Day | Time | Alarm Color | Action |
---|---|---|---|
Weekday | Normal working hours | Yellow | Notify the administrator |
Weekday | Normal working hours | Red | Notify the administrator and log occurance |
Any day | Outside working hours | Yellow | Monitor the situation closely Notify administrator next working day |
Any day | Outside working hours | Red | Notify the ON-CALL administrator Notify administrator next working day |
Common causes are flooding email caused by a mail loop of some kind (/var), too much auditing (/var, /home, even / or /usr), or system logs that have not been cleaned out on a regular basis (/var or /usr). Huge system logs indicate a failure of some kind and should be examined to determine the cause of the problem.
When /home fills up, the most common cause is due to users. Cleaning out netscape caches in /home/*/.netscape/cache/*/* can provide temporary relief. For a more permanent solution, find out who is using the most disk space using "du -s /home/*" and ask them by email to clean up their accounts. If all of the disk usage is actually required for valid reasons, then either expand the filesystem or create a /home2 and move some users to it. Another alternative would be to move some users to a different server that has more capacity.
Yellow alarms are NOT serious and only serve as a warning so that the situation can be monitored more closely. The administrator does not need to be notified and a report is not required.
The inetd daemon is responsible for spawning processes to handle remote connections such as telnet, ftp, remote shells, and pop3 (required for PC based mail clients). The system will be basically dead to any new remote connections without te inetd process and in most cases will require access from the system console.
If the console cannot be logged into, then in most cases the system will require a hard boot. This involves pressing the reset button (if one exists) or powering the system OFF, waiting for at least 30 seconds, and then powering it back on.
Sendmail provides email delivery to and from a server and must be running at all times.
NNTP provides news service and must be running for users to be able to both read and send out news postings.
DNS provides host name lookup service and must be running 24/7. It is a critical service.
A report should be logged for all red alarms if they persist for more than 30 minutes and it seems like the queues are not clearing up. The administrator should be notified in all cases so that the cause can be determined and steps taken to handle the problem. Perhaps in some cases the alarm threshold will have to be raised.
They could indicate a problem with the Postoffice systems, a hostname resolution problem (DNS, NIS, or /etc/hosts), a disk space problem, or a problem with the sendmail daemon itself or it's config file.
A report should be logged for all red alarms if they persist for more than 30 minutes and it seems like the queues are not clearing up. The administrator should be notified in all cases so that the cause can be determined and steps taken to avoid the problem in the future.
Sometimes a single ping will fail over a slow serial connection, but subsequent ones will show as OK. This is why normally more than one ping is performed and the number of returned ones are counted. If there are 0 responses received, then the alarm will be set to red. If all are returned, then the alarm is set to green. The yellow threshold can be set somewhere in the middle for unreliable connections.
SMTP provides mail service and must be running for users to be able to receive and send out mail messages.
These alarms generally indicate a problem with either the system load (usually accompanied by a CPU load alarm), a full or nearly full filesystem (again usually accompanied by a Free Space alarm), a failure of the inetd daemon (which usually the method by which the POP3 daemon is started), or a more serious problem such as the removal of all or part of the required software. The administrator should be notified so that the problem can be tracked down and fixed. If the POP3 service is not required, then it should be removed from the alarm checks.
HTTP provides web service and must be running during normal working hours.