ECLIPSE Help
- Introduction
- Client configuration
- Client configuration file format (ECLIPSE/etc/client*)
- Server configuration
- Server configuration files (ECLIPSE/etc/server.*)
- Web page generation
- Sample client checks
Introduction
ECLIPSE was written to provide operations staff and system administrators with a simple extensible tool to monitor the status of multiple clients and network connections. All monitoring is done through automatically generated WEB pages which have a refresh built into them.
All code is written in plain vanilla tcl and requires tcl version 7.5 or better. It has been extensively tested and supported on the following platforms : (SunOS 4.1.x, SOLARIS 2.x, AIX 3.2.5, AIX 4.x, and Linux).
Client configuration
The default client configuration file (ECLIPSE/etc/client) will check disk space, CPU load, daemon processes, and messages in the mail queue.
It is also possible to check connections to remote hosts from a client using both ping and tcp socket connections to any given port.
Custom client configuration files can be created with additional checks to be performed.
The client (ECLIPSE/bin/eclipse_client) will first look for ECLIPSE/etc/client.hostname.domain, then ECLIPSE/etc/client.hostname, and finally default to ECLIPSE/etc/client.
All client configuration files can thus be created in one master location and distributed to all clients.
Additionally, it is possible to simply NFS mount the ECLIPSE directory on the client in a standard location (the HOME of the user running the client, /opt, /apps, or /usr/local).
The client will work even if the directory is mounted read-only.
You can also specify the location of the ECLIPSE directory explicitly with the ECLIPSE environment variable.
If a report cannot be sent to one of the servers, a message is logged using the unix logger utility.
Client configuration file format (ECLIPSE/etc/client*)
The client configuration file is a TCL script that is sourced by the client script and sets the gv(monitors) and gv(monitor) variables.
The format of the file is indicated below.
set gv(monitors) {}
lappend gv(monitors) [list {server} {port}]
lappend gv(monitors) [list {server} {port}]
...
set gv(monitor) {}
lappend gv(monitor) [list {label} {command} {helpurl} {condition} {condition} ...]
lappend gv(monitor) [list {label} {command} {helpurl} {condition} {condition} ...]
...
Where :
- server is the name of a host running an ECLIPSE server
- port is the port to connect to (normally 1997)
- label is a label to be shown on the Web page describing the check
- command is a label to be shown on the Web page describing the check
- helpurl is a link to a Help URL for the check
- condition is a condition statment of the form {color op value}
- color is one of green, yellow, or red
- op is one of =, !=, <, >, <=, >=
- value is the value to compare against
Note that if the helpurl starts with a "#" mark, then it is assumed to be a name reference in standard.htm and will be prefixed with info/standard.htm.
- Command : diskfree directory
- This command will return the number of free Kbytes in the file system containing directory.
- Command : uptime ( users | cpu5 | cpu10 | cpu15 )
- This command will return the number of users currently on the system or the load averages over the past 5, 10, or 15 minute interval.
- Command : process searchstring
- This command will return a count of the number of lines in a complete process listing that contain the indicated searchstring.
- Command : filecount filespec
- This command will return a count of the number of files matching the supplied filespec.
- Command : fileage ( atime | ctime | mtime ) value filespec
- This command will return a count of the number of files matching the supplied filespec that are older than the indicated age (in seconds).
The access time, creation time, or modification time may be checked.
- Command : ping hostname count
- This command will return the number of successful returns of count pings to the indicated host.
- Command : connect port host ...
- A TCP socket connection on the indicated port will be attempted to each of the listed hosts and a count of the number of successful connections will be returned.
Server configuration
Servers can monitor any number of clients and a given client can send it's reports to multiple servers.
The processing burden for the server daemon is very low, and the Web page construction (done by running ECLIPSE/bin/eclipse_web) can be configured to run at any interval, the default being every 5 minutes.
Checks are made with a lock file (/tmp/eclipse_web.pid) to prevent multiple copies of eclipse_web running at the same time.
A script named ECLIPSE/bin/eclipse_checkdaemon is provided to ensure that the server is running.
It is designed to be invoked by cron at frequent intervals (every 5 minutes by default) and will start the server if it is not already running.
The eclipse server daemon itself (ECLIPSE/bin/eclipse_server) listens on port 1997 for incoming client reports.
Identification of clients is based upon the IP address corresponding to the incoming socket connection.
Mapping the IP addresses to client names is controlled by the ECLIPSE/etc/hosts file.
If a report comes in from an unidentied IP address, then the client_name is set to UNKNOWN:IP_address.
This provides a convenient way to identify new clients, clients that have changed their IP addresses, and a way to map several IP addresses to the same hostname (in the case where a client has multiple adapters and dynamic routing).
The reports are dropped by the server daemon (ECLIPSE/bin/eclipse_server) into the directory ECLIPSE/logs/client_name and the filename corresponds to a timestamp of when it was received (YYYYMMDDHHMM).
Old reports are removed up by the eclipse_web script.
If for some reason this script will not run, the reports will accumulate.
Depending on the number of clients that are being monitored, this could rapidly cause disk space problems.
A default maximum of 12 reports (5 minutes apart = 1 hour coverage) is kept but this may be changed by setting gv(maxkeep) in the server configuration file.
Server configuration files (ECLIPSE/etc/server.*)
If a file named ECLIPSE/etc/server.hostname exists, then when the web pages are built by ECLIPSE/bin/eclipse_web, a link will be added that points to each of the servers listed in the file.
The file is sourced by the ECLIPSE/bin/eclipse_web script and must contain properly formatted tcl code.
set gv(servers) {}
lappend gv(servers) [list {name} {URL}]
lappend gv(servers) [list {name} {URL}]
...
Where :
- name is the server name to show on the page
- URL is the URL corresponding to the server name
Web page generation
Web page generation is done by the ECLIPSE/bin/eclipse_web script.
The latest information is read from the log files in ECLIPSE/logs/* directories.
These log filenames correspond to timestamps so that the latest one can be easily identified.
Each line in the file corresponds to a check that was performed on the client and must be in TCL list format. The fields are :
- hostname the hostname of the client
- timestamp the timestamp of when the check completed (seconds since the epoch)
- status the color to show on the web page
- label the text to show in the table identifying the check
- command the actual client command that was run
- value the value returned from the command
- helpurl the URL to jump to for help
- condition 1 a condition in the form color operator value
- condition 2 a condition in the form color operator value
- ...
Note that if the helpurl starts with a # sign, then it is assumed to be a link to a name within the standard.htm file and will be prefixed with info/standard.htm.
Sample client checks
- Checking SMTP connections to multiple remote servers
- Using the following command will attempt to open a tcp socket connection on port 25 to each of the servers in turn and return the number of successful connections. This would normally be used to check a site that is served by a load balanced group of servers using MX records. You would normally set a red threshold of 0 indicating that no connection was possible.
connect 25 server1 server2 server3...
- Checking connection to a remote host using ping over a bad link
- Using the following command will send 5 pings to the indicated host. In the conditions, you should set the yellow threshold somewhere in the range 1 to 5 (meaning a bad but working connection) and the red threshold to 0 (meaning no connection).
ping host 5
- Checking for undelivered mail
- This command will check the sendmail queue for messages that are "stuck". A count of the number of messages that were created more than 10 minutes ago is returned.
fileage ctime 600 /var/spool/mqueue/q*
September 26 1997 - John van Gulik