Statmgr Overview

(last revised 10 July, 2002)

Statmgr monitors error messages which are produced by other Earthworm modules, and determines whether to report and how to report an error. Errors are reported by sending email or generating TYPE_PAGE messages. User-provided software can then pick up the TYPE_PAGE message and hand it to paging software to transmits these messages via modem to a pager service. Statmgr also monitors heartbeats of client modules, and if heartbeats are not received, an email and/or pager message is produced.

Statmgr has a restart feature which allows the system to recover if any module hangs by restarting only the hung module. Any module can request to be restarted if it's heartbeat stops. Otherwise, no restart attempt will be made. If statmgr detects that heartbeats from the module have stopped, statmgr will send a message of type TYPE_RESTART to the startstop program. Startstop will then kill the module process and restart the module.

For each module monitored by statmgr, a descriptor file must exist and be specified in the statmgr configuration file. The earthworm convention has been to use the suffix '.desc' to indicate a descriptor file. In the descriptor file, the user may specify the following: