April 17, 2002
The following notes are intended to be a general guide to installing an Automatic Earthworm system. Note that, the Earthworm project consists of the Automatic and Interactive portions of Earthworm. As described in Section1,"Overview", the Automatic portion consists of the "real-time", rapid-response modules, and the Interactive portion consists of the DBMS and web-page codes. These notes deal with the installation of the Automatic portion of Earthworm.
These notes are based on the experiences gained from numerous installations, and represent an idealized, average situation. In practice, of course, each installation presents unique challenges which require specific solutions. It is hoped, however, that these notes may be of use in preventing common problems and expediting the process.
Installation tasks can be divided into those requiring primarily computer engineering skills, and those requiring seismological expertise. Engineering tasks include interfacing to communication links, integration of computer hardware, establishing network connections, and most of the software configuration.
Seismological tasks include determining the desired processing functions, establishing agreed-upon data exchanges, and configuration of the core processing modules. The lists of such modules is growing, but currently consists of the P-phase picker ("pick_ew"), the event associator ("bind_ew"), the STA/LTA trigger ("carlstatrig" and "carlsubtrig"), and hypoinverse. In particular, configuring the hypoinverse magnitude calculation parameters has proven to be a significant seismological task.
1. DETERMINE THE DESIRED FUNCTIONS
The main body of documentation is available on the web (http://folkworm.ceri.memphis.edu/ew-doc), and additional features are being continuously developed. New processing modules are initially documented in the release notes and subsequently integrated into the documentation set at the above web site. However, for planning purposes, Earthworm currently offers the following general features:
1.1. Analog data acquisition: 12 or 16 bit A/D conversion, up to 256 channels per Windows 2000 machine, any number of such machines per system.
1.2. Digital acquisition: Currently, most commercial data loggers are supported. All data streams, whether analog or digital, can be processed in the same manner.
1.3. Auto Location: Automatic P-picking, event association, and hypo-inverse solutions.
1.4. Auto Notification: Selective event notification via e-mail or pager system.
1.5. STA/LTA Events: Traditional STA/LTA coincidence event generation.
1.6 Event Archive: Events can be archived in various existing formats, including PC-SUDS, SAC, AH, and UW2. Other archive formats will be integrated as required.
1.7. Error Processing: email and pager notification of system malfunctions.
1.8. Manual Events: Events can be declared manually within the time interval available in the on-line trace data storage (below).
1.9 On-line Trace Data Storage: all acquired trace data can be made available via an IP server for a configurable time period, ranging from hours to weeks.
1.10. Distributed systems: Any number of Earthworm systems can be linked via IP communication links. The extend of such linkages is configurable. Possible configurations include a 'master' central site and any number of remote nodes operating as one integrated system, as well as limited, negotiated linkages between two sovereign central systems.
1.11. Multi-machine configurations: Any number of networked Windows 2000 or Solaris computers can be combined to run one Earthworm system.
1.12 Data Exchange: Any type of internal data can be exchanged in real time with other Earthworm systems. This exchange mechanism has proved to be quite reliable, and has performed well in numerous applications. It requires a communications link capable of physically interfacing to a computer's ethernet port and capable of carrying IP messages. It functions over the public internet as well as private dedicated links.
1.13. Long period Events: A volcanic long-period event detector, developed by John Evans at Menlo Park.
2. ESTIMATE THE HARDWARE REQUIREMENTS:
Earthworm has been developed under the mandate of speed and reliability. It therefore tends to use fast, low-level operating system services rather than high-level end-user packages. A side effect of this is that Earthworm is fairly economical in terms of hardware requirements. In general, a contemporary Pentium- or Ultra- class machine is adequate for most regional network loads. Specifically, the hardware requirements can be divided into four categories:
a. Data acquisition: This can be via the ethernet network port, system bus, or serial ports. In practice, we have not observed problems in the first two cases. We have observed problems with serial port acquisition, especially when running on the same machine as the bus-based A/D acquisition. The symptoms are intermittent data loss from the serial ports, and warning messages from the A/D subsystem.
b. Numeric processing: This is not generally a problem when the current, common processing features are enabled. A contemporary machine (PentiumII or Sparc Ultra) has been demonstrated to be capable of supporting the numeric and I/O requirements of processing a large events on over 500 channels in rear-real time. However, modules are currently being worked on which may require considerable numeric resources.
c. On-line Trace Storage: This is provided by the WaveServerV module, which maintains a circular, on-line history of trace data . Each channel is maintained in a separate file, and the user is free to specify the location and size of each such file. In operation, trace data is written to these files as it is acquired, and multiple concurrent client programs can request data via network connections. Therefore, the critical hardware resources are disk storage capacity, disk access time, and network bandwidth. Disk storage requirements can be estimated by multiplying the number of channels to be served, the sampling rate, the number of bytes per sample, the time interval to be available, and an overhead factor of perhaps ten percent. Disk access time depends on the anticipated usage. A write is generated as each trace packet is acquired, and a burst of reads is generated with each client request. In practice, a contemporary 10,000 RPM SCSI disk supporting 64 channels seems conservative; twice that load is successfully functioning at some sites.
Network bandwidth requirements are critical on the input side, as the trace input streams are real-time. The server-side responses, or course, are not. In situations where this is a concern, a second, dedicated ethernet subnet is usually established to carry real-time trace data, and the participating machines are equipped with two (or more) network adapters.
d. Event storage and processing: As mentioned above, Earthworm can produce directories of event files from various triggers in various formats. Adequate disk space has to be provided to hold such events until they are moved to off-line media. Considerations there include the expected worst-case seismicity levels, time of storage, and off-line media capacity and life-time.
* Determine the hardware required (or available) at each node and the central site.
Given the desired features, as discussed above, and the above considerations, it is then possible to determine the hardware required at each node.
* Identify available communication link between central and remote nodes. The relevant parameters include nominal bandwidth, likelyhood of degraded performance, and interruptions. This may affect the features to be activated at the various nodes: how autonomous should they be, how much data should they be able to store.
3. PRODUCE A TEST EVENT
This is an optional step which has proven to be extremely useful in rapidly producing a reliable configuration in low-seismicity areas. It will not be needed until testing, but the effort should be initiated early to produce a timely result. The procedure is to find a suitable past event recorded on digital media, and to convert this to an Earthworm 'player' file. Earthworm includes a 'player' module ("waveserver") which can be used to repeatedly inject the trace data of the event to test the operation of the system.
This usually requires help from members of the Earthworm development team, and may require some effort, although some conversion software from other formats exists (such as PC-SUDS and CUSP).
4. PRODUCE A NETWORK STATION LIST
* Request an FDSN network code by submitting the form at http://fdsn.org/getcode.html. This two-letter code is used throughout the system as part of the trace data naming convention. It is also essential if any data exchange with other networks is contemplated.
* Request an Earthworm "Installation Id" from Mitch Withers, CERI, Memphis. Mitch will update the distribution to include your new installation id, and issue an updated "earthworm_global.d" to all involved installations.
* Earthworm uses the IRIS "Station, Component, Network" channel naming convention. See the IRIS web site for details. An installation is free to define its own naming convention, but this may present problems if linkages to other institutions are desired.
* Define all station names and locations, and produce a station list in hypoinverse format. If some parameters are not known, use unique temporary names and values which can later be search-and-replaced with real values.
5. CREATE THE EARTHWORM SOFTWARE DIAGRAMS
For each node, draw a diagram showing modules, message rings, and message paths. Examples are shown in Examples. The diagram should include:
* Final naming structure:
It is encouraged that each module in an installation be given a unique module name ( MOD_ID) and that the name include an identifier indicating which node this module runs in. This is not required, but in practice avoids confusion during initial configuration, and especially for later modifications. It has been found that finalizing the names (as discussed below) at this time can avoid a prolonged debugging period later.
* Identify port numbers for each node:
Various modules which send and receive data involve IP addresses and port numbers. To avoid later conflicts, the diagram should include the final values of these parameters.
* Note that each node must include a 'startstop' and 'statmgr' module to permit startup and re-starts of modules.
5.1 EARTHWORM NAMING SCHEME (BACKGROUND):
There are four types of names within the Earthworm system: "Installation Id", "Module Id", "Message Type", and "Ring Name". Each type of name is represented as an ASCII string at the human interface level, but is resolved to a numeric value when the system is running. The motivation for the naming scheme is that
* Earthworm is a message-passing system,
* An Earthworm system consists of many modules which listen to and produce messages, and
* Many Earthworm systems can exchange messages.
To implement this, each message, as it is generated, is given a 'shipping label' which contains the numeric values of "Installation Id", "Module Id", "Message Type". The "Installation Id" is the name of the installation at which this Earthworm is running. The "Module Id" is the name of the module which generated this message, and "Message Type" is the name of the type of message this is (eg. trace data, pick, location, heartbeat, etc)
Since any message may be shipped to any installation, this combination has to be unique amongst all Earthworm installations. This is assured by keeping the "Installation Id" unique. These are assigned by Mitch Withers, CERI Memphis Tennessee.
A module specifies which messages it is interested in listening to by specifying the 'shipping label'. Therefore, to be general, the "Module Id" must be unique within an installation. (Note that a module may specify a list of desired 'shipping labels', as well as specify wild-card values for selected portions of a label.)
In order to facilitate meaningful data exchange, there have to be "Message Type" values which are common to all installations. That is, if two installations decide to exchange pick data, both systems must have the same value for "Message Type" pick. Any installation can, of course, define additional private "Message Type" for its own internal purposes.
"Ring Names" are used by each module to specify which message ring the module is to 'listen to', and which ring it is to broadcast messages into. These names must be unique to each node. A set of traditional names have evolved, (eg. WAVE_RING, HYPO_RING, etc) and it is recommend to adhere to those, as it greatly reduces the amount of editing required.
The names appear in the configuration files of each module in the form of ASCII strings. The tables which define the numeric values of those strings are stored in two files: "earthworm.d" and "earthworm_global.d". "earthworm_global.d" contains the definitions which are common to all Earthworm sites: the "Installation Id" and "Message Type" values which are exchanged between installations. Changes to this file are coordinated through Mitch Withers, CERI, Memphis. "earthworm.d" contains definitions for local "Message Type", "Module Id", and "Ring Name".
In the Earthworm release these two files are included in the subdirectory "environment". They have to be moved from there to the local /run/params directory (see section 7).
6. LOAD THE EARTHWORM SOFTWARE
The Earthworm Directory Structure:
.../earthworm /run /params /log /vx.xx /bin /environment
/earthworm is the root of the Earthworm software;
/run is the directory which specifies and contains all data specific to an earthworm configuration. Many such 'run' directories can be stored under /earthworm, each specifying a distinct construction. Note that only one such construction can be running at a time.
/params contains the files specifying the configuration and environment for this earthworm construction.
/log contains all log files generated by this construction.
/vx.xx represents some earthworm version, as obtained from the distribution.
/bin contains all executables for this version, as obtained from the distribution, or by local compilation.
/environment contains files of global definitions a discussed in next section. Some of these must be moved to the /params directory.
Loading the Software:
* Create the root directory for Earthworm (e.g. c:\earthworm). Bring the current Earthworm release from FTP site into this directory, and unpack it there.
* Note that the executable binaries are in separate directories of the ftp site. The appropriate version of the executables should be placed in the \bin directory (e.g. c:\earthworm\v#.#\bin).
* Note that the "src" (\earthworm\v#.#\src) directory of the release contains a subdirectory for each module. Each such directory contains the source and make files, as well as example error processing configuration files (*.desc) and parameter files (*.d).
7. CREATE THE RUN DIRECTORIES
It is recommended that each node have the same directory structure. This is not required, but has been found useful in simplifying maintenance and updates.
* Under the \earthworm root directory, create a 'run' directory for each node. (ie., c:\earthworm\run_central, c:\earthworm\run_remote1, c:\earthworm\run_remote2 etc)
* Under each such run directory create a \log and a \params directory. The \log directory will contain all log files witten by modules in this node. The \params directory will contain all configuration files (.d and .desc) neccessary to run this node.
* Copy three files from the release the each of the \parms directories. These are "earthworm_global.d", "earthworm.d", and "ew_nt.cmd" (or "ew_sol_sparc.cmd", or "ew_sol_intel.cmd", depending on the platform). These files are located in the earthworm release in the \environment directory.
8. CONFIGURE THE EARTHWORM ENVIRONMENT FILES
* The earthworm_global.d file should contain your Installation Id. Contact Mitch Withers, CERI, Memphis, who acts as control for this file. Local editing of this file is possible, but will prevent the installation form exchanging data with other sites.
* Edit earthworm.d: Using your Earthworm software diagram, enter a definition for each each mesasage Ring Name and Module ID at each node. At this point, do not alter the Message Types found there, but use the default entries.
* Create a .cmd file for each node. This file contains the environment variables used by the Earthworm system. As mentioned above, these files are platform specific, and currently three versions exist: for Windows 2000 and NT: "ew_nt.cmd", for Solaris on Sparc: "ew_sol_sparc.cmd", and for Solaris on Intel: "ew_sol_intel.cmd". (OS/2 has sadly been dropped). Thus, for example, one would create files such as "ew_sol_sparc_central.cmd", "ew_nt_remote1.cmd", and "ew_nt_remote2.cmd".
In each file, edit the definitions for EW_INSTALLATION, EW_HOME, EW_VERSION, EW_PARAMS and EW_LOG. The remainder of the file requires no change:
EW_INSTALLATION is the installation id of this institution, as listed in "earthworm_global.d".
EW_HOME is the earthworm root directory ("/earthworm").
EW_VERSION is the version of the Earthworm release to be used (eg. v4.0). This permits several versions to be stored concurrently. Changing this definition will cause various versions to be run.
EW_PARAMS specifies the path to the dirctory from will paramete files will be read (eg. /earthworm/run_central/params). This will be different for each node.
EW_LOG similarly determines where log files will be written.
9. MODULE CONFIGURATION
For each module within a node you will need to create a parameter (.d) and an error processing file (.desc). Samples of these files are in the source directory for each module (e.g. c:\earthworm\v#.#\src).
To configure a module, first create the invocation paragraph for the module in startstop_nt.d (see below). Then copy the sample .d and .desc files from that module's source directory. Change the names to match the name of the Module Id. Edit the files to suit the configuration. At a minimum this involves changing the name of the Module Id (this is where the executable learns its Module Id), Installation Id (in the .desc file) and possibly the ring names. Change any other parameters to suit the configuration.
"startstop" and "statmgr" are required to run in all nodes an an earthworm system. Since the "startstop" and "statmgr" module's parameter and error processing files contain settings that control and effect the overall node it is advised to first configure these modules. "startstop" is the module which starts first, and brings up the system by creating the specified rings and starting each module. "statmgr" is responsible for processing error messages created by other modules, monitoring their heartbeats, and issuing restart requests when a module's heartbeat ceases (see below).
This module is system-specific. Therefore, there are system-specific configuration files (startstop_nt.d and startstop_sol.d) It's configuration file contains the substance of the earthworm configuration diagram (Sect.4). It first lists the rings required for this construction, and then lists a paragraph for each module to be started. This paragraph lists the name of the module's executable, the configuration file it is to read when starting, the priority it is to run at, and (in the Windows 2000 case) whether the module is to have its own window.
This module's configuration file (statmgr.d) lists the error processing files (.desc) of all modules which it is to keep track of. Here are listed the procedures to be followed for the various error types which a client module may issue (email, page, etc), as well as the token "restartMe". If this token appears in a module's .desc file, statmgr will initiate the restart procedure for that module if it's heartbeat should not appear within the stated time period. This procedure involves issuing a 'kill' to the offending module's process id, and "startstop" will then restart the module in a manner identical to that used at startup.
Note that "statmgr", like all modules, has one input. That is, it attaches and listens to one message ring. In order for it to receive error and heartbeat messages from all modules in a node, there are helper modules which 'conduct' such messages from one ring to another. These are the module "copystatus". They are extremely simple, and have no associated .d or .desc files. They are listed in "startstop_xx.d" as other modules, but their command line arguments, instead of being the name of the configuration file, name the input and output rings to which they are to attach.
9.3 OTHER MODULES
The Earthworm release currently includes over 50 modules. These are described on the Earthworm web site (http://folkworm.ceri.memphis.edu/ew-doc). However, several key modules are discussed below:
These modules perform long-distance message exchange between Earthworm systems. The Import ("import_generic") module is supplied with the IP address and port of the Export it is to accept connections from. When it senses a connect request, it connects, and places all incoming messages on it's specified output message ring. The 'shipping label' of the message is that of the incoming message. It exchanges predetermined heartbeat texts with it's Import at specified rates. If the incoming heartbeat text does not match the specification in its configuration file, it closes the connection. If the heartbeat from its Import does not arrive on time, it re-initialized the connection.
The Export module is given the address of the network port through which they are to communicate - the assumption is that the host machine may have several ports - and the port number to use. Its is given the heartbeat text which it is to send to its Imports, the rate to send them, as well as the text and expected rate of the incoming heartbeats. If the incoming heartbeat does not arrive on time, the connection is closed and re-initialized. It has buffering capability to prevent loss of messages during this process. The Export module comes in two varieties:
"export_generic" accepts a list of shipping labels which it is to ship. "export_scn" ('scn' as in Station Component Network), specializes in shipping only trace messages, and accepts a list of SCN names to ship. This permits selective sending of only specified data channels.
* RingtoCoax / CoaxtoRing
These modules effect short-distance replication of messages on a ring. "ringtocoax" (the name dates back to thin-net times), will broadcast all messages in its specified input ring onto a network adapter. The messages are broadcast in the IP sense, in that no recipient is specified. This allows any number of unknown systems to be listening, without affecting the sending system.
* Assoc (binder_ew)
* Carl Trig
10. INITIAL STARTUP
After all modules are configured, the system can be started.
*For mulit-node configurations, it is recommended that at least one remote node be physically close to the central node for initial debugging.
* Establish and test all communication links.
* Start each node. This is initially best done manually: create a command window, and manually change the directory to the /run_xx/params of this node. Execute the applicable environment file (ew_nt.cmd, ew_sol_sparc.cmd, etc.). Enter the command 'startstop'.
* The window should display a list of modules and their status, as produced by "startstop" presssing 'enter' will re-display this list. The window will also display any error messages from modules which are sharing this window (see below).
11. DEBUGGING TOOLS AND HINTS:
In the startstop configuration file (startstop_nt.d), each module which is to be run has a "NewConsole" / "NoNewConsole" specification. This determines whether the module's standard output is written to its own command window or into the window used to start the system. For initial testing, it is recommended to set all these to "NoNewConsole". In routine operation, individual windows are convenient for checking on system status. During initial debugging, however, modules tend to exit due to gross configuration errors (typical are invalid "Module Id" and "Ring Name" values). Under Windows 2000 this causes the window of the deceased module to vanish, carrying with it the module's error messages. The "NoNewConsole" setting permits the user to analyze terminal error messages.
As mentioned above, Earthworm includes a logging scheme in which each module can produce a log file in the /run/log directory. When a module has identity information to create a log file, it will do so, creating a new log file at midnight. These log files are the primary method for analyzing system malfunctions after initial setup. Note that under Windows 2000,a log file which is in use cannot be edited. One solution is to copy the current log file to a scratch file, and then look at that scratch file with an editor.
Another possible problem during initial configuration is that a configuration file error may cause "startstop" to exit. Since "startstop" is responsible for terminating any modules which may be running, this can leave modules running, which will cause chaos when the system is re-started again. To clean up such 'wild' modules, either terminate them via the Win2000 Task Manager, or simply reboot the machine.
It is often useful to have direct confirmation of what (if any) messages are appearing in a given message ring. "sniffring" is a useful debugging program which will show a summary of all messages being broadcast into a specified ring. To run this program, open a command window, execute the currently used "ew_nt.cmd" file (see section 9), and then enter the command "sniffring RING_NAME", where RING_NAME is the name of the ring you wish to examine. The program will display the lengths, shipping labels, and summaries of various messages.
Often, there are many messages in a ring, and one wishes to know if trace data from a specified channel is flowing through that ring. The program "sniffwave" is similar to "sniffring" above, but in addition takes as command line arguments the "Station" "Component" and "Network" name of the channel to be displayed.
After starting an Earthworm system with "startstop", a short list of the modules in operation, and their status is displayed. This list can be re-displayed by pressing return in the window in which "startstop" was invoked. The same display can be produced by entering the command "status" in a window in which "ew_nt.cmd" has been executed.
12. COMMON CONFIUGRATION ERRORS:
* Module ID:
A module will terminate because it was given a "Module Id" which is not defined in "earthworm.d".
* Port Numbers:
Strange behavior can occur if several modules are using the same port number. The simplest solution is to assign unique port numbers within an installation.
* Ring Name:
A problem of a module not functioning is that it has been configured to listen to the wrong ring, or that the module which is to be supplying the input messages is putting them on some other ring.
* Heartbeat rates:
Modules can produce heartbeat messages which "statmgr" can use to determine when a module has died. The rate at which heartbeats are produced (specified in a module's ".d" file) should be reasonably higher than the rate at which "statmgr" expects to see them (as specified in the module's ".desc" file). Setting them the same, or very close can cause a module to be spuriously declared dead (and optionally restarted).
Heartbeats are also used by the "Import" and "Export" modules to assure that long-distance links are functional. Note that these heartbeats are sent over the long-distance link, and are distinct from the internal heartbeats mentioned above. Heartbeats are sent in both directions. The parameter file contains the rate at which they are generated, and the rate at which they are expected. The sending rate should be considerably larger than the expected rate, to allow for transmission delays. In practice, sending rates of 60 seconds, and expected rates of 300 seconds have been successful over long-haul, low-quality IP links.
"Import" and "Export", like other modules, produce internal heartbeats. These should be configured to be slower than the heartbeats being exchanged over the long-haul link. Otherwise, "statmgr" may decide that the module has died, and perform needless numerous restarts.
13. INCREASING THE NUMBER OF SHARED MEMORY RINGS UNDER SOLARIS
Under Solaris 2.6 (and probably other versions as well), the maximum number of shared memory segments is six. This means that on an out-of-the-box machine you can only configure six rings. If you try to configure more than that, you will see a cryptic message from tport_create about too many open files. The fix to this problem is to add the following lines to the /etc/system file, and then reboot the system.
set shmsys:shminfo_shmmax = 4294967295 set shmsys:shminfo_shmmin = 1 set shmsys:shminfo_shmmni = 100 set shmsys:shminfo_shmseg = 20 set semsys:seminfo_semmns = 200 set semsys:seminfo_semmni = 70 This allows for 20 rings.
Date last modified: April 16, 2002