Chapter 22. The Boot utility

The boot utilit is a program that works with the databases created by the Experiment Configuration utility and the The state manager to initialize the components that make up an experiment's data acquisition system. The manager comes into play in managing the transitions from the NotReady state through the Booting state and into the Ready state, if necessary, failing back into the NotReady state in case something bad happens. To understand what this means, be sure to look at: Experiment (State manager) State Diagram.

The boot utility requires the environment variables that are set by the $DAQROOT/daqsetup.bash script ($DAQROOT is the location of the top level directory of an installation of NSCLDAQ at version 11.0 or greater). The boot program also requires that you can login to target systems via ssh without supplying a password, and that the version of NSCLDAQ you are using is installed on target systems in the same location as on the system that runs boot. For information on how to set up password free logins see: http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/ or, if that page is no longer in existence, google for ssh login without password and you'll get a large number of pages that show how to set this up.

The boot program accepts the following options:

--help

Outputs some program help information. When --help is specified, the program will exit after outputting the help message.

--server=statemgr-host

Specifies the host on which the state manager is running. If this optional switch is omitted,the state manager is assumed to be running on localhost

--state-service=service-name

Specifies the service name the state manager is using to publish its state and transitions. If not supplied this defaults to StatePublish

--transition-service

Specifies the service name on which the state manager accepts transition requests. If not specified, this defaults to StateRequest

In addition to the options described above, the boot program accepts a sequence of experiment configuration databases. Here's an example of a sequence of commands that starts up boot given that NSCLDAQ is installed at /usr/opt/daq/11.0:


. /usr/opt/daq/11.0/daqsetup.bash
$DAQBIN/boot s800.experiment caesar.experiment s800-ceasar-merge.experiment
            

A few points are worth mentioning from this example.

22.1. Details of operation

This section describes what the boot program actually does. First recall that the experiment configuration database defines several entities:

The boot program manages the creation and tear down of Rings and Programs at an appropriate time in the experiment.

The key state transitions the boot program is interested in are transitions to Booting from not ready and transitions to NotReady from any state.

The transtion to Booting tells the boot manager it's time to create the entities required by its experiment databases. Once all entities have been successfully created the boot manager requests that the state manager transition to the Ready state. A failure to start/create any entity results in a request that the state manager transition back to the NotReadystate.

The transition to NotReady means its time to destroy the entities that still exist, ensuring the next transition to Booting does not carry any baggage from the last time around.

Note that the databases are consulted on each start so there's no need to restart the boot manager if you reconfigure the experiment.

22.1.1. Booting the experiment

When the boot program becomes aware that the state manager is in the Booting state it iterates over the databases that were supplied to its command line twice. Both iterations are from left to right in command line order.

In the first iteration, the boot program creates ring buffers as defined by the databases. This comes first because there's an assumption that programs may need these rings to exist. The order in which this is done is from left to right on the command line and in order of increasing ringbuffer id within each database. In general, the order in which ring buffers are created is not important, however. They are the communications infrastructure that tie the experiment data flow togehter. As ring buffers are created they are memorized by the boot manager.

In the second iteration, the boot program starts the programs defined by the experiment configuration databases. The databases are again processed from left to right as they appear on the command line and within each database, programs are started in program id order. If any program cannot be started, the boot manager gives up and initiates a state transition back to the NotReady state. The assumption is that all programs are needed for the system to work.

Once all programs are successfully started, the boot manager initiates a state transition to Ready. The assumption is that at this time, users can start taking data. The boot manager remembers which programs it started, and the order in which they were started.

22.1.2. After the experiment is booted

Each program the boot manager starts is started in a wrapper process that is able to know if the program exits (normally or abnormally). If any program exits, the experiment software is assumed to be failing and the boot manager initiates a transition to NotReady (see the next section for what this triggers).

The initial version of the boot manager is also capturing output and error from the processes and displays that (not very nicely) to its stdout.

22.1.3. Shutting down the experiment

When the boot manager detects a transition to NotReady, for any reason, it will destroy the remaining entities. The order in which this is done is in the reverse order of entity creation.

First the boot manager will shut down the programs it started. This is done is three steps, each step is performed in the reverse order of the startup of the remaining program.

  1. There may be some programs that are aware of the state manager. The boot manager sleeps for a few seconds to give those programs time to become aware of the state manager's transtion to NotReady and to do what they need to cleanly exit.

  2. Other programs may accept an exit command. This string is pushed into the stdin of all remaining programs and, once more, time is given for them all to exit.

  3. Finally a Control-C interrupt is sent to all remaining programs. Programs can catch this signal and use that catch to exit gracefully if desired.