Condor Trigger Version 0.9.0 Manual

Condor Team, University of Wisconsin-Madison


Contents


License

Please see the LICENSE file for details of the Condor® Public License.


Overview

Condor_trigger is a ClassAd-based system for taking action according to the status of some entity, such as a machine in a Condor pool. For example, we might want to take some action if a machine has been excessively heavily loaded for a certain amount of time. This can be done using condor_trigger.

Condor_trigger uses the concepts of states and triggers. A state is some condition of an entity being monitored that is either true or false (for example, whether the system load average is above some threshold). A trigger fires when some action should be taken (for example, if a system has been heavily loaded for a certain amount of time).

Using the dual concepts of states and triggers allows condor_trigger to be much more flexible than if we used a simpler scheme.

Most simply, entity status is translated to state, which is then translated to actions.

See the processing section for more information.

More info about the ClassAd language.


Ad Types

Condor Trigger uses several types of ClassAds:

Note that the different types of ads are not syntactically different; the difference is solely in how they are used in the Condor Trigger process.

Status ads contain the information to be monitored. For example, this might be information about the status of your Condor pool, gotten with the condor_status -xml command. Status ads are obtained from the "outside world" as far as Condor Trigger is concerned, and must be in XML form. (If you have classads in "normal" form, you can convert them with the classad_convert program that is part of the Condor Trigger distribution.)

Preprocessing ads define preprocessing to be done on the incoming status ads, before other processing is done. The preprocessing of the status ads generates a new set of status ads, which are then used in subsequent processing steps. Preprocessing ads are optional. Preprocessing ads are used when the incoming status ads cannot be properly matched with state definition ads (see the Example Ads for more information).

State definition ads define the states you are interested in. For example, if you want to monitor machine loads, you might write a state definition ad that defines a heavily loaded machine as a machine where the load average is more than 5.0. State definition ads are matched with status ads to produce state ads.

State ads define whether a given state is true for a given entity. For example, if you define a "heavily loaded" state, that state is true or false for a given machine at any time. State ads are generated by the first phase of condor_trigger processing.

Trigger ads define when an action should be taken (in terms of a state), and what action should be taken. Trigger ads are matched with state ads to produce action ads. The details of the trigger expression determine the type of trigger (see below). Trigger ads are matched with state ads to produce action ads.

Action ads represent the actions that should be taken as a result of the current status. They are produced by the second phase of condor_trigger processing.


Ad Details

Note that any "extra" attributes not discussed below are simply ignored; they will not affect the condor_trigger processing.

Status Ads

Status ads are produced by whatever is to be monitored. So far, we have mostly used condor_trigger with status ads generated by the HawkEye monitoring tool; however, the status ads can come from any source, as long as they are legal ClassAds and have the required attributes.

Required attributes:

Optional attributes:

Preprocessing Ads

As currently implemented, preprocessing ads basically have the capability to take sets of attributes from a status ad and break them out into their own ads; part of the process is that the attributes are renamed. This capability is necessary largely because HawkEye does not support nested ClassAds at the present time. Therefore, if a number of similar properties are being monitored (various file systems on a single machine, for example), they end up having attribute names such as disk_root_used, disk_scratch_used, etc., and being part of a single large ClassAd. The subsequent condor_trigger processing steps cannot easily deal with such attributes, so the preprocessing step was implemented to transform the status ad into a form that can be more easily processed.

Required attributes:

Note: during the preprocessing step, asterisks in the CopyAttrs Old values, and in the NewAdName value, will be replaced with either the items in the list pointed to by ListAttr or by integer values 1 through the value of the attribute pointed to by CountAttr.

Please study the example ads for more clarification.

State Definition Ads

Required attributes:

Optional attributes:

State Ads

Attributes:

<State> subad attributes:

Trigger Ads

Required attributes:

Action subad required attributes:

Action subad optional attributes (required if the type is "Mail"):

Action Ads

Attributes:


Processing

Condor_trigger processing takes place in two steps:

  1. Generation of state ads.
  2. Generation of action ads.

In step 1, the status ads, state definition ads, and previous state ads are combined to generate the current state ads:

In step 2, the current state ands and trigger ads are combined to generate the action ads; and the appropriate actions are taken:


Trigger Types

There are several different types of triggers, as follows:

A state trigger is a trigger that fires every time the system is in a given state.

An edge trigger is a trigger that fires on the transition into or out of a given state.

A time trigger is a trigger that fires when the system has been in a given state for a given amount of time. The time trigger can be either "continuous" or "one-time". The "continuous" time trigger fires every time after the initial delay has expired; the "one-time" trigger fires only the first time after the delay has expired.

The type of trigger is not specified explicitly; it is determined by the trigger expression.

State Trigger Example

Here's an example of a trigger expression for a state trigger:

    ActionTrigger = other.DiskFull.currentState;
This trigger fires whenever the DiskFull state is true.

Edge Trigger Example

Here's an example of a trigger expression for an edge trigger:

    ActionTrigger = (other.StateHeavilyLoaded.currentState &&
          !other.StateHeavilyLoaded.previousState);
This trigger fires on the false to true transition of the StateHeavilyLoaded state.

Time Trigger Examples

Here's an example of a trigger expression for a continuous time trigger:

    ActionTrigger = (other.FullQueue.currentState &&
          other.LastHeardFrom - other.FullQueue.StateChangeTime > 550);
This trigger fires whenever the FullQueue state has been true for more than 550 seconds.

Here's an example of a trigger expression for a one-time time trigger:

    ActionTrigger = (other.FullQueue.currentState &&
          other.LastHeardFrom - other.FullQueue.StateChangeTime > 550 &&
	  other.PreviousHeardFrom - other.FullQueue.StateChangeTime <= 550);

This trigger fires the first time the FullQueue state has been true for more than 550 seconds.


Trigger Actions

At the present time, sending email is the only action implemented by condor_trigger. A given trigger can send email to a single user, or to a list of users. If one invocation of condor_trigger results in multiple triggers sending email to a given user, the messages will be combined into a single email.

Condor_trigger may be extended in the future to support actions other than sending email. In the mean time, if you want to perform other actions, it would be fairly easy to write a program (using the ClassAd library) that reads the action ads and performs appropriate actions. (If you specify action types other than "Mail", the action ads are still generated, although no action is actually taken.)


Example Ads

Note: the example ads in this section are taken from the test6 example that is distributed with the condor_trigger code.

Status Ads

This is a status ad containing information relating to various disk partitions on a system. This ad is a portion of an ad generated by the HawkEye monitoring tool.

    [
        LastHeardFrom = 1068575609;
        Machine = "chopin.cs.wisc.edu";
        Name = "chopin.cs.wisc.edu";
        disk_FIELDS = "used mnt pct_used size fs avail "; 
        disk_INDEX = "tmp var  usr root scratch"; 
        disk_LastUpdate = 1068573945; 
        disk_root_avail = 45662; 
        disk_root_fs = "/dev/sda1"; 
        disk_root_mnt = "/"; 
        disk_root_pct_used = 77; 
        disk_root_size = 202220; 
        disk_root_used = 146118; 
        disk_scratch_avail = 2436884; 
        disk_scratch_fs = "/dev/sda8"; 
        disk_scratch_mnt = "/scratch"; 
        disk_scratch_pct_used = 91; 
        disk_scratch_size = 28327964; 
        disk_scratch_used = 24452060; 
        disk_tmp_avail = 667480; 
        disk_tmp_fs = "/dev/sda5"; 
        disk_tmp_mnt = "/tmp"; 
        disk_tmp_pct_used = 33; 
        disk_tmp_size = 1035660; 
        disk_tmp_used = 315572; 
        disk_usr_avail = 1682168; 
        disk_usr_fs = "/dev/sda2"; 
        disk_usr_mnt = "/usr"; 
        disk_usr_pct_used = 42; 
        disk_usr_size = 3020172; 
        disk_usr_used = 1184584; 
        disk_var_avail = 178756; 
        disk_var_fs = "/dev/sda6"; 
        disk_var_mnt = "/var"; 
        disk_var_pct_used = 63; 
        disk_var_size = 497829; 
        disk_var_used = 293371; 
    ]

Preprocessing Ads

This is a preprocessing ad to process the above status ad. This preprocessing ad breaks out the information about each disk partition into a separate ad, so that it can be processed in subsequent steps.

    [
        Name = "disk preprocess";
        ListAttr = "disk_INDEX";
        CopyAttrs = {
            [
                Old = "LastHeardFrom";
                New = "LastHeardFrom"
            ],
            [
                Old = "Machine";
                New = "Machine"
            ],
            [
                Old = "disk_*_pct_used";
                New = "disk_pct_used"
            ],
            [
                Old = "disk_*_mnt";
                New = "disk_mnt"
            ]
        };
        NewAdName = strcat(Machine, "_*_disk_info");
    ]

Preprocessed Status Ads

This is what the status ads look like after preprocessing with the above preprocessing ad. You can see that there is now a separate ClassAd for each disk partition. The original status ClassAd is unchanged.

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
	    ...
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        disk_mnt = "/"; 
        Name = "chopin.cs.wisc.edu_root_disk_info"; 
        disk_pct_used = 77
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        disk_mnt = "/scratch"; 
        Name = "chopin.cs.wisc.edu_scratch_disk_info"; 
        disk_pct_used = 91
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        disk_mnt = "/tmp"; 
        Name = "chopin.cs.wisc.edu_tmp_disk_info"; 
        disk_pct_used = 33
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        disk_mnt = "/usr"; 
        Name = "chopin.cs.wisc.edu_usr_disk_info"; 
        disk_pct_used = 42
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        disk_mnt = "/var"; 
        Name = "chopin.cs.wisc.edu_var_disk_info"; 
        disk_pct_used = 63
    ]

State Definition Ads

This ClassAd defines the "DiskFull" status. In this case, obviously, we are defining DiskFull to be true if the relevant disk is more than 90% full. Note that the StateTrigger definition can be any legal ClassAd expression on status ad attributes.

    [
        StateName = "DiskFull";
        StateTrigger = ( other.disk_pct_used > 90 );
        CopyAttrs = { "Machine", "disk_mnt", "disk_pct_used" };
    ]

State Ads

After the first phase of processing, these state ads are generated. The first ad (the one having only the LastHeardFrom and Name attributes) comes from the original status ad; the other state ads come from the individual disk status ads that are generated by the preprocessing.

    [
        LastHeardFrom = 1068575609; 
        Name = "chopin.cs.wisc.edu"
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        DiskFull = 
            [
                currentState = false
            ]; 
        disk_mnt = "/"; 
        Name = "chopin.cs.wisc.edu_root_disk_info"; 
        disk_pct_used = 77
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        DiskFull = 
            [
                currentState = true
            ]; 
        disk_mnt = "/scratch"; 
        Name = "chopin.cs.wisc.edu_scratch_disk_info"; 
        disk_pct_used = 91
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        DiskFull = 
            [
                currentState = false
            ]; 
        disk_mnt = "/tmp"; 
        Name = "chopin.cs.wisc.edu_tmp_disk_info"; 
        disk_pct_used = 33
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        DiskFull = 
            [
                currentState = false
            ]; 
        disk_mnt = "/usr"; 
        Name = "chopin.cs.wisc.edu_usr_disk_info"; 
        disk_pct_used = 42
    ]

    [
        LastHeardFrom = 1068575609; 
        Machine = "chopin.cs.wisc.edu"; 
        DiskFull = 
            [
                currentState = false
            ]; 
        disk_mnt = "/var"; 
        Name = "chopin.cs.wisc.edu_var_disk_info"; 
        disk_pct_used = 63
    ]

Trigger Ads

This is an example of a state trigger. The trigger fires every time an entity(???) is in the DiskFull state.

    [
        ActionTrigger = other.DiskFull.currentState;
        ActionText = strcat("Disk ", other.ad.disk_mnt, " on ",
                other.ad.Machine, " is ", other.ad.disk_pct_used, "% full");
        Action = [
            Type = "Mail";
            To = { "current user" };
            Subject = "Test 6 -- disk full";
        ];
    ]

Action Ads

The second phase of processing generates this action ad, and sends the corresponding email. Note that specifying "current user" as the email address sends mail to the user running condor_trigger (as defined by the LOGNAME environment variable).

    [
        ActionText = "Disk /scratch on chopin.cs.wisc.edu is 91% full"; 
        Action = 
            [
                Subject = "Test 6 -- disk full"; 
                To = 
                   {
                      "current user"
                   }; 
                Type = "Mail"
            ]
    ]

Tests

More details will be added here soon. For now, please see the README file included with the distribution.


Command Line Arguments

Condor_trigger takes the following command-line arguments:

-actadsout filename
file to receive output action ads
-preprocads filename
file containing ads specifying status ad preprocssing
-shortversion
print the version number only
-stateadsin filename
file containing state ads
-stateadsout filename
file to receive output state ads
-statedefads filename
file containing state definition ads
-statusads filename
file containing status ads (or - to read from stdin)
-statuspreout filename
save preprocessed status ads to the given file
-triggerads filename
file containing trigger ads
-usage
print the usage message and exit
-verbosity number
set verbosity level (default is 3)
-version
print the version number and compile date

Building Condor Trigger

Condor_trigger relies on the ClassAd library, so you need to download and build that before you build condor_trigger. Note that you need version 0.9.6 or higher of the C++ ClassAd library; earlier versions have a bug that prevents condor_trigger from working properly.

Once you've built the ClassAd library, edit the condor_trigger Makefile, changing CLASSAD_DIR to point to the directory in which you installed the ClassAd library. Then run the command 'make'. You should end up with condor_trigger and classad_convert executables.

Once you've built the executables, run the command 'make test6'. You should get an email with the following message:

    Disk /scratch on chopin.cs.wisc.edu is 91% full

If you get that message, the test worked correctly.

Also see the README file included in the distribution for more information.


Using Condor Trigger

Condor_trigger is meant as a tool for ongoing monitoring; therefore it is normally run repeatedly, processing updated status ads each time. For example, condor_trigger can be run as a cron job.

It is currently not possible to do both step 1 and step 2 of processing in a single invocation of condor_trigger, so condor_trigger is normally run twice (with different arguments) for each set of status ads to be processed.

For step 1 of the processing, the following arguments must be specified:

Normally, -stateadsin is also specified, with the given file containing the state ads produced by the previous run's step 1.

For step 2 of the processing, the following arguments must be specified:

Here's an example of a script that is used to run condor_trigger as a cron job (this is a simplified version of run_triggers, which is included in the condor_trigger distribution):
    #!/bin/csh -f

    condor_status -pool condor -xml > status.xml

    ./condor_trigger -statusads status.xml -statedefads statedef.ca \
                -stateadsin state.old.ca -stateadsout state.new.ca

    ./condor_trigger -stateadsin state.new.ca -triggerads trigger.ca \
                -actadsout actions.ca

    \rm state.old.ca
    mv state.new.ca state.old.ca

The main thing to note is that, at the end, the script renames state.new.ca to state.old.ca, so that the state ads just produced will be used as input the next time the script is run. This is the only way that edge and time-based triggers will work.


condor-admin@cs.wisc.edu