Please see the LICENSE file for details of the Condor® Public License.
Condor_trigger is a ClassAd-based system for taking action according to the status of some entity, such as a machine in a Condor pool. For example, we might want to take some action if a machine has been excessively heavily loaded for a certain amount of time. This can be done using condor_trigger.
Condor_trigger uses the concepts of states and triggers. A state is some condition of an entity being monitored that is either true or false (for example, whether the system load average is above some threshold). A trigger fires when some action should be taken (for example, if a system has been heavily loaded for a certain amount of time).
Using the dual concepts of states and triggers allows condor_trigger to be much more flexible than if we used a simpler scheme.
Most simply, entity status is translated to state, which is then translated to actions.
See the processing section for more information.
More info about the ClassAd language.
Condor Trigger uses several types of ClassAds:
Note that the different types of ads are not syntactically different; the difference is solely in how they are used in the Condor Trigger process.
Status ads contain the information to be monitored.
For example, this might be information about the status of your
Condor pool, gotten with the condor_status -xml
command. Status
ads are obtained from the "outside world" as far as Condor Trigger
is concerned, and must be in XML form. (If you have classads in
"normal" form, you can convert them with the classad_convert
program that is part of the Condor Trigger distribution.)
Preprocessing ads define preprocessing to be done on the incoming status ads, before other processing is done. The preprocessing of the status ads generates a new set of status ads, which are then used in subsequent processing steps. Preprocessing ads are optional. Preprocessing ads are used when the incoming status ads cannot be properly matched with state definition ads (see the Example Ads for more information).
State definition ads define the states you are interested in. For example, if you want to monitor machine loads, you might write a state definition ad that defines a heavily loaded machine as a machine where the load average is more than 5.0. State definition ads are matched with status ads to produce state ads.
State ads define whether a given state is true for a given entity. For example, if you define a "heavily loaded" state, that state is true or false for a given machine at any time. State ads are generated by the first phase of condor_trigger processing.
Trigger ads define when an action should be taken (in terms of a state), and what action should be taken. Trigger ads are matched with state ads to produce action ads. The details of the trigger expression determine the type of trigger (see below). Trigger ads are matched with state ads to produce action ads.
Action ads represent the actions that should be taken as a result of the current status. They are produced by the second phase of condor_trigger processing.
Note that any "extra" attributes not discussed below are simply ignored; they will not affect the condor_trigger processing.
Status ads are produced by whatever is to be monitored. So far, we have mostly used condor_trigger with status ads generated by the HawkEye monitoring tool; however, the status ads can come from any source, as long as they are legal ClassAds and have the required attributes.
Required attributes:
Optional attributes:
As currently implemented, preprocessing ads basically
have the capability to take sets of attributes from a
status ad and break them out into their own ads; part of
the process is that the attributes are renamed. This
capability is necessary largely because HawkEye does not
support nested ClassAds at the present time. Therefore,
if a number of similar properties are being monitored
(various file systems on a single machine, for example),
they end up having attribute names such as
disk_root_used
, disk_scratch_used
,
etc., and being part of a single large
ClassAd. The subsequent condor_trigger processing steps
cannot easily deal with such attributes, so the preprocessing
step was implemented to transform the status ad into
a form that can be more easily processed.
Required attributes:
ListAttr
, the name
of an attribute
in the status ad containing a list of the subads to be generated;
if CountAttr
, the name of an attribute in the
status ad containing
the number of subads to be generated, where the subads will be
named 1 through n, where n is the value of the
CountAttr
attribute.Old
and
New
.
Old
is the name of the attribute to copy from the
existing status ad; New
is the name that attribute
will have in the new status ad.Note: during the preprocessing step, asterisks in the
CopyAttrs Old
values, and in the
NewAdName
value, will be
replaced with either the items in the list pointed to by
ListAttr
or by integer values 1 through the
value of the attribute pointed to by CountAttr
.
Please study the example ads for more clarification.
Required attributes:
Optional attributes:
Attributes:
LastHeardFrom
value from
the status adLastHeardFrom
value
the previous time condor_trigger was run<State> subad attributes:
Required attributes:
Action subad required attributes:
Action subad optional attributes (required if the type is "Mail"):
Attributes:
Condor_trigger processing takes place in two steps:
In step 1, the status ads, state definition ads, and previous state ads are combined to generate the current state ads:
In step 2, the current state ands and trigger ads are combined to generate the action ads; and the appropriate actions are taken:
There are several different types of triggers, as follows:
A state trigger is a trigger that fires every time the system is in a given state.
An edge trigger is a trigger that fires on the transition into or out of a given state.
A time trigger is a trigger that fires when the system has been in a given state for a given amount of time. The time trigger can be either "continuous" or "one-time". The "continuous" time trigger fires every time after the initial delay has expired; the "one-time" trigger fires only the first time after the delay has expired.
The type of trigger is not specified explicitly; it is determined by the trigger expression.
Here's an example of a trigger expression for a state trigger:
ActionTrigger = other.DiskFull.currentState;This trigger fires whenever the DiskFull state is true.
Here's an example of a trigger expression for an edge trigger:
ActionTrigger = (other.StateHeavilyLoaded.currentState && !other.StateHeavilyLoaded.previousState);This trigger fires on the false to true transition of the StateHeavilyLoaded state.
Here's an example of a trigger expression for a continuous time trigger:
ActionTrigger = (other.FullQueue.currentState && other.LastHeardFrom - other.FullQueue.StateChangeTime > 550);This trigger fires whenever the FullQueue state has been true for more than 550 seconds.
Here's an example of a trigger expression for a one-time time trigger:
ActionTrigger = (other.FullQueue.currentState && other.LastHeardFrom - other.FullQueue.StateChangeTime > 550 && other.PreviousHeardFrom - other.FullQueue.StateChangeTime <= 550);
This trigger fires the first time the FullQueue state has been true for more than 550 seconds.
At the present time, sending email is the only action implemented by condor_trigger. A given trigger can send email to a single user, or to a list of users. If one invocation of condor_trigger results in multiple triggers sending email to a given user, the messages will be combined into a single email.
Condor_trigger may be extended in the future to support actions other than sending email. In the mean time, if you want to perform other actions, it would be fairly easy to write a program (using the ClassAd library) that reads the action ads and performs appropriate actions. (If you specify action types other than "Mail", the action ads are still generated, although no action is actually taken.)
Note: the example ads in this section are taken from the test6 example that is distributed with the condor_trigger code.
This is a status ad containing information relating to various disk partitions on a system. This ad is a portion of an ad generated by the HawkEye monitoring tool.
[ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; Name = "chopin.cs.wisc.edu"; disk_FIELDS = "used mnt pct_used size fs avail "; disk_INDEX = "tmp var usr root scratch"; disk_LastUpdate = 1068573945; disk_root_avail = 45662; disk_root_fs = "/dev/sda1"; disk_root_mnt = "/"; disk_root_pct_used = 77; disk_root_size = 202220; disk_root_used = 146118; disk_scratch_avail = 2436884; disk_scratch_fs = "/dev/sda8"; disk_scratch_mnt = "/scratch"; disk_scratch_pct_used = 91; disk_scratch_size = 28327964; disk_scratch_used = 24452060; disk_tmp_avail = 667480; disk_tmp_fs = "/dev/sda5"; disk_tmp_mnt = "/tmp"; disk_tmp_pct_used = 33; disk_tmp_size = 1035660; disk_tmp_used = 315572; disk_usr_avail = 1682168; disk_usr_fs = "/dev/sda2"; disk_usr_mnt = "/usr"; disk_usr_pct_used = 42; disk_usr_size = 3020172; disk_usr_used = 1184584; disk_var_avail = 178756; disk_var_fs = "/dev/sda6"; disk_var_mnt = "/var"; disk_var_pct_used = 63; disk_var_size = 497829; disk_var_used = 293371; ]
This is a preprocessing ad to process the above status ad. This preprocessing ad breaks out the information about each disk partition into a separate ad, so that it can be processed in subsequent steps.
[ Name = "disk preprocess"; ListAttr = "disk_INDEX"; CopyAttrs = { [ Old = "LastHeardFrom"; New = "LastHeardFrom" ], [ Old = "Machine"; New = "Machine" ], [ Old = "disk_*_pct_used"; New = "disk_pct_used" ], [ Old = "disk_*_mnt"; New = "disk_mnt" ] }; NewAdName = strcat(Machine, "_*_disk_info"); ]
This is what the status ads look like after preprocessing with the above preprocessing ad. You can see that there is now a separate ClassAd for each disk partition. The original status ClassAd is unchanged.
[ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; ... ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; disk_mnt = "/"; Name = "chopin.cs.wisc.edu_root_disk_info"; disk_pct_used = 77 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; disk_mnt = "/scratch"; Name = "chopin.cs.wisc.edu_scratch_disk_info"; disk_pct_used = 91 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; disk_mnt = "/tmp"; Name = "chopin.cs.wisc.edu_tmp_disk_info"; disk_pct_used = 33 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; disk_mnt = "/usr"; Name = "chopin.cs.wisc.edu_usr_disk_info"; disk_pct_used = 42 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; disk_mnt = "/var"; Name = "chopin.cs.wisc.edu_var_disk_info"; disk_pct_used = 63 ]
This ClassAd defines the "DiskFull" status. In this case, obviously, we are defining DiskFull to be true if the relevant disk is more than 90% full. Note that the StateTrigger definition can be any legal ClassAd expression on status ad attributes.
[ StateName = "DiskFull"; StateTrigger = ( other.disk_pct_used > 90 ); CopyAttrs = { "Machine", "disk_mnt", "disk_pct_used" }; ]
After the first phase of processing, these state ads are generated. The first ad (the one having only the LastHeardFrom and Name attributes) comes from the original status ad; the other state ads come from the individual disk status ads that are generated by the preprocessing.
[ LastHeardFrom = 1068575609; Name = "chopin.cs.wisc.edu" ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; DiskFull = [ currentState = false ]; disk_mnt = "/"; Name = "chopin.cs.wisc.edu_root_disk_info"; disk_pct_used = 77 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; DiskFull = [ currentState = true ]; disk_mnt = "/scratch"; Name = "chopin.cs.wisc.edu_scratch_disk_info"; disk_pct_used = 91 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; DiskFull = [ currentState = false ]; disk_mnt = "/tmp"; Name = "chopin.cs.wisc.edu_tmp_disk_info"; disk_pct_used = 33 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; DiskFull = [ currentState = false ]; disk_mnt = "/usr"; Name = "chopin.cs.wisc.edu_usr_disk_info"; disk_pct_used = 42 ] [ LastHeardFrom = 1068575609; Machine = "chopin.cs.wisc.edu"; DiskFull = [ currentState = false ]; disk_mnt = "/var"; Name = "chopin.cs.wisc.edu_var_disk_info"; disk_pct_used = 63 ]
This is an example of a state trigger. The trigger fires every time an entity(???) is in the DiskFull state.
[ ActionTrigger = other.DiskFull.currentState; ActionText = strcat("Disk ", other.ad.disk_mnt, " on ", other.ad.Machine, " is ", other.ad.disk_pct_used, "% full"); Action = [ Type = "Mail"; To = { "current user" }; Subject = "Test 6 -- disk full"; ]; ]
The second phase of processing generates this action ad, and sends the corresponding email. Note that specifying "current user" as the email address sends mail to the user running condor_trigger (as defined by the LOGNAME environment variable).
[ ActionText = "Disk /scratch on chopin.cs.wisc.edu is 91% full"; Action = [ Subject = "Test 6 -- disk full"; To = { "current user" }; Type = "Mail" ] ]
More details will be added here soon. For now, please see the README file included with the distribution.
Condor_trigger takes the following command-line arguments:
Condor_trigger relies on the ClassAd library, so you need to download and build that before you build condor_trigger. Note that you need version 0.9.6 or higher of the C++ ClassAd library; earlier versions have a bug that prevents condor_trigger from working properly.
Once you've built the ClassAd library, edit the condor_trigger Makefile, changing CLASSAD_DIR to point to the directory in which you installed the ClassAd library. Then run the command 'make'. You should end up with condor_trigger and classad_convert executables.
Once you've built the executables, run the command 'make test6'. You should get an email with the following message:
Disk /scratch on chopin.cs.wisc.edu is 91% full
If you get that message, the test worked correctly.
Also see the README file included in the distribution for more information.
Condor_trigger is meant as a tool for ongoing monitoring; therefore it is normally run repeatedly, processing updated status ads each time. For example, condor_trigger can be run as a cron job.
It is currently not possible to do both step 1 and step 2 of processing in a single invocation of condor_trigger, so condor_trigger is normally run twice (with different arguments) for each set of status ads to be processed.
For step 1 of the processing, the following arguments must be specified:
-stateadsin
is also specified, with the given file
containing the state ads produced by the previous run's step 1.
For step 2 of the processing, the following arguments must be specified:
#!/bin/csh -f condor_status -pool condor -xml > status.xml ./condor_trigger -statusads status.xml -statedefads statedef.ca \ -stateadsin state.old.ca -stateadsout state.new.ca ./condor_trigger -stateadsin state.new.ca -triggerads trigger.ca \ -actadsout actions.ca \rm state.old.ca mv state.new.ca state.old.ca
The main thing to note is that, at the end, the script renames
state.new.ca
to state.old.ca
, so that the
state ads just produced will be used as input the next time the
script is run. This is the only way that edge and time-based
triggers will work.