condor_submit is the program for submitting jobs to Condor. condor_submit requires a submit-description file which contains commands to direct the queuing of jobs. One description file may contain specifications for the queuing of many condor jobs at once. All jobs queued by a single invocation of condor_submit must share the same executable, and are referred to as a ``job cluster''. It is advantageous to submit multiple jobs as a single cluster because:
SUBMIT DESCRIPTION FILE COMMANDS
Each condor job description file describes one cluster of jobs to be placed in the condor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments, etc. The submit-description file is then used as the only command-line argument to condor_submit.
The submit-description file must contain one executable command and at least one queue command. All of the other commands have default actions.
The commands which can appear in the submit-description file are:
requirements = Memory >= 64 && Mips > 45Only one requirements command may be present in a description file. By default, condor_submit appends the following clauses to the requirements expression:
requirements = Memory > 60 rank = Memoryasks Condor to find all available machines with more than 60 megabytes of memory and give the job the one with the most amount of memory. See the Condor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.
For example: Suppose you have a job that occasionally segfaults but you know if you run it again on the same data, chances are it will finish successfully. This is how you would represent that with on_exit_remove(assuming the signal identifier for segmentation fault is 4):
on_exit_remove = (ExitBySignal == True) && (ExitSignal != 4)
The above expression will not let the job exit if it exited by a signal and that signal number was 4(representing segmentaion fault). In any other case of the job exiting, it will leave the queue as it normally would have done.
If left unspecified, this will default to True.
periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and a *_hold expression takes precedent over a *_remove expression.
This expression is available only under UNIX and only for the standard and vanilla universes.
For example: Suppose you have a job that you know will run for an hour minimum. If the job exits after less than an hour, you would like it to be placed on hold and notified by e-mail instead of being allowed to leave the queue.
on_exit_hold = (ServerStartTime - JobStartDate) < 3600
The above expression will place the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became true.
periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.
If left unspecified, this will default to False.
This expression is available only under UNIX and only for the standard and vanilla universes.
For example: Suppose you would like your job removed if the total suspension time of the job is more than half of the run time of the job.
periodic_remove = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)
The above expression will remove the job once the conditions have become true.
Notice:
periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.
If left unspecified, this will default to False.
This expression is available only under UNIX and only for the standard and vanilla universes.
For example: Suppose you would like your job held if the total suspension time of the job is more than half of the total run time of the job.
periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)
The above expression will place the job on hold if it suspends longer than half the amount of time it has totally run. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became true.
If left unspecified, this will default to False.
periodic_* expressions(defined elsewhere in this man page) take precedent over on_exit_* expressions and any *_hold expression takes precedent over a *_remove expression.
This expression is available only under UNIX and only for the standard and vanilla universes.
job-owner@UID_DOMAINwhere UID_DOMAIN is specified by the Condor site administrator. If UID_DOMAIN has not been specified, Condor will send the email to :
job-owner@submit-machine-name
<parameter> = <value>Multiple environment variables can be specified by separating them with a semicolon (`` ; ''). These environment variables will be placed into the job's environment before execution. The length of all characters specified in the environment is currently limited to 10240 characters.
SIGTSTP
which tells the Condor libraries to initiate a checkpoint
of the process. For jobs submitted to the Vanilla Universe, the default
is SIGTERM
which is the standard way to terminate a program in UNIX.
If your job attempts to access any of the files mentioned in this list, Condor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.
The files given in this list may be simple filenames or complete paths and may include * as a wildcard. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:
compress_files = /tmp/data.gz, event.gz, *.gzip
Due to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will automatically copy the whole file to the executing machine, where it can be accessed quickly. When your job closes the file, it will be copied back to its original location. This list uses the same syntax as compress_files, shown above.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.
This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.
local_files = /tmp/*
This option only applies to standard-universe jobs.
Directs Condor to use a new filename in place of an old one. name
describes a filename that your job may attempt to open, and newname
describes the filename it should be replaced with.
newname may include an optional leading
access specifier, local:
or remote:
. If left unspecified,
the default access specifier is remote:
. Multiple remaps can be
specified by separating each with a semicolon.
This option only applies to standard-universe jobs.
If you wish to remap file names that contain equals signs or semicolons, these special chracaters may be escaped with a backslash.
This option only applies to standard-universe jobs.
dataset.1
. To instruct Condor
to force your job to read other.dataset
instead,
add this to the submit file:
file_remaps = "dataset.1=other.dataset"
very.big
. If this file can be found in the same place on
a local disk in every machine in the pool,
(say /bigdisk/bigfile
,) you can
instruct Condor of this fact by remapping very.big
to
/bigdisk/bigfile
and specifying that the file is to be read locally,
which will be much faster than reading over the network.
file_remaps = "very.big = local:/bigdisk/bigfile"
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"
These options only apply to standard-universe jobs.
If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1MB and the block size to 256KB for the file 'input.data', use this command:
buffer_files = "input.data=(1000000,256000)"
Alternatively, you may use these two options to set the default sizes for all files used by your job:
buffer_size = 1000000 buffer_block_size = 256000
If you do not set these, Condor will use the values given by these two config file macros:
DEFAULT_IO_BUFFER_SIZE = 1000000 DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000
Finally, if no other settings are present, Condor will use a buffer of 512KB and a block size of 32KB.
... GlobusScheduler = lego.bu.edu/jobmanager-lsf queue
In addition to commands, the submit-description file can contain macros and comments:
<macro_name> = <string>Two pre-defined macros are supplied by the description file parser. The $(Cluster) macro supplies the number of the job cluster, and the $(Process) macro supplies the number of the job. These macros are intended to aid in the specification of input/output files, arguments, etc., for clusters with lots of jobs, and/or could be used to supply a Condor process with its own cluster and process numbers on the command line. The $(Process) macro should not be used for PVM jobs.
If you happen to want a ``$'' as a literal character, then you must use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of macro called a ``Substitution Macro'' that allows you to substitue expressions defined on the resource machine itself(gotten after a match to the machine has been performed) into specific expressions in your submit description file. The special substitution macro is of the form:
$$(attribute)
The substitution macro can only be used in three expressions in the submit description file: executable , environment , and arguments . The most common use of this macro is for heterogeneous submission of an executable:
executable = povray.$$(opsys).$$(arch)The opsys and arch attributes will be substituted at match time for any given resource. This will allow Condor to automatically choose the right executable for the right machine.
condor_submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.
Example 1: The below example queues three jobs for execution by Condor. The first will be given command line arguments of '15' and '2000', and will write its standard output to 'foo.out1'. The second will be given command line arguments of '30' and '2000', and will write its standard output to 'foo.out2'. Similarly the third will have arguments of '45' and '6000', and will use 'foo.out3' for its standard output. Standard error output, (if any), from all three programs will appear in 'foo.error'.
#################### # # Example 1: queueing multiple jobs with differing # command line arguments and output files. # #################### Executable = foo Arguments = 15 2000 Output = foo.out1 Error = foo.err1 Queue Arguments = 30 2000 Output = foo.out2 Error = foo.err2 Queue Arguments = 45 6000 Output = foo.out3 Error = foo.err3 Queue
Example 2: This submit-description file example queues 150 runs of program 'foo' which must have been compiled and linked for Silicon Graphics workstations running IRIX 6.x. Condor will not attempt to run the processes on machines which have less than 32 megabytes of physical memory, and will run them on machines which have at least 64 megabytes if such machines are available. Stdin, stdout, and stderr will refer to ``in.0'', ``out.0'', and ``err.0'' for the first run of this program (process 0). Stdin, stdout, and stderr will refer to ``in.1'', ``out.1'', and ``err.1'' for process 1, and so forth. A log file containing entries about where/when Condor runs, checkpoints, and migrates processes in this cluster will be written into file ``foo.log''.
#################### # # Example 2: Show off some fancy features including # use of pre-defined macros and logging. # #################### Executable = foo Requirements = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI" Rank = Memory >= 64 Image_Size = 28 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
+WantCheckpoint = Falsein the submit-description file before the queue command(s).
See the HTCondor Manual for additional notices.