DESCRIPTION
This manual page describes the format of the template file for the
cluster queue configuration. Via the -aq and -mq options of the
qconf(1) command, you can add cluster queues and modify the configura-
tion of any queue in the cluster. Any of these change operations can be
rejected, as a result of a failed integrity verification.
The queue configuration parameters take as values strings, integer dec-
imal numbers or boolean, time and memory specifiers (see time_specifier
and memory_specifier in sge_types(5)) as well as comma separated lists.
Note, Univa Grid Engine allows backslashes (\) be used to escape new-
line (\newline) characters. The backslash and the newline are replaced
with a space (" ") character before any interpretation.
FORMAT
The following list of parameters specifies the queue configuration file
content:
qname
The name of the cluster queue as defined for queue_name in
sge_types(1). As template default "template" is used.
hostlist
A list of host identifiers as defined for host_identifier in
sge_types(1). For each host Univa Grid Engine maintains a queue
instance for running jobs on that particular host. Large amounts of
hosts can easily be managed by using host groups rather than by single
host names. As list separators white-spaces and "," can be used.
(template default: NONE).
If more than one host is specified it can be desirable to specify
divergences with the further below parameter settings for certain
hosts. These divergences can be expressed using the enhanced queue
configuration specifier syntax. This syntax builds upon the regular
parameter specifier syntax separately for each parameter:
"["host_identifier=<parameters_specifier_syntax>"]" [,"["host_identi-
fier=<parameters_specifier_syntax>"]" ]
note, even in the enhanced queue configuration specifier syntax an
entry without brackets denoting the default setting is required and
used for all queue instances where no divergences are specified.
Tuples with a host group host_identifier override the default setting.
Tuples with a host name host_identifier override both the default and
the host group setting.
Note that also with the enhanced queue configuration specifier syntax a
default setting is always needed for each configuration attribute; oth-
erwise the queue configuration gets rejected. Ambiguous queue configu-
rations with more than one attribute setting for a particular host are
rejected. Configurations containing override values for hosts not
queue_sort_method (see sched_conf(5) ).
Regardless of the queue_sort_method setting, qstat(1) reports queue
information in the order defined by the value of the seq_no. Set this
parameter to a monotonically increasing sequence. (type number; tem-
plate default: 0).
load_thresholds
load_thresholds is a list of load thresholds. Already if one of the
thresholds is exceeded no further jobs will be scheduled to the queues
and qmon(1) will signal an overload condition for this node. Arbitrary
load values being defined in the "host" and "global" complexes (see
complex(5) for details) can be used.
The syntax is that of a comma separated list with each list element
consisting of the complex_name (see sge_types(5)) of a load value, an
equal sign and the threshold value being intended to trigger the over-
load situation (e.g. load_avg=1.75,users_logged_in=5).
Note: Load values as well as consumable resources may be scaled differ-
ently for different hosts if specified in the corresponding execution
host definitions (refer to host_conf(5) for more information). Load
thresholds are compared against the scaled load and consumable values.
suspend_thresholds
A list of load thresholds with the same semantics as that of the
load_thresholds parameter (see above) except that exceeding one of the
denoted thresholds initiates suspension of one of multiple jobs in the
queue. See the nsuspend parameter below for details on the number of
jobs which are suspended. There is an important relationship between
the suspend_threshold and the scheduler_interval. If you have for exam-
ple a suspend threshold on the np_load_avg, and the load exceeds the
threshold, this does not have immediate effect. Jobs continue running
until the next scheduling run, where the scheduler detects the thresh-
old has been exceeded and sends an order to qmaster to suspend the job.
The same applies for unsuspending.
nsuspend
The number of jobs which are suspended/enabled per time interval if at
least one of the load thresholds in the suspend_thresholds list is
exceeded or if no suspend_threshold is violated anymore respectively.
Nsuspend jobs are suspended in each time interval until no sus-
pend_thresholds are exceeded anymore or all jobs in the queue are sus-
pended. Jobs are enabled in the corresponding way if the sus-
pend_thresholds are no longer exceeded. The time interval in which the
suspensions of the jobs occur is defined in suspend_interval below.
suspend_interval
The time interval in which further nsuspend jobs are suspended if one
of the suspend_thresholds (see above for both) is exceeded by the cur-
rent load on the host on which the queue is located. The time interval
is also used when enabling the jobs. The syntax is that of a
min_cpu_interval
The time between two automatic checkpoints in case of transparently
checkpointing jobs. The maximum of the time requested by the user via
qsub(1) and the time defined by the queue configuration is used as
checkpoint interval. Since checkpoint files may be considerably large
and thus writing them to the file system may become expensive, users
and administrators are advised to choose sufficiently large time inter-
vals. min_cpu_interval is of type time and the default is 5 minutes
(which usually is suitable for test purposes only). The syntax is that
of a time_specifier in sge_types(5).
processors
A set of processors in case of a multiprocessor execution host can be
defined to which the jobs executing in this queue are bound. The value
type of this parameter is a range description like that of the -pe
option of qsub(1) (e.g. 1-4,8,10) denoting the processor numbers for
the processor group to be used. Obviously the interpretation of these
values relies on operating system specifics and is thus performed
inside sge_execd(8) running on the queue host. Therefore, the parsing
of the parameter has to be provided by the execution daemon and the
parameter is only passed through sge_qmaster(8) as a string.
Currently, support is only provided for multiprocessor machines running
Solaris, SGI multiprocessor machines running IRIX 6.2 and Digital UNIX
multiprocessor machines. In the case of Solaris the processor set must
already exist, when this processors parameter is configured. So the
processor set has to be created manually. In the case of Digital UNIX
only one job per processor set is allowed to execute at the same time,
i.e. slots (see above) should be set to 1 for this queue.
qtype
The type of queue. Currently batch, interactive or a combination in a
comma separated list or NONE.
Jobs that need to be scheduled immediately (qsh, qlogin, qrsh and qsub
with option -now yes) can ONLY be scheduled on interactive queues.
The formerly supported types parallel and checkpointing are not allowed
anymore. A queue instance is implicitly of type parallel/checkpointing
if there is a parallel environment or a checkpointing interface speci-
fied for this queue instance in pe_list/ckpt_list. Formerly possible
settings e.g.
qtype PARALLEL
could be transferred into
qtype NONE
pe_list pe_name
(type string; default: batch interactive).
and detects that a job has been aborted for such reasons it can be
restarted if the jobs are restartable. A job may not be restartable,
for example, if it updates databases (first reads then writes to the
same record of a database/file) because the abortion of the job may
have left the database in an inconsistent state. If the owner of a job
wants to overrule the default behavior for the jobs in the queue the -r
option of qsub(1) can be used.
The type of this parameter is boolean, thus either TRUE or FALSE can be
specified. The default is FALSE, i.e. do not restart jobs automati-
cally.
slots
The maximum number of concurrently executing jobs allowed in any queue
instance defined by the queue. Type is number, valid values are 0 to
9999999.
tmpdir
The tmpdir parameter specifies the absolute path to the base of the
temporary directory filesystem. When sge_execd(8) launches a job, it
creates a uniquely-named directory in this filesystem for the purpose
of holding scratch files during job execution. At job completion, this
directory and its contents are removed automatically. The environment
variables TMPDIR and TMP are set to the path of each jobs scratch
directory (type string; default: /tmp).
shell
If either posix_compliant or script_from_stdin is specified as the
shell_start_mode parameter in sge_conf(5) the shell parameter specifies
the executable path of the command interpreter (e.g. sh(1) or csh(1))
to be used to process the job scripts executed in the queue. The defi-
nition of shell can be overruled by the job owner via the qsub(1) -S
option.
The type of the parameter is string. The default is /bin/csh.
shell_start_mode
This parameter defines the mechanisms which are used to actually invoke
the job scripts on the execution hosts. The following values are recog-
nized:
unix_behavior
If a user starts a job shell script under UNIX interactively by
invoking it just with the script name the operating system's
executable loader uses the information provided in a comment
such as `#!/bin/csh' in the first line of the script to detect
which command interpreter to start to interpret the script. This
mechanism is used by Univa Grid Engine when starting jobs if
unix_behavior is defined as shell_start_mode.
posix_compliant
POSIX does not consider first script line comments such a
ing execution daemon. In case you have prolog and epilog scripts
configured, they also need to be readable by any user who may
execute jobs.
If this violates your site's security policies you may want to
set shell_start_mode to script_from_stdin. This will force Univa
Grid Engine to open the job script as well as the epilogue and
prologue scripts for reading into STDIN as root (if sge_execd(8)
was started as root) before changing to the job owner's user
account. The script is then fed into the STDIN stream of the
command interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used (see
above).
Thus setting shell_start_mode to script_from_stdin also implies
posix_compliant behavior. Note, however, that feeding scripts
into the STDIN stream of a command interpreter may cause trouble
if commands like rsh(1) are invoked inside a job script as they
also process the STDIN stream of the command interpreter. These
problems can usually be resolved by redirecting the STDIN chan-
nel of those commands to come from /dev/null (e.g. rsh host date
< /dev/null). Note also, that any command-line options associ-
ated with the job are passed to the executing shell. The shell
will only forward them to the job if they are not recognized as
valid shell options.
The default for shell_start_mode is posix_compliant. Note, though,
that the shell_start_mode can only be used for batch jobs submitted by
qsub(1) and can't be used for interactive jobs submitted by qrsh(1),
qsh(1), qlogin(1).
prolog
The executable path of a shell script that is started before execution
of Univa Grid Engine jobs with the same environment setting as that for
the Univa Grid Engine jobs to be started afterwards. An optional prefix
"user@" specifies the user under which this procedure is to be started.
The procedures standard output and the error output stream are written
to the same file used also for the standard output and error output of
each job. This procedure is intended as a means for the Univa Grid
Engine administrator to automate the execution of general site specific
tasks like the preparation of temporary file systems with the need for
the same context information as the job. This queue configuration entry
overwrites cluster global or execution host specific prolog definitions
(see sge_conf(5)).
The default for prolog is the special value NONE, which prevents from
execution of a prologue script. The special variables for constitut-
ing a command line are the same like in prolog definitions of the clus-
ter configuration (see sge_conf(5)).
Exit codes for the prolog attribute can be interpreted based on the
following exit values:
0: Success
99: Reschedule job
overwrites cluster global or execution host specific epilog definitions
(see sge_conf(5)).
The default for epilog is the special value NONE, which prevents from
execution of a epilogue script. The special variables for constitut-
ing a command line are the same like in prolog definitions of the clus-
ter configuration (see sge_conf(5)).
Exit codes for the epilog attribute can be interpreted based on the
following exit values:
0: Success
99: Reschedule job
100: Put job in error state
Anything else: Put queue in error state
starter_method
The specified executable path will be used as a job starter facility
responsible for starting batch jobs. The executable path will be exe-
cuted instead of the configured shell to start the job. The job argu-
ments will be passed as arguments to the job starter. The following
environment variables are used to pass information to the job starter
concerning the shell environment which was configured or requested to
start the job.
SGE_STARTER_SHELL_PATH
The name of the requested shell to start the job
SGE_STARTER_SHELL_START_MODE
The configured shell_start_mode
SGE_STARTER_USE_LOGIN_SHELL
Set to "true" if the shell is supposed to be used as a login
shell (see login_shells in sge_conf(5))
The starter_method will not be invoked for qsh, qlogin or qrsh acting
as rlogin.
suspend_method
resume_method
terminate_method
These parameters can be used for overwriting the default method used by
Univa Grid Engine for suspension, release of a suspension and for ter-
mination of a job. Per default, the signals SIGSTOP, SIGCONT and
SIGKILL are delivered to the job to perform these actions. However, for
some applications this is not appropriate.
If no executable path is given, Univa Grid Engine takes the specified
parameter entries as the signal to be delivered instead of the default
signal. A signal must be either a positive number or a signal name with
"SIG" as prefix and the signal name as printed by kill -l (e.g.
$job_id
Univa Grid Engine's unique job identification number.
$job_name
The name of the job.
$queue The name of the queue.
$job_pid
The pid of the job.
notify
The time waited between delivery of SIGUSR1/SIGUSR2 notification sig-
nals and suspend/kill signals if job was submitted with the qsub(1)
-notify option.
owner_list
The owner_list enlists comma separated the login(1) user names (see
user_name in sge_types(1)) of those users who are authorized to disable
and suspend this queue through qmod(1) (Univa Grid Engine operators and
managers can do this by default). It is customary to set this field for
queues on interactive workstations where the computing resources are
shared between interactive sessions and Univa Grid Engine jobs, allow-
ing the workstation owner to have priority access. (default: NONE).
user_lists
The user_lists parameter contains a comma separated list of Univa Grid
Engine user access list names as described in access_list(5). Each
user contained in at least one of the enlisted access lists has access
to the queue. If the user_lists parameter is set to NONE (the default)
any user has access being not explicitly excluded via the xuser_lists
parameter described below. If a user is contained both in an access
list enlisted in xuser_lists and user_lists the user is denied access
to the queue.
xuser_lists
The xuser_lists parameter contains a comma separated list of Univa Grid
Engine user access list names as described in access_list(5). Each
user contained in at least one of the enlisted access lists is not
allowed to access the queue. If the xuser_lists parameter is set to
NONE (the default) any user has access. If a user is contained both in
an access list enlisted in xuser_lists and user_lists the user is
denied access to the queue.
projects
The projects parameter contains a comma separated list of Univa Grid
Engine projects (see project(5)) that have access to the queue. Any
project not in this list are denied access to the queue. If set to NONE
(the default), any project has access that is not specifically excluded
via the xprojects parameter described below. If a project is in both
1. Queuewise subordination
A list of Univa Grid Engine queue names as defined for queue_name in
sge_types(1). Subordinate relationships are in effect only between
queue instances residing at the same host. The relationship does not
apply and is ignored when jobs are running in queue instances on other
hosts. Queue instances residing on the same host will be suspended
when a specified count of jobs is running in this queue instance. The
list specification is the same as that of the load_thresholds parameter
above, e.g. low_pri_q=5,small_q. The numbers denote the job slots of
the queue that have to be filled in the superordinated queue to trigger
the suspension of the subordinated queue. If no value is assigned a
suspension is triggered if all slots of the queue are filled.
On nodes which host more than one queue, you might wish to accord bet-
ter service to certain classes of jobs (e.g., queues that are dedicated
to parallel processing might need priority over low priority production
queues; default: NONE).
2. Slotwise preemption
The slotwise preemption provides a means to ensure that high priority
jobs get the resources they need, while at the same time low priority
jobs on the same host are not unnecessarily preempted, maximizing the
host utilization. The slotwise preemption is designed to provide dif-
ferent preemption actions, but with the current implementation only
suspension is provided. This means there is a subordination relation-
ship defined between queues similar to the queuewise subordination, but
if the suspend threshold is exceeded, not the whole subordinated queue
is suspended, there are only single tasks running in single slots sus-
pended.
Like with queuewise subordination, the subordination relationships are
in effect only between queue instances residing at the same host. The
relationship does not apply and is ignored when jobs and tasks are run-
ning in queue instances on other hosts.
The syntax is:
slots=<threshold>(<queue_list>)
where
<threshold> =a positive integer number
<queue_list>=<queue_def>[,<queue_list>]
<queue_def> =<queue>[:<seq_no>][:<action>]
<queue> =a Univa Grid Engine queue name as defined for
queue_name in sge_types(1).
<seq_no> =sequence number among all subordinated queues
of the same depth in the tree. The higher the
sequence number, the lower is the priority of
the queue.
Default is 0, which is the highest priority.
which means the queue "B.q" is subordinated to the current queue (let's
call it "A.q"), the suspend threshold for all tasks running in "A.q"
and "B.q" on the current host is two, the sequence number of "B.q" is
"0" and the action is "suspend task with shortest run time first". This
subordination relationship looks like this:
A.q
|
B.q
This could be a typical configuration for a host with a dual core CPU.
This subordination configuration ensures that tasks that are scheduled
to "A.q" always get a CPU core for themselves, while jobs in "B.q" are
not preempted as long as there are no jobs running in "A.q".
If there is no task running in "A.q", two tasks are running in "B.q"
and a new task is scheduled to "A.q", the sum of tasks running in "A.q"
and "B.q" is three. Three is greater than two, this triggers the
defined action. This causes the task with the shortest run time in the
subordinated queue "B.q" to be suspended. After suspension, there is
one task running in "A.q", on task running in "B.q" and one task sus-
pended in "B.q".
b) A simple tree
subordinate_list slots=2(B.q:1, C.q:2)
This defines a small tree that looks like this:
A.q
/ \
B.q C.q
A use case for this configuration could be a host with a dual core CPU
and queue "B.q" and "C.q" for jobs with different requirements, e.g.
"B.q" for interactive jobs, "C.q" for batch jobs. Again, the tasks in
"A.q" always get a CPU core, while tasks in "B.q" and "C.q" are sus-
pended only if the threshold of running tasks is exceeded. Here the
sequence number among the queues of the same depth comes into play.
Tasks scheduled to "B.q" can't directly trigger the suspension of tasks
in "C.q", but if there is a task to be suspended, first "C.q" will be
searched for a suitable task.
If there is one task running in "A.q", one in "C.q" and a new task is
scheduled to "B.q", the threshold of "2" in "A.q", "B.q" and "C.q" is
exceeded. This triggers the suspension of one task in either "B.q" or
"C.q". The sequence number gives "B.q" a higher priority than "C.q",
therefore the task in "C.q" is suspended. After suspension, there is
one task running in "A.q", one task running in "B.q" and one task sus-
pended in "C.q".
c) More than two levels
is checked, the number of tasks running there is counted. If the
threshold which is defined in "B.q" is exceeded, the job in "C.q" is
suspended. Then the whole tree is checked, if the number of tasks run-
ning in "A.q", "B.q" and "C.q" exceeds the threshold defined in "A.q"
the task in "C.q" is suspended. This means, the effective threshold of
any subtree is not higher than the threshold of the root node of the
tree. If in this example a task is scheduled to "A.q", immediately the
number of tasks running in "A.q", "B.q" and "C.q" is checked against
the threshold defined in "A.q".
d) Any tree
A.q
/ \
B.q C.q
/ / \
D.q E.q F.q
\
G.q
The computation of the tasks that are to be (un)suspended always starts
at the queue instance that is modified, i.e. a task is scheduled to, a
task ends at, the configuration is modified, a manual or other auto-
matic (un)suspend is issued, except when it is a leaf node, like "D.q",
"E.q" and "G.q" in this example. Then the computation starts at its
parent queue instance (like "B.q", "C.q" or "F.q" in this example).
From there first all running tasks in the whole subtree of this queue
instance are counted. If the sum exceeds the threshold configured in
the subordinate_list, in this subtree a task is searched to be sus-
pended. Then the algorithm proceeds to the parent of this queue
instance, counts all running tasks in the whole subtree below the par-
ent and checks if the number exceeds the threshold configured at the
parent's subordinate_list. If so, it searches for a task to suspend in
the whole subtree below the parent. And so on, until it did this compu-
tation for the root node of the tree.
complex_values
complex_values defines quotas for resource attributes managed via this
queue. The syntax is the same as for load_thresholds (see above). The
quotas are related to the resource consumption of all jobs in a queue
in the case of consumable resources (see complex(5) for details on con-
sumable resources) or they are interpreted on a per queue slot (see
slots above) basis in the case of non-consumable resources. Consumable
resource attributes are commonly used to manage free memory, free disk
space or available floating software licenses while non-consumable
attributes usually define distinctive characteristics like type of
hardware installed.
For consumable resource attributes an available resource amount is
determined by subtracting the current resource consumption of all run-
ning jobs in the queue from the quota in the complex_values list. Jobs
Note also: The resource consumption of running jobs (used for the
availability calculation) as well as the resource requests of the jobs
waiting to be dispatched either may be derived from explicit user
requests during job submission (see the -l option to qsub(1)) or from a
"default" value configured for an attribute by the administrator (see
complex(5)). The -r option to qstat(1) can be used for retrieving full
detail on the actual resource requests of all jobs in the system.
For non-consumable resources Univa Grid Engine simply compares the
job's attribute requests with the corresponding specification in com-
plex_values taking the relation operator of the complex attribute defi-
nition into account (see complex(5)). If the result of the comparison
is "true", the queue is suitable for the job with respect to the par-
ticular attribute. For parallel jobs each queue slot to be occupied by
a parallel task is meant to provide the same resource attribute value.
Note: Only numeric complex attributes can be defined as consumable
resources and hence non-numeric attributes are always handled on a per
queue slot basis.
The default value for this parameter is NONE, i.e. no administrator
defined resource attribute quotas are associated with the queue.
calendar
specifies the calendar to be valid for this queue or contains NONE (the
default). A calendar defines the availability of a queue depending on
time of day, week and year. Please refer to calendar_conf(5) for
details on the Univa Grid Engine calendar facility.
Note: Jobs can request queues with a certain calendar model via a "-l
c=<cal_name>" option to qsub(1).
initial_state
defines an initial state for the queue either when adding the queue to
the system for the first time or on start-up of the sge_execd(8) on the
host on which the queue resides. Possible values are:
default The queue is enabled when adding the queue or is reset to the
previous status when sge_execd(8) comes up (this corresponds
to the behavior in earlier Univa Grid Engine releases not
supporting initial_state).
enabled The queue is enabled in either case. This is equivalent to a
manual and explicit 'qmod -e' command (see qmod(1)).
disabled The queue is disable in either case. This is equivalent to a
manual and explicit 'qmod -d' command (see qmod(1)).
RESOURCE LIMITS
The first two resource limit parameters, s_rt and h_rt, are implemented
bined CPU time consumed by all the processes in the job. If h_cpu is
exceeded by a job running in the queue, it is aborted via a SIGKILL
signal (see kill(1)). If s_cpu is exceeded, the job is sent a SIGXCPU
signal which can be caught by the job. If you wish to allow a job to
be "warned" so it can exit gracefully before it is killed then you
should set the s_cpu limit to a lower value than h_cpu. For parallel
processes, the limit is applied per slot which means that the limit is
multiplied by the number of slots being used by the job before being
applied.
The resource limit parameters s_vmem and h_vmem are implemented by
Univa Grid Engine as a job limit. They impose a limit on the amount of
combined virtual memory consumed by all the processes in the job. If
h_vmem is exceeded by a job running in the queue, it is aborted via a
SIGKILL signal (see kill(1)). If s_vmem is exceeded, the job is sent a
SIGXCPU signal which can be caught by the job. If you wish to allow a
job to be "warned" so it can exit gracefully before it is killed then
you should set the s_vmem limit to a lower value than h_vmem. For par-
allel processes, the limit is applied per slot which means that the
limit is multiplied by the number of slots being used by the job before
being applied.
The remaining parameters in the queue configuration template specify
per job soft and hard resource limits as implemented by the setr-
limit(2) system call. See this manual page on your system for more
information. By default, each limit field is set to infinity (which
means RLIM_INFINITY as described in the setrlimit(2) manual page). The
value type for the CPU-time limits s_cpu and h_cpu is time. The value
type for the other limits is memory. Note: Not all systems support
setrlimit(2).
Note also: s_vmem and h_vmem (virtual memory) are only available on
systems supporting RLIMIT_VMEM (see setrlimit(2) on your operating sys-
tem).
The UNICOS operating system supplied by SGI/Cray does not support the
setrlimit(2) system call, using their own resource limit-setting system
call instead. For UNICOS systems only, the following meanings apply:
s_cpu The per-process CPU time limit in seconds.
s_core The per-process maximum core file size in bytes.
s_data The per-process maximum memory limit in bytes.
s_vmem The same as s_data (if both are set the minimum is used).
h_cpu The per-job CPU time limit in seconds.
h_data The per-job maximum memory limit in bytes.
h_vmem The same as h_data (if both are set the minimum is used).
UGE 8.0.0 $Date: 2010/01/07 13:40:27 $ QUEUE_CONF(5)
Man(1) output converted with
man2html