SYNOPSIS
sge_shepherd
DESCRIPTION
sge_shepherd provides the parent process functionality for a single
Univa Grid Engine job. The parent functionality is necessary on UNIX
systems to retrieve resource usage information (see getrusage(2)) after
a job has finished. In addition, the sge_shepherd forwards signals to
the job, such as the signals for suspension, enabling, termination and
the Univa Grid Engine checkpointing signal (see sge_ckpt(1) for
details).
The sge_shepherd receives information about the job to be started from
the sge_execd(8). During the execution of the job it actually starts
up to 5 child processes. First a prolog script is run if this feature
is enabled by the prolog parameter in the cluster configuration. (See
sge_conf(5).) Next a parallel environment startup procedure is run if
the job is a parallel job. (See sge_pe(5) for more information.) After
that, the job itself is run, followed by a parallel environment shut-
down procedure for parallel jobs, and finally an epilog script if
requested by the epilog parameter in the cluster configuration. The
prolog and epilog scripts as well as the parallel environment startup
and shutdown procedures are to be provided by the Univa Grid Engine
administrator and are intended for site-specific actions to be taken
before and after execution of the actual user job.
After the job has finished and the epilog script is processed,
sge_shepherd retrieves resource usage statistics about the job, places
them in a job specific subdirectory of the sge_execd(8) spool directory
for reporting through sge_execd(8) and finishes.
sge_shepherd also places an exit status file in the spool directory.
This exit status can be viewed with qacct -j JobId (see qacct(1)); it
is not the exit status of sge_shepherd itself but of one of the methods
executed by sge_shepherd. This exit status can have several meanings,
depending on in which method an error occurred (if any). The possible
methods are: prolog, parallel start, job, parallel stop, epilog, sus-
pend, restart, terminate, clean, migrate, and checkpoint.
The following exit values are returned:
0 All methods: Operation was executed successfully.
99 Job script, prolog and epilog: When FORBID_RESCHEDULE is not set
in the configuration (see sge_conf(5)), the job gets re-queued.
Otherwise see "Other".
100 Job script, prolog and epilog: When FORBID_APPERROR is not set
in the configuration (see sge_conf(5)), the job gets re-queued.
Otherwise see "Other".
Other Job script: This is the exit status of the job itself. No action
sge_shepherd should not be invoked manually, but only by sge_execd(8).
FILES
sgepasswd contains a list of user names and their correspond-
ing encrypted passwords. If available, the password file will be
used by sge_shepherd. To change the contents of this file please use
the sgepasswd command. It is not advised to change that file manually.
<execd_spool>/job_dir/<job_id> job specific directory
SEE ALSO
sge_intro(1), sge_conf(5), sge_execd(8).
COPYRIGHT
See sge_intro(1) for a full statement of rights and permissions.
UGE 8.0.0 $Date: 2007/07/19 09:04:33 $ SGE_SHEPHERD(8)
Man(1) output converted with
man2html