sched_conf  defines  the  configuration  file  format  for  Univa  Grid
       Engine's  scheduler.  In order to modify  the  configuration,  use  the
       graphical  user's  interface  qmon(1)  or  the  -msconf  option  of the
       qconf(1) command. A default configuration is provided together with the
       Univa Grid Engine distribution package.

       Note,  Univa  Grid Engine allows backslashes (\) be used to escape new-
       line (\newline) characters. The backslash and the newline are  replaced
       with a space (" ") character before any interpretation.

       The following parameters are recognized by the Univa Grid Engine sched-
       uler if present in sched_conf:

       Note: Deprecated, may be removed in future release.
       Allows for the selection of alternative scheduling algorithms.

       Currently default is the only allowed setting.

       A simple algebraic expression used to derive  a  single  weighted  load
       value  from all or part of the load parameters reported by sge_execd(8)
       for each host and from all or part of  the  consumable  resources  (see
       complex(5))  being  maintained for each host.  The load formula expres-
       sion syntax is that of a summation weighted load values, that is:


       Note, no blanks are allowed in the load formula.
       The load values and consumable resources (load_val1, ...)   are  speci-
       fied by the name defined in the complex (see complex(5)).
       Note:  Administrator defined load values (see the load_sensor parameter
       in sge_conf(5) for details) and consumable resources available for  all
       hosts (see complex(5)) may be used as well as Univa Grid Engine default
       load parameters.
       The weighting factors  (w1,  ...)  are  positive  integers.  After  the
       expression  is  evaluated for each host the results are assigned to the
       hosts and are used to sort the  hosts  corresponding  to  the  weighted
       load. The sorted host list is used to sort queues subsequently.
       The default load formula is "np_load_avg".

       The  load,  which is imposed by the Univa Grid Engine jobs running on a
       system varies in time, and often, e.g. for the CPU load, requires  some
       amount  of time to be reported in the appropriate quantity by the oper-
       ating system. Consequently, if a job was  started  very  recently,  the
       reported  load  may not provide a sufficient representation of the load
       which is already imposed on that host by the  job.  The  reported  load
       will adapt to the real load over time, but the period of time, in which
       the reported load is too low, may already lead to  an  oversubscription
       bined  and weighted load of the hosts with the load_formula (see above)
       and to compare the load and consumable values against the load  thresh-
       old  lists defined in the queue configurations (see queue_conf(5)).  If
       the load_formula consists simply of the default CPU load average param-
       eter np_load_avg, and if the jobs are very compute intensive, one might
       want to set the job_load_adjustments list  to  np_load_avg=1.00,  which
       means  that  every  new job dispatched to a host will require 100 % CPU
       time, and thus the machine's load is instantly increased by 1.00.

       The load corrections  in  the  "job_load_adjustments"  list  above  are
       decayed  linearly  over time from the point of the job start, where the
       corresponding load or consumable parameter is raised by the  full  cor-
       rection   value,   until   after   a   time   period  of  "load_adjust-
       ment_decay_time", where the correction becomes  0.  Proper  values  for
       "load_adjustment_decay_time" greatly depend upon the load or consumable
       parameters used and the specific operating system(s).  Therefore,  they
       can  only  be  determined  on-site and experimentally.  For the default
       np_load_avg load parameter a "load_adjustment_decay_time" of 7  minutes
       has proven to yield reasonable results.

       The  maximum  number  of jobs any user may have running in a Univa Grid
       Engine cluster at the same time. If set to 0 (default)  the  users  may
       run an arbitrary number of jobs.

       At  the time the scheduler thread initially registers at the event mas-
       ter thread in sge_qmaster(8)process schedule_interval is  used  to  set
       the  time  interval  in  which the event master thread sends scheduling
       event updates to the scheduler thread.  A scheduling event is a  status
       change  that  has  occurred  within sge_qmaster(8) which may trigger or
       affect scheduler decisions (e.g. a job has finished and thus the  allo-
       cated resources are available again).
       In  the Univa Grid Engine default scheduler the arrival of a scheduling
       event report triggers a scheduler run. The scheduler  waits  for  event
       reports otherwise.
       Schedule_interval  is  a time value (see queue_conf(5) for a definition
       of the syntax of time values).

       This parameter determines in which order  several  criteria  are  taken
       into  account  to  product a sorted queue list. Currently, two settings
       are valid: seqno and load. However in both  cases,  Univa  Grid  Engine
       attempts  to  maximize  the  number  of  soft  requests (see qsub(1) -s
       option) being fulfilled by the queues for a particular as  the  primary
       Then,  if  the  queue_sort_method parameter is set to seqno, Univa Grid
       Engine will use the seq_no parameter as configured in the current queue
       configurations  (see  queue_conf(5))  as the next criterion to sort the
       queue list. The load_formula (see above) has  only  a  meaning  if  two
       queues  have  equal  sequence  numbers.  If queue_sort_method is set to
       the time format as specified in queue_conf(5).
       If the value is set to 0, the usage is not decayed.

       Univa  Grid  Engine  accounts for the consumption of the resources CPU-
       time, memory and IO to determine the usage which is imposed on a system
       by  a  job.  A  single  usage  value is computed from these three input
       parameters by multiplying the individual values by weights  and  adding
       them  up.  The weights are defined in the usage_weight_list. The format
       of the list is


       where wcpu, wmem and wio are the configurable weights. The weights  are
       real number. The sum of all tree weights should be 1.

       Determines  how fast Univa Grid Engine should compensate for past usage
       below of above the share entitlement defined in the share tree.  Recom-
       mended values are between 2 and 10, where 10 means faster compensation.

       The relative importance of the user shares in  the  functional  policy.
       Values are of type real.

       The relative importance of the project shares in the functional policy.
       Values are of type real.

       The relative importance of the department shares in the functional pol-
       icy. Values are of type real.

       The  relative  importance  of  the job shares in the functional policy.
       Values are of type real.

       The maximum number of functional tickets available for distribution  by
       Univa Grid Engine. Determines the relative importance of the functional
       policy.  See under sge_priority(5) for an overview on job priorities.

       The maximum number of share based tickets available for distribution by
       Univa Grid Engine. Determines the relative importance of the share tree
       policy. See under sge_priority(5) for an overview on job priorities.

       The weight applied on the remaining time  until  a  jobs  latest  start
       time.  Determines  the  relative  importance of the deadline. See under
       sge_priority(5) for an overview on job priorities.

       The weight applied on normalized ticket amount when determining  prior-
       ity  finally  used.   Determines  the relative importance of the ticket
       policies. See under sge_priority(5) for an overview on job  priorities.

       The  parameters  are provided for tuning the system's scheduling behav-
       ior.  By default, a scheduler run is triggered in the scheduler  inter-
       val.  When  this parameter is set to 1 or larger, the scheduler will be
       triggered x seconds after a job has finished. Setting this parameter to
       0 disables the flush after a job has finished.

       The  parameters  are provided for tuning the system's scheduling behav-
       ior.  By default, a scheduler run is triggered in the scheduler  inter-
       val.   When this parameter is set to 1 or larger, the scheduler will be
       triggered  x seconds after a job was submitted to the  system.  Setting
       this parameter to 0 disables the flush after a job was submitted.

       The  default  scheduler  can keep track why jobs could not be scheduled
       during the last scheduler run. This parameter enables or  disables  the
       observation.  The value true enables the monitoring false turns it off.

       It is also possible to activate the observation only for certain  jobs.
       This  will  be  done  if the parameter is set to job_list followed by a
       comma separated list of job ids.

       The user can obtain the collected information with  the  command  qstat

       This  is  foreseen  for passing additional parameters to the Univa Grid
       Engine scheduler. The following values are recognized:

              If set, overrides the default of value 60 seconds.  This parame-
              ter  is  used  by  the Univa Grid Engine scheduler when planning
              resource utilization as the delta between net job  runtimes  and
              total  time until resources become available again. Net job run-
              time  as  specified  with  -l  h_rt=...   or  -l   s_rt=...   or
              default_duration  always  differs  from total job runtime due to
              delays before and after actual job start and finish.  Among  the
              delays  before  job  start is the time until the end of a sched-
              ule_interval, the time it takes to deliver a job to sge_execd(8)
              and   the   delays   caused   by   prolog   in  queue_conf(5)  ,
              start_proc_args in sge_pe(5) and starter_method in queue_conf(5)
              (notify,  terminate_method  or  checkpointing),  procedures  run
              after actual job finish, such as stop_proc_args in sge_pe(5)  or
              epilog  in  queue_conf(5)  ,  and  the  delay until a new sched-

              A exception are jobs, which request a resource reservation. They
              are included regardless of the number of jobs in a category.

              This setting is turned off per default,  because  in  very  rare
              cases,  the  scheduler  can  make  a  wrong decision. It is also
              advised to turn report_pjob_tickets off.  Otherwise  qstat  -ext
              can report outdated ticket amounts. The information shown with a
              qstat -j for a job, that was excluded in a  scheduling  run,  is
              very limited.

              If set equal to 1, the scheduler logs profiling information sum-
              marizing each scheduling run.

              If set equal to 1, the scheduler records  information  for  each
              scheduling  run  allowing to reproduce job resources utilization
              in the file <sge_root>/<cell>/common/schedule.

              This parameter sets the algorithm for the pe range  computation.
              The  default  is  automatic, which means that the scheduler will
              select the best one, and it should not be necessary to change it
              to  a different setting in normal operation. If a custom setting
              is needed, the following values are available:
              auto       : the scheduler selects the best algorithm
              least      : starts the resource matching with the  lowest  slot
              amount first
              bin         :  starts the resource matching in the middle of the
              pe slot range
              highest    : starts the resource matching with the highest  slot
              amount first

       Changing  params will take immediate effect.  The default for params is

       Interval (HH:MM:SS) to reprioritize jobs on the execution  hosts  based
       on  the  current ticket amount for the running jobs. If the interval is
       set to 00:00:00 the reprioritization is turned off. The  default  value
       is 00:00:00.  The reprioritization tickets are calculated by the sched-
       uler and update events for running jobs are only sent after the  sched-
       uler calculated new values. How often the schedule should calculate the
       tickets is defined by the reprioritize_interval.  Because the scheduler
       is  only  triggered  in  a  specific interval (scheduler_interval) this
       means the reprioritize_interval has only a meaning if set greater  than
       the  scheduler_interval.   For  example, if the scheduler_interval is 2
       minutes and reprioritize_interval is set to 10 seconds, this means  the
       jobs get re-prioritized every 2 minutes.

       turned off by default, and the halftime is used instead.
       The  halflife_decay_list  also  allows one to configure different decay
       rates for each usage type being tracked (cpu, io, and mem). The list is
       specified in the following format:


       <Usage_TYPE> can be one of the following: cpu, io, or mem.
       <TIME>  can  be  -1, 0 or a timespan specified in minutes. If <TIME> is
       -1, only the usage of currently running jobs is used. 0 means that  the
       usage is not decayed.

       This  parameter  sets  up  a dependency chain of ticket based policies.
       Each ticket based policy in the dependency chain is influenced  by  the
       previous policies and influences the following policies. A typical sce-
       nario is to assign precedence for the override policy over  the  share-
       based  policy. The override policy determines in such a case how share-
       based tickets are assigned among jobs of  the  same  user  or  project.
       Note  that  all  policies contribute to the ticket amount assigned to a
       particular job regardless of the policy hierarchy definition.  Yet  the
       tickets  calculated  in each of the policies can be different depending

       The "POLICY_HIERARCHY" parameter can be a up to 3 letter combination of
       the  first letters of the 3 ticket based policies S(hare-based), F(unc-
       tional) and O(verride). So a value "OFS" means that the override policy
       takes  precedence  over the functional policy, which finally influences
       the share-based policy.  Less than 3 letters  mean  that  some  of  the
       policies do not influence other policies and also are not influenced by
       other policies. So a value of "FS" means  that  the  functional  policy
       influences  the  share-based  policy  and that there is no interference
       with the other policies.

       The special value "NONE" switches off policy hierarchies.

       If set to "true" or  "1",  override  tickets  of  any  override  object
       instance  are shared equally among all running jobs associated with the
       object. The pending jobs will get as many  override  tickets,  as  they
       would  have, when they were running. If set to "false" or "0", each job
       gets the full value of the override tickets associated with the object.
       The default value is "true".

       If  set  to  "true"  or "1", functional shares of any functional object
       instance are shared among all the jobs associated with the  object.  If
       set  to  "false"  or "0", each job associated with a functional object,
       gets the full functional shares of that object. The  default  value  is

       resource  as specified in sge_pe(5).  As job runtime the maximum of the
       time specified with -l h_rt=... or -l s_rt=...  is  assumed.  For  jobs
       that  have  neither  of them the default_duration is assumed.  Reserva-
       tions prevent jobs of lower priority as  specified  in  sge_priority(5)
       from  utilizing the reserved resource quota during the time of reserva-
       tion.  Jobs of lower priority are allowed  to  utilize  those  reserved
       resources  only if their prospective job end is before the start of the
       reservation (backfilling).  Reservation is done only for  non-immediate
       jobs  (-now  no) that request reservation (-R y). If max_reservation is
       set to "0" no job reservation is done.

       Note, that reservation scheduling  can  be  performance  consuming  and
       hence reservation scheduling is switched off by default. Since reserva-
       tion scheduling performance consumption is known to grow with the  num-
       ber  of  pending  jobs,  the use of -R y option is recommended only for
       those jobs actually queuing for bottleneck  resources.   Together  with
       the max_reservation parameter this technique can be used to narrow down
       performance impacts.

       When job reservation is enabled through  max_reservation  sched_conf(5)
       parameter the default duration is assumed as runtime for jobs that have
       neither -l h_rt=...  nor  -l  s_rt=...  specified.  In  contrast  to  a
       h_rt/s_rt time limit the default_duration is not enforced.

                  scheduler thread configuration

       sge_intro(1),   qalter(1),  qconf(1),  qstat(1),  qsub(1),  complex(5),
       queue_conf(5), sge_execd(8), sge_qmaster(8), Univa Grid Engine  Instal-
       lation and Administration Guide

       See sge_intro(1) for a full statement of rights and permissions.

UGE 8.0.0                $Date: 2009/07/08 14:42:40 $            SCHED_CONF(5)

Man(1) output converted with man2html