Fun With Grid Engine Topology Masks -or- How to Bind Compute Jobs to Different Amount of Cores Depending where the Jobs Run (2013-01-09)

A compute cluster or grid often varies in the hardware installed. Some compute nodes offers processors with strong compute capabilities while others are older and slower. This becomes an issue when parallel compute jobs are submitted to a job/cluster scheduler. When the jobs runs on slower hardware it should use more compute cores than on faster cores. But during submission time the parameters (like the amount of cores used for core binding) are often fixed. An obvious solution is to restrict the scheduler that the job can only run on a specific cluster partition with similar hardware (like in Grid Engine a queue domain requested with -q queue@@hostgroup). But the drawback of this approach is that the cluster is not fully utilized while jobs are waiting for getting a fast machine.

An alternative would be to exploit the new Grid Engine topology masks introduced in Univa Grid Engine 8.1.3. While with the core binding submission parameter (qsub -binding) a job requests a specific fixed amount of cores for the job, the topology mask bind the core selection request to the resource (i.e. execution machine the scheduler selects). Hence the Grid Engine scheduler can translate a virtual entity into the machine specific meanings. That‘s a bit abstract, hence an example should demonstrate it:

We assume that we have two types of machines in our cluster, old two socket quad-core processors and new octo-core machines also with two sockets. The workload creates lots of threads or can create as much threads as processors are available (during job execution the processors can be get from the $SGE_BINDING environment variable). What we can do now is defining a virtual compute entity, which expresses some compute power. Let‘s call it quads here. Since the old compute cores are slower than the new one you can define that a quad for the old machines are all four cores of processor, while on the new host a quad is equivalent to two cores (or one core if you like).

All what you have to do is to define the new compute entity quad in the Grid Engine complex configuration as a per host consumable with type RSMAP (resource map).

       qconf -mc
        quad  qd  RSMAP  <=  YES  HOST  0  0

Now we need to assign the quads to the hosts. This is done in the host configuration. At an old host a quad represents 4 cores. Hence 2 quads are available per host.

       qconf -me oldhost
       complex_values quad=2(quad1:SCCCCScccc quad2:SccccSCCCC)

Now we have two quad resources available: quad1 which is bound to the 4 cores on the first socket and quad2 which is available on the 4 cores of the second socket.

The same needs to be done on the new hosts, but now a quad are just two CPU cores.

       qconf -me newhost
       complex_values quad=8(quad1:SCCccccccScccccccc quad2:SccCCccccScccccccc \
      quad3:SccccCCccScccccccc quad4:SccccccCCScccccccc \
      quad5:SccccccccSCCcccccc quad6:SccccccccSccCCcccc \
      quad7:SccccccccSccccCCcc quad8:SccccccccSccccccCC) 

This looks like a little work for the administrator but afterwards it is very simple to use. The user just has to request the amount of quads he wants for the job and finally the core selection and core binding for the job is automatically done. The scheduler selects always the first free quad in the order of the complex values definition. Following job submission demonstrates it:

       qsub -l quad=1 -pe mytestpe 4 -b y sleep 123

Resource Based Core Binding
In order to force all jobs to use this compute entity quad, a JSV script could be written which adds the request to all jobs (while it disallows any existing core binding requests).

When now the job runs on a new host it is bound on two cores, while when the job is scheduled by Grid Engine to an on old host it runs on four cores. Of course you can also request as many quads as you like. NB: When requesting more then 2 quads per job, the job can‘t be scheduled to an old host. It will run only on new hosts. The quads are a per host requests, meaning that on each host the job runs that amount of quads you had requested are chosen (independent from the requested and granted amount of slots on this host).

This feature is unique in Univa Grid Engine and was introduced in UGE 8.1.3.