Univa Grid Engine 8.1.0 Features (Part 2) - Better Resource Management with the RSMAP Complex (2012-05-25)

Grid Engine deals with several different resource types like memory, resources which can be counted (integer), resources with names (strings/regex), resources with values (double), and boolean values. When you‘ve a countable resource on a specific host, for example a number of GPU cards, then Grid Engine can manage the amount of GPU cards, so that those hosts are not overloaded with jobs. But the jobs don‘t know which specific instance of GPUs they have to access. In older Grid Engine installations this is often solved by writing external scripts in order to manage access to the specific resources. With the introduction of the RSMAP complex this can now be handled by Grid Engine itself.

RSMAP (resource maps) are lists of strings which are managed by the Grid Engine scheduler. Because strings can represent arbitrary resource IDs this new complex type is very flexible. In the GPU example the list could be something like „GPU1 GPU2“. Such lists are then initialized in the „complex_values“ field of the execution host configuration, like integer and other resource types. Hence it is a per host resource.

The scheduler treats one item as of the list as an individual resource, but all of them are requested by the same resource name (the complex name) during job submission time. Additionally you can of course also request the amount of items your job needs. When a job now requests 1 of such a resource, then the jobs gets one of the „GPU1“ or „GPU2“ values attached. You can see the selected instance in the qstat output. The job itself can get the selected resource by reading out a special environment variable. If you want shared access to your resources, lets say 2 jobs are allowed to access one of the GPUs, you can easily configure that by creating a string list which has multiple same entries: „GPU1 GPU2 GPU1 GPU2“. So the first job will access the first GPU the second the second GPU and the third again the first GPU (in the first scheduler run). The order is the order of the currently unused items in the list. This concept can be used also for accessing licensees when you are using special IDs for accessing one. Since numeric IDs are very common you can create an ID list very easily by using following notation „<from>-<to>“, which creates strings with a number range. So "1-10" will create following list implicitly „1 2 3 4 5 6 7 8 9 10“.