Mastering GPUs with Open Cluster Scheduler (2024-07-1)

Mastering GPUs with Open Cluster Scheduler's RSMAP

Check out the full article here.

Unlock the full potential of your GPU resources with Open Cluster Scheduler's latest feature — Resource Map (RSMAP). This powerful and flexible resource type ensures efficient and conflict-free utilization of specific resource instances, such as GPUs.

Why Choose RSMAP?

  • Collision-Free Use: Ensures exclusive access to resources, preventing conflicts between jobs.
  • Detailed Monitoring & Accounting: Facilitates precise tracking and reporting of actual resource usage.
  • Versatile Resource Management:
    • Host-Level Resources: Manage local resources such as port numbers, GPUs, NUMA resources, network devices, and storage devices.
    • Global Resources: Manage network-wide resources like IP addresses, DNS names, license servers, and port numbers.

Example: Efficient GPU Management

  1. Define Your GPU Resource: Begin by opening the resource complex configuration using qconf -mc and add the following line:

    GPU gpu RSMAP <= YES YES NONE 0
    

    This defines a resource named GPU using the RSMAP type, marking it as requestable and consumable with specific constraints.

  2. Initialize Resource on Hosts: Assign values to the GPU resources on a specific host by modifying the host configuration with qconf -me <hostname>. For a host with 4 GPUs:

    complex_values GPU=4(0 1 2 3)
    

    This indicates the host has 4 GPU instances with IDs 0, 1, 2, and 3.

  3. Submit Your Job: Request GPU resources in your job script:

    #!/bin/bash
    env | grep SGE_HGR
    

    Submit the job with the command:

    qsub -l GPU=2 ./job.sh
    

    Your job will now be allocated the requested GPU resources, which can be confirmed by checking the output for granted GPU IDs. Convert these IDs for use with NVIDIA jobs:

    export CUDA_VISIBLE_DEVICES=$(echo $SGE_HGR_GPU | tr ' ' ',')
    

This innovative approach to resource management enhances both performance and resource tracking, making it a must-have for efficient computing. Plus, HPC Gridware set to release a new GPU package featuring streamlined configuration, improved GPU accounting, and automated environment variable management, taking the hassle out of GPU cluster management.

For more detailed information, check out the full article here. It's your go-to guide for mastering GPU management with the Open Cluster Scheduler!