Gridware Cluster Scheduler 9.0.7: Enhanced Stability and Performance (2025-07-08)

Release Date: 2025-07-08

We're pleased to announce the release of Gridware Cluster Scheduler 9.0.7, built on Open Cluster Scheduler 9.0.7 (formerly known as "Sun Grid Engine"). This release continues our commitment to delivering reliable, high-performance workload management for HPC environments across diverse computing architectures.

Key Improvements in 9.0.7

Enhanced Stability and Reliability

Version 9.0.7 addresses several important areas to improve system reliability:

Thread Safety Improvements: The accounting and reporting code has been made fully thread-safe, eliminating potential race conditions in high-throughput environments.

Core Binding Fixes: Resolved issues with both striding and explicit core binding strategies that could prevent optimal core allocation even when cores were available.

Error Reporting: Fixed truncation issues in error messages displayed by qstat -j, ensuring administrators receive complete diagnostic information.

Installation Improvements: Addressed installer issues and corrected documentation references to ensure smooth deployment experiences.

Seamless Binary Replacement Upgrades

For existing 9.0.x deployments, upgrading to 9.0.7 remains straightforward with our binary replacement approach:

  1. Stop current services
  2. Replace binaries with 9.0.7 versions
  3. Restart services

No configuration changes or extended maintenance windows are required, making this an ideal upgrade for production environments.

Comprehensive Architecture Support

Open Cluster Scheduler 9.0.7 maintains extensive support for modern computing architectures:

  • x86-64: Full support across major Linux distributions (RHEL, Rocky, Ubuntu, SUSE)
  • ARM64: Comprehensive support including NVIDIA Grace Hopper platforms
  • Specialized Architectures: Support for PowerPC (ppc64le), s390x, and RISC-V platforms
  • Operating Systems: Linux distributions, FreeBSD, Solaris, and macOS (client tools)

This broad compatibility ensures organizations can deploy consistent workload management across heterogeneous computing environments.

Notable Features from the 9.0.x Series

Since many users may be upgrading from earlier versions, it's worth highlighting key capabilities introduced throughout the 9.0.x series:

qtelemetry (Developer Preview)

Integrated metrics exporter for Prometheus and Grafana, providing detailed cluster monitoring including host metrics, job statistics, and qmaster performance data.

Enhanced NVIDIA GPU Support

The qgpu command simplifies GPU resource management with automatic setup, per-job accounting, and support for Grace Hopper architectures.

MPI Integration Templates

Out-of-the-box support for major MPI distributions (Intel MPI, OpenMPI, MPICH, MVAPICH) with ready-to-use parallel environment configurations.

Advanced Resource Management

  • RSMAP (Resource Map) complex type for managing specialized resources like GPU devices
  • Per-host consumable resources
  • Resource and queue requests per scope for parallel jobs

Performance and Scalability

The 9.0.x series represents significant performance improvements over previous versions through:

  • Multi-threaded Architecture: Separate thread pools for different request types
  • Enhanced Data Stores: Multiple data stores reducing internal contention
  • Automatic Session Management: Ensures data consistency while maintaining performance
  • Optimized Scheduling: Improved algorithms for large-scale deployments

Continued 9.0.x Support

We remain committed to supporting the entire 9.0.x series with ongoing maintenance, security updates, and technical support. This provides organizations with confidence in their long-term deployment strategy while allowing flexibility in upgrade timing.

Getting Started

Quick Evaluation

For testing Open Cluster Scheduler 9.0.7 (the most feature rich and modern open source "Sun Grid Engine" successor) on major Linux distributions:

# Review the script before running
curl -s https://raw.githubusercontent.com/hpc-gridware/quickinstall/refs/heads/main/ocs.sh | OCS_VERSION=9.0.7 sh

If you are interested in our commercially supported Gridware Cluster Scheduler, please speak with us.

Production Deployment

Production environments should follow our comprehensive installation guide included with the release, ensuring proper configuration for specific requirements and environments.

Resources

Looking Forward

Version 9.0.7 reflects our ongoing dedication to providing robust, high-performance workload management solutions. Whether you're running traditional HPC simulations, modern AI workloads, or mixed computing environments, Gridware Cluster Scheduler delivers the reliability and performance your critical applications require.

The combination of enhanced stability, seamless upgrade paths, and broad architecture support makes 9.0.7 an excellent foundation for both current and future computing needs.


For technical questions or deployment assistance, please connect with our community through GitHub or contact our support team. We're committed to helping you maximize the value of your HPC infrastructure.