The Singularity is :%s/near/here/g - Using wfl and Go drmaa2 with Singularity (2018-12-04)

The Singularity containerization software is re-written in Go. That’s a great message and caught immediately my interest :-). For a short intro into Singularity there are many sources including wikipedia or the official documentation.

A typical problem using Docker in a broader context, i.e. including integrating it in workload managers, is that it runs as a daemon. That means requests to start a Docker container end up in processes which are children of the daemon and not of the original process which initiated the container creation request. Workload managers are built around managing and supervising child processes and have reliable mechanisms for supervising processes which detach itself from the parent processes but they don’t work very well with Docker daemon. That’s why most of the serious workload managers do container orchestration using a different mechanism (like using directly runc or rkt or even handling cgroups and namespaces directly). Supervision includes resource usage measurements and limitations which sometimes exceed Dockers native capabilities. Singularity brings back the expected behavior for workload managers. But there are more reasons people use Singularity which I don’t cover in this article.

This article is about using Singularity in Go applications using wfl and drmaa2. Since I wanted to give Singularity a try by myself I started to add support for it in the drmaa2 go binding by implementing a job tracker. It was a very short journey to get the first container running as most of the necessary functionality was already implemented.

If you are Go programmer and write Singularity workflows either for testing, performance measurements, or for running simulations I would be more than happy to get feedback about future improvements.

DRMAA2 is a standard for job management which includes basic primitives like start, stop, pause, resume. That’s enough to built workflow management applications or even simpler abstractions like wfl around it.

In order to make use of the implemented DRMAA2 calls for Singularity in your Go application you need to include it along with the drmaa2interface dependency


    "github.com/dgruber/drmaa2interface"
    "github.com/dgruber/drmaa2os"

The next step is to create a DRMAA2 job session. As the underlying implementation need to make the session names and job IDs persistent you need to provide a path to a file which is used for that purposes.


    sm, err := drmaa2os.NewSingularitySessionManager(filepath.Join(os.TempDir(), "jobs.db"))
    if err != nil {
        panic(err)
    }

Note that when Singularity is not installed the call fails.

Creating a job session is like defined in the standard. As it fails when it exists (also defined in the standard) we need to do:


   js, err := sm.CreateJobSession("jobsession", "")
    if err != nil {
        js, err = sm.OpenJobSession("jobsession")
        if err != nil {
            panic(err)
        }
    }

The JobTemplate is the key for specifying the details of the job and the container.


       jt := drmaa2interface.JobTemplate{
            RemoteCommand: "/bin/sleep",
            Args:          []string{"600"},
            JobCategory:   "shub://GodloveD/lolcow",
            OutputPath:    "/dev/stdout",
            ErrorPath:     "/dev/stderr",
        }

The RemoteCommand is the application executed within the container and therefore must be accessible inside of the container. Args is the list of arguments for the file. The JobCategory is mandatory and specifies the image to be used. Here we are downloading the image from Singularity Hub (shub). In order to see some output in the shell we need to specify the OutputPath pointing to /dev/stdout. We should do the same for the ErrorPath otherwise we will miss some error messages printed by the applications running within the container.

In order to give Singularity more parameters (which are of course not part of the standard) we can use the extensions:


    jt.ExtensionList = map[string]string{
        "debug": "true",
        "pid":   "true",
    }

Please check out the drmaa2os singularity job tracker sources for a complete list of evaluated parameters (it might be the case some are missing). debug is a global parameter (before exec) and pid is an exec specific parameter. Since they are boolean parameters it does not matter if they are set to true or just an empty string. Only when set to false or FALSE they will be ignored.

This JobTemplate will result in a process managed by the Go drmaa2 implementation started with

singularty —debug exec —pid shub://GodloveD/lolcow /bin/sleep 600

RunJob() finally creates the process and returns immediately .


    job, err := js.RunJob(jt)
    if err != nil {
        panic(err)
    }

The job ID can be used to perform action on the container. It can be suspended:


    err = job.Suspend()

and resumed


    err = job.Resume()

or killed


    err := job.Terminate()

The implementation sends the necessary signals to the process group.

You can wait until it is terminated (with a timeout or drmaa2interface.InfiniteTime).


    err = job.WaitTerminated(time.Second * 240)

Don’t forget to close the JobSession at the end and destroy it properly.

    
    js.Close()
    sm.DestroyJobSession("jobsession")

While that is easy it still requires lot’s of code when writing real applications. That’s what we have wfl for. wfl now also includes the Singularity support as it is based on the drmaa2 interfaces.

Since last Sunday wfl can make use of default JobTemplate’s which allows to set a container image and parameters like OutputPath and ErrorPath for each subsequent job run. This further reduces the LOC to be written by avoiding RunT().

For starting a Singularity container you need a workflow based on the new SingularityContext which accepts a DefaultTemplate.


    var template = drmaa2interface.JobTemplate{
        JobCategory: "shub://GodloveD/lolcow",
        OutputPath:  "/dev/stdout",
        ErrorPath:   "/dev/stderr",
    }

    flow := wfl.NewWorkflow(wfl.NewSingularityContextByCfg(wfl.SingularityConfig{DefaultTemplate: template}))

Now flow let’s allow you manage your Singularity containers. For running 100 containers executing the sleep binary in parallel and re-executing any failed containers up 3 times you can do like that.


    job := flow.Run("/bin/sleep", "15").
        Resubmit(99).
        Synchronize().
        RetryAnyFailed(3)

For more capabilities of wfl please check out also the other examples at https://github.com/dgruber/wfl.

Of course you can mix backends in your applications, i.e. you can manage OS processes, k8s jobs, Docker containers, and Singularity containers within the same application.

Any feedback welcome.