Julia Lang, Docker & Nextflow

Preamble

This is basically just a write up of some learnings I’ve experienced while trying out some basic things in Julia - and thus by no means an expert summary - YMMV.

Julia Language

I’ve recently came into touch with the Julia language project and played around with it from time to time - some basic calculations were relatively easy to do: If you are however coming from R with a rich visualization framework such as ggplot2 or from Python with Seaborn, you might feel somewhat limited in your abilities to perform plotting / visualization with Julia. However, you can always run a hybrid approach and use some e.g. R scripts from within Julia similar to what Python can do with Rpy, which is something I’m going to try soon now, too.

Package management in Julia

Julia has its own package manager called Pkg that can be used to install packages that are available in the Julia package index - similar to CRAN or PyPi for R and Python respectively. It manages downloading these for you and is pretty efficient in doing so, at least some of the tests I did resolved the required packages very quickly and the resulting installation was (by default) placed in my $HOME\.julia\... directory. So far so good!

One thing that slightly worries me (but the same applies to pip installations that grow over time), is what happens once I’ve accumulated a lot of packages in my .julia environment. There are apparently project.toml similar to environment.yml files for conda available, but I’ve not managed to make use of these efficiently - maybe something for later. This was anyways the first attempt and likely not the last.

Dockerizing Julia

My first idea then was to use my Julia script from within a Nextflow pipeline and rely on Nextflow to be able to call the script inside my pipeline. This of course required making a suitable container for Julia and the accompanying packages my script requires, so I’ve looked around and found the official Julia Docker images and tried using them.

This works fine, as long as you only require “base” packages, e.g. the ones that are shipped with the Julia executable, but will not work if you need additional packages. My usual approach here would be to rely on bioconda and thus also Biocontainers, but after reading up a bit on this, there apparently seem to be no Julia packages individually packaged as this is done for R and Python package in the conda-verse (yet) - from the discussion it also is at least unclear to me, whether this is going to happen anyhow.

What I ultimately did then after reading up more, is that I’ve created a Dockerfile manually that extends the official Julia base container and adds additional packages, precompiles them and makes them available in the container without interferring with any user installations of Julia that might be available on the host.

The last bit is crucial, as we’ve come a long way at nf-core to enable R / Python usage inside pipelines without interferring with the host, which is especially problematic when running in an HPC setting using Singularity. Singularity mounts the users home directory automatically in a transparent way inside the container. This is a good thing, making it easy to run anything in the container but also can cause trouble when packages that have been installed in the container are not easily accessible as the HOST home is “overwriting” the CONTAINER home.

My ultimate Dockerfile then looked like this:

FROM julia:1.6.3-buster
RUN apt-get update && apt-get install -y procps
RUN JULIA_DEPOT_PATH=/usr/local/share/julia \
    julia -e 'using Pkg; Pkg.update(); Pkg.add(["DifferentialEquations", "NumericalIntegration", "CSV", "XLSX", "DataFrames", "Glob", "ArgParse", "Plots"]); Pkg.precompile();'

#Smoke test
RUN JULIA_DEPOT_PATH=/usr/local/share/julia \
    julia -e 'using DifferentialEquations'

Running Julia code from Nextflow

Everything else worked as it does for R / Python scripts, too: Simply place your *.jl script inside the bin folder of your pipeline, make it executable chmod +x bin\*.jl and then call it in nextflow in your script section as usual:

script:
'''
foo_bar.jl param1 param2 param3 ...
'''

I’ve added a PR to enable this in the nf-core nextflow template for new pipelines so that other users might potentially benefit from these findings in the future (and also document what I’ve tried, so to save some time to future-me, too). Locally the docker container created with the recipe above allows me to run my Julia scripts and on cluster systems I can rely on the transparent Docker -> Singularity conversion to enable a functional dependency container, too.

Related