Exercise 3 - Advanced Docker and Singularity
The following session assumes that you have Docker running on your
computer (we tested with Docker version 18.09.7, build 2d0083d). If
you’re running on a Linux host it further assumes that you have it set
up in such a way that you don’t have to prepend sudo to the
docker command each time. There is a number of ways you can do that,
e.g. by creating a docker group and adding your user to it or by running
the Docker daemon as a non-root user (find some more info/instructions
here). If for some reason you don’t want to or can do any of these
things, all of the below should also work if you simply prepend sudo
to the docker command.
Disclaimer
Parts of the session below were inspired and indeed reuse examples from a CyVerse workshop documentation (Release 0.2.0; March 05, 2020; online - dowloaded as pdf 03.04.2023).
Build your own image
In the previous sesssion you have been shown how to run existing, pre-build Docker containers. Now it’s time to make your own Docker image running only a particular piece of software of your choosing.
As an example I have chosen BLAST, which is used to find regions of
similarity between biological sequences, DNA- and/or Protein. The BLAST
algorithm (Basic Local Alignment Search Tool) and its derivatives is
among the most used tools in DNA-sequence based research. There is an
online version hosted by NCBI, but there are many situations where it’s
handy to have it running locally.
Interactive Build
Since you’ve already used it in the previous session we will be starting from a base Ubuntu image. Let’s start a container interactively.
(host)-$ docker run -it -h my_manual_blast --name ${USER}s_manual_blast ubuntu:18.04
Note the additional flags I am using - these are optional:
-h- Gives a defined name to the Docker host - just so it looks the same for all of us--name- Name the container
Your prompt should look like this now:
root@my_manual_blast:/#
Now, let’s see if Ubuntu already comes with BLAST pre-installed,
specifically a programm called blastn, which is used for comparing
DNA sequences.
root@my_manual_blast:/$ blastn
bash: blastn: command not found
No. But Ubuntu has an online registry for software and it happens to
contain the standard suite of BLAST programs. To set this up we need to
update this system first. Ubuntu has a free-software user interface
called apt (Advanced Package Tool) to manage installation and
removal of software between your system and the registry.
root@my_manual_blast:/$ apt-get update
Note: If you’re unsure if the program you want to install is in the
registry and what the package is called (it’s important to know exactly,
because the command line is case sensitive and will not recognize your
request if you have misstyped), apt has a function for searching the
registry.
(host)-$ apt-cache search blast
This is not a great example, because you get lots of packages back. But
somewhere in there you’ll find ncbi-blast+. This is what you want.
Then we can install the desired program as follows using apt-get.
You will be notified that the installation will require some additional
diskspace and need to agree with typing y.
root@my_manual_blast:/$ apt-get install ncbi-blast+
If we try again now, it seems to work.
root@my_manual_blast:/$ blastn -h
USAGE
blastn [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
[-negative_gilist filename] [-entrez_query entrez_query]
[-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-out output_file] [-evalue evalue] [-word_size int_value]
[-gapopen open_penalty] [-gapextend extend_penalty]
[-perc_identity float_value] [-qcov_hsp_perc float_value]
[-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
[-xdrop_gap_final float_value] [-searchsp int_value]
[-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
[-min_raw_gapped_score int_value] [-template_type type]
[-template_length int_value] [-dust DUST_options]
[-filtering_db filtering_database]
[-window_masker_taxid window_masker_taxid]
[-window_masker_db window_masker_db] [-soft_masking soft_masking]
[-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-best_hit_score_edge float_value] [-window_size int_value]
[-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
[-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
[-outfmt format] [-show_gis] [-num_descriptions int_value]
[-num_alignments int_value] [-line_length line_length] [-html]
[-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
[-version]
DESCRIPTION
Nucleotide-Nucleotide BLAST 2.6.0+
Use '-help' to print detailed descriptions of command line arguments
Note, that I call the software and add a -h to the call. This is a
very common, so-called *flag* in command line software that usually
gives you some kind of help about the program. In this case it shows all
the options the blastn program has. In this case there is even a
more extensive help you can get by typing blastn -help.
Great! Now you have blastn running in a container. But how to make
this permanent?
Let’s exit the container and see what we can do. The following will get you out of the container and bring your original prompt back.
root@my_manual_blast:/$ exit
Now, if you type docker container ls -a (in older docker versions
this is the same as docker ps -a) you will see the list of all
containers that you ran so far (the ones that are running as well as
those which are already exited), including ${USER}s_manual_blast,
which you have just exited.
We can convert this container, including the changes you made to the
base Ubuntu image to a new image - I will call it
${USER}s_manual_blast_image. Docker has a subroutine for that, called
commit. You need to also provide a commit message via the -m
flag. This is usually short information about how you changed the image,
so when you look at it later you will be able to remember what the
changes were.
(host)-$ docker commit -m "ubuntu + blast" ${USER}s_manual_blast ${USER}s_manual_blast_image
Great! Now the image should show up if you type docker image ls.
You can use it like we’ve done before. The --rm just tells Docker to
remove the container once you’re done. This is handy, otherwise you will
accumulate excited containers very fast. Below I dropped the --name
flag, because it’s optional and at this stage I don’t care which name
the system gives to the container while it’s running.
(host)-$ docker run -it --rm -h my_manual_blast ${USER}s_manual_blast_image
root@my_manual_blast:/$ blastn -h
USAGE
.
.
.
root@manual_blast_image:/$ exit
You can use the image also like an executable, rather than interactively, try:
(host)-$ docker run --rm ${USER}s_manual_blast_image blastn -h
Automatic Build
Docker can build images automatically, reading instructions from a text
file, the so-called Dockerfile. This is simply a text document that
contains all the commands you would normally execute manually in order
to build your Docker image.
The official reference to all features and extensions is provided at Docker’s official documentation here.
Let’s try to create your first Dockerfile and design it to build an
image as the one we did manually above.
To keep things tidy, let’s first make and move to a new directory.
(host)-$ mkdir automatic-blast && cd automatic-blast
Using your favorite text editor, create a file called Dockerfile and
copy/paste/type the following text into it.
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install ncbi-blast+
Note that these are mostly the exact same commands that we just ran interactively, but that we prepend specific directives that will be interpreted by Docker.
FROM- tells Docker to start building our image onto a certain base image.RUN- The commands in each of these lines are actually executed during the process of building the image.
Let’s try to build the image as instructed in the Dockerfile. Docker has a command for that.
(host)-$ docker build -t ${USER}s_automatic_blast_image .
Note the flag -t which I use to name the image
automatic_blast_image. The . is mandatory and just tells it to
look for a file called Dockerfile (per default) in your current
working directory. This behavior can be changed, i.e. you can specifiy a custom filename for your Dockerfile, but you can try to
figure that one out for yourself if you want.
You will see the same information as before during system update, but
then unfortunately we get an error. Remember you’ve been prompted to
agree that extra diskspace is used before? Docker does not allow user
interaction during build. Let’s make a small change to the Dockerfile to
fix that - add -y to the apt install command, which tells apt: ‘don’t prompt - yes to all’.
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y ncbi-blast+
Try again.
(host)-$ docker build -t ${USER}s_automatic_blast_image .
Looks good! Take a second to inspect the output Docker created and note that during the second build attempt Docker has not redone the update, but rather continued from from the first line in the Dockerfile that caused the error.
If you type docker image ls now, the image should exist. We can try
it out, like so:
(host)-$ docker run --rm ${USER}s_automatic_blast_image blastn -h
Exercises
Warning
If you do the following exercises on a shared resource, i.e. you are sharing Docker with multiple users on a single computer (e.g. AWS instance), please make sure that you build your images under unique names. Something like ${USER}_imagename.
Exercise 1
Clustalo is a very popular tool for multiple sequence alignment. It can be easily installed with conda, or built from source, or simply setup with precompiled binaries.
Write a Dockerfile and build an image to run clustalo version 1.2.4 in Ubuntu:20.04 or Ubuntu:22.04.
Possible solutions can be found here:
Exercise 2
Flye is a denovo genome assembler built for long reads (PacBio and ONT). It performs very well and is relatively fast and memory efficient, as far as denovo assemblers go.. ;-) Check out the installation instructions of Flye on their Github page.
Write a Dockerfile and build an image for the Flye assembler (version 2.9) running in Ubuntu 20.04. According to the installation instructions you could get it through conda or build it locally.
Possible solutions can be found here:
Exercise 3
Choose your favourite piece of bioinformatics software (command line tools) and try to run it through a container via Docker or Singularity.
Two options:
find a suitable existing Docker image (e.g. try to Google for it or explore Dockerhub)
Create your own Dockerfile and build an image yourself.
We’ll briefly discuss your experiences in the group.
Phew, for a minute there … Well Done !!!
Demos (require local Docker setup)
Warning
The following demos are assuming that you are running Docker locally on your computer. They can also be run on a server and forwarded to your local computer via port forwarding, but this is a little bit more advanced topic.
Running an RStudio server
The demo is inspired by this tutorial (last accessed 24.04.2020) and relies on images provided by The Rocker Project (see also the Github Wiki).
Start the RStudio server Docker container like so:
(host)-$ docker run -e PASSWORD=yourpassword --rm -p 8787:8787 rocker/rstudio:4.0.3
Then scoot to http://localhost:8787 in your webbrowser. Enter your
username rstudio (per default) and password we’ve set it to
yourpassword when we called the container.
If you also want to read/write files on your host from within the container, you can extend the above command, like so, e.g.:
(host)-$ docker run -d -e PASSWORD=yourpassword -e USERID=$UID --rm -v $(pwd):/working -w /working -p 8787:8787 rocker/rstudio:4.0.3
For an example Dockerfile you can use to build an Rstudio image that has some packages already pre-installed, see this Dockerfile. Incidentally, this is the RStudio server setup I used for doing the Differential Expression analyses a few lectures ago. For a Dockerfile setting up plain R with all dependencies for running SarTools go here and find the corresponding image here.
Jupyter Notebook
Cyverse US has created a number of Docker images and deposited the contexts on Github here.
Very nice are for example their Jupyterlab Servers in Docker containers. Try the following, but note that this image is rather large and may take a while to download, depending on your download speed..
(host)-$ docker run -it --rm -v /$HOME:/app --workdir /app -p 8888:8888 -e REDIRECT_URL=http://localhost:8888 cyversevice/jupyterlab-scipy:2.2.9
Once the download has finished and the server started running move to
http://localhost:8888 in your webbrowser. Cool, no?
Here’s another one that has snakemake setup within it.
(host)-$ docker run -it --rm -v /$HOME:/app --workdir /app -p 8888:8888 -e REDIRECT_URL=http://localhost:8888 chrishah/snakemake-vice:v05062020
Links
Dockerhub’s documentation
Contact
Christoph Hahn - christoph.hahn@uni-graz.at