wiki:SOPs/creatingVirtualEnvs

Creating and using virtual environments

Conda environments

Start by downloading and installing conda somewhere that will have enough room to hold lots of applications (so not your home directory)

# Get the Miniforge installer
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
# Miniforge3 will now be installed into this location:
# [choose your preferred location]
/nfs/BaRC/USER/conda

Create your desired environment

# Activate the environment (pointing to where you installed conda)
eval "$(/nfs/BaRC/USER/conda/bin/conda shell.bash hook)"
# Create a new environment
# If you don't include '--no-default-packages' you'll also get everything on your PATH
/nfs/BaRC/USER/conda/bin/conda create --name RNAseq_2024a --no-default-packages

Activate the environment

conda activate RNAseq_2024a

Add applications to your environment, specifying versions (if you want the install commands to be reproducible). These will be installed under your original conda location. The newest version of some software can cause problems (such as with STAR: "Genome version: 2.7.1a is INCOMPATIBLE with running STAR version: 2.7.11b") or conda incompatibilities.

conda install -c bioconda STAR=2.7.11b
conda install -c bioconda multiqc=1.25.2
conda install -c bioconda fastqc=0.12.1
conda install -c bioconda STAR=2.7.1
conda install -c bioconda subread=2.0.8

Get a list of packages in our environment

conda list -n RNAseq_2024a

Leave the environment

conda deactivate

Go back to environment

conda activate RNAseq_2024a

The name of your current environment should be obvious from the command line.

(RNAseq_2024a) gbell@sparky ~$

Save the environment

conda env export > RNAseq_2024a.environment.yml

Someone else should be able to create new environment from this YAML file

conda env create -f RNAseq_2024a.environment.yml

Remove a problem piece of software from the environment

conda remove STAR

If you no longer want the environment

conda remove -n ENV_NAME --all

If we want to use slurm, we need to add the path to the slurm commands. Is there a better way to do this?

export PATH=$PATH:/opt/slurm/bin

To test the environment -- the RNA-seq Hot Topics exercises should work.

See also the Whitehead IT conda page: https://clusterguide.wi.mit.edu/software/conda/

Singularity environments

Singularity containers allow you to create and run containers that package up pieces of software in a way that is portable and reproducible. Some software now comes in this way so that "installation" is simply downloading a SIF file.

One example is AGAT (Another Gtf/Gff Analysis Toolkit), which provides instructions on how to download and run the AGAT container that includes a series of applications:

# Download
singularity pull docker://quay.io/biocontainers/agat:1.0.0--pl5321hdfd78af_0
# Run
singularity run agat_1.0.0--pl5321hdfd78af_0.sif
# When finished
exit

Then one can run commands such as 'agat_convert_sp_gff2gtf.pl'. The trouble is that the environment doesn't include our usual filesystem, making it not very useful. The 'singularity' command needs to be modified to also include the required folder(s), such as the following one-line command

singularity run -B /lab/BaRC_projects:/lab/BaRC_projects --cleanenv --pwd /lab/BaRC_projects /nfs/BaRC_Public/apps/AGAT/agat_1.0.0--pl5321hdfd78af_0.sif
# Go where we want
cd /lab/BaRC_projects
# Check that the environment includes our desired files/folders
ls

One problem is that this is an older version of AGAT (v1.0.0). Another problem is that some of the commands require samtools, which is not present in the container. What can we do about this?

One solution to build a customized singularity container is to use the Sequera Container Builder. We can search for and add AGAT (not the first hit) and samtools, then specifying that we want a singularity container. Then click on "Get Container". When it's ready, run 'singularity pull' on the oras link, like

singularity pull oras://community.wave.seqera.io/library/agat_samtools:d30ed34317069fe6

We end up downloading a file like 'agat_samtools_d30ed34317069fe6.sif'. Then we can do the 'singularity run' command like above and get to run both AGAT and samtools.

By the way, including '--cleanenv' in the 'singularity run' command is to prevent the container from reading the environment from your .bashrc file. If you want to include those aliases, etc. then remove '--cleanenv'.

See also the Whitehead IT singularity page: https://clusterguide.wi.mit.edu/software/singularity/

Note: See TracWiki for help on using the wiki.