Creating and using virtual environments
Conda environments
Start by downloading and installing conda somewhere that will have enough room to hold lots of applications (so not your home directory)
# Get the Miniforge installer wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh bash Miniforge3-Linux-x86_64.sh # Miniforge3 will now be installed into this location: # [choose your preferred location] /nfs/BaRC/USER/conda
Create your desired environment
# Activate the environment (pointing to where you installed conda) eval "$(/nfs/BaRC/USER/conda/bin/conda shell.bash hook)" # Create a new environment # If you don't include '--no-default-packages' you'll also get everything on your PATH /nfs/BaRC/USER/conda/bin/conda create --name RNAseq_2024a --no-default-packages
Activate the environment
conda activate RNAseq_2024a
Add applications to your environment, specifying versions (if you want the install commands to be reproducible). These will be installed under your original conda location. The newest version of some software can cause problems (such as with STAR: "Genome version: 2.7.1a is INCOMPATIBLE with running STAR version: 2.7.11b") or conda incompatibilities.
conda install -c bioconda STAR=2.7.11b conda install -c bioconda multiqc=1.25.2 conda install -c bioconda fastqc=0.12.1 conda install -c bioconda STAR=2.7.1 conda install -c bioconda subread=2.0.8
Get a list of packages in our environment
conda list -n RNAseq_2024a
Leave the environment
conda deactivate
Go back to environment
conda activate RNAseq_2024a
The name of your current environment should be obvious from the command line.
(RNAseq_2024a) gbell@sparky ~$
Save the environment
conda env export > RNAseq_2024a.environment.yml
Someone else should be able to create new environment from this YAML file
conda env create -f RNAseq_2024a.environment.yml
Remove a problem piece of software from the environment
conda remove STAR
If you no longer want the environment
conda remove -n ENV_NAME --all
If we want to use slurm, we need to add the path to the slurm commands. Is there a better way to do this?
export PATH=$PATH:/opt/slurm/bin
To test the environment -- the RNA-seq Hot Topics exercises should work.
See also the Whitehead IT conda page: https://clusterguide.wi.mit.edu/software/conda/
Singularity environments
Singularity containers allow you to create and run containers that package up pieces of software in a way that is portable and reproducible. Some software now comes in this way so that "installation" is simply downloading a SIF file.
One example is AGAT (Another Gtf/Gff Analysis Toolkit), which provides instructions on how to download and run the AGAT container that includes a series of applications:
# Download singularity pull docker://quay.io/biocontainers/agat:1.0.0--pl5321hdfd78af_0 # Run singularity run agat_1.0.0--pl5321hdfd78af_0.sif # When finished exit
Then one can run commands such as 'agat_convert_sp_gff2gtf.pl'. The trouble is that the environment doesn't include our usual filesystem, making it not very useful. The 'singularity' command needs to be modified to also include the required folder(s), such as the following one-line command
singularity run -B /lab/BaRC_projects:/lab/BaRC_projects --cleanenv --pwd /lab/BaRC_projects /nfs/BaRC_Public/apps/AGAT/agat_1.0.0--pl5321hdfd78af_0.sif # Go where we want cd /lab/BaRC_projects # Check that the environment includes our desired files/folders ls
One problem is that this is an older version of AGAT (v1.0.0). Another problem is that some of the commands require samtools, which is not present in the container. What can we do about this?
One solution to build a customized singularity container is to use the Sequera Container Builder. We can search for and add AGAT (not the first hit) and samtools, then specifying that we want a singularity container. Then click on "Get Container". When it's ready, run 'singularity pull' on the oras link, like
singularity pull oras://community.wave.seqera.io/library/agat_samtools:d30ed34317069fe6
We end up downloading a file like 'agat_samtools_d30ed34317069fe6.sif'. Then we can do the 'singularity run' command like above and get to run both AGAT and samtools.
By the way, including '--cleanenv' in the 'singularity run' command is to prevent the container from reading the environment from your .bashrc file. If you want to include those aliases, etc. then remove '--cleanenv'.
See also the Whitehead IT singularity page: https://clusterguide.wi.mit.edu/software/singularity/