wiki:SOPs/nextflow

What is Nextflow?

Nextflow is a workflow system for creating scalable, portable, and reproducible workflows. It has been developed specifically to ease the creation and execution of bioinformatics pipelines. It allows you to run your analysis pipeline on a large-scale dataset in a streamlined and parallel manner. Nextflow can deploy workflows on a variety of execution platforms, including your local machine, HPC schedulers, and cloud. Additionally, Nextflow supports a range of compute environments, software container runtimes, and package managers, allowing workflows to be executed in reproducible and isolated environments.

Why using Nextflow?

The rise of big data has made it increasingly necessary to be able to analyze and perform experiments on large datasets in a portable and reproducible manner. Nextflow has several highlighted features that could be helpful in reproducible and efficient pipeline implementation.

  1. Reproducibility

Nextflow supports Docker and Singularity containers technology.

This, along with the integration of the GitHub code sharing platform, allows you to write self-contained pipelines, manage versions and to rapidly reproduce any former configuration.

  1. Continuous checkpoints

All the intermediate results produced during the pipeline execution are automatically tracked.

This allows you to resume its execution, from the last successfully executed step, no matter what the reason was for it stopping.

  1. Portability

Nextflow can be executed on multiple platforms without changing its codes.

It supports various executors including batch schedulers like SLURM, LSF, PBS, and cloud platforms, such as Kubernetes, Amazon AWS, Google Cloud and Microsoft Azure platforms.

Installation of Nextflow

Nextflow can be used on Linux, macOS and windows. It requires Bash 3.2 (or later) and Java 17 (or later, up to 23) to be installed. For the instructions to install Nextflow, please refer to this page: https://www.nextflow.io/docs/latest/install.html

The Nextflow command line tool has been installed on the WI slurm cluster: /nfs/BaRC_Public/apps/nextflow/nextflow

The current version is nextflow version 24.04.4.5917.

The main purpose of the Nextflow CLI is to run Nextflow pipelines with the run command. Nextflow can execute a local script (e.g. ./main.nf) or a remote project (e.g. github.com/foo/bar).

To launch the execution of a pipeline project, hosted in a remote code repository, you simply need to specify its qualified name or the repository URL after the run command. The qualified name is formed by two parts: the owner name and the repository name separated by a / character.

In other words if a Nextflow project is hosted, for example, in a GitHub repository at the address http://github.com/foo/bar, it can be executed by entering the following command in your shell terminal:

nextflow run foo/bar

or using the project URL:

nextflow run http://github.com/foo/bar

If the project is found, it will be automatically downloaded to the Nextflow home directory ($HOME/.nextflow by default) and cached for subsequent runs.

Try this simple example by running the following command:

nextflow run nextflow-io/hello

This is a simple script showing the basic 'Hello World!' example for the Nextflow framework. It will download a trivial example from the repository published at http://github.com/nextflow-io/hello and execute it on your computer. Run this example to confirm all tools are installed properly.

What is nf-core?

nf-core is a global community effort to collect a curated set of open‑source analysis pipelines built using Nextflow. There are 128 pipelines that are currently available as part of nf-core. Browse them at https://nf-co.re/pipelines/.

How to run nf-core pipelines

To run a pipeline:

  1. Configure Nextflow to run on your system:

The simplest way to run is with -profile docker (or singularity). This instructs Nextflow to execute jobs locally, with Docker (or Singularity) to fulfill software dependencies.

Please note that if you are running the pipeline on the slurm cluster, you can only use -profile singularity, because you don't have the permission to run docker on it.

Conda is also supported with -profile conda. However, this option is not recommended, as reproducibility of results can’t be guaranteed without containerization.

  1. Run the tests for your pipeline in the terminal to confirm everything is working:
nextflow run nf-core/<pipeline_name> -profile test,singularity --outdir <OUTDIR>

Replace <pipeline_name> with the name of an nf-core pipeline.

Nextflow will pull the code from the GitHub repository and fetch the software requirements automatically, so there is no need to download anything first.

  1. Read the pipeline documentation to see which command-line parameters are required. This will be specific to your data type and usage.
  1. To launch the pipeline with real data, omit the test config profile and provide the required pipeline-specific parameters. For example, to run the CUTandRun pipeline, your command will be similar to this:
nextflow run nf-core/cutandrun
-profile singularity
—input samplesheet.csv
—peakcaller ‘seacr,MACS2’
—genome GRCh38
—outdir nextflow_cutandrun
  1. Once complete, check the pipeline execution and quality control reports (such as multiqc_report.html files for MultiQC reports). Each pipeline’s documentation describes the outputs to expect.

Please refer to nf-core documentation for more details (https://nf-co.re/docs/usage/getting_started/introduction).

Run nf-core pipelines on the slurm cluster

The recommendations of running individual nf-core pipelines

  1. nf-core CUTandRun pipeline
  1. nf-core Ribo-seq pipeline
  1. nf-core ATAC-seq pipeline
Note: See TracWiki for help on using the wiki.