wiki:SOPs/nextflow

Version 2 (modified by xinlei.gao, 10 days ago) ( diff )

--

What is Nextflow?

Nextflow is a workflow system for creating scalable, portable, and reproducible workflows. It allows you to run your analysis pipeline on a large-scale dataset in a streamlined and parallel manner. Nextflow can deploy workflows on a variety of execution platforms, including your local machine, HPC schedulers, and cloud. Additionally, Nextflow supports a range of compute environments, software container runtimes, and package managers, allowing workflows to be executed in reproducible and isolated environments.

Why using Nextflow?

The rise of big data has made it increasingly necessary to be able to analyze and perform experiments on large datasets in a portable and reproducible manner. Nextflow has several highlighted features that could be helpful in reproducible and efficient pipeline implementation.

  1. Reproducibility

Nextflow supports Docker and Singularity containers technology.

This, along with the integration of the GitHub code sharing platform, allows you to write self-contained pipelines, manage versions and to rapidly reproduce any former configuration.

  1. Continuous checkpoints

All the intermediate results produced during the pipeline execution are automatically tracked.

This allows you to resume its execution, from the last successfully executed step, no matter what the reason was for it stopping.

  1. Portable

Nextflow can be executed on multiple platforms without changing its codes.

It supports various executors including batch schedulers like SLURM, LSF, PBS, and cloud platforms, such as Kubernetes, Amazon AWS, Google Cloud and Microsoft Azure platforms.

Installation of Nextflow

Nextflow can be used on Linux, macOS and windows. It requires Bash 3.2 (or later) and Java 17 (or later, up to 23) to be installed. For the instructions to install Nextflow, please refer to this page: https://www.nextflow.io/docs/latest/install.html

The Nextflow command line tool has been installed on the WI slurm cluster: /nfs/BaRC_Public/apps/nextflow/nextflow

The current version is nextflow version 24.04.4.5917.

The main purpose of the Nextflow CLI is to run Nextflow pipelines with the run command. Nextflow can execute a local script (e.g. ./main.nf) or a remote project (e.g. github.com/foo/bar).

To launch the execution of a pipeline project, hosted in a remote code repository, you simply need to specify its qualified name or the repository URL after the run command. The qualified name is formed by two parts: the owner name and the repository name separated by a / character.

In other words if a Nextflow project is hosted, for example, in a GitHub repository at the address http://github.com/foo/bar, it can be executed by entering the following command in your shell terminal:

nextflow run foo/bar

or using the project URL:

nextflow run http://github.com/foo/bar

If the project is found, it will be automatically downloaded to the Nextflow home directory ($HOME/.nextflow by default) and cached for subsequent runs.

Try this simple example by running the following command:

nextflow run nextflow-io/hello

This is a simple script showing the basic 'Hello World!' example for the Nextflow framework. It will download a trivial example from the repository published at http://github.com/nextflow-io/hello and execute it on your computer.

What is nf-core?

The guidelines for running nf-core pipelines

Run nf-core pipelines on the slurm cluster

The recommendations of running individual nf-core pipelines

  1. nf-core CUTandRun pipeline
  1. nf-core Ribo-seq pipeline
  1. nf-core ATAC-seq pipeline
Note: See TracWiki for help on using the wiki.