SOPs/ENCODE pipeline – BaRC Wiki

Context Navigation

Run ENCODE ATAC-seq Pipeline on the Whitehead Server

If you have human (hg38, hg19) and mouse (mm10, mm9) samples with biological replicates, you run ENCODE ATAC-seq Pipeline. The pipeline takes fastq files, cleans and maps the reads, filters aligned reads and does peak calls. Here is the schema of the workflow. In addition, it does quality controls. Here is a sample QC report. The steps below shows you how to run it on our Whitehead server. Note: It only works on python2.

content in input sample.json:

{
    "atac.pipeline_type" : "atac",
    "atac.genome_tsv" : "/nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv",
    "atac.fastqs_rep1_R1" : [
        "/fullpath/sample_rep1_1.fastq.gz"
    ],
    "atac.fastqs_rep1_R2" : [
    	"/fullpath/sample_rep1_2.fastq.gz"
    ],
    "atac.fastqs_rep2_R1" : [
    	"/fullpath/sample_rep2_1.fastq.gz"
    ],
    "atac.fastqs_rep2_R2" : [
	"/fullpath/sample_rep2_2.fastq.gz"
    ],
    "atac.paired_end" : true,
    "atac.auto_detect_adapter" : true,
    "atac.enable_tss_enrich" : true,
    "atac.title" : "sample",
    "atac.description" : "ATAC-seq mouse sample"
}

Supported genome files for hg19, hg38, mm9 and mm10 can be found in /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline, and atac.genome_tsv used for .json is
- hg19: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg19/hg19.tsv
- hg38: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg38/hg38.tsv
- mm9: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm9/mm9.tsv
- mm10: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv

To initiate conda inside Whitehead:

# Be sure to keep the first dot in the command below:
. /nfs/BaRC_Public/conda/start_barc_conda

Before running the ENCODE pipeline, verify there is no preexisting conda startup code with the command below:
```
conda env list
```
You have no preexisting conda if you get "conda: command not found". Otherwise, log out, log back in, start the new conda instance, and activate encode-atac-seq-pipeline
Ignore the developer's instructions and use your home directory for conda and the pipeline.
```
conda activate encode-atac-seq-pipeline
```

Run. Files could be url or fullpath. Detailed information about .json file

caper run /nfs/BaRC_Public/atac-seq-pipeline/atac.wdl -i sample.json
# After the job finishes, you can deactivate conda with
conda deactivate

The QC report is call-qc_report/execution/qc.html
idr peaks files:
- rep1: call-idr_pr/shard-0/execution/rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.narrowPeak.gz
- rep2: call-idr_pr/shard-1/execution/rep2-pr1_vs_rep2-pr2.idr0.05.bfilt.narrowPeak.gz
- Note: shard-0 refers to the first biological replicate, shard-1 refers to the 2nd biological replicate, and so on
- rep1 and rep2: call-idr/shard-1/execution/rep1_vs_rep2.idr0.05.bfilt.narrowPeak.gz

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text