== Run ENCODE ATAC-seq Pipeline on the Whitehead Server If you have human (hg38, hg19) and mouse (mm10, mm9) samples with biological replicates, you run [[https://github.com/ENCODE-DCC/atac-seq-pipeline|ENCODE ATAC-seq Pipeline]]. The pipeline takes fastq files, cleans and maps the reads, filters aligned reads and does peak calls. Here is the [[https://www.encodeproject.org/pipelines/ENCPL787FUN/|schema of the workflow]]. In addition, it does quality controls. Here is a [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_ATACseq_2021/qc.html | sample QC report]]. The steps below shows you how to run it on our Whitehead server. Note: It only works on python2. * content in input sample.json: {{{ { "atac.pipeline_type" : "atac", "atac.genome_tsv" : "/nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv", "atac.fastqs_rep1_R1" : [ "/fullpath/sample_rep1_1.fastq.gz" ], "atac.fastqs_rep1_R2" : [ "/fullpath/sample_rep1_2.fastq.gz" ], "atac.fastqs_rep2_R1" : [ "/fullpath/sample_rep2_1.fastq.gz" ], "atac.fastqs_rep2_R2" : [ "/fullpath/sample_rep2_2.fastq.gz" ], "atac.paired_end" : true, "atac.auto_detect_adapter" : true, "atac.enable_tss_enrich" : true, "atac.title" : "sample", "atac.description" : "ATAC-seq mouse sample" } }}} * Supported genome files for hg19, hg38, mm9 and mm10 can be found in /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline, and atac.genome_tsv used for .json is * hg19: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg19/hg19.tsv * hg38: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg38/hg38.tsv * mm9: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm9/mm9.tsv * mm10: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv * To initiate conda inside Whitehead: {{{ # Be sure to keep the first dot in the command below: . /nfs/BaRC_Public/conda/start_barc_conda }}} * Before running the ENCODE pipeline, verify there is no preexisting conda startup code with the command below: {{{ conda env list }}} You have no preexisting conda if you get "conda: command not found". Otherwise, log out, log back in, start the new conda instance, and activate encode-atac-seq-pipeline * Ignore the developer's instructions and use your home directory for conda and the pipeline. {{{ conda activate encode-atac-seq-pipeline }}} * Run. Files could be url or fullpath. [[https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md | Detailed information about .json file]] {{{ caper run /nfs/BaRC_Public/atac-seq-pipeline/atac.wdl -i sample.json # After the job finishes, you can deactivate conda with conda deactivate }}} * The QC report is call-qc_report/execution/qc.html * idr peaks files: * rep1: call-idr_pr/shard-0/execution/rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.narrowPeak.gz * rep2: call-idr_pr/shard-1/execution/rep2-pr1_vs_rep2-pr2.idr0.05.bfilt.narrowPeak.gz * Note: shard-0 refers to the first biological replicate, shard-1 refers to the 2nd biological replicate, and so on * rep1 and rep2: call-idr/shard-1/execution/rep1_vs_rep2.idr0.05.bfilt.narrowPeak.gz