122 | | * If you have human (hg38, hg19) and mouse (mm10, mm9) samples with biological replicates, you run [[https://github.com/ENCODE-DCC/atac-seq-pipeline|ENCODE ATAC-seq Pipeline]]. The pipeline takes fastq files, cleans and maps the reads, filters aligned reads and does peak calls. Here is the [[https://www.encodeproject.org/pipelines/ENCPL787FUN/|schema of the workflow]]. In addition, it does quality controls. Here is a [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_ATACseq_2021/qc.html | sample QC report]]. The steps below shows you how to run it on our Whitehead server. Note: It only works on python2. |
123 | | * content in input sample.json: |
124 | | {{{ |
125 | | { |
126 | | "atac.pipeline_type" : "atac", |
127 | | "atac.genome_tsv" : "/nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv", |
128 | | "atac.fastqs_rep1_R1" : [ |
129 | | "/fullpath/sample_rep1_1.fastq.gz" |
130 | | ], |
131 | | "atac.fastqs_rep1_R2" : [ |
132 | | "/fullpath/sample_rep1_2.fastq.gz" |
133 | | ], |
134 | | "atac.fastqs_rep2_R1" : [ |
135 | | "/fullpath/sample_rep2_1.fastq.gz" |
136 | | ], |
137 | | "atac.fastqs_rep2_R2" : [ |
138 | | "/fullpath/sample_rep2_2.fastq.gz" |
139 | | ], |
140 | | "atac.paired_end" : true, |
141 | | "atac.auto_detect_adapter" : true, |
142 | | "atac.enable_tss_enrich" : true, |
143 | | "atac.title" : "sample", |
144 | | "atac.description" : "ATAC-seq mouse sample" |
145 | | } |
146 | | }}} |
147 | | * Supported genome files for hg19, hg38, mm9 and mm10 can be found in /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline, and atac.genome_tsv used for .json is |
148 | | * hg19: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg19/hg19.tsv |
149 | | * hg38: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg38/hg38.tsv |
150 | | * mm9: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm9/mm9.tsv |
151 | | * mm10: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv |
152 | | |
153 | | * To initiate conda inside Whitehead: |
154 | | {{{ |
155 | | # Be sure to keep the first dot in the command below: |
156 | | . /nfs/BaRC_Public/conda/start_barc_conda |
157 | | }}} |
158 | | * Before running the ENCODE pipeline, verify there is no preexisting conda startup code with the command below: |
159 | | {{{ |
160 | | conda env list |
161 | | }}} |
162 | | You have no preexisting conda if you get "conda: command not found". Otherwise, log out, log back in, start the new conda instance, and activate encode-atac-seq-pipeline |
163 | | * Ignore the developer's instructions and use your home directory for conda and the pipeline. |
164 | | {{{ |
165 | | conda activate encode-atac-seq-pipeline |
166 | | }}} |
167 | | * Run. Files could be url or fullpath. [[https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md | Detailed information about .json file]] |
168 | | {{{ |
169 | | caper run /nfs/BaRC_Public/atac-seq-pipeline/atac.wdl -i sample.json |
170 | | # After the job finishes, you can deactivate conda with |
171 | | conda deactivate |
172 | | }}} |
173 | | * The QC report is call-qc_report/execution/qc.html |
174 | | * idr peaks files: |
175 | | * rep1: call-idr_pr/shard-0/execution/rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.narrowPeak.gz |
176 | | * rep2: call-idr_pr/shard-1/execution/rep2-pr1_vs_rep2-pr2.idr0.05.bfilt.narrowPeak.gz |
177 | | * Note: shard-0 refers to the first biological replicate, shard-1 refers to the 2nd biological replicate, and so on |
178 | | * rep1 and rep2: call-idr/shard-1/execution/rep1_vs_rep2.idr0.05.bfilt.narrowPeak.gz |
179 | | Follow this for species other than human/mouse, or if no replicates |