| 122 | | * If you have human (hg38, hg19) and mouse (mm10, mm9) samples with biological replicates, you run [[https://github.com/ENCODE-DCC/atac-seq-pipeline|ENCODE ATAC-seq Pipeline]]. The pipeline takes fastq files, cleans and maps the reads, filters aligned reads and does peak calls. Here is the [[https://www.encodeproject.org/pipelines/ENCPL787FUN/|schema of the workflow]]. In addition, it does quality controls. Here is a [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_ATACseq_2021/qc.html | sample QC report]]. The steps below shows you how to run it on our Whitehead server. Note: It only works on python2. |
| 123 | | * content in input sample.json: |
| 124 | | {{{ |
| 125 | | { |
| 126 | | "atac.pipeline_type" : "atac", |
| 127 | | "atac.genome_tsv" : "/nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv", |
| 128 | | "atac.fastqs_rep1_R1" : [ |
| 129 | | "/fullpath/sample_rep1_1.fastq.gz" |
| 130 | | ], |
| 131 | | "atac.fastqs_rep1_R2" : [ |
| 132 | | "/fullpath/sample_rep1_2.fastq.gz" |
| 133 | | ], |
| 134 | | "atac.fastqs_rep2_R1" : [ |
| 135 | | "/fullpath/sample_rep2_1.fastq.gz" |
| 136 | | ], |
| 137 | | "atac.fastqs_rep2_R2" : [ |
| 138 | | "/fullpath/sample_rep2_2.fastq.gz" |
| 139 | | ], |
| 140 | | "atac.paired_end" : true, |
| 141 | | "atac.auto_detect_adapter" : true, |
| 142 | | "atac.enable_tss_enrich" : true, |
| 143 | | "atac.title" : "sample", |
| 144 | | "atac.description" : "ATAC-seq mouse sample" |
| 145 | | } |
| 146 | | }}} |
| 147 | | * Supported genome files for hg19, hg38, mm9 and mm10 can be found in /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline, and atac.genome_tsv used for .json is |
| 148 | | * hg19: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg19/hg19.tsv |
| 149 | | * hg38: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg38/hg38.tsv |
| 150 | | * mm9: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm9/mm9.tsv |
| 151 | | * mm10: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv |
| 152 | | |
| 153 | | * To initiate conda inside Whitehead: |
| 154 | | {{{ |
| 155 | | # Be sure to keep the first dot in the command below: |
| 156 | | . /nfs/BaRC_Public/conda/start_barc_conda |
| 157 | | }}} |
| 158 | | * Before running the ENCODE pipeline, verify there is no preexisting conda startup code with the command below: |
| 159 | | {{{ |
| 160 | | conda env list |
| 161 | | }}} |
| 162 | | You have no preexisting conda if you get "conda: command not found". Otherwise, log out, log back in, start the new conda instance, and activate encode-atac-seq-pipeline |
| 163 | | * Ignore the developer's instructions and use your home directory for conda and the pipeline. |
| 164 | | {{{ |
| 165 | | conda activate encode-atac-seq-pipeline |
| 166 | | }}} |
| 167 | | * Run. Files could be url or fullpath. [[https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md | Detailed information about .json file]] |
| 168 | | {{{ |
| 169 | | caper run /nfs/BaRC_Public/atac-seq-pipeline/atac.wdl -i sample.json |
| 170 | | # After the job finishes, you can deactivate conda with |
| 171 | | conda deactivate |
| 172 | | }}} |
| 173 | | * The QC report is call-qc_report/execution/qc.html |
| 174 | | * idr peaks files: |
| 175 | | * rep1: call-idr_pr/shard-0/execution/rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.narrowPeak.gz |
| 176 | | * rep2: call-idr_pr/shard-1/execution/rep2-pr1_vs_rep2-pr2.idr0.05.bfilt.narrowPeak.gz |
| 177 | | * Note: shard-0 refers to the first biological replicate, shard-1 refers to the 2nd biological replicate, and so on |
| 178 | | * rep1 and rep2: call-idr/shard-1/execution/rep1_vs_rep2.idr0.05.bfilt.narrowPeak.gz |
| 179 | | Follow this for species other than human/mouse, or if no replicates |