Changes between Version 109 and Version 110 of SOPs/atac_Seq


Ignore:
Timestamp:
04/03/24 14:48:08 (10 months ago)
Author:
byuan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/atac_Seq

    v109 v110  
    120120MACS v2 is applicable for ATAC-Seq using the appropriate options/parameters.
    121121
    122   * If you have human (hg38, hg19) and mouse (mm10, mm9) samples with biological replicates, you run [[https://github.com/ENCODE-DCC/atac-seq-pipeline|ENCODE ATAC-seq Pipeline]]. The pipeline takes fastq files, cleans and maps the reads, filters aligned reads and does peak calls. Here is the [[https://www.encodeproject.org/pipelines/ENCPL787FUN/|schema of the workflow]].  In addition, it does quality controls. Here is a [[http://barc.wi.mit.edu/education/hot_topics/ChIPseq_ATACseq_2021/qc.html | sample QC report]]. The steps below shows you how to run it on our Whitehead server. Note: It only works on python2.
    123       * content in input sample.json:
    124 {{{
    125 {
    126     "atac.pipeline_type" : "atac",
    127     "atac.genome_tsv" : "/nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv",
    128     "atac.fastqs_rep1_R1" : [
    129         "/fullpath/sample_rep1_1.fastq.gz"
    130     ],
    131     "atac.fastqs_rep1_R2" : [
    132         "/fullpath/sample_rep1_2.fastq.gz"
    133     ],
    134     "atac.fastqs_rep2_R1" : [
    135         "/fullpath/sample_rep2_1.fastq.gz"
    136     ],
    137     "atac.fastqs_rep2_R2" : [
    138         "/fullpath/sample_rep2_2.fastq.gz"
    139     ],
    140     "atac.paired_end" : true,
    141     "atac.auto_detect_adapter" : true,
    142     "atac.enable_tss_enrich" : true,
    143     "atac.title" : "sample",
    144     "atac.description" : "ATAC-seq mouse sample"
    145 }
    146 }}}
    147       * Supported genome files for hg19, hg38, mm9 and mm10 can be found in /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline, and atac.genome_tsv used for .json is
    148           * hg19: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg19/hg19.tsv
    149           * hg38: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/hg38/hg38.tsv
    150           * mm9: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm9/mm9.tsv
    151           * mm10: /nfs/BaRC_datasets/ENCODE_ATAC-seq_Pipeline/mm10/mm10.tsv
    152 
    153       * To initiate conda inside Whitehead:
    154 {{{
    155 # Be sure to keep the first dot in the command below:
    156 . /nfs/BaRC_Public/conda/start_barc_conda
    157 }}}
    158       * Before running the ENCODE pipeline, verify there is no preexisting      conda startup code with the command below:
    159 {{{
    160 conda env list
    161 }}}
    162        You have no preexisting conda if you get "conda: command not found". Otherwise, log out, log back in, start the new conda instance, and activate encode-atac-seq-pipeline
    163       * Ignore the developer's instructions and use your home directory for conda and the pipeline.
    164 {{{
    165 conda activate encode-atac-seq-pipeline
    166 }}}
    167       * Run. Files could be url or fullpath. [[https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md | Detailed information about .json file]]
    168 {{{
    169 caper run /nfs/BaRC_Public/atac-seq-pipeline/atac.wdl -i sample.json
    170 # After the job finishes, you can deactivate conda with
    171 conda deactivate
    172 }}}
    173       * The QC report is call-qc_report/execution/qc.html
    174       * idr peaks files:
    175            * rep1: call-idr_pr/shard-0/execution/rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.narrowPeak.gz
    176            * rep2: call-idr_pr/shard-1/execution/rep2-pr1_vs_rep2-pr2.idr0.05.bfilt.narrowPeak.gz
    177            * Note: shard-0 refers to the first biological replicate, shard-1 refers to the 2nd biological replicate, and so on
    178            * rep1 and rep2: call-idr/shard-1/execution/rep1_vs_rep2.idr0.05.bfilt.narrowPeak.gz
    179 Follow this for species other than human/mouse, or if no replicates
    180122     * Run macs2 using pair-end bed as input and the options "--shift -75 --extsize 150". With those settings you will be creating a profile of reads around the cutting sites (one at each end of the fragment/paired read) that will result on peaks centered around the cutting sites (open chromatin). This is an important difference with ChIP-seq analysis. On ChIP-seq the binding event tends to be in the middle of the fragment; on ATAC-seq chromatin was opened where the cutting occurred and that is the end of the fragment. [[https://twitter.com/XiChenUoM/status/1336658454866325506|cutting/insertion sites enrichment in ATAC-seq]].
    181123{{{