Changes between Version 48 and Version 49 of SOPs/atac_Seq


Ignore:
Timestamp:
06/02/21 11:53:23 (4 years ago)
Author:
byuan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/atac_Seq

    v48 v49  
    130130MACS v2 is applicable for ATAC-Seq using the appropriate options/parameters.
    131131
    132 MACS2 [[ https://github.com/alexyfyf/atac_nf/blob/7f996b7de0e349c5a10dbbd75b2c266339517a3b/atac.nf#L341 | commands used in ATAC-seq review paper ]]
     132  * ENCODE pipeline macs2 code: pair-end tags as macs2 input. It works for hg19, hg38, mm9 and mm10.
     133{{{
     134
     135   bedtools bamtobed -bedpe -mate1 -i $name_sorted.bam | gzip -nc > foo.bedpe.gz
     136
     137   # Create final TA file
     138   zcat foo.bedpe.gz | awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\tN\t1000\t%s\n%s\t%s\t%s\tN\t1000\t%s\n",$1,$2,$3,$9,$4,$5,$6,$10}' | gzip -nc > tagAlign.gz
     139
     140
     141  # Call peaks: ENCODE uses very small p-value, so it could get enough peaks for IDR
     142  # If you don't do IDR, you need to make p-value more stringent, such as with q 0.01
     143  macs2 callpeak -t tagAlign.gz -n foo.narrow -f BED -g ${species} -p 0.01 --nomodel --shift -75 --extsize 150 --call-summits --keep-dup all -B --SPMR --call-summits
     144
     145}}}
     146
     147
     148  * Using pair-end bed as macs2 input. [[ https://twitter.com/XiChenUoM/status/1336658454866325506 | It considers ends of both mates, focus on cutting/insertion sites enrichment in ATAC-seq ]]. Codes implemented in the ATAC-seq review paper ( https://github.com/alexyfyf/atac_nf/blob/7f996b7de0e349c5a10dbbd75b2c266339517a3b/atac.nf#L341 )
    133149 [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1929-3 | From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis]]
    134150{{{
     
    139155}}}
    140156
    141 ENCODE pipeline macs2 code:
    142 {{{
    143 
    144   # Create virtual SE file containing both read pairs
    145   bedtools bamtobed -i filtered.bam | awk 'BEGIN{OFS="\t"}{$4="N";$5="1000";print $0}'| gzip -c > tagAlign.gz
    146 
    147   # Call peaks: ENCODE uses very small p-value, so it could get enough peaks for IDR
    148   macs2 callpeak -t tagAlign.gz -n foo.narrow -f BED -g ${species} -p 0.01 --nomodel --shift -75 --extsize 150 --call-summits --keep-dup all -B --SPMR --call-summits
    149 
    150 }}}
    151 
    152 
    153 If using the 'BAMPE' option with paired-end reads, let MACS run the pileup and calculate 'extsize';
    154 #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.
    155 {{{
    156 
    157 macs2 callpeak -f BAMPE -t file.sorted.bam  --keep-dup all -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
    158 
    159 
    160 
    161 #BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual,
    162 "If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup."
    163 
    164 }}}
    165 
    166 Additional notes on calling peaks using MACS: ''using MACS default options without BAMPE will not call the correct peaks''.  Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq.   
    167     * MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size -  works for finding accessible regions within cut sites.  The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable.
     157   * Run macs2 with BAMPE option: If using the 'BAMPE' option with paired-end reads, let MACS run the pileup and calculate 'extsize'.
     158       * #BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual,
     159       * "If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup."
     160
     161{{{
     162
     163    #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.
     164    macs2 callpeak -f BAMPE -t file.sorted.bam  --keep-dup all -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
     165}}}
     166
     167   * Run macs 2 using --nomodel --shift s --extsize 2s. This can be used for single-end reads. For PE reads, [[ https://github.com/macs3-project/MACS/issues/145 | it ignores the the right mate, and not recommended ]]. [[ https://github.com/macs3-project/MACS/issues/145 | In case where short fragment population is extremely dominant, the final output won't be off much as compared with BAMPE ]]
     168{{{
     169
     170   macs2 callpeak -t $foo.bam --broad -B -q 0.01 -g hs -n ${prefix} --nomodel --shift -75 --extsize 150
     171}}}
     172 
     173    * Which macs optMACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size -  works for finding accessible regions within cut sites.  The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable.
    168174      * Shifting reads, pos. strand +4 and neg strand -5 (see recommendations below) may be needed as well to find //exact// cut sites.
    169     * another approach is to convert the bam file to bed (using bedtools), and use the options, -f BED --shift 100 --extsize 200
    170 
    171 [[https://github.com/jsh58/Genrich | Genrich]] is another piece of software for peak-calling. It has the advantages of (a) running all of the post-alignment steps through peak-calling with one command, and (b) can process multiple replicates. Detailed information can be found in [[https://informatics.fas.harvard.edu/atac-seq-guidelines.html|Harvard ATAC-seq Guidelines]]
     175
     176
     177   * [[https://github.com/jsh58/Genrich | Genrich]] is another piece of software for peak-calling. It has the advantages of (a) running all of the post-alignment steps through peak-calling with one command, and (b) can process multiple replicates. Detailed information can be found in [[https://informatics.fas.harvard.edu/atac-seq-guidelines.html|Harvard ATAC-seq Guidelines]]
    172178
    173179{{{