132 | | MACS2 [[ https://github.com/alexyfyf/atac_nf/blob/7f996b7de0e349c5a10dbbd75b2c266339517a3b/atac.nf#L341 | commands used in ATAC-seq review paper ]] |
| 132 | * ENCODE pipeline macs2 code: pair-end tags as macs2 input. It works for hg19, hg38, mm9 and mm10. |
| 133 | {{{ |
| 134 | |
| 135 | bedtools bamtobed -bedpe -mate1 -i $name_sorted.bam | gzip -nc > foo.bedpe.gz |
| 136 | |
| 137 | # Create final TA file |
| 138 | zcat foo.bedpe.gz | awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\tN\t1000\t%s\n%s\t%s\t%s\tN\t1000\t%s\n",$1,$2,$3,$9,$4,$5,$6,$10}' | gzip -nc > tagAlign.gz |
| 139 | |
| 140 | |
| 141 | # Call peaks: ENCODE uses very small p-value, so it could get enough peaks for IDR |
| 142 | # If you don't do IDR, you need to make p-value more stringent, such as with q 0.01 |
| 143 | macs2 callpeak -t tagAlign.gz -n foo.narrow -f BED -g ${species} -p 0.01 --nomodel --shift -75 --extsize 150 --call-summits --keep-dup all -B --SPMR --call-summits |
| 144 | |
| 145 | }}} |
| 146 | |
| 147 | |
| 148 | * Using pair-end bed as macs2 input. [[ https://twitter.com/XiChenUoM/status/1336658454866325506 | It considers ends of both mates, focus on cutting/insertion sites enrichment in ATAC-seq ]]. Codes implemented in the ATAC-seq review paper ( https://github.com/alexyfyf/atac_nf/blob/7f996b7de0e349c5a10dbbd75b2c266339517a3b/atac.nf#L341 ) |
141 | | ENCODE pipeline macs2 code: |
142 | | {{{ |
143 | | |
144 | | # Create virtual SE file containing both read pairs |
145 | | bedtools bamtobed -i filtered.bam | awk 'BEGIN{OFS="\t"}{$4="N";$5="1000";print $0}'| gzip -c > tagAlign.gz |
146 | | |
147 | | # Call peaks: ENCODE uses very small p-value, so it could get enough peaks for IDR |
148 | | macs2 callpeak -t tagAlign.gz -n foo.narrow -f BED -g ${species} -p 0.01 --nomodel --shift -75 --extsize 150 --call-summits --keep-dup all -B --SPMR --call-summits |
149 | | |
150 | | }}} |
151 | | |
152 | | |
153 | | If using the 'BAMPE' option with paired-end reads, let MACS run the pileup and calculate 'extsize'; |
154 | | #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1. Removing duplicates is recommended. |
155 | | {{{ |
156 | | |
157 | | macs2 callpeak -f BAMPE -t file.sorted.bam --keep-dup all -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks |
158 | | |
159 | | |
160 | | |
161 | | #BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual, |
162 | | "If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup." |
163 | | |
164 | | }}} |
165 | | |
166 | | Additional notes on calling peaks using MACS: ''using MACS default options without BAMPE will not call the correct peaks''. Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq. |
167 | | * MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size - works for finding accessible regions within cut sites. The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable. |
| 157 | * Run macs2 with BAMPE option: If using the 'BAMPE' option with paired-end reads, let MACS run the pileup and calculate 'extsize'. |
| 158 | * #BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual, |
| 159 | * "If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup." |
| 160 | |
| 161 | {{{ |
| 162 | |
| 163 | #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1. Removing duplicates is recommended. |
| 164 | macs2 callpeak -f BAMPE -t file.sorted.bam --keep-dup all -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks |
| 165 | }}} |
| 166 | |
| 167 | * Run macs 2 using --nomodel --shift s --extsize 2s. This can be used for single-end reads. For PE reads, [[ https://github.com/macs3-project/MACS/issues/145 | it ignores the the right mate, and not recommended ]]. [[ https://github.com/macs3-project/MACS/issues/145 | In case where short fragment population is extremely dominant, the final output won't be off much as compared with BAMPE ]] |
| 168 | {{{ |
| 169 | |
| 170 | macs2 callpeak -t $foo.bam --broad -B -q 0.01 -g hs -n ${prefix} --nomodel --shift -75 --extsize 150 |
| 171 | }}} |
| 172 | |
| 173 | * Which macs optMACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size - works for finding accessible regions within cut sites. The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable. |