| | 17 | === ATAC-Seq Analysis Details === |
| | 18 | |
| | 19 | * Paired-end reads are better and recommended over single reads for ATAC-seq. The original [[https://www.nature.com/articles/nmeth.2688 | ATAC-Seq method]] from the Greenleaf lab recommends PE reads, stating, "We found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning." |
| | 20 | * Check adapter contamination with fastqc. Remove adapters with cutadapt or trim_galore: |
| | 21 | |
| | 22 | {{{ |
| | 23 | # --fastqc Run FastQC in the default mode on the FastQ file once trimming is complete. |
| | 24 | # --nextera Adapter sequence to be trimmed is the first 12bp of the Nextera adapter |
| | 25 | 'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence. |
| | 26 | trim_galore --fastqc -nextera --paired --length 30 -o clean_adaptor_folder read1.fq read2.fq |
| | 27 | }}} |
| | 28 | |
| | 29 | * Map reads with bowtie2, for PE keep only concordant pairs |
| | 30 | |
| | 31 | {{{ |
| | 32 | # --very-sensitive options --no-discordant suppress discordant alignments for paired reads |
| | 33 | # -X/--maxins <int> maximum fragment length (500). Increase to 2000 to get a better nucleosome distribution. |
| | 34 | # --no-discordant suppress discordant alignments for paired reads |
| | 35 | |
| | 36 | bowtie2 --very-sensitive --no-discordant -p 2 -X 2000 -x /nfs/genomes/human_hg38_dec13_no_random/bowtie/hg38 -1 read1.fq -2 read2.fq | samtools view -ub - | samtools sort -T $name - >| bowtie_out.bam |
| | 37 | }}} |
| | 38 | |
| | 39 | * Remove duplicates, if needed check fastqc to see deduplication level. Remove with Picard MarkDuplicates or Samtools rmdup |
| | 40 | |
| | 41 | * Remove reads mapped to mitochondria |
| | 42 | |
| | 43 | {{{ |
| | 44 | samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam |
| | 45 | }}} |
| | 46 | |
| | 47 | * Call peaks using MACS |
| | 48 | |
| | 49 | {{{ |
| | 50 | #use this only if using BAMPE, let MACS pileup and calculate extsize; duplicates were not removed |
| | 51 | macs2 callpeak -f BAMPE -t file.sorted.bam --broad --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks |
| | 52 | |
| | 53 | #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1. Removing duplicates is recommended. |
| | 54 | |
| | 55 | #BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual, |
| | 56 | "If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup." |
| | 57 | |
| | 58 | }}} |
| | 59 | |
| | 60 | * Additional notes on calling peaks using MACS: using MACS default options without BAMPE may not call the correct peaks. Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq. |
| | 61 | * MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size - works for finding accessible regions within cut sites. The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable. |
| | 62 | |
| | 63 | * Genrich is another piece of software for peak-calling. It has the advantage of running all of the post-alignment steps through peak-calling with one command. It can take multiple replicates. Detailed information can be found in [[https://informatics.fas.harvard.edu/atac-seq-guidelines.html|Harvard ATAC-seq Guidelines]] |
| | 64 | |
| | 65 | {{{ |
| | 66 | |
| | 67 | # need to sort by name |
| | 68 | samtools sort -n -T abc bowtie/ATAC.bam >ATAC_sortedByName.bam" |
| | 69 | Genrich -e chrM -j -r -v -t ATAC_sortedByName.bam -o peaks/ATAC_peak -f peaks/ATAC_log |
| | 70 | |
| | 71 | Where |
| | 72 | -j ATAC-seq mode (must be specified) intervals are interpreted that are centered on transposase cut sites (the ends of each DNA fragment). Only properly paired alignments are analyzed by default |
| | 73 | -r Remove PCR duplicates |
| | 74 | -d <int> Expand cut sites to the given length (default 100bp) |
| | 75 | -e <arg> Chromosomes (reference sequences) to exclude. Can be a comma-separated list, e.g. -e chrM,chrY. |
| | 76 | -E <file> Input BED file(s) of genomic regions to exclude, such as 'N' homopolymers or high mappability regions |
| | 77 | -t <file> Input SAM/BAM file(s) for experimental sample(s) |
| | 78 | -o <file> Output peak file (in ENCODE narrowPeak format) |
| | 79 | -f <LOG> Those who wish to explore the results of varying the peak-calling parameters (-q/-p, -a, -l, -g) should consider having Genrich produce a log file when it parses the SAM/BAM files (for example, with -f <LOG> added to the above command). Then, Genrich can call peaks directly from the log file with the -P option: |
| | 80 | Genrich -P -f <LOG> -o <OUT2> -p 0.01 -a 200 -v |
| | 81 | }}} |
| | 82 | |
| | 83 | |
| | 84 | * Other software: [[https://github.com/LiuLabUB/HMMRATAC | HMMRATAC]] |
| | 85 | |
| | 86 | |
| | 87 | |