Context Navigation

Changes between Version 2 and Version 3 of SOPs/atac_Seq

Timestamp:: 08/07/20 12:52:54 (5 years ago)
Author:: thiruvil
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/atac_Seq

-              v2
+              v3
 ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a method to identify regions of accessible ("open") chromatin that are presumed to be available for binding by transcription factors and other DNA-binding proteins, which in turn can regulate genome function.  It can help answer experimental questions previously addressed by techniques like  MNase-seq, DNase-Seq, and FAIRE-Seq.
+Basic Approach
+=== Basic Approach ===
  * Pre-process reads (remove adapters and other "contamination")
  * Map reads to the genome (with an unspliced mapping tool)
 …
+=== ATAC-Seq Analysis Details ===
+  * Paired-end reads are better and recommended over single reads for ATAC-seq.  The original [[https://www.nature.com/articles/nmeth.2688 | ATAC-Seq method]] from the Greenleaf lab recommends PE reads, stating, "We found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning."
+  * Check adapter contamination with fastqc. Remove adapters with cutadapt or trim_galore:
+{{{
+# --fastqc                Run FastQC in the default mode on the FastQ file once trimming is complete.
+# --nextera               Adapter sequence to be trimmed is the first 12bp of the Nextera adapter
+                        'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence.
+trim_galore --fastqc  -nextera --paired --length 30 -o clean_adaptor_folder read1.fq read2.fq
+}}}
+  * Map reads with bowtie2, for PE keep only concordant pairs
+{{{
+# --very-sensitive options --no-discordant    suppress discordant alignments for paired reads
+# -X/--maxins <int>  maximum fragment length (500). Increase to 2000 to get a better nucleosome distribution.
+# --no-discordant    suppress discordant alignments for paired reads
+bowtie2 --very-sensitive --no-discordant -p 2 -X 2000 -x /nfs/genomes/human_hg38_dec13_no_random/bowtie/hg38 -1 read1.fq -2 read2.fq | samtools view -ub - | samtools sort -T $name - >| bowtie_out.bam
+}}}
+  * Remove duplicates, if needed check fastqc to see deduplication level.  Remove with Picard MarkDuplicates or Samtools rmdup
+  * Remove reads mapped to mitochondria
+{{{
+samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam
+}}}
+  * Call peaks using MACS
+{{{
+#use this only if using BAMPE, let MACS pileup and calculate extsize; duplicates were not removed
+macs2 callpeak -f BAMPE -t file.sorted.bam --broad --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
+#note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.
+#BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual,
+"If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup."
+}}}
+  * Additional notes on calling peaks using MACS: using MACS default options without BAMPE may not call the correct peaks.  Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq.
+    * MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size -  works for finding accessible regions within cut sites.  The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable.
+  * Genrich is another piece of software for peak-calling. It has the advantage of running all of the post-alignment steps through peak-calling with one command. It can take multiple replicates. Detailed information can be found in [[https://informatics.fas.harvard.edu/atac-seq-guidelines.html|Harvard ATAC-seq Guidelines]]
+{{{
+# need to sort by name
+samtools sort -n -T abc bowtie/ATAC.bam >ATAC_sortedByName.bam"
+Genrich -e chrM -j -r -v -t ATAC_sortedByName.bam  -o peaks/ATAC_peak  -f peaks/ATAC_log
+Where
+-j      ATAC-seq mode (must be specified)  intervals are interpreted that are centered on transposase cut sites (the ends of each DNA fragment). Only properly paired alignments are analyzed by default
+-r      Remove PCR duplicates
+-d <int>        Expand cut sites to the given length (default 100bp)
+-e <arg>        Chromosomes (reference sequences) to exclude. Can be a comma-separated list, e.g. -e chrM,chrY.
+-E <file>       Input BED file(s) of genomic regions to exclude, such as 'N' homopolymers or high mappability regions
+-t  <file>       Input SAM/BAM file(s) for experimental sample(s)
+-o  <file>       Output peak file (in ENCODE narrowPeak format)
+-f <LOG>        Those who wish to explore the results of varying the peak-calling parameters (-q/-p, -a, -l, -g) should consider having Genrich produce a log file when it parses the SAM/BAM files (for example, with -f <LOG> added to the above command). Then, Genrich can call peaks directly from the log file with the -P option:
+                        Genrich  -P  -f <LOG>  -o <OUT2>  -p 0.01  -a 200  -v
+}}}
+* Other software: [[https://github.com/LiuLabUB/HMMRATAC | HMMRATAC]]