SOPs/atac_Seq – BaRC Wiki

Context Navigation

Version 5 (modified by thiruvil, 5 years ago) ( diff )
--

Using ATAC-Seq to identify open chromatin

ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a method to identify regions of accessible ("open") chromatin that are presumed to be available for binding by transcription factors and other DNA-binding proteins, which in turn can regulate genome function. It can help answer experimental questions previously addressed by techniques like MNase-seq, DNase-Seq, and FAIRE-Seq.

Basic Approach

Pre-process reads (remove adapters and other "contamination")
Map reads to the genome (with an unspliced mapping tool)
Run quality control and calculate QC metrics
Identify ATAC-seq "peaks" with a high coverage of mapped reads
Analyze peak regions for binding motifs
Identify differentially accessible regions (for multiple-sample experiments)
Identify potential targets of peak regions and/or differentially accessible regions

ATAC-Seq Analysis Details

Paired-end reads are better and recommended over single reads for ATAC-seq. The original ATAC-Seq method from the Greenleaf lab recommends PE reads, stating, "We found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning."
Check adapter contamination with fastqc. Remove adapters with cutadapt or trim_galore:

# --fastqc                Run FastQC in the default mode on the FastQ file once trimming is complete.
# --nextera               Adapter sequence to be trimmed is the first 12bp of the Nextera adapter
                        'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence.
trim_galore --fastqc  -nextera --paired --length 30 -o clean_adaptor_folder read1.fq read2.fq

Map reads with bowtie2, for PE keep only concordant pairs

# --very-sensitive options --no-discordant    suppress discordant alignments for paired reads
# -X/--maxins <int>  maximum fragment length (500). Increase to 2000 to get a better nucleosome distribution.
# --no-discordant    suppress discordant alignments for paired reads

bowtie2 --very-sensitive --no-discordant -p 2 -X 2000 -x /nfs/genomes/human_hg38_dec13_no_random/bowtie/hg38 -1 read1.fq -2 read2.fq | samtools view -ub - | samtools sort -T $name - >| bowtie_out.bam

Remove duplicates, if needed check fastqc to see deduplication level. Remove with Picard MarkDuplicates or Samtools rmdup

Remove reads mapped to mitochondria

samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam

Call peaks using MACS

#use this only if using BAMPE, let MACS pileup and calculate extsize; duplicates were not removed
macs2 callpeak -f BAMPE -t file.sorted.bam --broad --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks

#note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.

#BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual, 
"If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup."

Additional notes on calling peaks using MACS: using MACS default options without BAMPE may not call the correct peaks. Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq.
- MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used https://github.com/taoliu/MACS/issues/331, using BAMPE option asks MACS to pileup and calculate the extension size - works for finding accessible regions within cut sites. The additional parameters can also be used to look only at the exact cut sites by Tn5 instead of the open/accessible regions https://github.com/taoliu/MACS/issues/145, if so, -f BAMPE may not be suitable.

Genrich is another piece of software for peak-calling. It has the advantage of running all of the post-alignment steps through peak-calling with one command. It can take multiple replicates. Detailed information can be found in Harvard ATAC-seq Guidelines

# need to sort by name
samtools sort -n -T abc bowtie/ATAC.bam >ATAC_sortedByName.bam"
Genrich -e chrM -j -r -v -t ATAC_sortedByName.bam  -o peaks/ATAC_peak  -f peaks/ATAC_log

Where
-j      ATAC-seq mode (must be specified)  intervals are interpreted that are centered on transposase cut sites (the ends of each DNA fragment). Only properly paired alignments are analyzed by default
-r      Remove PCR duplicates
-d <int>        Expand cut sites to the given length (default 100bp)
-e <arg>        Chromosomes (reference sequences) to exclude. Can be a comma-separated list, e.g. -e chrM,chrY.
-E <file>       Input BED file(s) of genomic regions to exclude, such as 'N' homopolymers or high mappability regions
-t  <file>       Input SAM/BAM file(s) for experimental sample(s)
-o  <file>       Output peak file (in ENCODE narrowPeak format)
-f <LOG>        Those who wish to explore the results of varying the peak-calling parameters (-q/-p, -a, -l, -g) should consider having Genrich produce a log file when it parses the SAM/BAM files (for example, with -f <LOG> added to the above command). Then, Genrich can call peaks directly from the log file with the -P option:
                        Genrich  -P  -f <LOG>  -o <OUT2>  -p 0.01  -a 200  -v

Tips/Recommendations for ATAC-Seq

Based on presentation by H.Liu at BioC 2020 presentation

post-alignment filtering: remove mito/ChrM reads, remove duplciates, remove mapping artifacts: < 38bp and fragments > 2kb, and discordant reads
- The NRF should be > 38bp to < 147 bases (one nucleosome); mono nucleosome fragments are in the range of 147 -200 bp
- don't recommend to remove di-, tri nucleosome fragments for downstream analysis unless you have a great number of reads. Otherwise the you might miss many open chromatin regions during peak calling.
check fragment size distribution (from ATACSeqQC)
- see Fig 2 from Buenrostro, J.D., et al. for expected distribution of fragment size
Tn5 produces 5’ overhangs of 9 bases long: pos. strand + 4 and neg strand -5 (shiftGAlignmentsList and shiftReads function) splitGcAlignmentsByCut: creates different bins of reads: nucleosome free regions (NFR), mono, di, etc. Shifted reads that do not fit into any of the bins should be discarded.
use housekeeping genes to check QC: signal enrichment is expected in the regulatory regions of housekeeping genes in good ATACSeq eperiments. Use IGVSnapshot function with geneNames param. splitGAlignmentsByCut: creates different bins of reads: NFR, mono, di, etc. Shifted reads that do not fit into any of the bins should be discarded.

Other software: HMMRATAC

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text