Changes between Version 2 and Version 3 of SOPs/atac_Seq


Ignore:
Timestamp:
08/07/20 12:52:54 (4 years ago)
Author:
thiruvil
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/atac_Seq

    v2 v3  
    55ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a method to identify regions of accessible ("open") chromatin that are presumed to be available for binding by transcription factors and other DNA-binding proteins, which in turn can regulate genome function.  It can help answer experimental questions previously addressed by techniques like  MNase-seq, DNase-Seq, and FAIRE-Seq.
    66
    7 Basic Approach
     7=== Basic Approach ===
    88 * Pre-process reads (remove adapters and other "contamination")
    99 * Map reads to the genome (with an unspliced mapping tool)
     
    1515
    1616
     17=== ATAC-Seq Analysis Details ===
     18
     19  * Paired-end reads are better and recommended over single reads for ATAC-seq.  The original [[https://www.nature.com/articles/nmeth.2688 | ATAC-Seq method]] from the Greenleaf lab recommends PE reads, stating, "We found that ATAC-seq paired-end reads produced detailed information about nucleosome packing and positioning."
     20  * Check adapter contamination with fastqc. Remove adapters with cutadapt or trim_galore:
     21
     22{{{
     23# --fastqc                Run FastQC in the default mode on the FastQ file once trimming is complete.
     24# --nextera               Adapter sequence to be trimmed is the first 12bp of the Nextera adapter
     25                        'CTGTCTCTTATA' instead of the default auto-detection of adapter sequence.
     26trim_galore --fastqc  -nextera --paired --length 30 -o clean_adaptor_folder read1.fq read2.fq
     27}}}
     28
     29  * Map reads with bowtie2, for PE keep only concordant pairs
     30
     31{{{
     32# --very-sensitive options --no-discordant    suppress discordant alignments for paired reads
     33# -X/--maxins <int>  maximum fragment length (500). Increase to 2000 to get a better nucleosome distribution.
     34# --no-discordant    suppress discordant alignments for paired reads
     35
     36bowtie2 --very-sensitive --no-discordant -p 2 -X 2000 -x /nfs/genomes/human_hg38_dec13_no_random/bowtie/hg38 -1 read1.fq -2 read2.fq | samtools view -ub - | samtools sort -T $name - >| bowtie_out.bam
     37}}}
     38
     39  * Remove duplicates, if needed check fastqc to see deduplication level.  Remove with Picard MarkDuplicates or Samtools rmdup
     40
     41  * Remove reads mapped to mitochondria
     42
     43{{{
     44samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam
     45}}}
     46
     47  * Call peaks using MACS
     48
     49{{{
     50#use this only if using BAMPE, let MACS pileup and calculate extsize; duplicates were not removed
     51macs2 callpeak -f BAMPE -t file.sorted.bam --broad --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
     52
     53#note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.
     54
     55#BAMPE format: there is no special format for BAMPE - MACS will treat PE reads as coming from the same fragment, from the manual,
     56"If the BAM file is generated for paired-end data, MACS will only keep the left mate(5' end) tag. However, when format BAMPE is specified, MACS will use the real fragments inferred from alignment results for reads pileup."
     57
     58}}}
     59
     60  * Additional notes on calling peaks using MACS: using MACS default options without BAMPE may not call the correct peaks.  Alternative options/parameters in calling peaks using MACS, if BAMPE is not used are, --nomodel --shift s --extsize 2s, this can be used for single-end reads as well, e.g. --shift 100 --extsize 200 is suitable for ATAC-Seq.   
     61    * MACS' author, T.Liu, recommends using -f BAMPE if PE reads are used [[https://github.com/taoliu/MACS/issues/331]], using BAMPE option asks MACS to pileup and calculate the extension size -  works for finding accessible regions within cut sites.  The additional parameters can also be used to look only at the //exact// cut sites by Tn5 instead of the open/accessible regions [[https://github.com/taoliu/MACS/issues/145]], if so, -f BAMPE may not be suitable.
     62
     63  * Genrich is another piece of software for peak-calling. It has the advantage of running all of the post-alignment steps through peak-calling with one command. It can take multiple replicates. Detailed information can be found in [[https://informatics.fas.harvard.edu/atac-seq-guidelines.html|Harvard ATAC-seq Guidelines]]
     64
     65{{{
     66
     67# need to sort by name
     68samtools sort -n -T abc bowtie/ATAC.bam >ATAC_sortedByName.bam"
     69Genrich -e chrM -j -r -v -t ATAC_sortedByName.bam  -o peaks/ATAC_peak  -f peaks/ATAC_log
     70
     71Where
     72-j      ATAC-seq mode (must be specified)  intervals are interpreted that are centered on transposase cut sites (the ends of each DNA fragment). Only properly paired alignments are analyzed by default
     73-r      Remove PCR duplicates
     74-d <int>        Expand cut sites to the given length (default 100bp)
     75-e <arg>        Chromosomes (reference sequences) to exclude. Can be a comma-separated list, e.g. -e chrM,chrY.
     76-E <file>       Input BED file(s) of genomic regions to exclude, such as 'N' homopolymers or high mappability regions
     77-t  <file>       Input SAM/BAM file(s) for experimental sample(s)
     78-o  <file>       Output peak file (in ENCODE narrowPeak format)
     79-f <LOG>        Those who wish to explore the results of varying the peak-calling parameters (-q/-p, -a, -l, -g) should consider having Genrich produce a log file when it parses the SAM/BAM files (for example, with -f <LOG> added to the above command). Then, Genrich can call peaks directly from the log file with the -P option:
     80                        Genrich  -P  -f <LOG>  -o <OUT2>  -p 0.01  -a 200  -v
     81}}}
     82
     83
     84* Other software: [[https://github.com/LiuLabUB/HMMRATAC | HMMRATAC]]
     85
     86
     87
    1788
    1889