Changes between Version 40 and Version 41 of SOPs/atac_Seq


Ignore:
Timestamp:
05/27/21 09:23:17 (4 years ago)
Author:
byuan
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/atac_Seq

    v40 v41  
    4141}}}
    4242
     43  * Remove reads mapped to mitochondria.
     44{{{
     45samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam
     46}}}
     47  * Remove reads with low quality score: MAPQ < 30 with alignmentSieve from [[https://deeptools.readthedocs.io/en/develop/|DeepTools]]
     48{{{
     49alignmentSieve -b file.bam --minMappingQuality 30 -o MAPQ30.bam"
     50}}}
     51
    4352  * Remove duplicates with Picard's 'MarkDuplicates' or 'samtools rmdup'.
    44   * Check deduplication level with 'fastqc'.
    45   * Remove reads mapped to mitochondria.
    46 
    47 {{{
    48 samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam
    49 }}}
    50 
     53{{{
     54java -jar /usr/local/share/picard-tools/picard.jar MarkDuplicates I=foo.bam O=noDups.bam M=foo.marked_dup_metrics.txt REMOVE_DUPLICATES=true
     55}}}
     56  * Check deduplication level with [[http://barcwiki.wi.mit.edu/wiki/SOPs/qc_shortReads | 'fastqc']].
     57 
    5158   * For samples from human, mouse, fly, or C. elegans, one can prevent some probable false-positive peaks by removing reads that overlap "blacklisted" regions.  The blacklist, [https://www.nature.com/articles/s41598-019-45839-z popularized by ENCODE], is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The blacklist regions can be downloaded from [https://github.com/Boyle-Lab/Blacklist/].  We have them on Whitehead servers at /nfs/BaRC_datasets/ENCODE_blacklist/Blacklist/lists
    5259
     
    6269=== [=#QC Run quality control and calculate QC metrics] ===
    6370
    64 Calculate the fragment size distribution with the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package:
     71   * Calculate the fragment size distribution with the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package:
    6572{{{
    6673library("ATACseqQC")
     
    7178See [[https://www.nature.com/articles/nmeth.2688/figures/2 | Fig 2]] from Buenrostro et al. for the ideal distribution of fragment sizes.
    7279
    73 Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py)
     80   * Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py)
    7481{{{
    7582# USAGE: calculate_TSS_enrichment_score.py --outdir OUTDIR --outprefix OUTPREFIX --fastq1 FASTQ1 --tss TSS_BED --chromsizes CHROMSIZES --bam BAM
     
    7885}}}
    7986
    80 The [https://www.encodeproject.org/atac-seq/#standards ENCODE project] has some recommended TSS enrichment scores, depending on the genome.
    81 
    82 In addition to do varies quality controls, [[https://www.sciencedirect.com/science/article/pii/S240547122030079X | ataqv]] summarizes QC results into an interactive html page, which also allows you to view multiple samples together.
     87
     88  * The [https://www.encodeproject.org/atac-seq/#standards ENCODE project] has [[https://www.encodeproject.org/atac-seq/#standards|recommendations]] on TSS enrichment scores, fragment size distribution. You can also download ENCODE pipeline and analyze your samples with the pipeline. Its QC output html file includes quality controls results and interpretation. It also check on uniqueness of reads to estimates the library complexity. 
     89
     90  * In addition to do varies quality controls, [[https://www.sciencedirect.com/science/article/pii/S240547122030079X | ataqv]] summarizes QC results into an interactive html page, which also allows you to view multiple samples together.
    8391
    8492First, run ataqv on each bam file to generate JSON files.
     
    125133{{{
    126134
    127 macs2 callpeak -f BAMPE -t file.sorted.bam --broad --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
     135macs2 callpeak -f BAMPE -t file.sorted.bam --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks
    128136
    129137#note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1.  Removing duplicates is recommended.
     
    170178}}}
    171179
     180
    172181=== [=#Analyze Analyze peak regions for binding motifs] ===
    173182