Changes between Version 40 and Version 41 of SOPs/atac_Seq
- Timestamp:
- 05/27/21 09:23:17 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/atac_Seq
v40 v41 41 41 }}} 42 42 43 * Remove reads mapped to mitochondria. 44 {{{ 45 samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam 46 }}} 47 * Remove reads with low quality score: MAPQ < 30 with alignmentSieve from [[https://deeptools.readthedocs.io/en/develop/|DeepTools]] 48 {{{ 49 alignmentSieve -b file.bam --minMappingQuality 30 -o MAPQ30.bam" 50 }}} 51 43 52 * Remove duplicates with Picard's 'MarkDuplicates' or 'samtools rmdup'. 44 * Check deduplication level with 'fastqc'. 45 * Remove reads mapped to mitochondria. 46 47 {{{ 48 samtools view -h file.bam | grep -v chrM | samtools view -b -h -f 0x2 - | samtools sort - > file.sorted.bam 49 }}} 50 53 {{{ 54 java -jar /usr/local/share/picard-tools/picard.jar MarkDuplicates I=foo.bam O=noDups.bam M=foo.marked_dup_metrics.txt REMOVE_DUPLICATES=true 55 }}} 56 * Check deduplication level with [[http://barcwiki.wi.mit.edu/wiki/SOPs/qc_shortReads | 'fastqc']]. 57 51 58 * For samples from human, mouse, fly, or C. elegans, one can prevent some probable false-positive peaks by removing reads that overlap "blacklisted" regions. The blacklist, [https://www.nature.com/articles/s41598-019-45839-z popularized by ENCODE], is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The blacklist regions can be downloaded from [https://github.com/Boyle-Lab/Blacklist/]. We have them on Whitehead servers at /nfs/BaRC_datasets/ENCODE_blacklist/Blacklist/lists 52 59 … … 62 69 === [=#QC Run quality control and calculate QC metrics] === 63 70 64 Calculate the fragment size distribution with the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package:71 * Calculate the fragment size distribution with the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package: 65 72 {{{ 66 73 library("ATACseqQC") … … 71 78 See [[https://www.nature.com/articles/nmeth.2688/figures/2 | Fig 2]] from Buenrostro et al. for the ideal distribution of fragment sizes. 72 79 73 Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py)80 * Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py) 74 81 {{{ 75 82 # USAGE: calculate_TSS_enrichment_score.py --outdir OUTDIR --outprefix OUTPREFIX --fastq1 FASTQ1 --tss TSS_BED --chromsizes CHROMSIZES --bam BAM … … 78 85 }}} 79 86 80 The [https://www.encodeproject.org/atac-seq/#standards ENCODE project] has some recommended TSS enrichment scores, depending on the genome. 81 82 In addition to do varies quality controls, [[https://www.sciencedirect.com/science/article/pii/S240547122030079X | ataqv]] summarizes QC results into an interactive html page, which also allows you to view multiple samples together. 87 88 * The [https://www.encodeproject.org/atac-seq/#standards ENCODE project] has [[https://www.encodeproject.org/atac-seq/#standards|recommendations]] on TSS enrichment scores, fragment size distribution. You can also download ENCODE pipeline and analyze your samples with the pipeline. Its QC output html file includes quality controls results and interpretation. It also check on uniqueness of reads to estimates the library complexity. 89 90 * In addition to do varies quality controls, [[https://www.sciencedirect.com/science/article/pii/S240547122030079X | ataqv]] summarizes QC results into an interactive html page, which also allows you to view multiple samples together. 83 91 84 92 First, run ataqv on each bam file to generate JSON files. … … 125 133 {{{ 126 134 127 macs2 callpeak -f BAMPE -t file.sorted.bam --broad--keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks135 macs2 callpeak -f BAMPE -t file.sorted.bam --keep-dup 1 -B -q 0.01 -g mm -n MACS_ATACSeq_Peaks 128 136 129 137 #note: if duplicates were removed, use --keep-dup all; if not, use --keep-dup 1. Removing duplicates is recommended. … … 170 178 }}} 171 179 180 172 181 === [=#Analyze Analyze peak regions for binding motifs] === 173 182