| 66 | }}} |
| 67 | |
| 68 | === [=#postAlignQC Post-alignment quality control and calculate QC metrics] === |
| 69 | |
| 70 | * Calculate the fragment size distribution using BaRC code |
| 71 | (/nfs/BaRC_Public/BaRC_code/R/calculate_ATACseq_fragment_size_distribution/calculate_ATACseq_fragment_size_distribution.R) |
| 72 | {{{ |
| 73 | # USAGE: calculate_ATACseq_fragment_size_distribution.R input.bam output.pdf title_of_the_figure |
| 74 | calculate_ATACseq_fragment_size_distribution.R sample.bam sample_fragment_sizes.pdf Sample |
| 75 | }}} |
| 76 | |
| 77 | This script is a wrapper for fragSizeDist from the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package: |
| 78 | {{{ |
| 79 | library("ATACseqQC") |
| 80 | pdf("My_sample.fragment_sizes.pdf", w=11, h=8.5) |
| 81 | fragSizeDist("Mapped_reads.bam", "My_sample") |
| 82 | dev.off() |
| 83 | }}} |
| 84 | See [[https://www.nature.com/articles/nmeth.2688/figures/2 | Fig 2]] from Buenrostro et al. for the ideal distribution of fragment sizes. |
| 85 | |
| 86 | * Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py) |
| 87 | {{{ |
| 88 | # USAGE: calculate_TSS_enrichment_score.py --outdir OUTDIR --outprefix OUTPREFIX --fastq1 FASTQ1 --tss TSS_BED --chromsizes CHROMSIZES --bam BAM |
| 89 | |
| 90 | ./calculate_TSS_enrichment_score.py --outdir OUT_QC_1 --outprefix Sample_A --fastq1 ATACseq_reads.fq.gz --tss TSS.hg38.bed --chromsizes chromInfo.hg38.txt --bam ATACseq_mapped_reads.bam >| Sample_A.TSS_enrichment_score.txt |
154 | | |
155 | | |
156 | | |
157 | | === [=#Blacklist Blacklist filtering for peaks ] === |
158 | | * For samples from human, mouse, fly, or C. elegans, one can prevent some probable false-positive peaks by removing reads that overlap "blacklisted" regions. The blacklist, [https://www.nature.com/articles/s41598-019-45839-z popularized by ENCODE], is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The blacklist regions can be downloaded from [https://github.com/Boyle-Lab/Blacklist/]. We have them on Whitehead servers at /nfs/BaRC_datasets/ENCODE_blacklist/Blacklist/lists |
159 | | {{{ |
160 | | bedtools intersect -v -a ${PEAK} -b ${BLACKLIST} \ |
161 | | | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' \ |
162 | | | grep -P 'chr[\dXY]+[ \t]' | gzip -nc > ${FILTERED_PEAK} |
163 | | |
164 | | }}} |
165 | | |
166 | | === Subset peaks into reproducible peaks === |
| 180 | ==== Subset peaks into reproducible peaks ==== |
178 | | === [=#QC Run quality control and calculate QC metrics] === |
179 | | |
180 | | * Calculate the fragment size distribution with the [https://www.bioconductor.org/packages/release/bioc/html/ATACseqQC.html ATACseqQC] R package: |
181 | | {{{ |
182 | | library("ATACseqQC") |
183 | | pdf("My_sample.fragment_sizes.pdf", w=11, h=8.5) |
184 | | fragSizeDist("Mapped_reads.bam", "My_sample") |
185 | | dev.off() |
186 | | }}} |
187 | | See [[https://www.nature.com/articles/nmeth.2688/figures/2 | Fig 2]] from Buenrostro et al. for the ideal distribution of fragment sizes. |
188 | | |
189 | | * Calculate the TSS enrichment score (the degree to which transcription start sites show enrichment for ATAC-seq reads) using BaRC code (/nfs/BaRC_Public/BaRC_code/Python/calculate_TSS_enrichment_score/calculate_TSS_enrichment_score.py) |
190 | | {{{ |
191 | | # USAGE: calculate_TSS_enrichment_score.py --outdir OUTDIR --outprefix OUTPREFIX --fastq1 FASTQ1 --tss TSS_BED --chromsizes CHROMSIZES --bam BAM |
192 | | |
193 | | ./calculate_TSS_enrichment_score.py --outdir OUT_QC_1 --outprefix Sample_A --fastq1 ATACseq_reads.fq.gz --tss TSS.hg38.bed --chromsizes chromInfo.hg38.txt --bam ATACseq_mapped_reads.bam >| Sample_A.TSS_enrichment_score.txt |
194 | | }}} |
195 | | |
| 192 | |
| 193 | === [=#AfterPeakQC Quality control after calling peaks] === |
| 234 | |
| 235 | }}} |
| 236 | |
| 237 | |
| 238 | === [=#Blacklist Blacklist filtering for peaks ] === |
| 239 | * For samples from human, mouse, fly, or C. elegans, one can prevent some probable false-positive peaks by removing reads that overlap "blacklisted" regions. The blacklist, [https://www.nature.com/articles/s41598-019-45839-z popularized by ENCODE], is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The blacklist regions can be downloaded from [https://github.com/Boyle-Lab/Blacklist/]. We have them on Whitehead servers at /nfs/BaRC_datasets/ENCODE_blacklist/Blacklist/lists |
| 240 | {{{ |
| 241 | bedtools intersect -v -a ${PEAK} -b ${BLACKLIST} \ |
| 242 | | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' \ |
| 243 | | grep -P 'chr[\dXY]+[ \t]' | gzip -nc > ${FILTERED_PEAK} |