51 | | * For samples from human/mouse/fly/c.elegans, remove reads mapped to the ENCODE blacklist. Blacklist is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. [https://www.nature.com/articles/s41598-019-45839-z ENCODE paper ]. The blacklist can be downloaded from [https://github.com/Boyle-Lab/Blacklist/]. Blacklist can be filtered with alignmentSieve from deepTools package: |
52 | | {{{ |
53 | | alignmentSieve -b foo.bam --blackListFileName hg38-blacklist.bed -o no_blackList.bam |
| 51 | * For samples from human, mouse, fly, or C. elegans, one can prevent some probable false-positive peaks by removing reads that overlap "blacklisted" regions. The blacklist, [https://www.nature.com/articles/s41598-019-45839-z popularized by ENCODE], is a a comprehensive set of genomic regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The blacklist regions can be downloaded from [https://github.com/Boyle-Lab/Blacklist/]. We have them on Whitehead servers at /nfs/BaRC_datasets/ENCODE_blacklist/Blacklist/lists |
| 52 | |
| 53 | * Reads overlapping the blacklist can be filtered using alignmentSieve (from the deepTools package) or 'intersectBed -v' (from the bedtools suite): |
| 54 | {{{ |
| 55 | alignmentSieve -b Reads.bam --blackListFileName hg38-blacklist.bed -o Reads.no_blackList.bam |
| 56 | intersectBed -v -a Reads.bam -b hg38-blacklist.bed > Reads.no_blackList.bam |