Changes between Version 28 and Version 29 of SOPs/SAMBAMqc
- Timestamp:
- 09/21/20 15:48:30 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/SAMBAMqc
v28 v29 1 1 = SAM/BAM quality control: Analyzing short read quality (after mapping) = 2 2 3 \\ 4 5 6 == Remove Duplicates == 7 * Remove duplicates, for eg. from PCR 8 9 {{{ 10 #samtools command 11 samtools rmdup [-sS] <input.srt.bam> <output.bam> 12 -s or -S depending on PE data or not 13 }}} 14 15 == Determining the paired-end insert size for DNA samples == 16 17 If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. 18 19 When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. 3 '''Contents''' 4 * [#dups Remove Duplicates] 5 * [#insert Determine the paired-end insert size for DNA samples] 6 * [#stats QC to get a (visual) summary of mapping statistics] 7 * [#analyze_dups Graphically analyze read duplication] 8 * [#qc Interpret quality control issues] 9 10 == [=#dups Remove Duplicates] == 11 12 Remove duplicates, for example, when one molecule is amplified via PCR and sequenced multiple times. 13 14 {{{ 15 # Use 'samtools rmdup' 16 # on single-end reads (-s option) or paired-end reads (-S option) 17 samtools rmdup [-sS] <input.srt.bam> <output.bam> 18 }}} 19 20 == [=#insert Determine the paired-end insert size for DNA samples] == 21 22 If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running an unspliced mapper. 23 24 When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. 20 25 See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping mapping SOP] for mapping details. 21 26 … … 50 55 You might need to specify a different java path if above command is not working. On local tak, you can use /usr/local/jre1.8/bin/java 51 56 52 == QC to get a (visual) summary of mapping statistics. For eg. coverage/distribution of mapped reads across the genome or transcriptome == 57 == [#stats QC to get a (visual) summary of mapping statistics] == 58 59 This includes the coverage/distribution of mapped reads across the genome or transcriptome 53 60 54 61 ==== Use [http://broadinstitute.github.io/picard/ Picard] CollectRnaSeqMetrics.jar to find coverage across gene body for 5' or 3' bias ==== … … 66 73 }}} 67 74 The VALIDATION_STRINGENCY=SILENT option will keep the program from crashing if it finds something unexpected. The default: VALIDATION_STRINGENCY=STRICT 68 69 70 71 75 72 76 … … 185 189 186 190 187 == Graphically analyze read duplication==191 == [=#analyze_dups Graphically analyze read duplication] == 188 192 189 193 The R/Bioconductor package [https://www.bioconductor.org/packages/release/bioc/html/dupRadar.html dupRadar] can do this, analyzing a BAM file that has had duplicates flagged (such as with Picard's MarkDuplicates tool). … … 197 201 }}} 198 202 199 == Interpreting quality control issues==203 == [=#qc Interpret quality control issues] == 200 204 201 205 See [https://sequencing.qcfail.com/ QCFAIL.com] from the Babraham Institute