Changes between Version 28 and Version 29 of SOPs/SAMBAMqc


Ignore:
Timestamp:
09/21/20 15:48:30 (4 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/SAMBAMqc

    v28 v29  
    11= SAM/BAM quality control: Analyzing short read quality (after mapping) =
    22
    3 \\
    4 
    5 
    6 == Remove Duplicates ==
    7   * Remove duplicates, for eg. from PCR
    8 
    9  {{{
    10    #samtools command
    11     samtools rmdup [-sS] <input.srt.bam> <output.bam>
    12     -s or -S depending on PE data or not
    13 }}}
    14 
    15 == Determining the paired-end insert size for DNA samples ==
    16 
    17 If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. 
    18 
    19 When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. 
     3'''Contents'''
     4  * [#dups Remove Duplicates]
     5  * [#insert Determine the paired-end insert size for DNA samples]
     6  * [#stats QC to get a (visual) summary of mapping statistics]
     7  * [#analyze_dups Graphically analyze read duplication]
     8  * [#qc Interpret quality control issues]
     9
     10== [=#dups Remove Duplicates] ==
     11
     12Remove duplicates, for example, when one molecule is amplified via PCR and sequenced multiple times.
     13
     14{{{
     15# Use 'samtools rmdup'
     16# on single-end reads (-s option) or paired-end reads (-S option)
     17samtools rmdup [-sS] <input.srt.bam> <output.bam>
     18}}}
     19
     20== [=#insert Determine the paired-end insert size for DNA samples] ==
     21
     22If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running an unspliced mapper. 
     23
     24When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping.
    2025See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping  mapping SOP] for mapping details.
    2126
     
    5055You might need to specify a different java path if above command is not working. On local tak, you can use /usr/local/jre1.8/bin/java
    5156
    52 == QC to get a (visual) summary of mapping statistics.  For eg. coverage/distribution of mapped reads across the genome or transcriptome ==
     57== [#stats QC to get a (visual) summary of mapping statistics]  ==
     58
     59This includes the coverage/distribution of mapped reads across the genome or transcriptome
    5360
    5461==== Use [http://broadinstitute.github.io/picard/ Picard] CollectRnaSeqMetrics.jar to find coverage across gene body for 5' or 3' bias ====
     
    6673}}}
    6774The VALIDATION_STRINGENCY=SILENT option will keep the program from crashing if it finds something unexpected.  The default: VALIDATION_STRINGENCY=STRICT
    68 
    69 
    70 
    7175
    7276
     
    185189
    186190
    187 == Graphically analyze read duplication ==
     191== [=#analyze_dups Graphically analyze read duplication] ==
    188192
    189193The R/Bioconductor package [https://www.bioconductor.org/packages/release/bioc/html/dupRadar.html dupRadar] can do this, analyzing a BAM file that has had duplicates flagged (such as with Picard's MarkDuplicates tool).
     
    197201}}}
    198202
    199 == Interpreting quality control issues ==
     203== [=#qc Interpret quality control issues] ==
    200204
    201205See [https://sequencing.qcfail.com/ QCFAIL.com] from the Babraham Institute