Changes between Version 33 and Version 34 of SOPs/miningSAMBAM


Ignore:
Timestamp:
04/27/16 11:14:03 (9 years ago)
Author:
ibarrasa
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/miningSAMBAM

    v33 v34  
    1 == SAM/BAM summarizing, processing and quality control(QC) ==
     1== SAM/BAM summarizing and processing ==
    22
    33Many of these involve [http://samtools.sourceforge.net/samtools.shtml samtools]
     
    123123   * Converting a flag into its components may be easiest with the Picard [https://broadinstitute.github.io/picard/explain-flags.html Explain SAM Flags] tool.
    124124
    125 === QC to get a (visual) summary of mapping statistics.  For eg. coverage/distribution of mapped reads across the genome or transcriptome. ===
    126125
    127 ==== [http://rseqc.sourceforge.net/ | RSeQC]: RNA-Seq quality control package for getting mapping statistics (eg. unique/multi-mapped reads) ====
    128 {{{
    129 bam_stat.py -i myFile.bam
    130 }}}
    131 
    132 ==== [http://broadinstitute.github.io/picard/ | Picard]: CollectRnaSeqMetrics.jar to find coverage across gene body for 5' or 3' bias ====
    133 {{{
    134 java -jar /usr/local/share/picard-tools/CollectRnaSeqMetrics.jar INPUT=accepted_hits.bam REF_FLAT=refFlat.txt STRAND_SPECIFICITY=NONE OUTPUT=Out_RnaSeqMetrics.txt REFERENCE_SEQUENCE=hg19.fa CHART_OUTPUT=Out_RnaSeqMetrics.pdf
    135 }}}
    136 
    137 If you get an "SequenceListsDifferException" error from picard (using a BAM file from TopHat, for example), you may first need to reorder the header BAM header with a command like
    138 {{{
    139 java -jar /usr/local/share/picard-tools/ReorderSam.jar INPUT=accepted_hits.bam OUTPUT=accepted_hits.reordered.bam REFERENCE=/path/to/reference/genome.fa
    140 }}}
    141 
    142 ==== [http://qualimap.bioinfo.cipf.es/ | QualiMap]: can be used on DNA or RNA-Seq to get summary of mapping and coverage/distribution ====
    143 {{{
    144 # Graphical interface: enter 'qualimap' on the command line
    145 # Command line:
    146 unset DISPLAY  #needed for submitting to cluster
    147 bsub "qualimap bamqc -bam myFile.bam -outdir output_qualimap"
    148 # For huge data, you can increase memory with --java-mem-size="4800M" to avoid OutOfMemoryError: Java heap space
    149 
    150 
    151 #rnaseq qc
    152 bsub "qualimap rnaseq -bam myFile.bam -gtf Homo_sapiens.GRCh37.72.canonical.gtf -outdir output_qualimap_rnaseq -p non-strand-specific"
    153 
    154 #counts qc (after using htseq-count or similar program to generate a matrix of counts)
    155 qualimap counts -d countsqc_input.txt -c -s HUMAN -outdir counts_qc
    156 #Format of countsqc_input.txt (below), totalCounts.txt is a matrix of counts; header lines must be commented "#" and species is human or mouse only.
    157 #Sample Condition       Path    Column
    158 HMLE1   HMLE    totalCounts.txt 2
    159 HMLE2   HMLE    totalCounts.txt 3
    160 HMLE3   HMLE    totalCounts.txt 4
    161 N81     N8      totalCounts.txt 5
    162 N82     N8      totalCounts.txt 6
    163 N83     N8      totalCounts.txt 7
    164 
    165 }}}
    166 
    167 
    168 ==== infer_experiment.py from RseQC package: can be used to check if the RNA-seq reads are stranded. ====
    169 {{{
    170 
    171 # Command line:
    172 
    173 bsub infer_experiment.py -i accepted_hits.bam -r hs.bed
    174 
    175 -i INPUT_FILE in SAM or BAM format
    176 -r Reference gene model in bed fomat.
    177 
    178 # sample output on strand-specific PE reads:
    179 This is PairEnd Data
    180 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0193
    181 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9807
    182 Fraction of reads explained by other combinations: 0.0000
    183 
    184 # sample output on non-stranded PE reads:
    185 This is PairEnd Data
    186 Fraction of reads explained by "1++,1--,2+-,2-+": 0.5103
    187 Fraction of reads explained by "1+-,1-+,2++,2--": 0.4897
    188 Fraction of reads explained by other combinations: 0.0000
    189 
    190 For pair-end RNA-seq, there are two different ways to strand reads:
    191   i) 1++,1--,2+-,2-+
    192      read1 mapped to '+' strand indicates parental gene on '+' strand
    193      read1 mapped to '-' strand indicates parental gene on '-' strand
    194      read2 mapped to '+' strand indicates parental gene on '-' strand
    195      read2 mapped to '-' strand indicates parental gene on '+' strand
    196   ii) 1+-,1-+,2++,2--
    197      read1 mapped to '+' strand indicates parental gene on '-' strand
    198      read1 mapped to '-' strand indicates parental gene on '+' strand
    199      read2 mapped to '+' strand indicates parental gene on '+' strand
    200      read2 mapped to '-' strand indicates parental gene on '-' strand
    201 
    202 For single-end RNA-seq, there are two different ways to strand reads:
    203   i) ++,--
    204      read mapped to '+' strand indicates parental gene on '+' strand
    205      read mapped to '-' strand indicates parental gene on '-' strand
    206   ii) +-,-+
    207      read mapped to '+' strand indicates parental gene on '-' strand
    208      read mapped to '-' strand indicates parental gene on '+' strand
    209 
    210 
    211 }}}
    212126
    213127=== Split by strand by matched strand ===