Context Navigation

Changes between Version 33 and Version 34 of SOPs/miningSAMBAM

Timestamp:: 04/27/16 11:14:03 (9 years ago)
Author:: ibarrasa
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOPs/miningSAMBAM

-              v33
+              v34
 == SAM/BAM summarizing, processing and quality control(QC) ==
+== SAM/BAM summarizing and processing ==
 Many of these involve [http://samtools.sourceforge.net/samtools.shtml samtools]
 …
    * Converting a flag into its components may be easiest with the Picard [https://broadinstitute.github.io/picard/explain-flags.html Explain SAM Flags] tool.
-=== QC to get a (visual) summary of mapping statistics.  For eg. coverage/distribution of mapped reads across the genome or transcriptome. ===
-==== [http://rseqc.sourceforge.net/ | RSeQC]: RNA-Seq quality control package for getting mapping statistics (eg. unique/multi-mapped reads) ====
-{{{
-bam_stat.py -i myFile.bam
-}}}
-==== [http://broadinstitute.github.io/picard/ | Picard]: CollectRnaSeqMetrics.jar to find coverage across gene body for 5' or 3' bias ====
-{{{
-java -jar /usr/local/share/picard-tools/CollectRnaSeqMetrics.jar INPUT=accepted_hits.bam REF_FLAT=refFlat.txt STRAND_SPECIFICITY=NONE OUTPUT=Out_RnaSeqMetrics.txt REFERENCE_SEQUENCE=hg19.fa CHART_OUTPUT=Out_RnaSeqMetrics.pdf
-}}}
-If you get an "SequenceListsDifferException" error from picard (using a BAM file from TopHat, for example), you may first need to reorder the header BAM header with a command like
-{{{
-java -jar /usr/local/share/picard-tools/ReorderSam.jar INPUT=accepted_hits.bam OUTPUT=accepted_hits.reordered.bam REFERENCE=/path/to/reference/genome.fa
-}}}
-==== [http://qualimap.bioinfo.cipf.es/ | QualiMap]: can be used on DNA or RNA-Seq to get summary of mapping and coverage/distribution ====
-{{{
-# Graphical interface: enter 'qualimap' on the command line
-# Command line:
-unset DISPLAY  #needed for submitting to cluster
-bsub "qualimap bamqc -bam myFile.bam -outdir output_qualimap"
-# For huge data, you can increase memory with --java-mem-size="4800M" to avoid OutOfMemoryError: Java heap space
-#rnaseq qc
-bsub "qualimap rnaseq -bam myFile.bam -gtf Homo_sapiens.GRCh37.72.canonical.gtf -outdir output_qualimap_rnaseq -p non-strand-specific"
-#counts qc (after using htseq-count or similar program to generate a matrix of counts)
-qualimap counts -d countsqc_input.txt -c -s HUMAN -outdir counts_qc
-#Format of countsqc_input.txt (below), totalCounts.txt is a matrix of counts; header lines must be commented "#" and species is human or mouse only.
-#Sample Condition       Path    Column
-HMLE1   HMLE    totalCounts.txt 2
-HMLE2   HMLE    totalCounts.txt 3
-HMLE3   HMLE    totalCounts.txt 4
-N81     N8      totalCounts.txt 5
-N82     N8      totalCounts.txt 6
-N83     N8      totalCounts.txt 7
-}}}
-==== infer_experiment.py from RseQC package: can be used to check if the RNA-seq reads are stranded. ====
-{{{
-# Command line:
-bsub infer_experiment.py -i accepted_hits.bam -r hs.bed
--i INPUT_FILE in SAM or BAM format
--r Reference gene model in bed fomat.
-# sample output on strand-specific PE reads:
-This is PairEnd Data
-Fraction of reads explained by "1++,1--,2+-,2-+": 0.0193
-Fraction of reads explained by "1+-,1-+,2++,2--": 0.9807
-Fraction of reads explained by other combinations: 0.0000
-# sample output on non-stranded PE reads:
-This is PairEnd Data
-Fraction of reads explained by "1++,1--,2+-,2-+": 0.5103
-Fraction of reads explained by "1+-,1-+,2++,2--": 0.4897
-Fraction of reads explained by other combinations: 0.0000
-For pair-end RNA-seq, there are two different ways to strand reads:
-  i) 1++,1--,2+-,2-+
-     read1 mapped to '+' strand indicates parental gene on '+' strand
-     read1 mapped to '-' strand indicates parental gene on '-' strand
-     read2 mapped to '+' strand indicates parental gene on '-' strand
-     read2 mapped to '-' strand indicates parental gene on '+' strand
-  ii) 1+-,1-+,2++,2--
-     read1 mapped to '+' strand indicates parental gene on '-' strand
-     read1 mapped to '-' strand indicates parental gene on '+' strand
-     read2 mapped to '+' strand indicates parental gene on '+' strand
-     read2 mapped to '-' strand indicates parental gene on '-' strand
-For single-end RNA-seq, there are two different ways to strand reads:
-  i) ++,--
-     read mapped to '+' strand indicates parental gene on '+' strand
-     read mapped to '-' strand indicates parental gene on '-' strand
-  ii) +-,-+
-     read mapped to '+' strand indicates parental gene on '-' strand
-     read mapped to '-' strand indicates parental gene on '+' strand
-}}}
 === Split by strand by matched strand ===