| 127 | | ==== [http://rseqc.sourceforge.net/ | RSeQC]: RNA-Seq quality control package for getting mapping statistics (eg. unique/multi-mapped reads) ==== |
| 128 | | {{{ |
| 129 | | bam_stat.py -i myFile.bam |
| 130 | | }}} |
| 131 | | |
| 132 | | ==== [http://broadinstitute.github.io/picard/ | Picard]: CollectRnaSeqMetrics.jar to find coverage across gene body for 5' or 3' bias ==== |
| 133 | | {{{ |
| 134 | | java -jar /usr/local/share/picard-tools/CollectRnaSeqMetrics.jar INPUT=accepted_hits.bam REF_FLAT=refFlat.txt STRAND_SPECIFICITY=NONE OUTPUT=Out_RnaSeqMetrics.txt REFERENCE_SEQUENCE=hg19.fa CHART_OUTPUT=Out_RnaSeqMetrics.pdf |
| 135 | | }}} |
| 136 | | |
| 137 | | If you get an "SequenceListsDifferException" error from picard (using a BAM file from TopHat, for example), you may first need to reorder the header BAM header with a command like |
| 138 | | {{{ |
| 139 | | java -jar /usr/local/share/picard-tools/ReorderSam.jar INPUT=accepted_hits.bam OUTPUT=accepted_hits.reordered.bam REFERENCE=/path/to/reference/genome.fa |
| 140 | | }}} |
| 141 | | |
| 142 | | ==== [http://qualimap.bioinfo.cipf.es/ | QualiMap]: can be used on DNA or RNA-Seq to get summary of mapping and coverage/distribution ==== |
| 143 | | {{{ |
| 144 | | # Graphical interface: enter 'qualimap' on the command line |
| 145 | | # Command line: |
| 146 | | unset DISPLAY #needed for submitting to cluster |
| 147 | | bsub "qualimap bamqc -bam myFile.bam -outdir output_qualimap" |
| 148 | | # For huge data, you can increase memory with --java-mem-size="4800M" to avoid OutOfMemoryError: Java heap space |
| 149 | | |
| 150 | | |
| 151 | | #rnaseq qc |
| 152 | | bsub "qualimap rnaseq -bam myFile.bam -gtf Homo_sapiens.GRCh37.72.canonical.gtf -outdir output_qualimap_rnaseq -p non-strand-specific" |
| 153 | | |
| 154 | | #counts qc (after using htseq-count or similar program to generate a matrix of counts) |
| 155 | | qualimap counts -d countsqc_input.txt -c -s HUMAN -outdir counts_qc |
| 156 | | #Format of countsqc_input.txt (below), totalCounts.txt is a matrix of counts; header lines must be commented "#" and species is human or mouse only. |
| 157 | | #Sample Condition Path Column |
| 158 | | HMLE1 HMLE totalCounts.txt 2 |
| 159 | | HMLE2 HMLE totalCounts.txt 3 |
| 160 | | HMLE3 HMLE totalCounts.txt 4 |
| 161 | | N81 N8 totalCounts.txt 5 |
| 162 | | N82 N8 totalCounts.txt 6 |
| 163 | | N83 N8 totalCounts.txt 7 |
| 164 | | |
| 165 | | }}} |
| 166 | | |
| 167 | | |
| 168 | | ==== infer_experiment.py from RseQC package: can be used to check if the RNA-seq reads are stranded. ==== |
| 169 | | {{{ |
| 170 | | |
| 171 | | # Command line: |
| 172 | | |
| 173 | | bsub infer_experiment.py -i accepted_hits.bam -r hs.bed |
| 174 | | |
| 175 | | -i INPUT_FILE in SAM or BAM format |
| 176 | | -r Reference gene model in bed fomat. |
| 177 | | |
| 178 | | # sample output on strand-specific PE reads: |
| 179 | | This is PairEnd Data |
| 180 | | Fraction of reads explained by "1++,1--,2+-,2-+": 0.0193 |
| 181 | | Fraction of reads explained by "1+-,1-+,2++,2--": 0.9807 |
| 182 | | Fraction of reads explained by other combinations: 0.0000 |
| 183 | | |
| 184 | | # sample output on non-stranded PE reads: |
| 185 | | This is PairEnd Data |
| 186 | | Fraction of reads explained by "1++,1--,2+-,2-+": 0.5103 |
| 187 | | Fraction of reads explained by "1+-,1-+,2++,2--": 0.4897 |
| 188 | | Fraction of reads explained by other combinations: 0.0000 |
| 189 | | |
| 190 | | For pair-end RNA-seq, there are two different ways to strand reads: |
| 191 | | i) 1++,1--,2+-,2-+ |
| 192 | | read1 mapped to '+' strand indicates parental gene on '+' strand |
| 193 | | read1 mapped to '-' strand indicates parental gene on '-' strand |
| 194 | | read2 mapped to '+' strand indicates parental gene on '-' strand |
| 195 | | read2 mapped to '-' strand indicates parental gene on '+' strand |
| 196 | | ii) 1+-,1-+,2++,2-- |
| 197 | | read1 mapped to '+' strand indicates parental gene on '-' strand |
| 198 | | read1 mapped to '-' strand indicates parental gene on '+' strand |
| 199 | | read2 mapped to '+' strand indicates parental gene on '+' strand |
| 200 | | read2 mapped to '-' strand indicates parental gene on '-' strand |
| 201 | | |
| 202 | | For single-end RNA-seq, there are two different ways to strand reads: |
| 203 | | i) ++,-- |
| 204 | | read mapped to '+' strand indicates parental gene on '+' strand |
| 205 | | read mapped to '-' strand indicates parental gene on '-' strand |
| 206 | | ii) +-,-+ |
| 207 | | read mapped to '+' strand indicates parental gene on '-' strand |
| 208 | | read mapped to '-' strand indicates parental gene on '+' strand |
| 209 | | |
| 210 | | |
| 211 | | }}} |