| 30 | | * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene |
| | 30 | |
| | 31 | * Currently our favorite tool for this is [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package. |
| | 32 | * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads |
| | 33 | * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools) |
| | 34 | * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting. |
| | 35 | * Sample commands: |
| | 36 | {{{ |
| | 37 | # single-end reads (unstranded) |
| | 38 | featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| | 39 | # single-end reads (forward stranded) |
| | 40 | featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| | 41 | # single-end reads (reverse stranded) |
| | 42 | featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| | 43 | |
| | 44 | |
| | 45 | # paired-end reads (unstranded) |
| | 46 | featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| | 47 | # paired-end reads (forward stranded) |
| | 48 | featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| | 49 | # paired-end reads (reverse stranded) |
| | 50 | featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| | 51 | }}} |
| | 52 | |
| | 53 | * [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] works fine to get counts for each gene, but it's quite slow. |
| 56 | | |
| 57 | | * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package |
| 58 | | * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads |
| 59 | | * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools) |
| 60 | | * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting. |
| 61 | | * Sample commands: |
| 62 | | {{{ |
| 63 | | # single-end reads (unstranded) |
| 64 | | featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 65 | | # single-end reads (forward stranded) |
| 66 | | featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 67 | | # single-end reads (reverse stranded) |
| 68 | | featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 69 | | |
| 70 | | |
| 71 | | # paired-end reads (unstranded) |
| 72 | | featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 73 | | # paired-end reads (forward stranded) |
| 74 | | featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| 75 | | # paired-end reads (reverse stranded) |
| 76 | | featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| 77 | | }}} |
| 78 | | |
| 79 | | * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization. |
| | 79 | |