30 | | * Typically we use [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] to get counts for each gene |
| 30 | |
| 31 | * Currently our favorite tool for this is [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package. |
| 32 | * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads |
| 33 | * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools) |
| 34 | * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting. |
| 35 | * Sample commands: |
| 36 | {{{ |
| 37 | # single-end reads (unstranded) |
| 38 | featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 39 | # single-end reads (forward stranded) |
| 40 | featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 41 | # single-end reads (reverse stranded) |
| 42 | featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 43 | |
| 44 | |
| 45 | # paired-end reads (unstranded) |
| 46 | featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
| 47 | # paired-end reads (forward stranded) |
| 48 | featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| 49 | # paired-end reads (reverse stranded) |
| 50 | featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
| 51 | }}} |
| 52 | |
| 53 | * [[http://www-huber.embl.de/users/anders/HTSeq/doc/count.html|htseq-count]] works fine to get counts for each gene, but it's quite slow. |
56 | | |
57 | | * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package |
58 | | * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads |
59 | | * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools) |
60 | | * featureCounts needs the paired-read BAM file to be sorted by read ID, but if it isn't, it'll do the sorting. |
61 | | * Sample commands: |
62 | | {{{ |
63 | | # single-end reads (unstranded) |
64 | | featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
65 | | # single-end reads (forward stranded) |
66 | | featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
67 | | # single-end reads (reverse stranded) |
68 | | featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
69 | | |
70 | | |
71 | | # paired-end reads (unstranded) |
72 | | featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam |
73 | | # paired-end reads (forward stranded) |
74 | | featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
75 | | # paired-end reads (reverse stranded) |
76 | | featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam |
77 | | }}} |
78 | | |
79 | | * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization. |
| 79 | |