== Using RNA-Seq to quantify gene levels and assay for differential expression for transposable elements == === Background === * Transposable elements make up between 20 to 80% of the genome sequence for many eukaryotes, yet are typically excluded from the analysis that follows transcriptomic profiling with RNA-seq. This exclusion is due to the repetitive nature of transposons and the ambiguity that accompanies assigning multi-mapping reads. === Step by step analysis === * **Mapping** * Use [https://github.com/alexdobin/STAR STAR] or another spliced mapper to map short reads to the genome of choice. * See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping mapping SOP] for more details. * **Quantification of raw counts** * Is your sequencing library stranded or unstranded? This information is needed to help these tools accurately count features. Strandedness of some library prep methods: * TruSeq Stranded mRNA Kits ("TruSeqStrandedPolyA") reads are reverse stranded (stranded in the reverse direction relative to the transcript orientation). * SMART-Seq v4 Ultra Low Input RNA Kit ("SMARTerUltra-lowPOLYA-V4") reads are unstranded. * KAPA RNA HyperPrep Kits ("KAPAHyperPrepmRNA") reads are reverse stranded. * The Whitehead Genome Core has some more [http://genomecore.wi.mit.edu/index.php/NCBISubmission Library Prep Descriptions]. * See [[SAMBAMqc]] (and/or look at mapped reads in a genome browser) to determine or verify strandedness {{{ # single-end reads (unstranded) featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam # single-end reads (forward stranded) featureCounts -s 1 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam # single-end reads (reverse stranded) featureCounts -s 2 -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam # paired-end reads (unstranded) featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam # paired-end reads (forward stranded) featureCounts -p -s 1 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam # paired-end reads (reverse stranded) featureCounts -p -s 2 -a gene_annotations.gtf -o MySamples.featureCounts.txt *sortedByName.bam }}} * **Other** * Review articles: * [[http://www.ncbi.nlm.nih.gov/pubmed/24020486|Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.]] - Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D. Genome Biol. 2013 Sep 10;14(9):R95. * [[http://www.ncbi.nlm.nih.gov/pubmed/21106489|A survey of statistical software for analyzing RNA-seq data]] - Gao D, Kim J, Kim H, Phang TL, Selby H, Tan AC, Tong T. Hum Genomics. 2010 Oct;5(1):56-60. * [[http://www.ncbi.nlm.nih.gov/pubmed/21176179|From RNA-seq reads to differential expression results]] - Oshlack A, Robinson MD, Young MD. Genome Biol. 2010;11(12):220. Epub 2010 Dec 22. * For more practical information, see the third session of [http://jura.wi.mit.edu/bio/education/R2011/ An introduction to R and Bioconductor: A BaRC Short Course] and the [http://jura.wi.mit.edu/bio/hot_topics/ BaRC Hot Topic] (under "Short Read Sequencing", see "Practical RNA-Seq analysis")