Changes between Version 38 and Version 39 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
05/10/17 10:13:13 (8 years ago)
Author:
thiruvil
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v38 v39  
    3434      * Counting can be also performed transcript level but this may be too much information.
    3535      * htseq-count can accept SAM or BAM (-f option), for example
    36          * ''htseq-count -f bam -m intersection-strict --stranded=reverse MySample.accepted_hits.sortedByName.bam gene_models.gtf >| MySample.counts.txt''
    37          * Note that htseq-count assumes that your reads are strand-specific; default is --stranded=yes
    38          * If your reads are not stranded, use "--stranded=no" (or half of your reads won't be counted).  If they're stranded and in the opposite direction from the transcript, use "--stranded=reverse".
    39          * Note the "-" can be used for input from standard input (stdin)
    40          * For paired-end reads the sam file has to be sorted by read name: ''bsub  "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"''
    41          * To request a certain amount of memory and a specific node use ''bsub  -R "rusage[mem=50000]" -m NodeName ''
    42          * Remove the rows at the bottom with descriptions like no_feature, ambiguous, etc.
     36
     37{{{
     38#Note that htseq-count assumes that your reads are strand-specific; default is --stranded=yes
     39#If your reads are not stranded, use "--stranded=no" (or half of your reads won't be counted).
     40#If they're stranded and in the opposite direction from the transcript, use "--stranded=reverse".
     41
     42htseq-count -f bam -m intersection-strict --stranded=reverse MySample.accepted_hits.sortedByName.bam gene_models.gtf > MySample.counts.txt
     43
     44#Examine to see the last rows at the bottom with descriptions like no_feature, ambiguous, etc.
     45#if too many reads are thrown out.  Otherwise, these rows can be removed for downstream analysis.
     46
     47#For PE reads the bam files needs to be sorted by name (default for htseq-count), eg.
     48#bsub "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"
     49#If the bam file is sorted by coordinate you may try htseq-count -r option, eg. -r pos , however,
     50#this may not always work (htseq-count throws numerous errors).
     51
     52#To request a certain amount of memory and a specific node use bsub -R "rusage[mem=50000]" -m NodeName
     53 
     54}}}
     55       
    4356    * Another tool to use [[http://bioinf.wehi.edu.au/featureCounts/|featureCounts]], part of the [[http://subread.sourceforge.net/|Subread]] package
    4457      * featureCounts is much faster than htseq-count, but the details of its counting method is quite different from that of htseq-count, especially for paired-end reads
    4558      * See [[http://www.ncbi.nlm.nih.gov/pubmed/24227677|Liao et al., 2014]] for details of the method (and comparisons with other counting tools)
    4659      * Sample commands:
    47          * ''featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (single-end reads)
    48          * ''featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam'' (paired-end reads)
     60{{{
     61#single-end reads
     62featureCounts -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     63#PE reads
     64featureCounts -p -a gene_anotations.gtf -o MySample.featureCounts.txt MySample.bam
     65}}}
    4966    * For some analyses (or for visualization), you can add a pseudocount (such as 1 or another small number) to all genes in all samples to prevent log2 ratios that require dividing by 0 and reduce background count noise -- BUT be aware that some statistical methods (like DESeq) require raw input values without any pseudocounts or normalization.
    5067