Changes between Initial Version and Version 1 of SOPs/ShortReadExpDesign


Ignore:
Timestamp:
07/22/16 09:24:15 (9 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/ShortReadExpDesign

    v1 v1  
     1
     2= Experimental design of short read sequencing experiments =
     3
     4== How long should the reads be?  Should they be single or paired-end? ==
     5
     6 * What is the goal of your experiment?
     7   * For typical RNA-seq expression level quantification, a read or read pair gets one count, regardless of the length.  As a result, shorter reads may provide just as good data, as long as they aren't so short that repetitive mapping is a problem.
     8   * Longer and/or paired reads are surely beneficial if the experimental goal is
     9       * novel gene discovery: longer reads are much better at identifying novel splice junctions
     10   * For variant discovery, coverage is key, whether it's fewer long reads or more shorter reads (as long as the reads are long enough to map uniquely)
     11 * How much read length is used for primers, adapters, barcodes, etc.?  Of course make sure that enough actual experimental DNA is left for effective mapping.
     12
     13== If you are able to sequence more than one lane, how should the samples be partitioned? ==
     14
     15   * The magnitude of a lane effect is typically small but typically non-zero.
     16   * To balance any lane effect, sequence all of your samples on each of your lanes.
     17   * Another benefit of barcoding and mixing all samples together is that the samples can be re-sequenced in other lanes in the future (from the same library preparation) without unbalancing the experimental design.
     18
     19== How many reads are needed for each sample? ==
     20
     21== Calculating number of DNA or RNA reads needed to obtain the desired coverage ==
     22
     23 * Some useful references:
     24   * Sims et al., 2014.  [http://www.ncbi.nlm.nih.gov/pubmed/24434847 Sequencing depth and coverage: key considerations in genomic analyses.] 
     25       * Includes methods to estimate the number of reads required for single nucleotide variant calling, and RNA-seq and ChIP-seq experiments
     26   * Ajay et al., 2011. [http://www.ncbi.nlm.nih.gov/pubmed/21771779/ Accurate and comprehensive sequencing of personal genomes.] 
     27       * Includes methods to estimate the number of reads required for single nucleotide variant calling
     28
     29 * ''Example 1'' (genome sequencing): For a genome of 3e+9 nt, to get 35x coverage we would need:
     30   * For 40-nt reads:
     31      * 3e+9 * 35 / 40 = 2.625e+09 => ~2.6 billion reads
     32   * For 100-nt reads:
     33      * 3e+9 * 35 / 100 = 1.05e+09 => ~1 billion reads
     34
     35 *  '' Example 2'' (RNA_seq experiment):
     36   * If we have
     37     * 6 million 35x35-nt paired end reads
     38     * a genome with ~7000 genes expressed
     39     * average gene length = 5741 bp
     40   * then the total length of the transcriptome is 7000 x 5741 => 38,297,000 nt
     41   * and the total length of the reads is 6 million x 70 nt [35 + 35] => 420,000,000 nt
     42   * so the average coverage will be 420,000,000 / 38,297,000 => ~11x
     43   * but note that coverage will be very irregular to due a wide range of expression levels
     44
     45
     46