SOPs/InProgress – BaRC Wiki

Context Navigation

What is the goal of your experiment?
- For typical RNA-seq expression level quantification, a read or read pair gets one count, regardless of the length. As a result, shorter reads may provide just as good data, as long as they aren't so short that repetitive mapping is a problem.
- Longer and/or paired reads are surely beneficial if the experimental goal is
  - novel gene discovery: longer reads are much better at identifying novel splice junctions
- For variant discovery, coverage is key, whether it's fewer long reads or more shorter reads (as long as the reads are long enough to map uniquely)
How much read length is used for primers, adapters, barcodes, etc.? Of course make sure that enough actual experimental DNA is left for effective mapping.

The magnitude of a lane effect is typically small but typically non-zero.
To balance any lane effect, sequence all of your samples on each of your lanes.
Another benefit of barcoding and mixing all samples together is that the samples can be re-sequenced in other lanes in the future (from the same library preparation) without unbalancing the experimental design.

Some useful references:
- Sims et al., 2014. Sequencing depth and coverage: key considerations in genomic analyses.
  - Includes methods to estimate the number of reads required for single nucleotide variant calling, and RNA-seq and ChIP-seq experiments
- Ajay et al., 2011. Accurate and comprehensive sequencing of personal genomes.
  - Includes methods to estimate the number of reads required for single nucleotide variant calling

Example 1 (genome sequencing): For a genome of 3e+9 nt, to get 35x coverage we would need:
- For 40-nt reads:
  - 3e+9 * 35 / 40 = 2.625e+09 => ~2.6 billion reads
- For 100-nt reads:
  - 3e+9 * 35 / 100 = 1.05e+09 => ~1 billion reads

Note: See TracWiki for help on using the wiki.