Changes between Version 23 and Version 24 of SOPs/qc_shortReads


Ignore:
Timestamp:
04/27/16 11:20:16 (9 years ago)
Author:
ibarrasa
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/qc_shortReads

    v23 v24  
    243243= Analyzing short read quality (after mapping) =
    244244
    245 == Remove Duplicates ==
    246   * Remove duplicates, for eg. from PCR
    247 
    248  {{{
    249    #samtools command
    250     samtools rmdup [-sS] <input.srt.bam> <output.bam>
    251     -s or -S depending on PE data or not
    252 }}}
    253 
    254 == Determining the paired-end insert size for DNA samples ==
    255 
    256 If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. 
    257 
    258 When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping.  See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details.
    259 
    260 
    261 Method 1: Get insert sizes from BAM file
    262 {{{
    263    # Using a SAM file (at Unix command prompt)
    264    awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt
    265    # Using a BAM file (at Unix command prompt)
    266    samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt
    267 
    268    # and then process column of numbers with R (or Excel)
    269    # In R Session
    270    sizeFile = "s_1_insert_sizes.txt"
    271    sample.name = "My paired reads"
    272    distance = read.delim(sizeFile, h=F)[,1]
    273    pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5)
    274    hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)")
    275    dev.off()
    276 }}}
    277 
    278 Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net).  This is also a good approximation for RNA samples.
    279 {{{
    280    #
    281    # I=File    Input SAM or BAM file.  (Required)
    282    # O=File    File to write the output to.  (Required)
    283    # H=File    File to write insert size histogram chart to.  (Required)
    284    # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file
    285    #         CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation)
    286 bsub java -jar  /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf
    287 }}}
    288 
    289 == [RNA-seq only] Get global coverage profile across transcripts ==
    290 
    291 Do reads come from across the length of a typical transcript, or is there 3' or 5' bias (where most reads come from one end of a typical transcript)?
    292 
    293 One way to look at this is with Picard's CollectRnaSeqMetrics tool
    294 {{{
    295    # Usage:
    296    java -jar /usr/local/share/picard-tools/CollectRnaSeqMetrics.jar INPUT=bamFile REF_FLAT=refFlatFile STRAND_SPECIFICITY=NONE OUTPUT=outputFile REFERENCE_SEQUENCE=/path/to/genome.fa CHART_OUTPUT=output.pdf VALIDATION_STRINGENCY=SILENT
    297    # Example command
    298    java -jar /usr/local/share/picard-tools/CollectRnaSeqMetrics.jar INPUT=WT.bam REF_FLAT=/nfs/genomes/mouse_mm10_dec_11_no_random/anno/refFlat.txt STRAND_SPECIFICITY=NONE OUTPUT=QC_metrics/WT.RnaSeqMetrics.txt REFERENCE_SEQUENCE=/nfs/genomes/mouse_mm10_dec_11_no_random/fasta_whole_genome/mm10.fa CHART_OUTPUT=QC_metrics/WT.RnaSeqMetrics.pdf VALIDATION_STRINGENCY=SILENT
    299 
    300 }}}
    301 The VALIDATION_STRINGENCY=SILENT option will keep the program from crashing if it finds something unexpected.  The default: VALIDATION_STRINGENCY=STRICT
    302 
    303 
    304 = Interpreting quality control issues =
    305 
    306 See [[https://sequencing.qcfail.com/|QC Fail Sequencing]] from the Babraham Institute
     245See [[http://barcwiki.wi.mit.edu/wiki/SOPs/SAMBAMqc|SAM/BAM quality control]]
     246
     247