Changes between Version 21 and Version 22 of SOPs/qc_shortReads


Ignore:
Timestamp:
04/26/16 09:29:51 (9 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/qc_shortReads

    v21 v22  
    33\\
    44
    5 = Analyzing short read quality =
     5= Analyzing short read quality (before mapping) =
    66
    77\\
     
    3939{{{
    4040   bsub “/nfs/BaRC_Public/BaRC_code/Perl/cmpfastq/cmpfastq.pl s_8_1_sequence.txt s_8_2_sequence.txt”  # fastq inputs
    41 }}}
    42 
    43 == Determining the paired-end insert size for DNA samples ==
    44 
    45 If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. 
    46 
    47 When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping.  See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details.
    48 
    49 
    50 Method 1: Get insert sizes from BAM file
    51 {{{
    52    # Using a SAM file (at Unix command prompt)
    53    awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt
    54    # Using a BAM file (at Unix command prompt)
    55    samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt
    56 
    57    # and then process column of numbers with R (or Excel)
    58    # In R Session
    59    sizeFile = "s_1_insert_sizes.txt"
    60    sample.name = "My paired reads"
    61    distance = read.delim(sizeFile, h=F)[,1]
    62    pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5)
    63    hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)")
    64    dev.off()
    65 }}}
    66 
    67 Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net).  This is also a good approximation for RNA samples.
    68 {{{
    69    #
    70    # I=File    Input SAM or BAM file.  (Required)
    71    # O=File    File to write the output to.  (Required)
    72    # H=File    File to write insert size histogram chart to.  (Required)
    73    # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file
    74    #         CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation)
    75 bsub java -jar  /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf
    7641}}}
    7742
     
    234199}}}
    235200
    236 == Remove Duplicates ==
    237   * Remove duplicates, for eg. from PCR
    238 
    239  {{{
    240    #samtools command
    241     samtools rmdup [-sS] <input.srt.bam> <output.bam>
    242     -s or -S depending on PE data or not
    243 }}}
    244 
    245201== Select reads that are paired [for paired-end sequencing]  ==
    246202
     
    285241}}}
    286242
     243= Analyzing short read quality (after mapping) =
     244
     245== Remove Duplicates ==
     246  * Remove duplicates, for eg. from PCR
     247
     248 {{{
     249   #samtools command
     250    samtools rmdup [-sS] <input.srt.bam> <output.bam>
     251    -s or -S depending on PE data or not
     252}}}
     253
     254== Determining the paired-end insert size for DNA samples ==
     255
     256If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. 
     257
     258When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping.  See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details.
     259
     260
     261Method 1: Get insert sizes from BAM file
     262{{{
     263   # Using a SAM file (at Unix command prompt)
     264   awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt
     265   # Using a BAM file (at Unix command prompt)
     266   samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt
     267
     268   # and then process column of numbers with R (or Excel)
     269   # In R Session
     270   sizeFile = "s_1_insert_sizes.txt"
     271   sample.name = "My paired reads"
     272   distance = read.delim(sizeFile, h=F)[,1]
     273   pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5)
     274   hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)")
     275   dev.off()
     276}}}
     277
     278Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net).  This is also a good approximation for RNA samples.
     279{{{
     280   #
     281   # I=File    Input SAM or BAM file.  (Required)
     282   # O=File    File to write the output to.  (Required)
     283   # H=File    File to write insert size histogram chart to.  (Required)
     284   # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file
     285   #         CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation)
     286bsub java -jar  /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf
     287}}}
     288
    287289= Interpreting quality control issues =
    288290