| 41 | | }}} |
| 42 | | |
| 43 | | == Determining the paired-end insert size for DNA samples == |
| 44 | | |
| 45 | | If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. |
| 46 | | |
| 47 | | When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details. |
| 48 | | |
| 49 | | |
| 50 | | Method 1: Get insert sizes from BAM file |
| 51 | | {{{ |
| 52 | | # Using a SAM file (at Unix command prompt) |
| 53 | | awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt |
| 54 | | # Using a BAM file (at Unix command prompt) |
| 55 | | samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt |
| 56 | | |
| 57 | | # and then process column of numbers with R (or Excel) |
| 58 | | # In R Session |
| 59 | | sizeFile = "s_1_insert_sizes.txt" |
| 60 | | sample.name = "My paired reads" |
| 61 | | distance = read.delim(sizeFile, h=F)[,1] |
| 62 | | pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5) |
| 63 | | hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)") |
| 64 | | dev.off() |
| 65 | | }}} |
| 66 | | |
| 67 | | Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net). This is also a good approximation for RNA samples. |
| 68 | | {{{ |
| 69 | | # |
| 70 | | # I=File Input SAM or BAM file. (Required) |
| 71 | | # O=File File to write the output to. (Required) |
| 72 | | # H=File File to write insert size histogram chart to. (Required) |
| 73 | | # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file |
| 74 | | # CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation) |
| 75 | | bsub java -jar /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf |
| | 243 | = Analyzing short read quality (after mapping) = |
| | 244 | |
| | 245 | == Remove Duplicates == |
| | 246 | * Remove duplicates, for eg. from PCR |
| | 247 | |
| | 248 | {{{ |
| | 249 | #samtools command |
| | 250 | samtools rmdup [-sS] <input.srt.bam> <output.bam> |
| | 251 | -s or -S depending on PE data or not |
| | 252 | }}} |
| | 253 | |
| | 254 | == Determining the paired-end insert size for DNA samples == |
| | 255 | |
| | 256 | If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. |
| | 257 | |
| | 258 | When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details. |
| | 259 | |
| | 260 | |
| | 261 | Method 1: Get insert sizes from BAM file |
| | 262 | {{{ |
| | 263 | # Using a SAM file (at Unix command prompt) |
| | 264 | awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt |
| | 265 | # Using a BAM file (at Unix command prompt) |
| | 266 | samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt |
| | 267 | |
| | 268 | # and then process column of numbers with R (or Excel) |
| | 269 | # In R Session |
| | 270 | sizeFile = "s_1_insert_sizes.txt" |
| | 271 | sample.name = "My paired reads" |
| | 272 | distance = read.delim(sizeFile, h=F)[,1] |
| | 273 | pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5) |
| | 274 | hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)") |
| | 275 | dev.off() |
| | 276 | }}} |
| | 277 | |
| | 278 | Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net). This is also a good approximation for RNA samples. |
| | 279 | {{{ |
| | 280 | # |
| | 281 | # I=File Input SAM or BAM file. (Required) |
| | 282 | # O=File File to write the output to. (Required) |
| | 283 | # H=File File to write insert size histogram chart to. (Required) |
| | 284 | # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file |
| | 285 | # CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation) |
| | 286 | bsub java -jar /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf |
| | 287 | }}} |
| | 288 | |