41 | | }}} |
42 | | |
43 | | == Determining the paired-end insert size for DNA samples == |
44 | | |
45 | | If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. |
46 | | |
47 | | When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details. |
48 | | |
49 | | |
50 | | Method 1: Get insert sizes from BAM file |
51 | | {{{ |
52 | | # Using a SAM file (at Unix command prompt) |
53 | | awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt |
54 | | # Using a BAM file (at Unix command prompt) |
55 | | samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt |
56 | | |
57 | | # and then process column of numbers with R (or Excel) |
58 | | # In R Session |
59 | | sizeFile = "s_1_insert_sizes.txt" |
60 | | sample.name = "My paired reads" |
61 | | distance = read.delim(sizeFile, h=F)[,1] |
62 | | pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5) |
63 | | hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)") |
64 | | dev.off() |
65 | | }}} |
66 | | |
67 | | Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net). This is also a good approximation for RNA samples. |
68 | | {{{ |
69 | | # |
70 | | # I=File Input SAM or BAM file. (Required) |
71 | | # O=File File to write the output to. (Required) |
72 | | # H=File File to write insert size histogram chart to. (Required) |
73 | | # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file |
74 | | # CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation) |
75 | | bsub java -jar /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf |
| 243 | = Analyzing short read quality (after mapping) = |
| 244 | |
| 245 | == Remove Duplicates == |
| 246 | * Remove duplicates, for eg. from PCR |
| 247 | |
| 248 | {{{ |
| 249 | #samtools command |
| 250 | samtools rmdup [-sS] <input.srt.bam> <output.bam> |
| 251 | -s or -S depending on PE data or not |
| 252 | }}} |
| 253 | |
| 254 | == Determining the paired-end insert size for DNA samples == |
| 255 | |
| 256 | If paired-end insert size or distance is unknown or need to be verified, it can be extracted from a BAM/SAM file after running Bowtie. |
| 257 | |
| 258 | When mapping with bowtie (or another mapper), the insert size can often be included as an input parameter (example for bowtie: -X 500), which can help with mapping. See the [[http://barcwiki.wi.mit.edu/wiki/SOPs/mapping|mapping SOP]] for mapping details. |
| 259 | |
| 260 | |
| 261 | Method 1: Get insert sizes from BAM file |
| 262 | {{{ |
| 263 | # Using a SAM file (at Unix command prompt) |
| 264 | awk -F "\t" '$9 > 0 {print $9}' s_1_bowtie.sam > s_1_insert_sizes.txt |
| 265 | # Using a BAM file (at Unix command prompt) |
| 266 | samtools view s_1_bowtie.bam | awk -F"\t" '$9 > 0 {print $9}' > s_1_insert_sizes.txt |
| 267 | |
| 268 | # and then process column of numbers with R (or Excel) |
| 269 | # In R Session |
| 270 | sizeFile = "s_1_insert_sizes.txt" |
| 271 | sample.name = "My paired reads" |
| 272 | distance = read.delim(sizeFile, h=F)[,1] |
| 273 | pdf(paste(sample.name, "insert.size.histogram.pdf", sep="."), w=11, h=8.5) |
| 274 | hist(distance, breaks=200, col="wheat", main=paste("Insert sizes for", sample.name), xlab="length (nt)") |
| 275 | dev.off() |
| 276 | }}} |
| 277 | |
| 278 | Method 2: Calculate insert sizes with CollectInsertSizeMetrics function from picard (http://picard.sourceforge.net). This is also a good approximation for RNA samples. |
| 279 | {{{ |
| 280 | # |
| 281 | # I=File Input SAM or BAM file. (Required) |
| 282 | # O=File File to write the output to. (Required) |
| 283 | # H=File File to write insert size histogram chart to. (Required) |
| 284 | # output: CollectInsertSizeMetrics.txt: values for -r and --mate-std-dev can be found in this text file |
| 285 | # CollectInsertSizeMetrics_hist.pdf: insert size histogram (graphic representation) |
| 286 | bsub java -jar /usr/local/share/picard-tools/CollectInsertSizeMetrics.jar I=foo.bam O=CollectInsertSizeMetrics.txt H=CollectInsertSizeMetrics_hist.pdf |
| 287 | }}} |
| 288 | |