| 1 | == Mining, summarizing, and processing SAM/BAM files == |
| 2 | |
| 3 | Many of these involve [http://samtools.sourceforge.net/samtools.shtml | samtools] |
| 4 | |
| 5 | === Convert, sort, and/or index === |
| 6 | |
| 7 | |
| 8 | {{{ |
| 9 | Convert SAM to BAM: |
| 10 | samtools view -bS -o foo.bam foo.sam |
| 11 | }}} |
| 12 | |
| 13 | {{{ |
| 14 | Convert BAM to SAM: |
| 15 | samtools view -h -o foo.sam foo.bam |
| 16 | }}} |
| 17 | {{{ |
| 18 | Sort BAM file (where ".bam" is added to "foo.sorted") |
| 19 | samtools sort foo.bam foo.sorted |
| 20 | }}} |
| 21 | {{{ |
| 22 | Index a sorted BAM file (which creates foo.sorted.bam.bai): |
| 23 | samtools index foo.sorted.bam |
| 24 | |
| 25 | Both foo.sorted.bam and foo.sorted.bam.bai are needed for visualization. |
| 26 | }}} |
| 27 | |
| 28 | === Count the number of mapped reads by chromosome === |
| 29 | |
| 30 | {{{ |
| 31 | Method 1 (all chromosomes) |
| 32 | 1 - Index the BAM file: |
| 33 | samtools index mapped_reads.bam |
| 34 | 2 - Get index statistics (including the number of mapped reads in the third column: |
| 35 | samtools idxstats mapped_reads.bam |
| 36 | }}} |
| 37 | |
| 38 | {{{ |
| 39 | Method 2 (one chromosome, without a BAM index) |
| 40 | From SAM |
| 41 | awk -F"\t" '$3 == "chr2" {print $1}' mapped_reads.sam | sort -u | wc -l |
| 42 | From BAM |
| 43 | samtools view mapped_reads.bam | awk -F"\t" '$3 == "chr2" {print $1}' | sort -u | wc -l |
| 44 | }}} |
| 45 | |
| 46 | === Remove unmapped reads === |
| 47 | |
| 48 | |
| 49 | {{{ |
| 50 | samtools view -hS -F 4 mapped_unmapped.sam > mapped_only.sam |
| 51 | }}} |
| 52 | |
| 53 | |
| 54 | === How many multiple/uniquely mapped reads are in a bam/sam file? |
| 55 | |
| 56 | {{{ |
| 57 | bam_stat.py -i mapped_reads.bam |
| 58 | }}} |