| 9 | | One may choose between bowtie version 1 (faster but ignores indels) and bowtie version 2 (slower but performs gapped alignment (i.e., indels)). For a feature comparision, see [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml How is Bowtie 2 different from Bowtie 1?] |
| 10 | | |
| 11 | | '''[http://bowtie-bio.sourceforge.net/index.shtml bowtie version 1]''' |
| 12 | | |
| 13 | | Sample command: |
| 14 | | {{{ |
| 15 | | bsub bowtie -k 1 -n 2 -l 50 --best --sam --solexa1.3-quals /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 Sample_A.fq Sample_A.mm9.k1.n2.l50.best.sam |
| 16 | | }}} |
| 17 | | |
| 18 | | Parameters included in the sample command: |
| 19 | | * '''-l/--seedlen <int>''' seed length for -n (default: 28) -- Set to longest possible length of high-quality bases (but no longer than 40-50, or mapping may become too stringent). Use the FastQC output to determine length of high-quality positions. |
| 20 | | * '''-n/--seedmms <int>''' max mismatches in seed (can be 0-3, default: -n 2) |
| 21 | | * '''-k <int>''' report up to <int> good alignments per read (default: 1) -- If you want only uniquely mapped reads, however, also use '-m 1' to ignore multi-mapped reads; use --all to report all alignments (much slower, ie. turn-off -k option) |
| 22 | | * '''--best''' (in the case of multi-mapped reads, keep only the best hit(s)) |
| 23 | | * '''--sam''' to get SAM output format (which is the best format for downstream analysis) |
| 24 | | |
| 25 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 26 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 27 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 28 | | * '''--phred33-quals''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| 29 | | |
| 30 | | To see other parameters log into tak and type '''bowtie''' |
| 31 | | |
| 32 | | |
| 33 | | '''[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie version 2]''' |
| 34 | | |
| 35 | | Bowtie 2 was designed as an improvement to bowtie 1, specifically, it supports gapped alignment. See the first [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml bowtie2 FAQ] for how they differ. Early versions of bowtie 2 had some issues, but these seem to have been fixed. Bowtie 2 uses a different set of genome index files (*.bt2) than bowtie 1 (*.ebwt). Bowtie 2 works with indels. |
| 36 | | |
| 37 | | Sample command: |
| 38 | | {{{ |
| 39 | | bsub bowtie2 --phred64 -L 22 -N 1 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U Sample_A.fq -S Sample_A.mm9.L22.N1.sam |
| 40 | | }}} |
| 41 | | |
| 42 | | The parameters included in the sample command: |
| 43 | | * '''-L <int>''' length of seed substrings; must be >3 and <32 (default=22) |
| 44 | | * '''-N <int>''' max # mismatches in seed alignment; can be 0 or 1 (default=0) |
| 45 | | * '''-S''' name of SAM output file |
| 46 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 47 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 48 | | * '''--phred64''' (for input quality scores from Illumina versions 1.3-1.7) |
| 49 | | * '''--phred33''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| 50 | | |
| 51 | | bowtie2 can also perform local alignments where the unaligned end(s) of a read are clipped (so, for example, remaining adapter won't prevent alignment) by adding the argument '''--local'''. |
| 52 | | |
| 53 | | The bowtie2 command can be modified to output mapped reads as BAM, such as |
| 54 | | |
| 55 | | {{{ |
| 56 | | bsub "bowtie2 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U s_7.txt | samtools view -bS - > s7_mm9.bam" |
| 57 | | }}} |
| 58 | | |
| 59 | | '''[http://bio-bwa.sourceforge.net/ bwa - Burrows-Wheeler Alignment Tool ]''' |
| 60 | | |
| 61 | | Bwa is a software package containing several related algorithms using the Burrows-Wheeler Transform. It works well even with indels, but not with spliced (RNA) reads. |
| 62 | | |
| 63 | | ''Sample commands for short (upto 100 bp) reads:'' |
| | 17 | === [=#bwa BWA] === |
| | 18 | |
| | 19 | The [[http://bio-bwa.sourceforge.net/ | Burrows-Wheeler Alignment (BWA) tool]] is a software package containing several related algorithms using the Burrows-Wheeler Transform. It works well even with indels, but not with spliced (RNA) reads. |
| | 20 | |
| | 21 | ''Sample commands for short (up to 100 bp) reads:'' |
| 86 | | [[BR]] |
| | 44 | |
| | 45 | One may choose between bowtie version 1 (faster but ignores indels) and bowtie version 2 (slower but performs gapped alignment (i.e., indels)). For a feature comparision, see [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml How is Bowtie 2 different from Bowtie 1?] |
| | 46 | |
| | 47 | '''[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie version 2]''' |
| | 48 | |
| | 49 | === [=#bowtie2 Bowtie2] === |
| | 50 | |
| | 51 | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] was designed as an improvement to bowtie 1, specifically, it supports gapped alignment. See the first [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml bowtie2 FAQ] for how they differ. Bowtie 2 uses a different set of genome index files (*.bt2) than bowtie 1 (*.ebwt). Bowtie 2 works with indels. |
| | 52 | |
| | 53 | Sample command: |
| | 54 | {{{ |
| | 55 | bsub bowtie2 --phred64 -L 22 -N 1 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U Sample_A.fq -S Sample_A.mm9.L22.N1.sam |
| | 56 | }}} |
| | 57 | |
| | 58 | The parameters included in the sample command: |
| | 59 | * '''-L <int>''' length of seed substrings; must be >3 and <32 (default=22) |
| | 60 | * '''-N <int>''' max # mismatches in seed alignment; can be 0 or 1 (default=0) |
| | 61 | * '''-S''' name of SAM output file |
| | 62 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| | 63 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| | 64 | * '''--phred64''' (for input quality scores from Illumina versions 1.3-1.7) |
| | 65 | * '''--phred33''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| | 66 | |
| | 67 | bowtie2 can also perform local alignments where the unaligned end(s) of a read are clipped (so, for example, remaining adapter won't prevent alignment) by adding the argument '''--local'''. |
| | 68 | |
| | 69 | The bowtie2 command can be modified to output mapped reads as BAM, such as |
| | 70 | |
| | 71 | {{{ |
| | 72 | bsub "bowtie2 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U s_7.txt | samtools view -bS - > s7_mm9.bam" |
| | 73 | }}} |
| | 74 | |
| | 75 | === [=#bowtie Bowtie] === |
| | 76 | |
| | 77 | [http://bowtie-bio.sourceforge.net/index.shtml Bowtie] may still have some advantages over bowtie2 for specific use cases. |
| | 78 | |
| | 79 | Sample command: |
| | 80 | {{{ |
| | 81 | bsub bowtie -k 1 -n 2 -l 50 --best --sam --solexa1.3-quals /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 Sample_A.fq Sample_A.mm9.k1.n2.l50.best.sam |
| | 82 | }}} |
| | 83 | |
| | 84 | Parameters included in the sample command: |
| | 85 | * '''-l/--seedlen <int>''' seed length for -n (default: 28) -- Set to longest possible length of high-quality bases (but no longer than 40-50, or mapping may become too stringent). Use the FastQC output to determine length of high-quality positions. |
| | 86 | * '''-n/--seedmms <int>''' max mismatches in seed (can be 0-3, default: -n 2) |
| | 87 | * '''-k <int>''' report up to <int> good alignments per read (default: 1) -- If you want only uniquely mapped reads, however, also use '-m 1' to ignore multi-mapped reads; use --all to report all alignments (much slower, ie. turn-off -k option) |
| | 88 | * '''--best''' (in the case of multi-mapped reads, keep only the best hit(s)) |
| | 89 | * '''--sam''' to get SAM output format (which is the best format for downstream analysis) |
| | 90 | |
| | 91 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| | 92 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| | 93 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| | 94 | * '''--phred33-quals''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| | 95 | |
| | 96 | To see other parameters log into tak and type '''bowtie''' |
| | 97 | |
| 177 | | |
| 178 | | '''tophat version 1 (old)''' |
| | 193 | === [=#tophat2 TopHat version 2] === |
| | 194 | |
| | 195 | '''[http://ccb.jhu.edu/software/tophat/index.shtml TopHat version 2] is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| | 196 | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| | 197 | |
| | 198 | Sample command: |
| | 199 | {{{ |
| | 200 | # Single-end reads |
| | 201 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| | 202 | # Paired-end reads |
| | 203 | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| | 204 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| | 205 | }}} |
| | 206 | |
| | 207 | The parameters included in the sample command are: |
| | 208 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| | 209 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| | 210 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| | 211 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| | 212 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| | 213 | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| | 214 | |
| | 215 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| | 216 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| | 217 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| | 218 | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| | 219 | |
| | 220 | Choices for controlling alignment (eg. mismatches) |
| | 221 | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| | 222 | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| | 223 | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| | 224 | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| | 225 | |
| | 226 | === [=#tophat TopHat version 1] === |
| 206 | | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
| 207 | | |
| 208 | | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| 209 | | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| 210 | | |
| 211 | | Sample command: |
| 212 | | {{{ |
| 213 | | # Single-end reads |
| 214 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 215 | | # Paired-end reads |
| 216 | | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| 217 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| 218 | | }}} |
| 219 | | |
| 220 | | The parameters included in the sample command are: |
| 221 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 222 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 223 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 224 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 225 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 226 | | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| 227 | | |
| 228 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 229 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 230 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 231 | | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| 232 | | |
| 233 | | Choices for controlling alignment (eg. mismatches) |
| 234 | | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| 235 | | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| 236 | | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| 237 | | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| 238 | | |