9 | | One may choose between bowtie version 1 (faster but ignores indels) and bowtie version 2 (slower but performs gapped alignment (i.e., indels)). For a feature comparision, see [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml How is Bowtie 2 different from Bowtie 1?] |
10 | | |
11 | | '''[http://bowtie-bio.sourceforge.net/index.shtml bowtie version 1]''' |
12 | | |
13 | | Sample command: |
14 | | {{{ |
15 | | bsub bowtie -k 1 -n 2 -l 50 --best --sam --solexa1.3-quals /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 Sample_A.fq Sample_A.mm9.k1.n2.l50.best.sam |
16 | | }}} |
17 | | |
18 | | Parameters included in the sample command: |
19 | | * '''-l/--seedlen <int>''' seed length for -n (default: 28) -- Set to longest possible length of high-quality bases (but no longer than 40-50, or mapping may become too stringent). Use the FastQC output to determine length of high-quality positions. |
20 | | * '''-n/--seedmms <int>''' max mismatches in seed (can be 0-3, default: -n 2) |
21 | | * '''-k <int>''' report up to <int> good alignments per read (default: 1) -- If you want only uniquely mapped reads, however, also use '-m 1' to ignore multi-mapped reads; use --all to report all alignments (much slower, ie. turn-off -k option) |
22 | | * '''--best''' (in the case of multi-mapped reads, keep only the best hit(s)) |
23 | | * '''--sam''' to get SAM output format (which is the best format for downstream analysis) |
24 | | |
25 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
26 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
27 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
28 | | * '''--phred33-quals''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
29 | | |
30 | | To see other parameters log into tak and type '''bowtie''' |
31 | | |
32 | | |
33 | | '''[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie version 2]''' |
34 | | |
35 | | Bowtie 2 was designed as an improvement to bowtie 1, specifically, it supports gapped alignment. See the first [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml bowtie2 FAQ] for how they differ. Early versions of bowtie 2 had some issues, but these seem to have been fixed. Bowtie 2 uses a different set of genome index files (*.bt2) than bowtie 1 (*.ebwt). Bowtie 2 works with indels. |
36 | | |
37 | | Sample command: |
38 | | {{{ |
39 | | bsub bowtie2 --phred64 -L 22 -N 1 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U Sample_A.fq -S Sample_A.mm9.L22.N1.sam |
40 | | }}} |
41 | | |
42 | | The parameters included in the sample command: |
43 | | * '''-L <int>''' length of seed substrings; must be >3 and <32 (default=22) |
44 | | * '''-N <int>''' max # mismatches in seed alignment; can be 0 or 1 (default=0) |
45 | | * '''-S''' name of SAM output file |
46 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
47 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
48 | | * '''--phred64''' (for input quality scores from Illumina versions 1.3-1.7) |
49 | | * '''--phred33''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
50 | | |
51 | | bowtie2 can also perform local alignments where the unaligned end(s) of a read are clipped (so, for example, remaining adapter won't prevent alignment) by adding the argument '''--local'''. |
52 | | |
53 | | The bowtie2 command can be modified to output mapped reads as BAM, such as |
54 | | |
55 | | {{{ |
56 | | bsub "bowtie2 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U s_7.txt | samtools view -bS - > s7_mm9.bam" |
57 | | }}} |
58 | | |
59 | | '''[http://bio-bwa.sourceforge.net/ bwa - Burrows-Wheeler Alignment Tool ]''' |
60 | | |
61 | | Bwa is a software package containing several related algorithms using the Burrows-Wheeler Transform. It works well even with indels, but not with spliced (RNA) reads. |
62 | | |
63 | | ''Sample commands for short (upto 100 bp) reads:'' |
| 17 | === [=#bwa BWA] === |
| 18 | |
| 19 | The [[http://bio-bwa.sourceforge.net/ | Burrows-Wheeler Alignment (BWA) tool]] is a software package containing several related algorithms using the Burrows-Wheeler Transform. It works well even with indels, but not with spliced (RNA) reads. |
| 20 | |
| 21 | ''Sample commands for short (up to 100 bp) reads:'' |
86 | | [[BR]] |
| 44 | |
| 45 | One may choose between bowtie version 1 (faster but ignores indels) and bowtie version 2 (slower but performs gapped alignment (i.e., indels)). For a feature comparision, see [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml How is Bowtie 2 different from Bowtie 1?] |
| 46 | |
| 47 | '''[http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie version 2]''' |
| 48 | |
| 49 | === [=#bowtie2 Bowtie2] === |
| 50 | |
| 51 | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Bowtie2] was designed as an improvement to bowtie 1, specifically, it supports gapped alignment. See the first [http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml bowtie2 FAQ] for how they differ. Bowtie 2 uses a different set of genome index files (*.bt2) than bowtie 1 (*.ebwt). Bowtie 2 works with indels. |
| 52 | |
| 53 | Sample command: |
| 54 | {{{ |
| 55 | bsub bowtie2 --phred64 -L 22 -N 1 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U Sample_A.fq -S Sample_A.mm9.L22.N1.sam |
| 56 | }}} |
| 57 | |
| 58 | The parameters included in the sample command: |
| 59 | * '''-L <int>''' length of seed substrings; must be >3 and <32 (default=22) |
| 60 | * '''-N <int>''' max # mismatches in seed alignment; can be 0 or 1 (default=0) |
| 61 | * '''-S''' name of SAM output file |
| 62 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 63 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 64 | * '''--phred64''' (for input quality scores from Illumina versions 1.3-1.7) |
| 65 | * '''--phred33''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| 66 | |
| 67 | bowtie2 can also perform local alignments where the unaligned end(s) of a read are clipped (so, for example, remaining adapter won't prevent alignment) by adding the argument '''--local'''. |
| 68 | |
| 69 | The bowtie2 command can be modified to output mapped reads as BAM, such as |
| 70 | |
| 71 | {{{ |
| 72 | bsub "bowtie2 -x /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 -U s_7.txt | samtools view -bS - > s7_mm9.bam" |
| 73 | }}} |
| 74 | |
| 75 | === [=#bowtie Bowtie] === |
| 76 | |
| 77 | [http://bowtie-bio.sourceforge.net/index.shtml Bowtie] may still have some advantages over bowtie2 for specific use cases. |
| 78 | |
| 79 | Sample command: |
| 80 | {{{ |
| 81 | bsub bowtie -k 1 -n 2 -l 50 --best --sam --solexa1.3-quals /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 Sample_A.fq Sample_A.mm9.k1.n2.l50.best.sam |
| 82 | }}} |
| 83 | |
| 84 | Parameters included in the sample command: |
| 85 | * '''-l/--seedlen <int>''' seed length for -n (default: 28) -- Set to longest possible length of high-quality bases (but no longer than 40-50, or mapping may become too stringent). Use the FastQC output to determine length of high-quality positions. |
| 86 | * '''-n/--seedmms <int>''' max mismatches in seed (can be 0-3, default: -n 2) |
| 87 | * '''-k <int>''' report up to <int> good alignments per read (default: 1) -- If you want only uniquely mapped reads, however, also use '-m 1' to ignore multi-mapped reads; use --all to report all alignments (much slower, ie. turn-off -k option) |
| 88 | * '''--best''' (in the case of multi-mapped reads, keep only the best hit(s)) |
| 89 | * '''--sam''' to get SAM output format (which is the best format for downstream analysis) |
| 90 | |
| 91 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 92 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 93 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 94 | * '''--phred33-quals''' (default "Sanger format"; for input quality scores from Illumina versions 1.8 and later) |
| 95 | |
| 96 | To see other parameters log into tak and type '''bowtie''' |
| 97 | |
177 | | |
178 | | '''tophat version 1 (old)''' |
| 193 | === [=#tophat2 TopHat version 2] === |
| 194 | |
| 195 | '''[http://ccb.jhu.edu/software/tophat/index.shtml TopHat version 2] is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
| 196 | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
| 197 | |
| 198 | Sample command: |
| 199 | {{{ |
| 200 | # Single-end reads |
| 201 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
| 202 | # Paired-end reads |
| 203 | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
| 204 | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
| 205 | }}} |
| 206 | |
| 207 | The parameters included in the sample command are: |
| 208 | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
| 209 | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
| 210 | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
| 211 | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
| 212 | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
| 213 | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
| 214 | |
| 215 | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
| 216 | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
| 217 | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
| 218 | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
| 219 | |
| 220 | Choices for controlling alignment (eg. mismatches) |
| 221 | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
| 222 | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
| 223 | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
| 224 | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
| 225 | |
| 226 | === [=#tophat TopHat version 1] === |
206 | | '''[http://ccb.jhu.edu/software/tophat/index.shtml tophat version 2]''' |
207 | | |
208 | | '''TopHat version 2 is no longer recommended.''' The authors of TopHat currently recommend [http://ccb.jhu.edu/software/hisat2/index.shtml HISAT2]. |
209 | | TopHat version 2 uses bowtie2, rather than bowtie, for its mapping. As a result, TopHat 2 uses a different set of genome index files (*.bt2) than TopHat 1 (*.ebwt). |
210 | | |
211 | | Sample command: |
212 | | {{{ |
213 | | # Single-end reads |
214 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.txt |
215 | | # Paired-end reads |
216 | | # For PE reads, specifiy expected (mean) inner distance using -r option (default is 50bp). The inner distance, or insert size, does not include length of the reads/mates. For example, PE run with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. |
217 | | bsub tophat -o s_7_tophat_out --phred64-quals --library-type fr-firststrand --segment-length 20 -I 200000 -G /nfs/genomes/mouse_gp_jul_07_no_random/gtf/Mus_musculus.NCBIM37.67_noNT.gtf --no-novel-juncs /nfs/genomes/mouse_gp_jul_07_no_random/bowtie/mm9 s_7.1.txt s_7.2.txt |
218 | | }}} |
219 | | |
220 | | The parameters included in the sample command are: |
221 | | * '''-o/--output-dir <word>''' All output files will be created in this directory (default = tophat_out) |
222 | | * '''--segment-length <int>''' Shortest length of a spliced read that can map to one side of the junction. For reads shorter than ~45 nt, set this to half the read length (so set '--segment-length 20' for 40-nt reads). For longer reads, the default length (25) can be used. |
223 | | * '''-I <int>''' Maximum intron length. If your genome has introns that are all shorter (or many that are longer) than the default value (500000), set this to a more appropriate value. |
224 | | * '''-G <GTF file>''' Supply bowtie with a GTF file of transcript models. This can help bowtie identify functions that may otherwise be missed. |
225 | | * '''--no-novel-juncs ''' Only look for spliced reads across junctions in the supplied GTF file. Not used if looking for novel isoforms. |
226 | | * '''--library type ''' Take advantage of strandedness of library for mapping (especially across splice junctions); can be fr-unstranded, fr-firststrand, or fr-secondstrand |
227 | | |
228 | | Choices for fastq encoding (which is listed as "Encoding" in the top "Basic Statistics" table of the FastQC output file). See the [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format page] for more details. |
229 | | * '''--solexa-quals''' (for input quality scores from Illumina versions 1.2 and earlier) |
230 | | * '''--solexa1.3-quals''' or '''--phred64-quals''' (for input quality scores from Illumina versions 1.3-1.7) |
231 | | * For "Sanger / Illumina 1.8" or "Sanger / Illumina 1.9", bowtie can use the default "phred33" encoding |
232 | | |
233 | | Choices for controlling alignment (eg. mismatches) |
234 | | * '''--read-mismatches/-N''' Final read alignments having more than these many mismatches are discarded (default is 2). |
235 | | * '''--read-gap-length''' Final read alignments having more than these many total length of gaps are discarded (default is 2). |
236 | | * '''--read-edit-dist''' Final read alignments having more than these many edit distance (ie. mismatches+indels) are discarded (default is 2). |
237 | | * '''--segment-mismatches''' Read segments are mapped independently, allowing up to this many mismatches in each segment alignment (default is 2). |
238 | | |