| 45 | | == Call raw variants == |
| | 45 | == Call raw variants with mpileup+bcftools == |
| | 46 | |
| | 47 | Call variants (one sample vs. reference) with samtools' mpileup+bcftools (see the [http://samtools.sourceforge.net/mpileup.shtml| samtools' variant calling page] for more details) |
| | 48 | {{{ |
| | 49 | samtools mpileup -d100000 -uf /nfs/genomes/sgd_2010/bwa/sacCer3.fa A_reads.bt2.sorted_unique.bam | bcftools view -bvcg - >| A_reads.bt2.sorted_unique.raw.bcf |
| | 50 | }}} |
| | 51 | Call variants (multiple sample vs. reference) using a set of BAM files |
| | 52 | {{{ |
| | 53 | samtools mpileup -d100000 -uf /nfs/genomes/sgd_2010/bwa/sacCer3.fa *_reads.bt2.sorted_unique.bam | bcftools view -bvcg - >| ALL_reads.bt2.sorted_unique.raw.bcf |
| | 54 | }}} |
| | 55 | |
| | 56 | == Add filters and annotations to raw variants == |
| | 57 | |
| | 58 | This step uses vcf-annotate from the [http://vcftools.sourceforge.net/docs.html VCFtools suite] |
| | 59 | |
| | 60 | Annotate variants by adding tags ("filters" but all variants are kept) to each variant, using all default filters |
| | 61 | {{{ |
| | 62 | bcftools view -L -vcg A_reads.bt2.sorted_unique.raw.bcf | vcf-annotate -f + > A_reads.bt2.sorted_unique.withTags.bcf |
| | 63 | }}} |
| | 64 | |
| | 65 | Prepare file of known SNPs for use with vcf-annotate. |
| | 66 | Start with tab-delimited file (ex: SNP137.bed) that looks like |
| | 67 | |
| | 68 | chr1 1360 1361 rs000000001 |
| | 69 | {{{ |
| | 70 | bgzip SNP137.bed |
| | 71 | tabix -p bed SNP137.bed.gz |
| | 72 | }}} |
| | 73 | |
| | 74 | Annotate variants by adding tags, more analysis, and any SNPdb overlaps |
| | 75 | {{{ |
| | 76 | bcftools view -L -vcg A_reads.bt2.sorted_unique.raw.bcf | vcf-annotate -f +/d=10 --fill-HWE --fill-type -n -a SNP137.bed.gz -c CHROM,FROM,TO,INFO/SNP_ID -d key=INFO,ID=SNP_ID,Number=1,Type=Integer,Description='SNP137 sites' > A_reads.bt2.sorted_unique.filtered.vcf |
| | 77 | }}} |
| | 78 | |
| | 79 | Any tags will appear in the FILTER field, and SNPdb overlaps will appear in the INFO field of the output VCF file. |
| | 80 | |
| | 81 | Overlap with annotations can also be identified with intersectBed (where annotation will appear in new fields of the output VCF file): |
| | 82 | {{{ |
| | 83 | intersectBed -wao -split -a A_reads.bt2.sorted_unique.raw.vcf -b SNP137.bed > A_reads.bt2.sorted_unique.annotated.vcf |
| | 84 | }}} |
| | 85 | |
| | 86 | == Annotate effect(s) of variants on genes == |
| | 87 | |
| | 88 | Use snpEff (assuming snpEff has gene+protein annotations for your genome): |
| | 89 | {{{ |
| | 90 | snpEff -c /usr/local/share/snpEff/snpEff.config -s A_snpEff.html SacCer_Apr2011.18 A_reads.bt2.sorted_unique.filtered.vcf > A_reads.bt2.sorted_unique.filtered.snpEff.vcf |
| | 91 | }}} |
| | 92 | |
| | 93 | Get information only for variants that overlap protein-coding exons: |
| | 94 | {{{ |
| | 95 | snpEff -c /usr/local/share/snpEff/snpEff.config -no-downstream -no-intergenic -no-intron -no-upstream -no-utr -s A_snpEff.html SacCer_Apr2011.18 A_reads.bt2.sorted_unique.filtered.vcf > A_reads.bt2.sorted_unique.filtered.snpEff.vcf |
| | 96 | }}} |