45 | | == Call raw variants == |
| 45 | == Call raw variants with mpileup+bcftools == |
| 46 | |
| 47 | Call variants (one sample vs. reference) with samtools' mpileup+bcftools (see the [http://samtools.sourceforge.net/mpileup.shtml| samtools' variant calling page] for more details) |
| 48 | {{{ |
| 49 | samtools mpileup -d100000 -uf /nfs/genomes/sgd_2010/bwa/sacCer3.fa A_reads.bt2.sorted_unique.bam | bcftools view -bvcg - >| A_reads.bt2.sorted_unique.raw.bcf |
| 50 | }}} |
| 51 | Call variants (multiple sample vs. reference) using a set of BAM files |
| 52 | {{{ |
| 53 | samtools mpileup -d100000 -uf /nfs/genomes/sgd_2010/bwa/sacCer3.fa *_reads.bt2.sorted_unique.bam | bcftools view -bvcg - >| ALL_reads.bt2.sorted_unique.raw.bcf |
| 54 | }}} |
| 55 | |
| 56 | == Add filters and annotations to raw variants == |
| 57 | |
| 58 | This step uses vcf-annotate from the [http://vcftools.sourceforge.net/docs.html VCFtools suite] |
| 59 | |
| 60 | Annotate variants by adding tags ("filters" but all variants are kept) to each variant, using all default filters |
| 61 | {{{ |
| 62 | bcftools view -L -vcg A_reads.bt2.sorted_unique.raw.bcf | vcf-annotate -f + > A_reads.bt2.sorted_unique.withTags.bcf |
| 63 | }}} |
| 64 | |
| 65 | Prepare file of known SNPs for use with vcf-annotate. |
| 66 | Start with tab-delimited file (ex: SNP137.bed) that looks like |
| 67 | |
| 68 | chr1 1360 1361 rs000000001 |
| 69 | {{{ |
| 70 | bgzip SNP137.bed |
| 71 | tabix -p bed SNP137.bed.gz |
| 72 | }}} |
| 73 | |
| 74 | Annotate variants by adding tags, more analysis, and any SNPdb overlaps |
| 75 | {{{ |
| 76 | bcftools view -L -vcg A_reads.bt2.sorted_unique.raw.bcf | vcf-annotate -f +/d=10 --fill-HWE --fill-type -n -a SNP137.bed.gz -c CHROM,FROM,TO,INFO/SNP_ID -d key=INFO,ID=SNP_ID,Number=1,Type=Integer,Description='SNP137 sites' > A_reads.bt2.sorted_unique.filtered.vcf |
| 77 | }}} |
| 78 | |
| 79 | Any tags will appear in the FILTER field, and SNPdb overlaps will appear in the INFO field of the output VCF file. |
| 80 | |
| 81 | Overlap with annotations can also be identified with intersectBed (where annotation will appear in new fields of the output VCF file): |
| 82 | {{{ |
| 83 | intersectBed -wao -split -a A_reads.bt2.sorted_unique.raw.vcf -b SNP137.bed > A_reads.bt2.sorted_unique.annotated.vcf |
| 84 | }}} |
| 85 | |
| 86 | == Annotate effect(s) of variants on genes == |
| 87 | |
| 88 | Use snpEff (assuming snpEff has gene+protein annotations for your genome): |
| 89 | {{{ |
| 90 | snpEff -c /usr/local/share/snpEff/snpEff.config -s A_snpEff.html SacCer_Apr2011.18 A_reads.bt2.sorted_unique.filtered.vcf > A_reads.bt2.sorted_unique.filtered.snpEff.vcf |
| 91 | }}} |
| 92 | |
| 93 | Get information only for variants that overlap protein-coding exons: |
| 94 | {{{ |
| 95 | snpEff -c /usr/local/share/snpEff/snpEff.config -no-downstream -no-intergenic -no-intron -no-upstream -no-utr -s A_snpEff.html SacCer_Apr2011.18 A_reads.bt2.sorted_unique.filtered.vcf > A_reads.bt2.sorted_unique.filtered.snpEff.vcf |
| 96 | }}} |