= Manipulating VCF files = Create a VCF ([http://en.wikipedia.org/wiki/Variant_Call_Format variant call format]) file [with about any program that identifies variants], such as * samtools' mpileup+bcftools: {{{ # One file of mapped reads samtools mpileup -uf indexed_genome My_mapped_reads.bam | bcftools view -bvcg - >| My_mapped_reads.raw.bcf # Multiple files of mapped reads samtools mpileup -uf indexed_genome *.bam | bcftools view -bvcg - >| Multiple_samples.raw.bcf }}} Convert from BCF (binary version of VCF) to VCF: {{{ bcftools view My_mapped_reads.raw.bcf > My_mapped_reads.raw.vcf }}} Convert from VCF to BCF: {{{ bcftools view -bS -D chr_list.txt My_mapped_reads.raw.vcf > My_mapped_reads.raw.bcf }}} Merge multiple VCF files -- works on raw VCF files but apparently not with those processed by vcf-annotate {{{ # For each VCF file: bgzip Variants_sample_A.raw.vcf tabix -p vcf Variants_sample_A.raw.vcf.gz }}} Merge multiple bgzipped, tabixed files: {{{ vcf-merge *.raw.vcf.gz >| Variants_all_samples.raw.vcf }}} Annotate a VCF file using [http://vcftools.sourceforge.net/perl_module.html#vcf-annotate vcf-annotate] {{{ # Apply all filters with default values: vcf-annotate -f + Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf # Apply all filters with default values except for specified ones (MinAB(a) = 10; MinDP(b) = 20): vcf-annotate -f +/a=10/d=20 Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf # Add custom filter(s) as described in file "My_filters.txt": vcf-annotate -f My_filters.txt Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf }}} For the last command, My_filters.txt contains a filter (such as an example one that calculates PLdiff values for 1/1 - 0/0 and 1/1 - 0/1, making sure that each difference is greater than 20). In this case, the filter is designed for selection, rather than filtering out. [[BR]] For more examples, go to [http://vcftools.sourceforge.net/perl_module.html#vcf-annotate vcf-annotate] and click on "Read even more". {{{ { tag => 'FORMAT/PL', name => 'GoodPLdiff', desc => 'Homozygote alternate and PLdiff greater than 20', apply_to => 'all', test => sub { for my $pl (@$MATCH) { my @pls = split(/,/,$pl); if ( ($pls[2]-$pls[0])>20 && ($pls[1]-$pls[0])>20 ) { return $FAIL; } } return $PASS; }, }, }}} Annotate variants with SNP IDs (if they overlap known SNPs) {{{ # Index VCF file of SNPs (or other annotations) bgzip dbSNP.version.vcf tabix -p vcf dbSNP.version.vcf.gz # Use annotation file to add info to ID column (if there's overlap between variant and annotation) vcf-annotate -a dbSNP.version.vcf.gz -c CHROM,POS,ID,-,-,-,-,- -d key=INFO,ID=RS_ID,Number=1,Type=Integer,Description="SNPs from a subset of dbSNP" < My_variants.hg38.vcf > My_variants.hg38.dbSNP.vcf }}} See [http://www.htslib.org/doc/tabix.html tabix] for how to index BED or other file types. See [http://vcftools.sourceforge.net/perl_module.html#vcf-annotate vcf-annotate] for details of last command. Sort by chromosome and then coordinates {{{ vcf-sort Variants.vcf > Variants.sorted.vcf }}} Validate VCF file (for use with GATK, for example) {{{ java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T ValidateVariants -R /path/to/indexed/genome --variant:VCF SNPs.vcf }}} Convert from VCF to bed (using the [http://bedops.readthedocs.org BEDOPS] toolkit): {{{ vcf2bed < my_variants.vcf > my_variants.bed }}}