= Manipulating VCF files = 

Create a VCF ([http://en.wikipedia.org/wiki/Variant_Call_Format variant call format]) file [with about any program that identifies variants], such as
* samtools' mpileup+bcftools: 
{{{
# One file of mapped reads
samtools mpileup -uf indexed_genome My_mapped_reads.bam | bcftools view -bvcg - >| My_mapped_reads.raw.bcf
# Multiple files of mapped reads
samtools mpileup -uf indexed_genome *.bam | bcftools view -bvcg - >| Multiple_samples.raw.bcf
}}}

Convert from BCF (binary version of VCF) to VCF:
{{{
bcftools view My_mapped_reads.raw.bcf > My_mapped_reads.raw.vcf
}}}

Convert from VCF to BCF:
{{{
bcftools view -bS -D chr_list.txt My_mapped_reads.raw.vcf > My_mapped_reads.raw.bcf
}}}

Merge multiple VCF files -- works on raw VCF files but apparently not with those processed by vcf-annotate

{{{
# For each VCF file:
bgzip Variants_sample_A.raw.vcf
tabix -p vcf Variants_sample_A.raw.vcf.gz
}}}
Merge multiple bgzipped, tabixed files:
{{{
vcf-merge *.raw.vcf.gz >| Variants_all_samples.raw.vcf
}}}

Annotate a VCF file using [https://vcftools.github.io/perl_module.html#vcf-annotate vcf-annotate]
{{{
# Apply all filters with default values:
vcf-annotate -f + Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf
# Apply all filters with default values except for specified ones (MinAB(a) = 10; MinDP(b) = 20):
vcf-annotate -f +/a=10/d=20 Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf
# Add custom filter(s) as described in file "My_filters.txt":
vcf-annotate -f My_filters.txt Variants_all_samples.raw.vcf > Variants_all_samples.withTags.vcf
}}}

For the last command, My_filters.txt contains a filter (such as an example one that calculates PLdiff values for 1/1 - 0/0 and 1/1 - 0/1, making sure that each difference is greater than 20).  In this case, the filter is designed for selection, rather than filtering out. [[BR]]
For more examples, go to [https://vcftools.github.io/perl_module.html#vcf-annotate vcf-annotate] and click on "Read even more".
{{{
{
    tag      => 'FORMAT/PL',
    name     => 'GoodPLdiff',
    desc     => 'Homozygote alternate and PLdiff greater than 20',
    apply_to => 'all',
    test     => sub {
            for my $pl (@$MATCH)
            {
                my @pls = split(/,/,$pl);
                if ( ($pls[2]-$pls[0])>20 && ($pls[1]-$pls[0])>20 ) { return $FAIL; }
            }
        return $PASS;
    },
},
}}}

Similar sorts of tasks can be performed with [http://snpeff.sourceforge.net/SnpSift.html#filter SnpSift's filter]  command, like
{{{
# Get sites where first sample is reference and the second sample is homozygous for the variant
cat Variants.vcf | java -jar SnpSift.jar filter "isRef(GEN[0]) & isHom(GEN[1]) & isVariant(GEN[1])" > Selected_variants.vcf
}}}

Annotate variants with SNP IDs (if they overlap known SNPs)
{{{
# Index VCF file of SNPs (or other annotations)
bgzip dbSNP.version.vcf
tabix -p vcf dbSNP.version.vcf.gz
# Use annotation file to add info to ID column (if there's overlap between variant and annotation)
vcf-annotate -a dbSNP.version.vcf.gz -c CHROM,POS,ID,-,-,-,-,- -d key=INFO,ID=RS_ID,Number=1,Type=Integer,Description="SNPs from a subset of dbSNP" < My_variants.hg38.vcf > My_variants.hg38.dbSNP.vcf
}}}
See [http://www.htslib.org/doc/tabix.html tabix] for how to index BED or other file types.
See [http://vcftools.sourceforge.net/perl_module.html#vcf-annotate vcf-annotate] for details of last command.

Sort by chromosome and then coordinates
{{{
vcf-sort Variants.vcf > Variants.sorted.vcf
}}}

Validate VCF file (for use with GATK, for example)
{{{
java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T ValidateVariants -R /path/to/indexed/genome --variant:VCF SNPs.vcf
}}}

Convert from VCF to bed (using the [http://bedops.readthedocs.org BEDOPS] toolkit):
{{{
vcf2bed < my_variants.vcf > my_variants.bed
}}}

Filter vcf with [http://snpeff.sourceforge.net/SnpSift.html#filter snpSift Filter ] function 
{{{
# Only keep the variants with DV (number of high-quality non-reference bases) at least 1 in both the 1st and 2nd samples:
java -jar /nfs/BaRC/apps/snpEff/Version_4/SnpSift.jar  filter " (GEN[0].DV>=1) & (GEN[1].DV>=1)" foo.vcf > s1.DVabove1.plus.s2.DVabove1.vcf
}}}

Extract fields from a VCF file to a text, tab-separated format with the [http://snpeff.sourceforge.net/SnpSift.html#Extract snpSift extractFields] function
{{{
# Extract chromosome, position, reference base(s), alternate bases(s), all the samples' ADF and ADR (number of high-quality ref-fwd, alt-fwd, ref-reverse, and alt-reverse bases):
java -jar SnpSift.jar extractFields foo.vcf CHROM POS REF ALT "GEN[*].ADF" "GEN[*].ADR" > Foo.ADF.ADR.vcf

# Similar command, but extract and split all ADF and ADR fields
java -jar SnpSift.jar extractFields foo.vcf CHROM POS REF ALT "GEN[*].ADF[*]" "GEN[*].ADR[*]" > Foo.ADF.ADR.split.vcf
}}}

Extract fields from a VCF file to a text, tab-separated file with the [https://samtools.github.io/bcftools/howtos/query.html bcftools query] function

{{{
# Extract any fields that are in the INFO or other fields
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%AC\t%AN\t%AF\n' gnomad.genomes.r2.1.1.sites.21.vcf.bgz > gnomad.genomes.r2.1.1.sites.21.vcf.AC+AN+AF.txt
# Start by keeping only SNPs (so remove indels)
bcftools view --types snps gnomad.genomes.r2.1.1.sites.21.vcf.bgz | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%AC\t%AN\n' >| gnomad.genomes.r2.1.1.sites.21.vcf.SNPs_only.AC+AN+AF.txt
# Make a bedgraph file using an output value (such as AF for the score)
awk -F"\t" '{print "chr"$1 "\t" $2-1 "\t" $2 "\t" $7}' gnomad.genomes.r2.1.1.sites.21.vcf.AC+AN+AF.txt >| gnomad.genomes.r2.1.1.sites.21.AF.bedgraph
# Make output a bed file showing the REF and ALT alleles (like "G>C" for QC)
awk -F"\t" '{print "chr"$1 "\t" $2-1 "\t" $2 "\t" $3">"$4}' gnomad.genomes.r2.1.1.sites.21.vcf.AC+AN+AF.txt >| gnomad.genomes.r2.1.1.sites.21.bed
}}}