= Interpreting VCF files =

The [http://vcftools.sourceforge.net/specs.html VCF (Variant Call Format) specification] describes most of what you need to know.

Tags in the FILTER, INFO, and FORMAT fields are described in the VCF header.

Probability (ranging from 0 to 1) for a Phred score P is defined as '''10^-P/10^'''.

As a tabular reference, common tags and scores are as follows:

'''QUAL''' field: QUAL = -10*log,,10,,(Probability(call in ALT is wrong))

'''FILTER''' field (typically generated by vcf-annotate):

||'''Tag''' || '''Description''' || '''Default threshold''' ||
||BaseQualBias || Min P-value for baseQ bias || 0 ||
||EndDistBias || Min P-value for end distance bias || 0.0001 ||
||GapWin || Window size for filtering adjacent gaps || 3 ||
||MapQualBias || Min P-value for mapQ bias || 0 ||
||MaxDP || Maximum read depth || 10000000 ||
||MinAB || Minimum number of alternate bases || 2 ||
||MinDP || Minimum read depth || 2 ||
||MinMQ || Minimum RMS mapping quality for SNPs || 10 ||
||Qual || Minimum value of the QUAL field || 10 ||
||RefN || Reference base is N || [] ||
||SnpGap || SNP within INT bp around a gap to be filtered || 10 ||
||StrandBias || Min P-value for strand bias || 0.0001 ||
||VDB || Minimum Variant Distance Bias || 0 ||

'''INFO''' field (typically generated by bcftools and expanded with vcf-annotate):

||'''Tag''' || '''Description''' || '''More details''' ||
||AC || Allele count in genotypes ||  ||
||AC1 || Max-likelihood estimate of the first ALT allele count (no HWE assumption) ||  ||
||AF1 || Max-likelihood estimate of the first ALT allele frequency (assuming HWE) ||  ||
||AN || Total number of alleles in called genotypes ||  ||
||CGT || The most probable constrained genotype configuration in the trio ||  ||
||CLR || Log ratio of genotype likelihoods with and without the constraint ||  ||
||DP || Raw read depth || For multiple-sample VCFs, the sum for all samples ||
||DP4 || Number of high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases ||  ||
||FQ || Phred probability of all samples being the same ||  ||
||G3 || ML estimate of genotype frequencies ||  ||
||HWE || Hardy-Weinberg equilibrium test (PMID:15789306) ||  ||
||ICF || Inbreeding coefficient F ||  ||
||INDEL || Indicates that the variant is an INDEL. ||  ||
||IS || Maximum number of reads supporting an indel and fraction of indel reads ||  ||
||MDV || Maximum number of high-quality nonRef reads in samples ||  ||
||MQ || Root-mean-square mapping quality of covering reads ||  ||
||PC2 || Phred probability of the nonRef allele frequency in group1 samples being larger (, smaller) than in group2. ||  ||
||PCHI2 || Posterior weighted chi2 P-value for testing the association between group1 and group2 samples. ||  ||
||PR || Number of permutations yielding a smaller PCHI2. ||  ||
||PV4 || P-values for strand bias, baseQ bias, mapQ bias and tail distance bias ||  ||
||QBD || Quality by Depth: QUAL/#reads ||  ||
||QCHI2 || Phred scaled PCHI2. ||  ||
||RPB || Read Position Bias ||  ||
||SF || Source File (index to sourceFiles, f when filtered) ||  ||
||TYPE || Variant type ||  ||
||UGT || The most probable unconstrained genotype configuration in the trio ||  ||
||VDB || Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. ||  ||