= Interpreting VCF files = The [http://vcftools.sourceforge.net/specs.html VCF (Variant Call Format) specification] describes most of what you need to know. Tags in the FILTER, INFO, and FORMAT fields are described in the VCF header. Probability (ranging from 0 to 1) for a Phred score P is defined as '''10^-P/10^'''. As a tabular reference, common tags and scores are as follows: '''QUAL''' field: QUAL = -10*log,,10,,(Probability(call in ALT is wrong)) '''FILTER''' field (typically generated by vcf-annotate): ||'''Tag''' || '''Description''' || '''Default threshold''' || ||BaseQualBias || Min P-value for baseQ bias || 0 || ||EndDistBias || Min P-value for end distance bias || 0.0001 || ||GapWin || Window size for filtering adjacent gaps || 3 || ||MapQualBias || Min P-value for mapQ bias || 0 || ||MaxDP || Maximum read depth || 10000000 || ||MinAB || Minimum number of alternate bases || 2 || ||MinDP || Minimum read depth || 2 || ||MinMQ || Minimum RMS mapping quality for SNPs || 10 || ||Qual || Minimum value of the QUAL field || 10 || ||RefN || Reference base is N || [] || ||SnpGap || SNP within INT bp around a gap to be filtered || 10 || ||StrandBias || Min P-value for strand bias || 0.0001 || ||VDB || Minimum Variant Distance Bias || 0 || '''INFO''' field (typically generated by bcftools and expanded with vcf-annotate): ||'''Tag''' || '''Description''' || '''More details''' || ||AC || Allele count in genotypes || || ||AC1 || Max-likelihood estimate of the first ALT allele count (no HWE assumption) || || ||AF1 || Max-likelihood estimate of the first ALT allele frequency (assuming HWE) || || ||AN || Total number of alleles in called genotypes || || ||CGT || The most probable constrained genotype configuration in the trio || || ||CLR || Log ratio of genotype likelihoods with and without the constraint || || ||DP || Raw read depth || For multiple-sample VCFs, the sum for all samples || ||DP4 || Number of high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases || || ||FQ || Phred probability of all samples being the same || || ||G3 || ML estimate of genotype frequencies || || ||HWE || Hardy-Weinberg equilibrium test (PMID:15789306) || || ||ICF || Inbreeding coefficient F || || ||INDEL || Indicates that the variant is an INDEL. || || ||IS || Maximum number of reads supporting an indel and fraction of indel reads || || ||MDV || Maximum number of high-quality nonRef reads in samples || || ||MQ || Root-mean-square mapping quality of covering reads || || ||PC2 || Phred probability of the nonRef allele frequency in group1 samples being larger (, smaller) than in group2. || || ||PCHI2 || Posterior weighted chi2 P-value for testing the association between group1 and group2 samples. || || ||PR || Number of permutations yielding a smaller PCHI2. || || ||PV4 || P-values for strand bias, baseQ bias, mapQ bias and tail distance bias || || ||QBD || Quality by Depth: QUAL/#reads || || ||QCHI2 || Phred scaled PCHI2. || || ||RPB || Read Position Bias || || ||SF || Source File (index to sourceFiles, f when filtered) || || ||TYPE || Variant type || || ||UGT || The most probable unconstrained genotype configuration in the trio || || ||VDB || Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. || ||