wiki:SOPs/vcf

Version 3 (modified by gbell, 11 years ago) ( diff )

--

Interpreting VCF files

The VCF (Variant Call Format) specification describes most of what you need to know.

Tags in the FILTER, INFO, and FORMAT fields are described in the VCF header.

As a tabular reference, common tags are as follows:

FILTER field (typically generated by vcf-annotate):

Tag Description Default threshold
BaseQualBias Min P-value for baseQ bias 0
EndDistBias Min P-value for end distance bias 0.0001
GapWin Window size for filtering adjacent gaps 3
MapQualBias Min P-value for mapQ bias 0
MaxDP Maximum read depth 10000000
MinAB Minimum number of alternate bases 2
MinDP Minimum read depth 2
MinMQ Minimum RMS mapping quality for SNPs 10
Qual Minimum value of the QUAL field 10
RefN Reference base is N []
SnpGap SNP within INT bp around a gap to be filtered 10
StrandBias Min P-value for strand bias 0.0001
VDB Minimum Variant Distance Bias 0

INFO field (typically generated by bcftools and expanded with vcf-annotate):

Tag Description More details
AC Allele count in genotypes
AC1 Max-likelihood estimate of the first ALT allele count (no HWE assumption)
AF1 Max-likelihood estimate of the first ALT allele frequency (assuming HWE)
AN Total number of alleles in called genotypes
CGT The most probable constrained genotype configuration in the trio
CLR Log ratio of genotype likelihoods with and without the constraint
DP Raw read depth
DP4 # high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases
FQ Phred probability of all samples being the same
G3 ML estimate of genotype frequencies
HWE Hardy-Weinberg equilibrium test (PMID:15789306)
ICF Inbreeding coefficient F
INDEL Indicates that the variant is an INDEL.
IS Maximum number of reads supporting an indel and fraction of indel reads
MDV Maximum number of high-quality nonRef reads in samples
MQ Root-mean-square mapping quality of covering reads
PC2 Phred probability of the nonRef allele frequency in group1 samples being larger (, smaller) than in group2.
PCHI2 Posterior weighted chi2 P-value for testing the association between group1 and group2 samples.
PR # permutations yielding a smaller PCHI2.
PV4 P-values for strand bias, baseQ bias, mapQ bias and tail distance bias
QBD Quality by Depth: QUAL/#reads
QCHI2 Phred scaled PCHI2.
RPB Read Position Bias
SF Source File (index to sourceFiles, f when filtered)
TYPE Variant type
UGT The most probable unconstrained genotype configuration in the trio
VDB Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data.
Note: See TracWiki for help on using the wiki.