wiki:SOPs/vcf_manipulation

Version 5 (modified by gbell, 11 years ago) ( diff )

--

Manipulating VCF files

Create a VCF (variant call format) file [with about any program that identifies variants], such as

  • samtools' mpileup+bcftools:
    # One file of mapped reads
    samtools mpileup -uf indexed_genome My_mapped_reads.bam | bcftools view -bvcg - >| My_mapped_reads.raw.bcf
    # Multiple files of mapped reads
    samtools mpileup -uf indexed_genome *.bam | bcftools view -bvcg - >| Multiple_samples.raw.bcf
    

Convert from BCF (binary version of VCF) to VCF:

bcftools view My_mapped_reads.raw.bcf > My_mapped_reads.raw.vcf

Convert from VCF to BCF:

bcftools view -bS -D chr_list.txt My_mapped_reads.raw.vcf > My_mapped_reads.raw.bcf

Merge multiple VCF files -- works on raw VCF files but apparently not with those processed by vcf-annotate

# For each VCF file:
bgzip Variants_sample_A.raw.vcf
tabix -p vcf Variants_sample_A.raw.vcf.gz

Merge multiple bgzipped, tabixed files:

vcf-merge *.raw.vcf.gz >| Variants_all_samples.raw.vcf

Annotate a VCF file (applying all filters with default values):

cat Variants_all_samples.raw.vcf | vcf-annotate -f + > Variants_all_samples.withTags.vcf

Sort by chromosome and then coordinates

vcf-sort Variants.vcf > Variants.sorted.vcf

Validate VCF file (for use with GATK, for example)

java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T ValidateVariants -R /path/to/indexed/genome --variant:VCF SNPs.vcf
Note: See TracWiki for help on using the wiki.