Variant Call Format
Encyclopedia
The Variant Call Format (VCF) is a specification for storing gene sequence variations. The format has been developed with the advent of large-scale genotyping and gene sequencing projects, such as the 1000 Genomes Project. Existing formats for genetic data, such as GFF, stored all of the genetic data, much of which is redundant because it will be shared across the genomes. By using the variant call format only the variations need to be stored along with a reference genome.
The standard is currently in version 4.0, although the 1000 genomes project has developed their own specification for structural variations such as duplications, which are not easily accommodated into the existing schema. A set of tools are also available for editing and manipulating the files.
The standard is currently in version 4.0, although the 1000 genomes project has developed their own specification for structural variations such as duplications, which are not easily accommodated into the existing schema. A set of tools are also available for editing and manipulating the files.
Example
##fileformat=VCFv4.0
- fileDate=20110705
- reference=1000GenomesPilot-NCBI37
- phasing=partial
- INFO=
- INFO=
- INFO=
- INFO=
- INFO=
- INFO=
- FILTER=
- FILTER=
- FORMAT=
- FORMAT=
- FORMAT=
- FORMAT=
- CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3
2 4370 rs6057 G A 29 . NS=2;DP=13;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:52,51 1|0:48:8:51,51 1/1:43:5:.,.
2 7330 . T A 3 q10 NS=5;DP=12;AF=0.017 GT:GQ:DP:HQ 0|0:46:3:58,50 0|1:3:5:65,3 0/0:41:3
2 110696 rs6055 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
2 130237 . T . 47 . NS=2;DP=16;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:56,51 0/0:61:2
2 134567 microsat1 GTCT G,GTACT 50 PASS NS=2;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
(...)