Joint Variant Calling¶
Avocado’s Jointer
command supports joint variant calling from gVCF-styled
data.
The Jointer
command can also be used to export Apache
Parquet Genotype data to VCF, and to joint
genotype a collection of samples who all scored the same set of variants.
Our joint variant calling approach is is described in Chapter 7 of this
thesis.
To run the Jointer
command, you must provide two parameters:
- The path to all input files to joint genotyping (to load multiple files, use Hadoop’s glob syntax.
- The path to save the output to, as a VCF file.
To save the VCF file as a single file (instead of sharded output), pass the
-single
flag.
If run on a single sample, Jointer
will calculate variant statistics (VCF
INFO column attributes) and qualities only. If run on multiple samples, the
Jointer
command will update the called genotypes using a binomial prior
that is informed by the observed allele frequency of the variant across all
samples with confident calls. If the input data for the multiple samples is in
gVCF format, pass the -from_gvcf
flag.