Skip to content

VCF annotation

wrap_vcf_vep_annotate_unified

Short description

Provides a single entry point to run VEP on a VCF file with any combination of protein, gene, and variant-class annotations in one call.

Signature

def wrap_vcf_vep_annotate_unified(
    vcf_file: Union[str, Path],
    cache_dir: Union[str, Path],
    fasta: Union[str, Path],
    output_file: Optional[Union[str, Path]] = None,
    synonyms_file: Optional[Union[str, Path]] = None,
    assembly: Optional[str] = None,
    version: Optional[str] = None,
    no_stats: bool = True,
    annotate_protein: bool = False,
    annotate_gene: bool = False,
    annotate_variant_class: bool = False,
    distance: Optional[int] = None
) -> Tuple[bool, str]:

Parameters

Parameter Type Required Description
vcf_file str | Path Yes Path to the input VCF file.
cache_dir str | Path Yes Directory containing the local VEP cache.
fasta str | Path Yes Reference FASTA used by VEP.
output_file str | Path No Destination for the annotated VCF. If omitted, a directory named vep_annotation_<timestamp> is created beside the VCF.
synonyms_file str | Path No Chromosome–synonyms file. When None, it is inferred from the cache directory.
assembly str No Genome assembly. If omitted, parsed from cache_dir.
version str No VEP cache version. If omitted, parsed from cache_dir.
no_stats bool No Run VEP with --no_stats (True, default).
annotate_protein bool No Add protein-level data (--protein --uniprot --domains --symbol). Default False.
annotate_gene bool No Add gene symbols (and optionally nearest gene) via --symbol. Default False.
annotate_variant_class bool No Add variant class information (--variant_class). Default False.
distance int No Distance (bp) for nearest-gene lookup when annotate_gene=True. Ignored otherwise.

Return value

Tuple[bool, str] where

  • boolTrue if VEP completed without error, else False.
  • str – Path to the annotated VCF (even on certain non-fatal errors); never None.

Exceptions

  • ValueError – if none of annotate_protein, annotate_gene, or annotate_variant_class is True.
  • FileNotFoundError – if vcf_file, cache_dir, or fasta does not exist.

Minimal usage example

```python from pathlib import Path from vep_annotate import wrap_vcf_vep_annotate_unified

success, info = wrap_vcf_vep_annotate_unified( vcf_file="cohort.vcf.gz", cache_dir=Path("/data/vep_cache"), fasta=Path("/data/genome/GRCh38.fa"), annotate_protein=True, annotate_variant_class=True ) print(success, info) ````