VCF annotation
wrap_vcf_vep_annotate_unified¶
Short description¶
Provides a single entry point to run VEP on a VCF file with any combination of protein, gene, and variant-class annotations in one call.
Signature¶
def wrap_vcf_vep_annotate_unified(
vcf_file: Union[str, Path],
cache_dir: Union[str, Path],
fasta: Union[str, Path],
output_file: Optional[Union[str, Path]] = None,
synonyms_file: Optional[Union[str, Path]] = None,
assembly: Optional[str] = None,
version: Optional[str] = None,
no_stats: bool = True,
annotate_protein: bool = False,
annotate_gene: bool = False,
annotate_variant_class: bool = False,
distance: Optional[int] = None
) -> Tuple[bool, str]:
Parameters¶
Parameter | Type | Required | Description |
---|---|---|---|
vcf_file |
str | Path |
Yes | Path to the input VCF file. |
cache_dir |
str | Path |
Yes | Directory containing the local VEP cache. |
fasta |
str | Path |
Yes | Reference FASTA used by VEP. |
output_file |
str | Path |
No | Destination for the annotated VCF. If omitted, a directory named vep_annotation_<timestamp> is created beside the VCF. |
synonyms_file |
str | Path |
No | Chromosome–synonyms file. When None , it is inferred from the cache directory. |
assembly |
str |
No | Genome assembly. If omitted, parsed from cache_dir . |
version |
str |
No | VEP cache version. If omitted, parsed from cache_dir . |
no_stats |
bool |
No | Run VEP with --no_stats (True , default). |
annotate_protein |
bool |
No | Add protein-level data (--protein --uniprot --domains --symbol ). Default False . |
annotate_gene |
bool |
No | Add gene symbols (and optionally nearest gene) via --symbol . Default False . |
annotate_variant_class |
bool |
No | Add variant class information (--variant_class ). Default False . |
distance |
int |
No | Distance (bp) for nearest-gene lookup when annotate_gene=True . Ignored otherwise. |
Return value¶
Tuple[bool, str]
where
bool
–True
if VEP completed without error, elseFalse
.str
– Path to the annotated VCF (even on certain non-fatal errors); neverNone
.
Exceptions¶
ValueError
– if none ofannotate_protein
,annotate_gene
, orannotate_variant_class
isTrue
.FileNotFoundError
– ifvcf_file
,cache_dir
, orfasta
does not exist.
Minimal usage example¶
```python from pathlib import Path from vep_annotate import wrap_vcf_vep_annotate_unified
success, info = wrap_vcf_vep_annotate_unified( vcf_file="cohort.vcf.gz", cache_dir=Path("/data/vep_cache"), fasta=Path("/data/genome/GRCh38.fa"), annotate_protein=True, annotate_variant_class=True ) print(success, info) ````