Skip to content

MAF annotation

wrap_maf_vep_annotate_protein

Short description

Annotates a MAF file with protein-level VEP information and automatically merges the VEP output back into the original MAF, producing an annotated file ready for downstream analysis.

Signature

def wrap_maf_vep_annotate_protein(
    maf_file: Union[str, Path],
    cache_dir: Union[str, Path],
    fasta: Union[str, Path],
    output_file: Optional[Union[str, Path]] = None,
    synonyms_file: Optional[Union[str, Path]] = None,
    assembly: Optional[str] = None,
    version: Optional[str] = None,
    compress: bool = True,
    no_stats: bool = True
) -> Tuple[bool, str]:

Parameters

Parameter Type Required Description
maf_file str | Path Yes Path to the input MAF file (.maf or .maf.gz).
cache_dir str | Path Yes Directory containing the local VEP cache.
fasta str | Path Yes Reference FASTA used by VEP.
output_file str | Path No Destination for the raw VEP output. If omitted, a directory named vep_annotation_<timestamp> is created beside the MAF.
synonyms_file str | Path No Chromosome–synonyms file. When None, it is inferred from the cache directory.
assembly str No Genome assembly (e.g. "GRCh38"). If omitted, parsed from cache_dir.
version str No VEP cache version (e.g. "113"). If omitted, parsed from cache_dir.
compress bool No Gzip-compress the merged MAF (True, default).
no_stats bool No Run VEP with --no_stats to skip statistics (True, default).

Return value

Tuple[bool, str] where

  • boolTrue on successful VEP run (merge may still fail) and False on fatal error.
  • str – Path(s) to the VEP output and (when merge succeeds) the final annotated MAF. Never None.

Exceptions

  • ValueError – if cache_dir name does not encode assembly/version and they are not provided.
  • FileNotFoundError – if maf_file, cache_dir, or fasta does not exist.

Minimal usage example

```python from pathlib import Path from vep_annotate import wrap_maf_vep_annotate_protein

success, info = wrap_maf_vep_annotate_protein( maf_file="tumor_samples.maf.gz", cache_dir=Path("/data/vep_cache"), fasta=Path("/data/genome/GRCh38.fa") ) print(success, info) ````