MAF annotation
wrap_maf_vep_annotate_protein¶
Short description¶
Annotates a MAF file with protein-level VEP information and automatically merges the VEP output back into the original MAF, producing an annotated file ready for downstream analysis.
Signature¶
def wrap_maf_vep_annotate_protein(
maf_file: Union[str, Path],
cache_dir: Union[str, Path],
fasta: Union[str, Path],
output_file: Optional[Union[str, Path]] = None,
synonyms_file: Optional[Union[str, Path]] = None,
assembly: Optional[str] = None,
version: Optional[str] = None,
compress: bool = True,
no_stats: bool = True
) -> Tuple[bool, str]:
Parameters¶
Parameter | Type | Required | Description |
---|---|---|---|
maf_file |
str | Path |
Yes | Path to the input MAF file (.maf or .maf.gz ). |
cache_dir |
str | Path |
Yes | Directory containing the local VEP cache. |
fasta |
str | Path |
Yes | Reference FASTA used by VEP. |
output_file |
str | Path |
No | Destination for the raw VEP output. If omitted, a directory named vep_annotation_<timestamp> is created beside the MAF. |
synonyms_file |
str | Path |
No | Chromosome–synonyms file. When None , it is inferred from the cache directory. |
assembly |
str |
No | Genome assembly (e.g. "GRCh38" ). If omitted, parsed from cache_dir . |
version |
str |
No | VEP cache version (e.g. "113" ). If omitted, parsed from cache_dir . |
compress |
bool |
No | Gzip-compress the merged MAF (True , default). |
no_stats |
bool |
No | Run VEP with --no_stats to skip statistics (True , default). |
Return value¶
Tuple[bool, str]
where
bool
–True
on successful VEP run (merge may still fail) andFalse
on fatal error.str
– Path(s) to the VEP output and (when merge succeeds) the final annotated MAF. NeverNone
.
Exceptions¶
ValueError
– ifcache_dir
name does not encodeassembly
/version
and they are not provided.FileNotFoundError
– ifmaf_file
,cache_dir
, orfasta
does not exist.
Minimal usage example¶
```python from pathlib import Path from vep_annotate import wrap_maf_vep_annotate_protein
success, info = wrap_maf_vep_annotate_protein( maf_file="tumor_samples.maf.gz", cache_dir=Path("/data/vep_cache"), fasta=Path("/data/genome/GRCh38.fa") ) print(success, info) ````