Skip to content

Read MAF Files

read_maf

Short description

Reads a MAF file and returns a PyMutation object with automatic caching and performance optimizations.

Signature

def read_maf(path: str | Path,
             assembly: str,
             cache_dir: Optional[str | Path] = None,
             consolidate_variants: bool = True
) -> PyMutation:

Parameters

Parameter Type Required Description
path str \| Path Yes Path to the MAF file (.maf or .maf.gz).
assembly str Yes Version of the genome assembly. Must be either "37" or "38".
cache_dir str \| Path No Directory for caching processed files. If None, uses a .pymut_cache directory next to the input file.
consolidate_variants bool No If True (default), consolidates identical variants that appear in multiple samples into a single row. If False, maintains the original behavior where each MAF row becomes a separate DataFrame row.

Return value

Returns a PyMutation object containing the mutations read from the MAF file, converted to wide-format. Includes metadata with information about the source file, comments, and configuration.

Exceptions

  • FileNotFoundError: if the specified MAF file path does not exist.
  • ValueError: if the MAF file is missing required columns or has an invalid format, or if the assembly parameter is not "37" or "38".
  • ImportError: if the 'pyarrow' library cannot be imported and the 'c' engine alternative also fails.
  • Exception: for any other errors encountered while reading or processing the file.

Minimal usage example

>>> from pyMut import read_maf
>>> mutations = read_maf("data/mutations.maf", assembly="38")
>>> print(mutations.data.shape)
(1000, 25)