Skip to content

Extract Signatures

extractSignatures

Short description

Extracts mutational signatures from a 96 × samples trinucleotide-context matrix using NMF (KL divergence with multiplicative updates). Returns signature profiles and per-sample contributions.

Signature

def extractSignatures(
    contexts_df: pd.DataFrame,
    n: int,
    parallel: int = 4,
    pConstant: Optional[float] = None,
) -> Dict:

Parameters

Parameter Type Required Description
contexts_df pd.DataFrame Yes 96 × N matrix of raw counts produced by trinucleotideMatrix.
n int Yes Number of signatures to extract.
parallel int No Number of cores to use. Currently not used (kept for compatibility).
pConstant float | None No Small positive constant added to avoid numerical issues. Must be > 0 if provided. If None and there are samples with total mutations = 0, 0.01 is applied automatically.

Constraints: n ≥ 1 and contexts_df must have exactly 96 rows. If provided, pConstant must be > 0.

Return value

dict with keys:

Key Type Meaning
signatures pd.DataFrame Scaled signature matrix 96 × n (each signature sums to 1).
contributions pd.DataFrame Normalized contribution matrix n × samples (each sample sums to 1).
contributions_abs pd.DataFrame Absolute contribution matrix n × samples (not normalized).
nmfObj sklearn.decomposition.NMF Fitted NMF model object.

Exceptions

  • ImportErrorscikit-learn missing.
  • ValueError – Invalid arguments (wrong shapes, n < 1, pConstant <= 0) or NMF failure.

Minimal usage example

# ctx is the 96 × samples DataFrame returned by trinucleotideMatrix
res = extractSignatures(ctx, n=4)

print("signatures:", res["signatures"].shape)            # (96, 4)
print("contributions (norm):", res["contributions"].shape)     # (4, n_samples)
print("contributions (abs):", res["contributions_abs"].shape)  # (4, n_samples)
print(type(res["nmfObj"]))