Combining MAF Files with pyMut¶
This notebook demonstrates how to combine two MAF files using the combine_pymutations
method from pyMut.
Example: Combining TCGA LAML and PAAD-TP MAF files¶
We'll load and combine tcga_laml.maf.gz
and PAAD-TP.final_analysis_set.maf.gz
files, then save the result to the output folder.
In [1]:
Copied!
# Import necessary functions
from pyMut.input import read_maf
from pyMut.combination import combine_pymutations
# Import necessary functions
from pyMut.input import read_maf
from pyMut.combination import combine_pymutations
In [2]:
Copied!
# Load the first MAF file (TCGA LAML)
maf1_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
pymut1 = read_maf(path=maf1_path, assembly="37")
# Load the first MAF file (TCGA LAML)
maf1_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
pymut1 = read_maf(path=maf1_path, assembly="37")
2025-08-01 02:01:59,559 | INFO | pyMut.input | Starting MAF reading: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz 2025-08-01 02:01:59,560 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/MAF/.pymut_cache/tcga_laml.maf_8bfbda65c4b23428.parquet 2025-08-01 02:01:59,591 | INFO | pyMut.input | Cache loaded successfully in 0.03 seconds
In [3]:
Copied!
# Load the second MAF file (PAAD-TP)
maf2_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
pymut2 = read_maf(path=maf2_path, assembly="37")
# Load the second MAF file (PAAD-TP)
maf2_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
pymut2 = read_maf(path=maf2_path, assembly="37")
2025-08-01 02:01:59,667 | INFO | pyMut.input | Starting MAF reading: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz 2025-08-01 02:01:59,668 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/MAF/.pymut_cache/tcga_laml.maf_8bfbda65c4b23428.parquet 2025-08-01 02:01:59,684 | INFO | pyMut.input | Cache loaded successfully in 0.02 seconds
In [4]:
Copied!
# Combine the two PyMutation instances
combined_pymut = combine_pymutations(pymut1, pymut2)
# Combine the two PyMutation instances
combined_pymut = combine_pymutations(pymut1, pymut2)
2025-08-01 02:01:59,714 | INFO | pyMut.combination | Starting combination of PyMutation instances: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz and ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz 2025-08-01 02:01:59,715 | INFO | pyMut.combination | Assembly check passed: both instances have assembly 37 2025-08-01 02:01:59,715 | INFO | pyMut.combination | Combined samples: 193 from first instance + 193 from second instance = 193 unique samples 2025-08-01 02:01:59,763 | INFO | pyMut.combination | Created unique variant identifiers for 2091 variants in first instance and 2091 variants in second instance 2025-08-01 02:01:59,764 | INFO | pyMut.combination | Column analysis: 217 common columns, 0 unique to first instance, 0 unique to second instance 2025-08-01 02:01:59,765 | INFO | pyMut.combination | Found 2091 unique variants across both instances 2025-08-01 02:02:00,211 | INFO | pyMut.combination | Processed 217 common columns, 0 columns unique to first instance, and 0 columns unique to second instance 2025-08-01 02:02:00,215 | INFO | pyMut.combination | Reordered columns: 7 standard columns + 193 sample columns + 16 annotation columns 2025-08-01 02:02:00,215 | INFO | pyMut.combination | Created new metadata with assembly 37 2025-08-01 02:02:00,216 | INFO | pyMut.combination | Combination complete: created PyMutation instance with 2091 variants and 216 columns
In [5]:
Copied!
# Save the combined result to the output folder
output_path = "output/combined_2maf_output.maf"
combined_pymut.to_maf(output_path)
# Save the combined result to the output folder
output_path = "output/combined_2maf_output.maf"
combined_pymut.to_maf(output_path)
2025-08-01 02:02:00,228 | INFO | pyMut.output | Starting MAF export to: output/combined_2maf_output.maf 2025-08-01 02:02:00,230 | INFO | pyMut.output | Starting to process 2091 variants from 193 samples 2025-08-01 02:02:00,234 | INFO | pyMut.output | Processing sample 1/193: TCGA-AB-2994 (0.5%) 2025-08-01 02:02:00,245 | INFO | pyMut.output | Sample TCGA-AB-2994: 12 variants found 2025-08-01 02:02:00,263 | INFO | pyMut.output | Processing sample 3/193: TCGA-AB-2974 (1.6%) 2025-08-01 02:02:00,275 | INFO | pyMut.output | Sample TCGA-AB-2974: 8 variants found 2025-08-01 02:02:00,312 | INFO | pyMut.output | Processing sample 6/193: TCGA-AB-2879 (3.1%) 2025-08-01 02:02:00,324 | INFO | pyMut.output | Sample TCGA-AB-2879: 5 variants found 2025-08-01 02:02:00,359 | INFO | pyMut.output | Processing sample 9/193: TCGA-AB-2941 (4.7%) 2025-08-01 02:02:00,370 | INFO | pyMut.output | Sample TCGA-AB-2941: 5 variants found 2025-08-01 02:02:00,406 | INFO | pyMut.output | Processing sample 12/193: TCGA-AB-2804 (6.2%) 2025-08-01 02:02:00,418 | INFO | pyMut.output | Sample TCGA-AB-2804: 7 variants found 2025-08-01 02:02:00,454 | INFO | pyMut.output | Processing sample 15/193: TCGA-AB-2848 (7.8%) 2025-08-01 02:02:00,467 | INFO | pyMut.output | Sample TCGA-AB-2848: 1 variants found 2025-08-01 02:02:00,504 | INFO | pyMut.output | Processing sample 18/193: TCGA-AB-2873 (9.3%) 2025-08-01 02:02:00,517 | INFO | pyMut.output | Sample TCGA-AB-2873: 2 variants found 2025-08-01 02:02:00,553 | INFO | pyMut.output | Processing sample 21/193: TCGA-AB-2889 (10.9%) 2025-08-01 02:02:00,566 | INFO | pyMut.output | Sample TCGA-AB-2889: 5 variants found 2025-08-01 02:02:00,600 | INFO | pyMut.output | Processing sample 24/193: TCGA-AB-2850 (12.4%) 2025-08-01 02:02:00,612 | INFO | pyMut.output | Sample TCGA-AB-2850: 6 variants found 2025-08-01 02:02:00,646 | INFO | pyMut.output | Processing sample 27/193: TCGA-AB-2917 (14.0%) 2025-08-01 02:02:00,659 | INFO | pyMut.output | Sample TCGA-AB-2917: 16 variants found 2025-08-01 02:02:00,693 | INFO | pyMut.output | Processing sample 30/193: TCGA-AB-2978 (15.5%) 2025-08-01 02:02:00,702 | INFO | pyMut.output | Sample TCGA-AB-2978: 18 variants found 2025-08-01 02:02:00,731 | INFO | pyMut.output | Processing sample 33/193: TCGA-AB-2806 (17.1%) 2025-08-01 02:02:00,741 | INFO | pyMut.output | Sample TCGA-AB-2806: 17 variants found 2025-08-01 02:02:00,769 | INFO | pyMut.output | Processing sample 36/193: TCGA-AB-2853 (18.7%) 2025-08-01 02:02:00,782 | INFO | pyMut.output | Sample TCGA-AB-2853: 9 variants found 2025-08-01 02:02:00,812 | INFO | pyMut.output | Processing sample 39/193: TCGA-AB-2973 (20.2%) 2025-08-01 02:02:00,822 | INFO | pyMut.output | Sample TCGA-AB-2973: 4 variants found 2025-08-01 02:02:00,852 | INFO | pyMut.output | Processing sample 42/193: TCGA-AB-2839 (21.8%) 2025-08-01 02:02:00,862 | INFO | pyMut.output | Sample TCGA-AB-2839: 21 variants found 2025-08-01 02:02:00,890 | INFO | pyMut.output | Processing sample 45/193: TCGA-AB-2945 (23.3%) 2025-08-01 02:02:00,900 | INFO | pyMut.output | Sample TCGA-AB-2945: 13 variants found 2025-08-01 02:02:00,927 | INFO | pyMut.output | Processing sample 48/193: TCGA-AB-2906 (24.9%) 2025-08-01 02:02:00,936 | INFO | pyMut.output | Sample TCGA-AB-2906: 15 variants found 2025-08-01 02:02:00,965 | INFO | pyMut.output | Processing sample 51/193: TCGA-AB-2995 (26.4%) 2025-08-01 02:02:00,974 | INFO | pyMut.output | Sample TCGA-AB-2995: 6 variants found 2025-08-01 02:02:01,005 | INFO | pyMut.output | Processing sample 54/193: TCGA-AB-2844 (28.0%) 2025-08-01 02:02:01,015 | INFO | pyMut.output | Sample TCGA-AB-2844: 13 variants found 2025-08-01 02:02:01,044 | INFO | pyMut.output | Processing sample 57/193: TCGA-AB-2888 (29.5%) 2025-08-01 02:02:01,054 | INFO | pyMut.output | Sample TCGA-AB-2888: 9 variants found 2025-08-01 02:02:01,083 | INFO | pyMut.output | Processing sample 60/193: TCGA-AB-2842 (31.1%) 2025-08-01 02:02:01,094 | INFO | pyMut.output | Sample TCGA-AB-2842: 2 variants found 2025-08-01 02:02:01,121 | INFO | pyMut.output | Processing sample 63/193: TCGA-AB-2882 (32.6%) 2025-08-01 02:02:01,130 | INFO | pyMut.output | Sample TCGA-AB-2882: 17 variants found 2025-08-01 02:02:01,160 | INFO | pyMut.output | Processing sample 66/193: TCGA-AB-2854 (34.2%) 2025-08-01 02:02:01,169 | INFO | pyMut.output | Sample TCGA-AB-2854: 11 variants found 2025-08-01 02:02:01,199 | INFO | pyMut.output | Processing sample 69/193: TCGA-AB-2841 (35.8%) 2025-08-01 02:02:01,211 | INFO | pyMut.output | Sample TCGA-AB-2841: 4 variants found 2025-08-01 02:02:01,239 | INFO | pyMut.output | Processing sample 72/193: TCGA-AB-2833 (37.3%) 2025-08-01 02:02:01,249 | INFO | pyMut.output | Sample TCGA-AB-2833: 8 variants found 2025-08-01 02:02:01,277 | INFO | pyMut.output | Processing sample 75/193: TCGA-AB-2885 (38.9%) 2025-08-01 02:02:01,288 | INFO | pyMut.output | Sample TCGA-AB-2885: 13 variants found 2025-08-01 02:02:01,316 | INFO | pyMut.output | Processing sample 78/193: TCGA-AB-2908 (40.4%) 2025-08-01 02:02:01,325 | INFO | pyMut.output | Sample TCGA-AB-2908: 20 variants found 2025-08-01 02:02:01,353 | INFO | pyMut.output | Processing sample 81/193: TCGA-AB-2924 (42.0%) 2025-08-01 02:02:01,363 | INFO | pyMut.output | Sample TCGA-AB-2924: 11 variants found 2025-08-01 02:02:01,392 | INFO | pyMut.output | Processing sample 84/193: TCGA-AB-2869 (43.5%) 2025-08-01 02:02:01,402 | INFO | pyMut.output | Sample TCGA-AB-2869: 12 variants found 2025-08-01 02:02:01,431 | INFO | pyMut.output | Processing sample 87/193: TCGA-AB-2955 (45.1%) 2025-08-01 02:02:01,441 | INFO | pyMut.output | Sample TCGA-AB-2955: 19 variants found 2025-08-01 02:02:01,469 | INFO | pyMut.output | Processing sample 90/193: TCGA-AB-2939 (46.6%) 2025-08-01 02:02:01,479 | INFO | pyMut.output | Sample TCGA-AB-2939: 15 variants found 2025-08-01 02:02:01,508 | INFO | pyMut.output | Processing sample 93/193: TCGA-AB-2957 (48.2%) 2025-08-01 02:02:01,519 | INFO | pyMut.output | Sample TCGA-AB-2957: 2 variants found 2025-08-01 02:02:01,548 | INFO | pyMut.output | Processing sample 96/193: TCGA-AB-2904 (49.7%) 2025-08-01 02:02:01,558 | INFO | pyMut.output | Sample TCGA-AB-2904: 22 variants found 2025-08-01 02:02:01,588 | INFO | pyMut.output | Processing sample 99/193: TCGA-AB-2940 (51.3%) 2025-08-01 02:02:01,598 | INFO | pyMut.output | Sample TCGA-AB-2940: 4 variants found 2025-08-01 02:02:01,628 | INFO | pyMut.output | Processing sample 102/193: TCGA-AB-2918 (52.8%) 2025-08-01 02:02:01,638 | INFO | pyMut.output | Sample TCGA-AB-2918: 2 variants found 2025-08-01 02:02:01,668 | INFO | pyMut.output | Processing sample 105/193: TCGA-AB-2871 (54.4%) 2025-08-01 02:02:01,679 | INFO | pyMut.output | Sample TCGA-AB-2871: 15 variants found 2025-08-01 02:02:01,710 | INFO | pyMut.output | Processing sample 108/193: TCGA-AB-2891 (56.0%) 2025-08-01 02:02:01,720 | INFO | pyMut.output | Sample TCGA-AB-2891: 16 variants found 2025-08-01 02:02:01,749 | INFO | pyMut.output | Processing sample 111/193: TCGA-AB-2881 (57.5%) 2025-08-01 02:02:01,759 | INFO | pyMut.output | Sample TCGA-AB-2881: 9 variants found 2025-08-01 02:02:01,790 | INFO | pyMut.output | Processing sample 114/193: TCGA-AB-2929 (59.1%) 2025-08-01 02:02:01,804 | INFO | pyMut.output | Sample TCGA-AB-2929: 16 variants found 2025-08-01 02:02:01,851 | INFO | pyMut.output | Processing sample 117/193: TCGA-AB-2819 (60.6%) 2025-08-01 02:02:01,863 | INFO | pyMut.output | Sample TCGA-AB-2819: 16 variants found 2025-08-01 02:02:01,893 | INFO | pyMut.output | Processing sample 120/193: TCGA-AB-2927 (62.2%) 2025-08-01 02:02:01,902 | INFO | pyMut.output | Sample TCGA-AB-2927: 27 variants found 2025-08-01 02:02:01,934 | INFO | pyMut.output | Processing sample 123/193: TCGA-AB-2968 (63.7%) 2025-08-01 02:02:01,944 | INFO | pyMut.output | Sample TCGA-AB-2968: 18 variants found 2025-08-01 02:02:01,974 | INFO | pyMut.output | Processing sample 126/193: TCGA-AB-2838 (65.3%) 2025-08-01 02:02:01,984 | INFO | pyMut.output | Sample TCGA-AB-2838: 20 variants found 2025-08-01 02:02:02,014 | INFO | pyMut.output | Processing sample 129/193: TCGA-AB-2855 (66.8%) 2025-08-01 02:02:02,023 | INFO | pyMut.output | Sample TCGA-AB-2855: 4 variants found 2025-08-01 02:02:02,051 | INFO | pyMut.output | Processing sample 132/193: TCGA-AB-2934 (68.4%) 2025-08-01 02:02:02,061 | INFO | pyMut.output | Sample TCGA-AB-2934: 10 variants found 2025-08-01 02:02:02,088 | INFO | pyMut.output | Processing sample 135/193: TCGA-AB-2895 (69.9%) 2025-08-01 02:02:02,098 | INFO | pyMut.output | Sample TCGA-AB-2895: 16 variants found 2025-08-01 02:02:02,126 | INFO | pyMut.output | Processing sample 138/193: TCGA-AB-2928 (71.5%) 2025-08-01 02:02:02,136 | INFO | pyMut.output | Sample TCGA-AB-2928: 10 variants found 2025-08-01 02:02:02,165 | INFO | pyMut.output | Processing sample 141/193: TCGA-AB-2846 (73.1%) 2025-08-01 02:02:02,174 | INFO | pyMut.output | Sample TCGA-AB-2846: 15 variants found 2025-08-01 02:02:02,202 | INFO | pyMut.output | Processing sample 144/193: TCGA-AB-2971 (74.6%) 2025-08-01 02:02:02,212 | INFO | pyMut.output | Sample TCGA-AB-2971: 11 variants found 2025-08-01 02:02:02,241 | INFO | pyMut.output | Processing sample 147/193: TCGA-AB-2832 (76.2%) 2025-08-01 02:02:02,251 | INFO | pyMut.output | Sample TCGA-AB-2832: 10 variants found 2025-08-01 02:02:02,282 | INFO | pyMut.output | Processing sample 150/193: TCGA-AB-2834 (77.7%) 2025-08-01 02:02:02,291 | INFO | pyMut.output | Sample TCGA-AB-2834: 1 variants found 2025-08-01 02:02:02,319 | INFO | pyMut.output | Processing sample 153/193: TCGA-AB-2919 (79.3%) 2025-08-01 02:02:02,328 | INFO | pyMut.output | Sample TCGA-AB-2919: 11 variants found 2025-08-01 02:02:02,356 | INFO | pyMut.output | Processing sample 156/193: TCGA-AB-2829 (80.8%) 2025-08-01 02:02:02,366 | INFO | pyMut.output | Sample TCGA-AB-2829: 10 variants found 2025-08-01 02:02:02,395 | INFO | pyMut.output | Processing sample 159/193: TCGA-AB-2851 (82.4%) 2025-08-01 02:02:02,405 | INFO | pyMut.output | Sample TCGA-AB-2851: 7 variants found 2025-08-01 02:02:02,433 | INFO | pyMut.output | Processing sample 162/193: TCGA-AB-2987 (83.9%) 2025-08-01 02:02:02,443 | INFO | pyMut.output | Sample TCGA-AB-2987: 7 variants found 2025-08-01 02:02:02,472 | INFO | pyMut.output | Processing sample 165/193: TCGA-AB-2861 (85.5%) 2025-08-01 02:02:02,482 | INFO | pyMut.output | Sample TCGA-AB-2861: 19 variants found 2025-08-01 02:02:02,513 | INFO | pyMut.output | Processing sample 168/193: TCGA-AB-2847 (87.0%) 2025-08-01 02:02:02,523 | INFO | pyMut.output | Sample TCGA-AB-2847: 10 variants found 2025-08-01 02:02:02,612 | INFO | pyMut.output | Processing sample 171/193: TCGA-AB-3001 (88.6%) 2025-08-01 02:02:02,621 | INFO | pyMut.output | Sample TCGA-AB-3001: 12 variants found 2025-08-01 02:02:02,650 | INFO | pyMut.output | Processing sample 174/193: TCGA-AB-2808 (90.2%) 2025-08-01 02:02:02,661 | INFO | pyMut.output | Sample TCGA-AB-2808: 10 variants found 2025-08-01 02:02:02,690 | INFO | pyMut.output | Processing sample 177/193: TCGA-AB-2911 (91.7%) 2025-08-01 02:02:02,701 | INFO | pyMut.output | Sample TCGA-AB-2911: 2 variants found 2025-08-01 02:02:02,729 | INFO | pyMut.output | Processing sample 180/193: TCGA-AB-2956 (93.3%) 2025-08-01 02:02:02,739 | INFO | pyMut.output | Sample TCGA-AB-2956: 4 variants found 2025-08-01 02:02:02,767 | INFO | pyMut.output | Processing sample 183/193: TCGA-AB-2913 (94.8%) 2025-08-01 02:02:02,777 | INFO | pyMut.output | Sample TCGA-AB-2913: 16 variants found 2025-08-01 02:02:02,805 | INFO | pyMut.output | Processing sample 186/193: TCGA-AB-2923 (96.4%) 2025-08-01 02:02:02,816 | INFO | pyMut.output | Sample TCGA-AB-2923: 23 variants found 2025-08-01 02:02:02,844 | INFO | pyMut.output | Processing sample 189/193: TCGA-AB-2814 (97.9%) 2025-08-01 02:02:02,853 | INFO | pyMut.output | Sample TCGA-AB-2814: 10 variants found 2025-08-01 02:02:02,881 | INFO | pyMut.output | Processing sample 192/193: TCGA-AB-2915 (99.5%) 2025-08-01 02:02:02,890 | INFO | pyMut.output | Sample TCGA-AB-2915: 24 variants found 2025-08-01 02:02:02,894 | INFO | pyMut.output | Processing sample 193/193: TCGA-AB-2907 (100.0%) 2025-08-01 02:02:02,905 | INFO | pyMut.output | Sample TCGA-AB-2907: 16 variants found 2025-08-01 02:02:02,931 | INFO | pyMut.output | Sample processing completed: 193/193 samples processed 2025-08-01 02:02:02,931 | INFO | pyMut.output | Total variants found: 2207 variants 2025-08-01 02:02:02,933 | INFO | pyMut.output | Using MAF_COL_ORDER.csv column order: 21 columns arranged 2025-08-01 02:02:02,935 | INFO | pyMut.output | Writing 2207 variants to file 2025-08-01 02:02:02,945 | INFO | pyMut.output | Progress: 2207/2207 variants written (100.0%) 2025-08-01 02:02:02,946 | INFO | pyMut.output | MAF export completed successfully: 2207 variants processed and written to output/combined_2maf_output.maf 2025-08-01 02:02:02,946 | INFO | pyMut.output | Conversion summary: 193 samples, 2091 input variants, 2207 output variants