MAF to VCF and MAF Conversion¶
This notebook demonstrates how to:
- Read a MAF file using
read_maf
with assembly=37 - Export the PyMutation object to VCF format using
to_vcf
- Export the PyMutation object to MAF format using
to_maf
Import the necessary functions¶
In [1]:
Copied!
import os
from pyMut import read_maf
print("✅ Functions imported correctly")
import os
from pyMut import read_maf
print("✅ Functions imported correctly")
✅ Functions imported correctly
Define the path to the MAF file¶
In [2]:
Copied!
# Path to the MAF file
maf_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
print("📁 File to process:")
print(f" - MAF file: {maf_path}")
# Verify that the file exists
if os.path.exists(maf_path):
print("✅ File found")
else:
print("❌ File not found")
# Path to the MAF file
maf_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"
print("📁 File to process:")
print(f" - MAF file: {maf_path}")
# Verify that the file exists
if os.path.exists(maf_path):
print("✅ File found")
else:
print("❌ File not found")
📁 File to process: - MAF file: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz ✅ File found
Read the MAF file with assembly=37¶
In [3]:
Copied!
print("📖 Reading MAF file...")
try:
# Read the MAF file with assembly=37
pymutation_obj = read_maf(maf_path, "37")
print("✅ PyMutation object created successfully")
print(f" DataFrame shape: {pymutation_obj.data.shape}")
print(f" Number of variants: {len(pymutation_obj.data)}")
print(f" Number of columns: {len(pymutation_obj.data.columns)}")
print(f" Number of samples: {len(pymutation_obj.samples)}")
except Exception as e:
print(f"❌ Error reading the file: {e}")
import traceback
traceback.print_exc()
print("📖 Reading MAF file...")
try:
# Read the MAF file with assembly=37
pymutation_obj = read_maf(maf_path, "37")
print("✅ PyMutation object created successfully")
print(f" DataFrame shape: {pymutation_obj.data.shape}")
print(f" Number of variants: {len(pymutation_obj.data)}")
print(f" Number of columns: {len(pymutation_obj.data.columns)}")
print(f" Number of samples: {len(pymutation_obj.samples)}")
except Exception as e:
print(f"❌ Error reading the file: {e}")
import traceback
traceback.print_exc()
2025-08-01 01:51:46,880 | INFO | pyMut.input | Starting MAF reading: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz 2025-08-01 01:51:46,881 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/MAF/.pymut_cache/tcga_laml.maf_8bfbda65c4b23428.parquet 2025-08-01 01:51:46,910 | INFO | pyMut.input | Cache loaded successfully in 0.03 seconds
📖 Reading MAF file... ✅ PyMutation object created successfully DataFrame shape: (2091, 216) Number of variants: 2091 Number of columns: 216 Number of samples: 193
Show the first rows of the DataFrame¶
In [4]:
Copied!
print("🔍 First 3 rows of the DataFrame:")
pymutation_obj.head(3)
print("🔍 First 3 rows of the DataFrame:")
pymutation_obj.head(3)
🔍 First 3 rows of the DataFrame:
Out[4]:
CHROM | POS | ID | REF | ALT | QUAL | FILTER | TCGA-AB-2988 | TCGA-AB-2869 | TCGA-AB-3009 | ... | Strand | Variant_Classification | Variant_Type | Reference_Allele | Tumor_Seq_Allele1 | Tumor_Seq_Allele2 | Tumor_Sample_Barcode | Protein_Change | i_TumorVAF_WU | i_transcript_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | chr9 | 100077177 | . | T | C | . | . | T|T | T|T | T|T | ... | + | SILENT | SNP | T | T | C | TCGA-AB-2886 | p.T431T | 9.76 | NM_020893.1 |
1 | chr9 | 100085148 | . | G | A | . | . | G|G | G|G | G|G | ... | + | MISSENSE_MUTATION | SNP | G | G | A | TCGA-AB-2917 | p.R581H | 18.4 | NM_020893.1 |
2 | chr9 | 100971322 | . | A | C | . | . | A|A | A|A | A|A | ... | + | MISSENSE_MUTATION | SNP | A | A | C | TCGA-AB-2841 | p.L593R | 45.83 | NM_018421.3 |
3 rows × 216 columns
Define output paths for VCF and MAF exports¶
In [5]:
Copied!
# Create output directory if it doesn't exist
output_dir = "./output"
os.makedirs(output_dir, exist_ok=True)
# Define output paths
vcf_output_path = os.path.join(output_dir, "maf_to_vcf_output.vcf")
maf_output_path = os.path.join(output_dir, "maf_to_maf_output.maf")
print("📁 Output files will be saved to:")
print(f" - VCF output: {vcf_output_path}")
print(f" - MAF output: {maf_output_path}")
# Create output directory if it doesn't exist
output_dir = "./output"
os.makedirs(output_dir, exist_ok=True)
# Define output paths
vcf_output_path = os.path.join(output_dir, "maf_to_vcf_output.vcf")
maf_output_path = os.path.join(output_dir, "maf_to_maf_output.maf")
print("📁 Output files will be saved to:")
print(f" - VCF output: {vcf_output_path}")
print(f" - MAF output: {maf_output_path}")
📁 Output files will be saved to: - VCF output: ./output/maf_to_vcf_output.vcf - MAF output: ./output/maf_to_maf_output.maf
Export to VCF format¶
In [6]:
Copied!
print("📝 Exporting to VCF format...")
try:
# Export to VCF format
pymutation_obj.to_vcf(vcf_output_path)
# Check if the file was created
if os.path.exists(vcf_output_path):
print(f"✅ VCF file created successfully: {vcf_output_path}")
print(f" File size: {os.path.getsize(vcf_output_path) / (1024 * 1024):.2f} MB")
else:
print("❌ VCF file was not created")
except Exception as e:
print(f"❌ Error exporting to VCF: {e}")
import traceback
traceback.print_exc()
print("📝 Exporting to VCF format...")
try:
# Export to VCF format
pymutation_obj.to_vcf(vcf_output_path)
# Check if the file was created
if os.path.exists(vcf_output_path):
print(f"✅ VCF file created successfully: {vcf_output_path}")
print(f" File size: {os.path.getsize(vcf_output_path) / (1024 * 1024):.2f} MB")
else:
print("❌ VCF file was not created")
except Exception as e:
print(f"❌ Error exporting to VCF: {e}")
import traceback
traceback.print_exc()
2025-08-01 01:51:47,147 | INFO | pyMut.output | Starting VCF export to: output/maf_to_vcf_output.vcf 2025-08-01 01:51:47,150 | INFO | pyMut.output | Starting to process 2091 variants from 193 samples
📝 Exporting to VCF format...
2025-08-01 01:51:47,263 | INFO | pyMut.output | Processing genotype data to replace bases with indices 2025-08-01 01:51:50,943 | INFO | pyMut.output | Writing 2091 variants to file 2025-08-01 01:51:51,008 | INFO | pyMut.output | Progress: 2091/2091 variants written (100.0%) 2025-08-01 01:51:51,011 | INFO | pyMut.output | VCF export completed successfully: 2091 variants processed and written to output/maf_to_vcf_output.vcf 2025-08-01 01:51:51,012 | INFO | pyMut.output | Conversion summary: 193 samples, 2091 input variants, 2091 output variants
✅ VCF file created successfully: ./output/maf_to_vcf_output.vcf File size: 1.84 MB
Export to MAF format¶
In [7]:
Copied!
print("📝 Exporting to MAF format...")
try:
# Export to MAF format
pymutation_obj.to_maf(maf_output_path)
# Check if the file was created
if os.path.exists(maf_output_path):
print(f"✅ MAF file created successfully: {maf_output_path}")
print(f" File size: {os.path.getsize(maf_output_path) / (1024 * 1024):.2f} MB")
else:
print("❌ MAF file was not created")
except Exception as e:
print(f"❌ Error exporting to MAF: {e}")
import traceback
traceback.print_exc()
print("📝 Exporting to MAF format...")
try:
# Export to MAF format
pymutation_obj.to_maf(maf_output_path)
# Check if the file was created
if os.path.exists(maf_output_path):
print(f"✅ MAF file created successfully: {maf_output_path}")
print(f" File size: {os.path.getsize(maf_output_path) / (1024 * 1024):.2f} MB")
else:
print("❌ MAF file was not created")
except Exception as e:
print(f"❌ Error exporting to MAF: {e}")
import traceback
traceback.print_exc()
2025-08-01 01:51:51,038 | INFO | pyMut.output | Starting MAF export to: output/maf_to_maf_output.maf 2025-08-01 01:51:51,039 | INFO | pyMut.output | Starting to process 2091 variants from 193 samples 2025-08-01 01:51:51,043 | INFO | pyMut.output | Processing sample 1/193: TCGA-AB-2988 (0.5%) 2025-08-01 01:51:51,056 | INFO | pyMut.output | Sample TCGA-AB-2988: 15 variants found 2025-08-01 01:51:51,085 | INFO | pyMut.output | Processing sample 3/193: TCGA-AB-3009 (1.6%) 2025-08-01 01:51:51,098 | INFO | pyMut.output | Sample TCGA-AB-3009: 42 variants found 2025-08-01 01:51:51,132 | INFO | pyMut.output | Processing sample 6/193: TCGA-AB-2920 (3.1%) 2025-08-01 01:51:51,144 | INFO | pyMut.output | Sample TCGA-AB-2920: 11 variants found 2025-08-01 01:51:51,180 | INFO | pyMut.output | Processing sample 9/193: TCGA-AB-2999 (4.7%) 2025-08-01 01:51:51,191 | INFO | pyMut.output | Sample TCGA-AB-2999: 11 variants found 2025-08-01 01:51:51,224 | INFO | pyMut.output | Processing sample 12/193: TCGA-AB-2923 (6.2%)
📝 Exporting to MAF format...
2025-08-01 01:51:51,235 | INFO | pyMut.output | Sample TCGA-AB-2923: 23 variants found 2025-08-01 01:51:51,269 | INFO | pyMut.output | Processing sample 15/193: TCGA-AB-2931 (7.8%) 2025-08-01 01:51:51,280 | INFO | pyMut.output | Sample TCGA-AB-2931: 11 variants found 2025-08-01 01:51:51,312 | INFO | pyMut.output | Processing sample 18/193: TCGA-AB-2906 (9.3%) 2025-08-01 01:51:51,322 | INFO | pyMut.output | Sample TCGA-AB-2906: 15 variants found 2025-08-01 01:51:51,354 | INFO | pyMut.output | Processing sample 21/193: TCGA-AB-2945 (10.9%) 2025-08-01 01:51:51,363 | INFO | pyMut.output | Sample TCGA-AB-2945: 13 variants found 2025-08-01 01:51:51,396 | INFO | pyMut.output | Processing sample 24/193: TCGA-AB-2952 (12.4%) 2025-08-01 01:51:51,407 | INFO | pyMut.output | Sample TCGA-AB-2952: 15 variants found 2025-08-01 01:51:51,439 | INFO | pyMut.output | Processing sample 27/193: TCGA-AB-2862 (14.0%) 2025-08-01 01:51:51,451 | INFO | pyMut.output | Sample TCGA-AB-2862: 11 variants found 2025-08-01 01:51:51,485 | INFO | pyMut.output | Processing sample 30/193: TCGA-AB-2911 (15.5%) 2025-08-01 01:51:51,496 | INFO | pyMut.output | Sample TCGA-AB-2911: 2 variants found 2025-08-01 01:51:51,528 | INFO | pyMut.output | Processing sample 33/193: TCGA-AB-2910 (17.1%) 2025-08-01 01:51:51,539 | INFO | pyMut.output | Sample TCGA-AB-2910: 12 variants found 2025-08-01 01:51:51,572 | INFO | pyMut.output | Processing sample 36/193: TCGA-AB-2822 (18.7%) 2025-08-01 01:51:51,583 | INFO | pyMut.output | Sample TCGA-AB-2822: 22 variants found 2025-08-01 01:51:51,618 | INFO | pyMut.output | Processing sample 39/193: TCGA-AB-2807 (20.2%) 2025-08-01 01:51:51,629 | INFO | pyMut.output | Sample TCGA-AB-2807: 29 variants found 2025-08-01 01:51:51,662 | INFO | pyMut.output | Processing sample 42/193: TCGA-AB-2897 (21.8%) 2025-08-01 01:51:51,673 | INFO | pyMut.output | Sample TCGA-AB-2897: 7 variants found 2025-08-01 01:51:51,708 | INFO | pyMut.output | Processing sample 45/193: TCGA-AB-2929 (23.3%) 2025-08-01 01:51:51,719 | INFO | pyMut.output | Sample TCGA-AB-2929: 16 variants found 2025-08-01 01:51:51,752 | INFO | pyMut.output | Processing sample 48/193: TCGA-AB-2935 (24.9%) 2025-08-01 01:51:51,763 | INFO | pyMut.output | Sample TCGA-AB-2935: 10 variants found 2025-08-01 01:51:51,799 | INFO | pyMut.output | Processing sample 51/193: TCGA-AB-2889 (26.4%) 2025-08-01 01:51:51,811 | INFO | pyMut.output | Sample TCGA-AB-2889: 5 variants found 2025-08-01 01:51:51,848 | INFO | pyMut.output | Processing sample 54/193: TCGA-AB-2990 (28.0%) 2025-08-01 01:51:51,859 | INFO | pyMut.output | Sample TCGA-AB-2990: 9 variants found 2025-08-01 01:51:51,892 | INFO | pyMut.output | Processing sample 57/193: TCGA-AB-2864 (29.5%) 2025-08-01 01:51:51,902 | INFO | pyMut.output | Sample TCGA-AB-2864: 16 variants found 2025-08-01 01:51:51,929 | INFO | pyMut.output | Processing sample 60/193: TCGA-AB-2903 (31.1%) 2025-08-01 01:51:51,934 | INFO | pyMut.output | Sample TCGA-AB-2903: 1 variants found 2025-08-01 01:51:51,960 | INFO | pyMut.output | Processing sample 63/193: TCGA-AB-2959 (32.6%) 2025-08-01 01:51:51,967 | INFO | pyMut.output | Sample TCGA-AB-2959: 27 variants found 2025-08-01 01:51:51,988 | INFO | pyMut.output | Processing sample 66/193: TCGA-AB-2888 (34.2%) 2025-08-01 01:51:51,993 | INFO | pyMut.output | Sample TCGA-AB-2888: 9 variants found 2025-08-01 01:51:52,013 | INFO | pyMut.output | Processing sample 69/193: TCGA-AB-3002 (35.8%) 2025-08-01 01:51:52,019 | INFO | pyMut.output | Sample TCGA-AB-3002: 27 variants found 2025-08-01 01:51:52,042 | INFO | pyMut.output | Processing sample 72/193: TCGA-AB-2991 (37.3%) 2025-08-01 01:51:52,048 | INFO | pyMut.output | Sample TCGA-AB-2991: 8 variants found 2025-08-01 01:51:52,143 | INFO | pyMut.output | Processing sample 75/193: TCGA-AB-2874 (38.9%) 2025-08-01 01:51:52,150 | INFO | pyMut.output | Sample TCGA-AB-2874: 15 variants found 2025-08-01 01:51:52,171 | INFO | pyMut.output | Processing sample 78/193: TCGA-AB-2821 (40.4%) 2025-08-01 01:51:52,176 | INFO | pyMut.output | Sample TCGA-AB-2821: 15 variants found 2025-08-01 01:51:52,195 | INFO | pyMut.output | Processing sample 81/193: TCGA-AB-2814 (42.0%) 2025-08-01 01:51:52,200 | INFO | pyMut.output | Sample TCGA-AB-2814: 10 variants found 2025-08-01 01:51:52,221 | INFO | pyMut.output | Processing sample 84/193: TCGA-AB-2978 (43.5%) 2025-08-01 01:51:52,226 | INFO | pyMut.output | Sample TCGA-AB-2978: 18 variants found 2025-08-01 01:51:52,246 | INFO | pyMut.output | Processing sample 87/193: TCGA-AB-3006 (45.1%) 2025-08-01 01:51:52,253 | INFO | pyMut.output | Sample TCGA-AB-3006: 19 variants found 2025-08-01 01:51:52,274 | INFO | pyMut.output | Processing sample 90/193: TCGA-AB-2857 (46.6%) 2025-08-01 01:51:52,279 | INFO | pyMut.output | Sample TCGA-AB-2857: 14 variants found 2025-08-01 01:51:52,299 | INFO | pyMut.output | Processing sample 93/193: TCGA-AB-2813 (48.2%) 2025-08-01 01:51:52,304 | INFO | pyMut.output | Sample TCGA-AB-2813: 16 variants found 2025-08-01 01:51:52,323 | INFO | pyMut.output | Processing sample 96/193: TCGA-AB-2970 (49.7%) 2025-08-01 01:51:52,329 | INFO | pyMut.output | Sample TCGA-AB-2970: 8 variants found 2025-08-01 01:51:52,351 | INFO | pyMut.output | Processing sample 99/193: TCGA-AB-2971 (51.3%) 2025-08-01 01:51:52,356 | INFO | pyMut.output | Sample TCGA-AB-2971: 11 variants found 2025-08-01 01:51:52,376 | INFO | pyMut.output | Processing sample 102/193: TCGA-AB-2985 (52.8%) 2025-08-01 01:51:52,383 | INFO | pyMut.output | Sample TCGA-AB-2985: 5 variants found 2025-08-01 01:51:52,404 | INFO | pyMut.output | Processing sample 105/193: TCGA-AB-2851 (54.4%) 2025-08-01 01:51:52,409 | INFO | pyMut.output | Sample TCGA-AB-2851: 7 variants found 2025-08-01 01:51:52,428 | INFO | pyMut.output | Processing sample 108/193: TCGA-AB-2858 (56.0%) 2025-08-01 01:51:52,434 | INFO | pyMut.output | Sample TCGA-AB-2858: 13 variants found 2025-08-01 01:51:52,453 | INFO | pyMut.output | Processing sample 111/193: TCGA-AB-2868 (57.5%) 2025-08-01 01:51:52,458 | INFO | pyMut.output | Sample TCGA-AB-2868: 13 variants found 2025-08-01 01:51:52,478 | INFO | pyMut.output | Processing sample 114/193: TCGA-AB-2937 (59.1%) 2025-08-01 01:51:52,484 | INFO | pyMut.output | Sample TCGA-AB-2937: 12 variants found 2025-08-01 01:51:52,503 | INFO | pyMut.output | Processing sample 117/193: TCGA-AB-2881 (60.6%) 2025-08-01 01:51:52,510 | INFO | pyMut.output | Sample TCGA-AB-2881: 9 variants found 2025-08-01 01:51:52,531 | INFO | pyMut.output | Processing sample 120/193: TCGA-AB-2803 (62.2%) 2025-08-01 01:51:52,536 | INFO | pyMut.output | Sample TCGA-AB-2803: 15 variants found 2025-08-01 01:51:52,556 | INFO | pyMut.output | Processing sample 123/193: TCGA-AB-2806 (63.7%) 2025-08-01 01:51:52,561 | INFO | pyMut.output | Sample TCGA-AB-2806: 17 variants found 2025-08-01 01:51:52,580 | INFO | pyMut.output | Processing sample 126/193: TCGA-AB-2810 (65.3%) 2025-08-01 01:51:52,587 | INFO | pyMut.output | Sample TCGA-AB-2810: 14 variants found 2025-08-01 01:51:52,607 | INFO | pyMut.output | Processing sample 129/193: TCGA-AB-2849 (66.8%) 2025-08-01 01:51:52,612 | INFO | pyMut.output | Sample TCGA-AB-2849: 26 variants found 2025-08-01 01:51:52,632 | INFO | pyMut.output | Processing sample 132/193: TCGA-AB-2928 (68.4%) 2025-08-01 01:51:52,638 | INFO | pyMut.output | Sample TCGA-AB-2928: 10 variants found 2025-08-01 01:51:52,659 | INFO | pyMut.output | Processing sample 135/193: TCGA-AB-2843 (69.9%) 2025-08-01 01:51:52,664 | INFO | pyMut.output | Sample TCGA-AB-2843: 12 variants found 2025-08-01 01:51:52,685 | INFO | pyMut.output | Processing sample 138/193: TCGA-AB-2940 (71.5%) 2025-08-01 01:51:52,691 | INFO | pyMut.output | Sample TCGA-AB-2940: 4 variants found 2025-08-01 01:51:52,710 | INFO | pyMut.output | Processing sample 141/193: TCGA-AB-3007 (73.1%) 2025-08-01 01:51:52,716 | INFO | pyMut.output | Sample TCGA-AB-3007: 8 variants found 2025-08-01 01:51:52,736 | INFO | pyMut.output | Processing sample 144/193: TCGA-AB-2983 (74.6%) 2025-08-01 01:51:52,741 | INFO | pyMut.output | Sample TCGA-AB-2983: 14 variants found 2025-08-01 01:51:52,762 | INFO | pyMut.output | Processing sample 147/193: TCGA-AB-2829 (76.2%) 2025-08-01 01:51:52,768 | INFO | pyMut.output | Sample TCGA-AB-2829: 10 variants found 2025-08-01 01:51:52,788 | INFO | pyMut.output | Processing sample 150/193: TCGA-AB-2946 (77.7%) 2025-08-01 01:51:52,794 | INFO | pyMut.output | Sample TCGA-AB-2946: 3 variants found 2025-08-01 01:51:52,814 | INFO | pyMut.output | Processing sample 153/193: TCGA-AB-2809 (79.3%) 2025-08-01 01:51:52,819 | INFO | pyMut.output | Sample TCGA-AB-2809: 4 variants found 2025-08-01 01:51:52,838 | INFO | pyMut.output | Processing sample 156/193: TCGA-AB-2873 (80.8%) 2025-08-01 01:51:52,843 | INFO | pyMut.output | Sample TCGA-AB-2873: 2 variants found 2025-08-01 01:51:52,862 | INFO | pyMut.output | Processing sample 159/193: TCGA-AB-2919 (82.4%) 2025-08-01 01:51:52,867 | INFO | pyMut.output | Sample TCGA-AB-2919: 11 variants found 2025-08-01 01:51:52,888 | INFO | pyMut.output | Processing sample 162/193: TCGA-AB-2967 (83.9%) 2025-08-01 01:51:52,893 | INFO | pyMut.output | Sample TCGA-AB-2967: 11 variants found 2025-08-01 01:51:52,913 | INFO | pyMut.output | Processing sample 165/193: TCGA-AB-2981 (85.5%) 2025-08-01 01:51:52,918 | INFO | pyMut.output | Sample TCGA-AB-2981: 6 variants found 2025-08-01 01:51:52,939 | INFO | pyMut.output | Processing sample 168/193: TCGA-AB-2877 (87.0%) 2025-08-01 01:51:52,946 | INFO | pyMut.output | Sample TCGA-AB-2877: 20 variants found 2025-08-01 01:51:52,967 | INFO | pyMut.output | Processing sample 171/193: TCGA-AB-2998 (88.6%) 2025-08-01 01:51:52,971 | INFO | pyMut.output | Sample TCGA-AB-2998: 10 variants found 2025-08-01 01:51:52,991 | INFO | pyMut.output | Processing sample 174/193: TCGA-AB-2982 (90.2%) 2025-08-01 01:51:52,997 | INFO | pyMut.output | Sample TCGA-AB-2982: 2 variants found 2025-08-01 01:51:53,016 | INFO | pyMut.output | Processing sample 177/193: TCGA-AB-2840 (91.7%) 2025-08-01 01:51:53,023 | INFO | pyMut.output | Sample TCGA-AB-2840: 1 variants found 2025-08-01 01:51:53,044 | INFO | pyMut.output | Processing sample 180/193: TCGA-AB-2942 (93.3%) 2025-08-01 01:51:53,050 | INFO | pyMut.output | Sample TCGA-AB-2942: 1 variants found 2025-08-01 01:51:53,072 | INFO | pyMut.output | Processing sample 183/193: TCGA-AB-2826 (94.8%) 2025-08-01 01:51:53,077 | INFO | pyMut.output | Sample TCGA-AB-2826: 4 variants found 2025-08-01 01:51:53,099 | INFO | pyMut.output | Processing sample 186/193: TCGA-AB-2948 (96.4%) 2025-08-01 01:51:53,105 | INFO | pyMut.output | Sample TCGA-AB-2948: 2 variants found 2025-08-01 01:51:53,125 | INFO | pyMut.output | Processing sample 189/193: TCGA-AB-2941 (97.9%) 2025-08-01 01:51:53,131 | INFO | pyMut.output | Sample TCGA-AB-2941: 5 variants found 2025-08-01 01:51:53,152 | INFO | pyMut.output | Processing sample 192/193: TCGA-AB-2855 (99.5%) 2025-08-01 01:51:53,158 | INFO | pyMut.output | Sample TCGA-AB-2855: 4 variants found 2025-08-01 01:51:53,162 | INFO | pyMut.output | Processing sample 193/193: TCGA-AB-2933 (100.0%) 2025-08-01 01:51:53,170 | INFO | pyMut.output | Sample TCGA-AB-2933: 1 variants found 2025-08-01 01:51:53,226 | INFO | pyMut.output | Sample processing completed: 193/193 samples processed 2025-08-01 01:51:53,226 | INFO | pyMut.output | Total variants found: 2207 variants 2025-08-01 01:51:53,235 | INFO | pyMut.output | Using MAF_COL_ORDER.csv column order: 21 columns arranged 2025-08-01 01:51:53,239 | INFO | pyMut.output | Writing 2207 variants to file 2025-08-01 01:51:53,254 | INFO | pyMut.output | Progress: 2207/2207 variants written (100.0%) 2025-08-01 01:51:53,255 | INFO | pyMut.output | MAF export completed successfully: 2207 variants processed and written to output/maf_to_maf_output.maf 2025-08-01 01:51:53,255 | INFO | pyMut.output | Conversion summary: 193 samples, 2091 input variants, 2207 output variants
✅ MAF file created successfully: ./output/maf_to_maf_output.maf File size: 0.35 MB
Examine the exported files¶
In [8]:
Copied!
# Show the first few lines of the exported VCF file
print("🔍 First 10 lines of the exported VCF file:")
!head -10 {vcf_output_path}
# Show the first few lines of the exported VCF file
print("🔍 First 10 lines of the exported VCF file:")
!head -10 {vcf_output_path}
🔍 First 10 lines of the exported VCF file: /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) ##fileformat=VCFv4.3 ##fileDate=20250801 ##source=https://github.com/Luisruimor/pyMut ##reference=37 ##FILTER=<ID=PASS,Description="All filters passed"> ##contig=<ID=9> ##contig=<ID=X> ##contig=<ID=14> ##contig=<ID=2> ##contig=<ID=12>
In [9]:
Copied!
# Show the first few lines of the exported MAF file
print("🔍 First 10 lines of the exported MAF file:")
!head -10 {maf_output_path}
# Show the first few lines of the exported MAF file
print("🔍 First 10 lines of the exported MAF file:")
!head -10 {maf_output_path}
🔍 First 10 lines of the exported MAF file: /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) Hugo_Symbol Entrez_Gene_Id Center NCBI_Build NCBI_Build Chromosome Start_Position Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Seq_Allele2 dbSNP_RS Tumor_Sample_Barcode Tumor_Sample_Barcode FILTER i_TumorVAF_WU End_position Protein_Change i_transcript_name QUAL BAAT 570 genome.wustl.edu 37 37 9 104124840 104124840 104124840 + MISSENSE_MUTATION SNP G G G G A A . TCGA-AB-2988 TCGA-AB-2988 . 48.35 104124840 p.T376M NM_001701.1 . TKTL1 8277 genome.wustl.edu 37 37 X 153557894 153557894 153557894 + SILENT SNP C C C C T T . TCGA-AB-2988 TCGA-AB-2988 . 41.11 153557894 p.A549A NM_012253.1 . ANG 283 genome.wustl.edu 37 37 14 21161742 21161742 21161742 + MISSENSE_MUTATION SNP G G G G A A . TCGA-AB-2988 TCGA-AB-2988 . 47.43 21161742 p.V7I NM_001097577.2 . DNMT3A 1788 genome.wustl.edu 37 37 2 25457161 25457161 25457161 + MISSENSE_MUTATION SNP A A A A C C . TCGA-AB-2988 TCGA-AB-2988 . 45.44 25457161 p.F909C NM_022552.3 . LRWD1 222229 genome.wustl.edu 37 37 7 102106693 102106693 102106693 + MISSENSE_MUTATION SNP C C C C A A . TCGA-AB-2988 TCGA-AB-2988 . 46.34 102106693 p.N136K NM_152892.1 . GUCA2A 2980 genome.wustl.edu 37 37 1 42629190 42629190 42629190 + MISSENSE_MUTATION SNP A A A A G G . TCGA-AB-2988 TCGA-AB-2988 . 42.08 42629190 p.F56S NM_033553.2 . SPTBN5 51332 genome.wustl.edu 37 37 15 42168393 42168393 42168393 + SILENT SNP G G G G A A . TCGA-AB-2988 TCGA-AB-2988 . 49.28 42168393 p.N1347N NM_016642.2 . SLC17A3 10786 genome.wustl.edu 37 37 6 25850330 25850330 25850330 + MISSENSE_MUTATION SNP G G G G A A . TCGA-AB-2988 TCGA-AB-2988 . 44.82 25850330 p.L279F NM_001098486.1 . NPM1 4869 genome.wustl.edu 37 37 5 170837547 170837547 170837547 + FRAME_SHIFT_INS INS - - - - CATG CATG . TCGA-AB-2988 TCGA-AB-2988 . 170837548 p.WQ288fs NM_002520.1 .
Summary¶
In this notebook, we demonstrated how to:
- Read a MAF file using
read_maf
with assembly=37 - Export the PyMutation object to VCF format using
to_vcf
- Export the PyMutation object to MAF format using
to_maf
These conversion capabilities allow for seamless interoperability between different mutation data formats.