Reading VCF Files with pyMut¶
This notebook demonstrates how to read VCF files using the read_vcf
method from pyMut.
Example: Loading 1000 Genomes VCF file¶
We'll load the subset_1k_variants_ALL.chr10.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_vep_protein_gene_variant_class.vcf
file using assembly 38.
In [1]:
Copied!
# Import the read_vcf function
from pyMut.input import read_vcf
# Import the read_vcf function
from pyMut.input import read_vcf
In [2]:
Copied!
# Load the VCF file
vcf_path = "../../../src/pyMut/data/examples/VCF/subset_1k_variants_ALL.chr10.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_vep_protein_gene_variant_class.vcf"
py_mut = read_vcf(path=vcf_path, assembly="38")
# Load the VCF file
vcf_path = "../../../src/pyMut/data/examples/VCF/subset_1k_variants_ALL.chr10.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_vep_protein_gene_variant_class.vcf"
py_mut = read_vcf(path=vcf_path, assembly="38")
2025-08-01 01:44:54,794 | INFO | pyMut.input | Starting optimized VCF reading: ../../../src/pyMut/data/examples/VCF/subset_1k_variants_ALL.chr10.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_vep_protein_gene_variant_class.vcf 2025-08-01 01:44:54,795 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/VCF/.pymut_cache/subset_1k_variants_ALL.chr10.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased_vep_protein_gene_variant_class_e646b1f7d5dca1c4.parquet 2025-08-01 01:44:54,959 | INFO | pyMut.input | Cache loaded successfully in 0.16 seconds
In [3]:
Copied!
# Display the first 5 rows
py_mut.head()
# Display the first 5 rows
py_mut.head()
Out[3]:
CHROM | POS | ID | REF | ALT | QUAL | FILTER | HG00096 | HG00097 | HG00099 | ... | VEP_ENSP | VEP_SWISSPROT | VEP_TREMBL | VEP_UNIPARC | VEP_UNIPROT_ISOFORM | VEP_NEAREST | VEP_DOMAINS | Hugo_Symbol | Variant_Classification | Variant_Type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | chr10 | 11501 | . | C | A | . | PASS | C|A | C|C | C|C | ... | TUBB8 | TUBB8 | INTRON | SNP | ||||||
1 | chr10 | 36097 | . | G | A | . | PASS | G|A | A|G | G|G | ... | TUBB8 | TUBB8 | INTRON | SNP | ||||||
2 | chr10 | 45900 | . | C | T | . | PASS | C|C | C|C | C|C | ... | ENSP00000456206 | Q3ZCM7.157 | UPI000007238E | TUBB8 | TUBB8 | 3'FLANK | SNP | |||
3 | chr10 | 47049 | . | GGA | G | . | PASS | GGA|GGA | GGA|GGA | GGA|GGA | ... | ENSP00000456206 | Q3ZCM7.157 | UPI000007238E | TUBB8 | TUBB8 | 3'UTR_DEL | DEL | |||
4 | chr10 | 47064 | . | ACCT | A | . | PASS | ACCT|ACCT | ACCT|ACCT | ACCT|ACCT | ... | ENSP00000456206 | Q3ZCM7.157 | UPI000007238E | TUBB8 | MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lit... | TUBB8 | RNA_DEL | DEL |
5 rows × 2601 columns