A High-Dimensional Genomic Framework for Leukemia Subtype Classification Using LightGBM and SHAP-based Explainable AI
DOI:
https://doi.org/10.54361/ajmas.269403Abstract
Differentiating between acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) is very important for the treatment and prognosis of the patients. Conventional diagnostic schemes, while essential, are often insufficient to reveal the entire molecular complexity of these tumors.55 Cytogenetics only offers a limited snapshot of the genome at the molecular level, and gene expression profiling provides a more comprehensive molecular snapshot; however, genomic data are high-dimensional and present considerable analysis challenges. In this work, we propose a novel computational scheme that incorporates LightGBM with SHAP-based explainable artificial intelligence (XAI) to classify the subtypes of leukemia with high precision and determine significant genomic biomarkers. We considered the classic Golub dataset containing 7129 gene expressions of 72 patients, of which 47 had ALL and 25 suffered AML. The LightGBM classifier was trained by stratified 5-fold cross-validation. The model is also interpretable via SHAP, which allows global feature importance and local explanations at the patient level through dependence plots and waterfall plots. The LightGBM model outperformed state-of-the-art methods with 97.14% accuracy and an AUC-ROC of 0.9974, which is an excellent diagnostic result. SHAP analysis yielded a concise genomic signature, which was dominated by CD33 (M23197_at) and TCF3 (M31523_at) – well-known lineage markers with direct therapeutic relevance in AML and ALL, respectively. Dependence plots resulted in non-linear relationships between significant genes, and waterfall plots showed intelligible patient-specific diagnostic rationale. This study suggests that biological interpretability and high-performing ML are not mutually exclusive. By integrating computational results with clinically interpretable molecular findings, this framework may provide a blueprint for building reliable AI systems in hematologic oncology to enable a shift to precision medicine.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Soha Salih

This work is licensed under a Creative Commons Attribution 4.0 International License.











