A High-Dimensional Genomic Framework for Leukemia Subtype Classification Using LightGBM and SHAP-based Explainable AI

Authors

DOI:

https://doi.org/10.54361/ajmas.269403

Abstract

Differentiating between acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) is very important for the treatment and prognosis of the patients. Conventional diagnostic schemes, while essential, are often insufficient to reveal the entire molecular complexity of these tumors.55 Cytogenetics only offers a limited snapshot of the genome at the molecular level, and gene expression profiling provides a more comprehensive molecular snapshot; however, genomic data are high-dimensional and present considerable analysis challenges. In this work, we propose a novel computational scheme that incorporates LightGBM with SHAP-based explainable artificial intelligence (XAI) to classify the subtypes of leukemia with high precision and determine significant genomic biomarkers. We considered the classic Golub dataset containing  7129 gene expressions of 72 patients, of which 47 had ALL and 25 suffered AML. The LightGBM classifier was trained by stratified 5-fold cross-validation. The model is also interpretable via SHAP, which allows global feature importance and local explanations at the patient level through dependence plots and waterfall plots. The LightGBM model outperformed state-of-the-art methods with 97.14% accuracy and an AUC-ROC of 0.9974, which is an excellent diagnostic result. SHAP analysis yielded a concise genomic signature, which was dominated by CD33 (M23197_at) and TCF3 (M31523_at) – well-known lineage markers with direct therapeutic relevance in AML and ALL, respectively. Dependence plots resulted in non-linear relationships between significant genes, and waterfall plots showed intelligible patient-specific diagnostic rationale. This study suggests that biological interpretability and high-performing ML are not mutually exclusive. By integrating computational results with clinically interpretable molecular findings, this framework may provide a blueprint for building reliable AI systems in hematologic oncology to enable a shift to precision medicine.

Downloads

Published

2026-04-06

How to Cite

1.
Soha Salih. A High-Dimensional Genomic Framework for Leukemia Subtype Classification Using LightGBM and SHAP-based Explainable AI. Alq J Med App Sci [Internet]. 2026 Apr. 6 [cited 2026 Apr. 6];:793-804. Available from: https://journal.utripoli.edu.ly/index.php/Alqalam/article/view/1523

Issue

Section

Articles