Identification of glycolysis genes signature for predicting prognosis in malignant pleural mesothelioma by bioinformatics and machine learning

Frontiers in Endocrinology 2022 November 29 [Link]

Yingqi Xiao, Wei Huang, Li Zhang, Hongwei Wang

Abstract

Background: Glycolysis-related genes as prognostic markers in malignant pleural mesothelioma (MPM) is still unclear. We hope to explore the relationship between glycolytic pathway genes and MPM prognosis by constructing prognostic risk models through bioinformatics and machine learning.

Methods: The authors screened the dataset GSE51024 from the GEO database for Gene set enrichment analysis (GSEA), and performed differentially expressed genes (DEGs) of glycolytic pathway gene sets. Then, Cox regression analysis was used to identify prognosis-associated glycolytic genes and establish a risk model. Further, the validity of the risk model was evaluated using the dataset GSE67487 in GEO database, and finally, a specimen classification model was constructed by support vector machine (SVM) and random forest (RF) to further screen prognostic genes.

Results: By DEGs, five glycolysis-related pathway gene sets (17 glycolytic genes) were identified to be highly expressed in MPM tumor tissues. Also 11 genes associated with MPM prognosis were identified in TCGA-MPM patients, and 6 (COL5A1, ALDH2, KIF20A, ADH1B, SDC1, VCAN) of them were included by Multi-factor COX analysis to construct a prognostic risk model for MPM patients, with Area under the ROC curve (AUC) was 0.830. Further, dataset GSE67487 also confirmed the validity of the risk model, with a significant difference in overall survival (OS) between the low-risk and high-risk groups (P < 0.05). The final machine learning screened the five prognostic genes with the highest risk of MPM, in order of importance, were ALDH2, KIF20A, COL5A1, ADH1B and SDC1.

Conclusions: A risk model based on six glycolytic genes (ALDH2, KIF20A, COL5A1, ADH1B, SDC1, VCAN) can effectively predict the prognosis of MPM patients.