Ensemble methods for classification of patients for personalized medicine with high-dimensional data

Artificial Intelligence in Medicine. 2007 Nov;41(3):197-207. Epub 2007 Aug 23. [Link]

Moon H, Ahn H, Kodell RL, Baek S, Lin CJ, Chen JJ.

Department of Mathematics and Statistics, California State University-Long Beach, 1250 Bellflower Blvd., Long Beach, CA 90840, USA. hmoon623@yahoo.com

Abstract

Objective: Personalized medicine is defined by the use of genomic signatures of patients in a target population for assignment of more effective therapies as well as better diagnosis and earlier interventions that might prevent or delay disease. An objective is to find a novel classification algorithm that can be used for prediction of response to therapy in order to help individualize clinical assignment of treatment.

Methods and materials: Classification algorithms are required to be highly accurate for optimal treatment on each patient. Typically, there are numerous genomic and clinical variables over a relatively small number of patients, which presents challenges for most traditional classification algorithms to avoid over-fitting the data. We developed a robust classification algorithm for high-dimensional data based on ensembles of classifiers built from the optimal number of random partitions of the feature space. The software is available on request from the authors.

Results: The proposed algorithm is applied to genomic data sets on lymphoma patients and lung cancer patients to distinguish disease subtypes for optimal treatment and to genomic data on breast cancer patients to identify patients most likely to benefit from adjuvant chemotherapy after surgery. The performance of the proposed algorithm is consistently ranked highly compared to the other classification algorithms.

Conclusion: The statistical classification method for individualized treatment of diseases developed in this study is expected to play a critical role in developing safer and more effective therapies that replace one-size-fits-all drugs with treatments that focus on specific patient needs.

Keywords: Class prediction, Cross-validation, Ensembles, Majority voting, Risk profiling