A cluster-genetic programming approach for detecting pulmonary tuberculosis
Tuberculosis (TB) remains a global health concern. It commonly spreads through the air and attacks low immune bodies. TB is the most common and known health problem in low and middle-income countries. Genetic programming (GP) is a machine learning model for discovering useful relationships among the variables in complex clinical data. It is more appropriate in a circumstance when the form of the solution model is unknown a priori. The main objective of this study is to develop a model that can detect positive cases of TB suspected patients using genetic programming approach. In this paper, Genetic Programming (GP) is exploited to identify the presence of positive cases of tuberculosis from the real data set of TB suspects and hospitalized patients. First, the dataset is pre-processed, and target variables are identified using cluster analysis. This data-driven cluster analysis identifies two distinct clusters of patients, representing TB positive and TB negative. Then, GP is trained using the training datasets to construct a prediction model and tested with a separate new dataset. With the 30 runs, the median performance of GP on test data was good (sensitivity=0.78, specificity=0.95, accuracy=0.89, AUC=0.91). We find that GP shows better performance in predicting TB compared to other machine learning models. The study demonstrates that the GP model might be used to support clinicians to screen TB patients.