Comparison of machine learning methods for the prediction of type 2 diabetes in primary care setting using EHR data

Amos Otieno Olwendo; George Ochieng; Kenneth Rucha

doi:10.4314/jagst.v23i1.3

download pdf

Published:

Oct 30, 2023

DOI:

10.4314/jagst.v23i1.3

Keywords:

Comparison, machine learning, classification, clustering, type 2 diabetes

Issue

Vol. 23 No. 1 (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Open access articles published in the Journal of Agriculture, Science and Technology are under the terms of the Creative Commons Attribution (CC BY) License which permits use, distribution and reproduction in any medium, provided the original work is properly cited. The CC BY license permits commercial and non-commercial re-use of an open-access article, as long as the author is properly attributed.

Copyright on any research article published in the Journal of Agriculture, Science and Technology is retained by the author(s). The authors grant the Journal of Agriculture, Science and Technology with a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.

Use of the article in whole or in part in any medium requires proper citation as follows:

Title of Article, Names of the Author, Year of Publication, Journal Title, Volume (Issue) and page. Links to the final article on the JSRE website are encouraged.

The Creative Commons Attribution License does not affect any other rights held by authors or third parties in the article, including without limitation the rights of privacy and publicity. Use of the article must not assert or imply, whether implicitly or explicitly, any connection with, endorsement or sponsorship of such use by the author, publisher or any other party associated with the article.

For any reuse or distribution, users must include the copyright notice and make clear to others that the article is made available under a Creative Commons Attribution license, linking to the relevant Creative Commons web page. Users may impose no restrictions on the use of the article other than those imposed by the Creative Commons Attribution license.

To the fullest extent permitted by applicable law, the article is made available as is and without representation or warranties of any kind whether express, implied, statutory or otherwise and including, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of defects, accuracy, or the presence or absence of errors.

Amos Otieno Olwendo

https://orcid.org/0000-0002-8537-791X

George Ochieng

Kenneth Rucha

Abstract

Diabetes remains a major global public health challenge, thus the need for better methods for managing diabetes. Machine learning could provide reliable solutions to the need for early detection and management of diabetes. This study conducted experiments to compare a number of selected machine learning approaches to determine their suitability for early detection of diabetes in the primary care setting. A retrospective study was conducted using EHR dataset of confirmed cases of diabetes collected during routine care at Nairobi Hospital. Institutional ethical approvals were obtained, and data were retrieved from the database through stratified sampling based on gender. Diagnoses were confirmed using the ICD-10 codes. Records with 5% or so of missing values were excluded from this analysis. Data were processed by correction of errors and replacement of missing values using measures of central tendency. The data were transformed through normalization using the decimal-scaling method. Data analysis was conducted using selected supervised and unsupervised learning algorithms. Model performances were validated using metrics for the evaluation of classification and clustering results, respectively. Random Forest had the highest accuracy (0.95) and error rate (0.05), while Gradient Boosting and Multilayer Perceptron (MLP) with 3 hidden layers obtained accuracy (0.94) and error rate (0.06), respectively. The process of selecting machine learning algorithms needs to explore both supervised and unsupervised learning techniques. In addition, an appropriate architectural design of an MLP could present astounding results for classification tasks in primary care settings.

Journal of Agriculture, Science and Technology
Journal / Journal of Agriculture, Science and Technology / Vol. 23 No. 1 (2024) / Articles

Published:

DOI:

Keywords:

Comparison of machine learning methods for the prediction of type 2 diabetes in primary care setting using EHR data

Amos Otieno Olwendo

George Ochieng

Kenneth Rucha

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Amos Otieno Olwendo

George Ochieng

Kenneth Rucha

Abstract

Journal Identifiers