Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods

dc.contributor.authorAktaş, Tuğba
dc.contributor.authorTemel, İsmail Mert
dc.contributor.authorSaygılı, Ahmet
dc.date.accessioned2025-04-06T12:12:30Z
dc.date.available2025-04-06T12:12:30Z
dc.date.issued2024
dc.departmentTekirdağ Namık Kemal Üniversitesi
dc.description.abstractDiabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively.
dc.description.abstractDiabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively.
dc.identifier.doi10.47897/bilmes.1447878
dc.identifier.doihttps://doi.org/10.47897/bilmes.1447878
dc.identifier.endpage32
dc.identifier.issn2618-5938
dc.identifier.issue1
dc.identifier.startpage22
dc.identifier.urihttps://hdl.handle.net/20.500.11776/16052
dc.identifier.volume8
dc.language.isoen
dc.publisherUmut SARAY
dc.relation.ispartofInternational Scientific and Vocational Studies Journal
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_DergiPark_20250406
dc.subjectDiabetes
dc.subjectMachine learning
dc.subjectRandom forest
dc.titleComparative Analysis of Diabetes Diagnosis with Machine Learning Methods
dc.typeResearch Article

Dosyalar