Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods
dc.contributor.author | Aktaş, Tuğba | |
dc.contributor.author | Temel, İsmail Mert | |
dc.contributor.author | Saygılı, Ahmet | |
dc.date.accessioned | 2025-04-06T12:12:30Z | |
dc.date.available | 2025-04-06T12:12:30Z | |
dc.date.issued | 2024 | |
dc.department | Tekirdağ Namık Kemal Üniversitesi | |
dc.description.abstract | Diabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively. | |
dc.description.abstract | Diabetes is a disease that occurs when the body cannot regulate the level of sugar (glucose) in the blood. Early diagnosis of this disease is important in preventing more serious diseases that may arise later. Within the scope of this study, an attempt was made to optimize the diabetes data set for use by training it with different models. At the very beginning of the study, Logistic Regression, KNN, SVM (Support Vector Machine), CART (Classification and Regression Trees), RF (Random Forest), Adaboost, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LGBM (Light Gradient Boosting). Machine), CatBoost models were used. According to the results of the models, RF, LGBM, XGBoost accuracy, and f1 values were observed as the best models, respectively. As a result, in the Random Forest model, which produced the most successful results, Accuracy: 0.88, F1 Score: 0.84, and ROC AUC: 0.95 values were obtained, respectively. | |
dc.identifier.doi | 10.47897/bilmes.1447878 | |
dc.identifier.doi | https://doi.org/10.47897/bilmes.1447878 | |
dc.identifier.endpage | 32 | |
dc.identifier.issn | 2618-5938 | |
dc.identifier.issue | 1 | |
dc.identifier.startpage | 22 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11776/16052 | |
dc.identifier.volume | 8 | |
dc.language.iso | en | |
dc.publisher | Umut SARAY | |
dc.relation.ispartof | International Scientific and Vocational Studies Journal | |
dc.relation.publicationcategory | Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.snmz | KA_DergiPark_20250406 | |
dc.subject | Diabetes | |
dc.subject | Machine learning | |
dc.subject | Random forest | |
dc.title | Comparative Analysis of Diabetes Diagnosis with Machine Learning Methods | |
dc.type | Research Article |