Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

dc.authoridSAHIN, MEHMET FATIH/0000-0002-0926-3005
dc.contributor.authorSahin, Mehmet Fatih
dc.contributor.authorTopkac, Erdem Can
dc.contributor.authorDogan, Cagri
dc.contributor.authorSeramet, Serkan
dc.contributor.authorOzcan, Ridvan
dc.contributor.authorAkgul, Murat
dc.contributor.authorYazici, Cenk Murat
dc.date.accessioned2024-10-29T17:58:37Z
dc.date.available2024-10-29T17:58:37Z
dc.date.issued2024
dc.departmentTekirdağ Namık Kemal Üniversitesi
dc.description.abstractObjective: To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS).Materials and Methods: Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria.Results: The three most frequently searched terms were stone in kidney, kidney stone pain, and kidney pain. Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 +/- 7.1) and lowest FKGL (10.0 +/- 1.1) ratings (p = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 +/- 1.2) (p = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 +/- 2.0), and actionability was the highest in Claude (61.8 +/- 3.5) (p = 0.001).Conclusion: GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.
dc.identifier.doi10.1089/end.2024.0474
dc.identifier.issn0892-7790
dc.identifier.issn1557-900X
dc.identifier.pmid39212674
dc.identifier.scopus2-s2.0-85203523003
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1089/end.2024.0474
dc.identifier.urihttps://hdl.handle.net/20.500.11776/14422
dc.identifier.wosWOS:001306276500001
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherMary Ann Liebert, Inc
dc.relation.ispartofJournal of Endourology
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectartificial intelligence
dc.subjectClaude
dc.subjectGPT-4
dc.subjectGoogle PaLM
dc.subjectGrok
dc.subjectMistral
dc.subjectkidney stone
dc.titleStill Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones
dc.typeArticle

Dosyalar