Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Sahin, Mehmet Fatih; Topkac, Erdem Can; Dogan, Cagri; Seramet, Serkan; Ozcan, Ridvan; Akgul, Murat; Yazici, Cenk Murat

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

dc.authorid	SAHIN, MEHMET FATIH/0000-0002-0926-3005
dc.contributor.author	Sahin, Mehmet Fatih
dc.contributor.author	Topkac, Erdem Can
dc.contributor.author	Dogan, Cagri
dc.contributor.author	Seramet, Serkan
dc.contributor.author	Ozcan, Ridvan
dc.contributor.author	Akgul, Murat
dc.contributor.author	Yazici, Cenk Murat
dc.date.accessioned	2024-10-29T17:58:37Z
dc.date.available	2024-10-29T17:58:37Z
dc.date.issued	2024
dc.department	Tekirdağ Namık Kemal Üniversitesi
dc.description.abstract	Objective: To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS).Materials and Methods: Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria.Results: The three most frequently searched terms were stone in kidney, kidney stone pain, and kidney pain. Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 +/- 7.1) and lowest FKGL (10.0 +/- 1.1) ratings (p = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 +/- 1.2) (p = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 +/- 2.0), and actionability was the highest in Claude (61.8 +/- 3.5) (p = 0.001).Conclusion: GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.
dc.identifier.doi	10.1089/end.2024.0474
dc.identifier.issn	0892-7790
dc.identifier.issn	1557-900X
dc.identifier.pmid	39212674
dc.identifier.scopus	2-s2.0-85203523003
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1089/end.2024.0474
dc.identifier.uri	https://hdl.handle.net/20.500.11776/14422
dc.identifier.wos	WOS:001306276500001
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Mary Ann Liebert, Inc
dc.relation.ispartof	Journal of Endourology
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	artificial intelligence
dc.subject	Claude
dc.subject	GPT-4
dc.subject	Google PaLM
dc.subject	Grok
dc.subject	Mistral
dc.subject	kidney stone
dc.title	Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones

Dosyalar

Koleksiyon