Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

Sahin, Mehmet Fatih; Dogan, Cagri; Topkac, Erdem Can; Seramet, Serkan; Tuncer, Furkan Batuhan; Yazici, Cenk Murat

Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

dc.authorid	SAHIN, MEHMET FATIH/0000-0002-0926-3005
dc.contributor.author	Sahin, Mehmet Fatih
dc.contributor.author	Dogan, Cagri
dc.contributor.author	Topkac, Erdem Can
dc.contributor.author	Seramet, Serkan
dc.contributor.author	Tuncer, Furkan Batuhan
dc.contributor.author	Yazici, Cenk Murat
dc.date.accessioned	2025-04-06T12:23:56Z
dc.date.available	2025-04-06T12:23:56Z
dc.date.issued	2025
dc.department	Tekirdağ Namık Kemal Üniversitesi
dc.description.abstract	IntroductionThe European Board of Urology (EBU) In-Service Assessment (ISA) test evaluates urologists' knowledge and interpretation. Artificial Intelligence (AI) chatbots are being used widely by physicians for theoretical information. This research compares five existing chatbots' test performances and questions' knowledge and interpretation.Materials and methodsGPT-4o, Copilot Pro, Gemini Advanced, Claude 3.5, and Sonar Huge chatbots solved 596 questions in 6 exams between 2017 and 2022. The questions were divided into two categories: questions that measure knowledge and require data interpretation. The chatbots' exam performances were compared.ResultsOverall, all chatbots except Claude 3.5 passed the examinations with a percentage of 60% overall score. Copilot Pro scored best, and Claude 3.5's score difference was significant (71.6% vs. 56.2%, p = 0.001). When a total of 444 knowledge and 152 analysis questions were compared, Copilot Pro offered the greatest percentage of information, whereas Claude 3.5 provided the least (72.1% vs. 57.4%, p = 0.001). This was also true for analytical skills (70.4% vs. 52.6%, p = 0.019).ConclusionsFour out of five chatbots passed the exams, achieving scores exceeding 60%, while only one did not pass the EBU examination. Copilot Pro performed best in EBU ISA examinations, whereas Claude 3.5 performed worst. Chatbots scored worse on analysis than knowledge questions. Thus, although existing chatbots are successful in terms of theoretical knowledge, their competence in analyzing the questions is questionable.
dc.description.sponsorship	Tekirdag Namimath;k Kemal University; Executive Committee of the European Board of Urology
dc.description.sponsorship	We thank to the Executive Committee of the European Board of Urology (EBU) for allowing us to utilize the in-service assessment questions from 2017 to 2022 for our research.
dc.identifier.doi	10.1007/s00345-025-05499-3
dc.identifier.issn	0724-4983
dc.identifier.issn	1433-8726
dc.identifier.issue	1
dc.identifier.pmid	39932577
dc.identifier.scopus	2-s2.0-85218436313
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1007/s00345-025-05499-3
dc.identifier.uri	https://hdl.handle.net/20.500.11776/17270
dc.identifier.volume	43
dc.identifier.wos	WOS:001419987100001
dc.identifier.wosquality	Q2
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Springer
dc.relation.ispartof	World Journal of Urology
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WOS_20250406
dc.subject	European board of urology
dc.subject	In-service assessment
dc.subject	Chatbot
dc.subject	GPT
dc.subject	Gemini
dc.subject	Copilot
dc.subject	Sonar huge
dc.subject	Claude
dc.title	Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

Dosyalar

Koleksiyon