Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

dc.authoridSAHIN, MEHMET FATIH/0000-0002-0926-3005
dc.contributor.authorSahin, Mehmet Fatih
dc.contributor.authorDogan, Cagri
dc.contributor.authorTopkac, Erdem Can
dc.contributor.authorSeramet, Serkan
dc.contributor.authorTuncer, Furkan Batuhan
dc.contributor.authorYazici, Cenk Murat
dc.date.accessioned2025-04-06T12:23:56Z
dc.date.available2025-04-06T12:23:56Z
dc.date.issued2025
dc.departmentTekirdağ Namık Kemal Üniversitesi
dc.description.abstractIntroductionThe European Board of Urology (EBU) In-Service Assessment (ISA) test evaluates urologists' knowledge and interpretation. Artificial Intelligence (AI) chatbots are being used widely by physicians for theoretical information. This research compares five existing chatbots' test performances and questions' knowledge and interpretation.Materials and methodsGPT-4o, Copilot Pro, Gemini Advanced, Claude 3.5, and Sonar Huge chatbots solved 596 questions in 6 exams between 2017 and 2022. The questions were divided into two categories: questions that measure knowledge and require data interpretation. The chatbots' exam performances were compared.ResultsOverall, all chatbots except Claude 3.5 passed the examinations with a percentage of 60% overall score. Copilot Pro scored best, and Claude 3.5's score difference was significant (71.6% vs. 56.2%, p = 0.001). When a total of 444 knowledge and 152 analysis questions were compared, Copilot Pro offered the greatest percentage of information, whereas Claude 3.5 provided the least (72.1% vs. 57.4%, p = 0.001). This was also true for analytical skills (70.4% vs. 52.6%, p = 0.019).ConclusionsFour out of five chatbots passed the exams, achieving scores exceeding 60%, while only one did not pass the EBU examination. Copilot Pro performed best in EBU ISA examinations, whereas Claude 3.5 performed worst. Chatbots scored worse on analysis than knowledge questions. Thus, although existing chatbots are successful in terms of theoretical knowledge, their competence in analyzing the questions is questionable.
dc.description.sponsorshipTekirdag Namimath;k Kemal University; Executive Committee of the European Board of Urology
dc.description.sponsorshipWe thank to the Executive Committee of the European Board of Urology (EBU) for allowing us to utilize the in-service assessment questions from 2017 to 2022 for our research.
dc.identifier.doi10.1007/s00345-025-05499-3
dc.identifier.issn0724-4983
dc.identifier.issn1433-8726
dc.identifier.issue1
dc.identifier.pmid39932577
dc.identifier.scopus2-s2.0-85218436313
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1007/s00345-025-05499-3
dc.identifier.urihttps://hdl.handle.net/20.500.11776/17270
dc.identifier.volume43
dc.identifier.wosWOS:001419987100001
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofWorld Journal of Urology
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WOS_20250406
dc.subjectEuropean board of urology
dc.subjectIn-service assessment
dc.subjectChatbot
dc.subjectGPT
dc.subjectGemini
dc.subjectCopilot
dc.subjectSonar huge
dc.subjectClaude
dc.titleWhich current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment
dc.typeArticle

Dosyalar