Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

IntroductionThe European Board of Urology (EBU) In-Service Assessment (ISA) test evaluates urologists' knowledge and interpretation. Artificial Intelligence (AI) chatbots are being used widely by physicians for theoretical information. This research compares five existing chatbots' test performances and questions' knowledge and interpretation.Materials and methodsGPT-4o, Copilot Pro, Gemini Advanced, Claude 3.5, and Sonar Huge chatbots solved 596 questions in 6 exams between 2017 and 2022. The questions were divided into two categories: questions that measure knowledge and require data interpretation. The chatbots' exam performances were compared.ResultsOverall, all chatbots except Claude 3.5 passed the examinations with a percentage of 60% overall score. Copilot Pro scored best, and Claude 3.5's score difference was significant (71.6% vs. 56.2%, p = 0.001). When a total of 444 knowledge and 152 analysis questions were compared, Copilot Pro offered the greatest percentage of information, whereas Claude 3.5 provided the least (72.1% vs. 57.4%, p = 0.001). This was also true for analytical skills (70.4% vs. 52.6%, p = 0.019).ConclusionsFour out of five chatbots passed the exams, achieving scores exceeding 60%, while only one did not pass the EBU examination. Copilot Pro performed best in EBU ISA examinations, whereas Claude 3.5 performed worst. Chatbots scored worse on analysis than knowledge questions. Thus, although existing chatbots are successful in terms of theoretical knowledge, their competence in analyzing the questions is questionable.

Açıklama

Anahtar Kelimeler

European board of urology, In-service assessment, Chatbot, GPT, Gemini, Copilot, Sonar huge, Claude

Kaynak

World Journal of Urology

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

43

Sayı

1

Künye