Can Large Language Models Act as Medical Students in Medical Exams?; strengths and limitations

Can Large Language Models Act as Medical Students in Medical Exams?; strengths and limitations
AmirAli Moodi Ghalibaf,^1,* Keivan Lashkari,²
1. Student Committee of Medical Education Development, Education Development Center, Birjand University of Medical Sciences, Birjand, Iran
2. Student Committee of Medical Education Development, Education Development Center, Ardabil University of Medical Sciences, Ardabil, Iran
Introduction: Clinical reasoning is a logical and reasoned process of gathering information, understanding the problems and the patient's condition, planning and implementing interventions, and evaluating interventions and feedback in the learning process. Today, it is considered necessary to have it as a competence and clinical vision. However, few training programs emphasize this innovative and creative teaching strategy. With the emergence of artificial intelligence and its expansion in recent decades, today artificial intelligence has become one of the most used and hottest tools in human life, especially in the field of medicine. Among the types of artificial intelligence models that humans have achieved so far, large language models (LLMs) are artificial intelligence systems designed to understand, produce, and respond to human language. The present study is going to review the potential role and acts of the large language models in medical exams; in fact, Can Large Language Models Act as Medical Students in Medical Exams?
Methods: To determine the aims of the present study, a comprehensive systematic search was conducted through electronic databases including PubMed, Scopus, Embase, and Web of Science with the keywords “Medical Education”, “Medical exam”, “medical students”, “large language models”, and other related MeSH terms up to August 2024. Original studies, review studies, and references of the review studies were included. Finally, the studies which indicate the large language models role in medical exams were selected and reviewed.
Results: According to the reviewed studies, to determine the response of the question it is necessary to consider aspects of large language models. The capabilities of LLMs will be considered as an Extensive Knowledge Base (LLMs are trained on vast datasets that include medical literature, textbooks, clinical guidelines, and case studies. This allows them to generate responses based on a wide range of medical knowledge; they can provide information on diseases, treatments, pharmacology, and diagnostic criteria, which are often tested in medical exams.), Natural Language Processing (LLMs excel at understanding and generating human language, making them capable of interpreting exam questions and articulating coherent responses; they can simulate the language and terminology used in medical exams, which can be beneficial for answering questions in a format that aligns with exam expectations.), and Pattern Recognition (LLMs can identify patterns in clinical scenarios and apply relevant information to answer questions, particularly in multiple-choice formats or straightforward case-based questions; they can also generate differential diagnoses based on symptom descriptions provided in exam questions.). On the other hand, LLMs limitations can be stated as a lack of Clinical Experience (unlike medical students, LLMs do not have hands-on clinical experience or patient interactions. This experiential learning is crucial for understanding the complexities of real-world medicine; they cannot perform physical examinations or interact with patients, which are integral components of medical training.), Clinical Reasoning and Judgment (Medical exams often require higher-order thinking skills such as clinical reasoning, ethical considerations, and decision-making that go beyond rote knowledge; LLMs may struggle with complex clinical scenarios that require nuanced judgment or prioritization of competing clinical factors.), Contextual Awareness (LLMs may not fully grasp the context surrounding a specific clinical situation, leading to responses that may be inappropriate or incomplete; they cannot integrate real-time clinical data or adapt their responses based on new findings beyond their training cutoff), Ethical and Legal Implications (the use of LLMs in medical settings raises significant ethical concerns, including accountability for decisions made based on their outputs; there is a risk of misinformation or oversimplification of complex medical scenarios, which could have serious consequences in a clinical context), and Static Knowledge (While LLMs can provide up-to-date information from their training data, they cannot continuously learn or adapt based on new research or clinical guidelines after their last training update). While LLMs can simulate certain aspects of medical knowledge and provide valuable information, they cannot fully replicate the comprehensive skill set required of medical students in exams or clinical practice. They may be able to perform well on certain types of questions—especially those focused on factual recall or straightforward application of medical knowledge—but they lack the critical thinking, ethical reasoning, and experiential insights that human medical students develop through their education.
Conclusion: In summary, while LLMs can assist in learning and provide supplementary information in a medical context, they should not be viewed as substitutes for actual medical students in exams or clinical settings. Their role might best be seen as a supportive tool for education rather than a replacement for the nuanced understanding and judgment required in medicine.
Keywords: Medical Education, Medical Student, Medical Exam, Large Language Models