Defeating the “Devil’s Drop”: How AI Can Save Medical Empathy

Medical students enter training with high empathy levels. Then something breaks. Research documents a significant drop in empathy during clinical years, precisely when students begin intensive patient contact. This "Devil's Drop" is not a character flaw. It reflects a training system that fails to provide the repeated practice required to maintain these critical skills under stress.

Medical students enter training with high empathy levels. Then something breaks. Research documents a significant drop in empathy during clinical years, precisely when students begin intensive patient contact. This “Devil’s Drop” is not a character flaw. It reflects a training system that fails to provide the repeated practice required to maintain these critical skills under stress [1].

The stakes are clinical, not just philosophical. For example, studies show greater physician empathy correlates with improved patient satisfaction and better clinical control of things such as hemoglobin A1c in diabetic patients [2]. Higher empathy in primary care clinicians has also been associated with reduced all-cause mortality over ten-year follow-up periods in patients with type 2 diabetes [3]. When communication breaks down, the consequences can be catastrophic. Communication problems are among the most frequently identified contributing factors in sentinel events and are implicated in the majority of serious medical errors, which in turn drive malpractice risk [4]. When patients feel unheard, they sue. When clinicians lack tools for difficult conversations, they burn out [5].

From Trait to Competency

For most of the 20th century, medical education treated empathy as innate. You either had bedside manner or you didn’t. While physicians broadly recognize the importance of communicating serious diagnoses, many report feeling undertrained for these conversations [5]. This changed when accrediting bodies elevated communication to a core competency. The ACGME formally designated interpersonal and communication skills as a core competency in 1999 [6], and the USMLE added Step 2 Clinical Skills in 2004, requiring students to demonstrate empathetic communication with standardized patients [7]. Structured frameworks like the SPIKES protocol emerged, offering a learnable, six-step process for breaking bad news to patients with cancer [8].

Meta-analyses confirmed these skills respond to training. For example, one oncology communication skills training produced meaningful improvements in observable provider behaviors, with longer or more intensive training yielding greater benefits [9]. Specific programs such as Oncotalk (now foundational to VitalTalk) demonstrated that oncology fellows and other clinicians can acquire multiple bad news and transition-to-palliative-care skills after intensive workshops [10]. These findings helped shift empathy and communication from presumed traits to teachable, assessable competencies.

The Scalability Problem

Standardized patients (SPs)—actors trained to portray specific cases—became the gold standard, offering realistic practice in controlled environments. The model works but does not scale easily. Market analyses of the healthcare simulation sector highlight high operational costs, which limit how often learners can access high-fidelity human simulation [11]. Students cannot master breaking bad news by attempting it annually; they need repeated attempts in a short period, which the economics of human simulation often prohibit [12].

Beyond cost, human variability creates assessment challenges. Performance can drift over the course of a day or across exam cycles, which complicates efforts to standardize learner assessment, a concern echoed in simulation and communication-skills literature [12]. Recruiting and retaining diverse standardized patients is also difficult. For instance, studies document challenges in building and sustaining African American and Latino SP pools, creating representation gaps that can unintentionally reinforce bias [13]. Performance anxiety adds another limitation. High-stakes encounters with human actors can trigger threat responses that impair the prefrontal functions needed for complex social and emotional processing, a pattern described across stress and cognition research [14]. Fear of judgment makes it harder for learners to experiment, fail, and grow.

The Generative AI Turn

Large language models change the equation fundamentally. Modern AI-powered virtual patients can conduct natural conversations, adapting their responses based on what learners say. Unlike older branching logic systems that required multiple-choice selections, generative AI supports free-text (and increasingly voice) dialogue that more closely mirrors real clinical encounters [14].

The emerging evidence is striking, though still early. A cross-over randomized trial in undergraduate nursing education reported that generative-AI patient simulations improved perceived clinical competency and AI readiness compared with other immersive formats [15]. In text-based interactions, a 2023 study of responses to real-world patient questions on a social media forum found that clinicians and other raters preferred ChatGPT’s answers to physicians’ answers 78.6% of the time and rated the AI responses as more empathetic and of higher quality [16]. A multi-course evaluation in various health-care focused settings involving hundreds of learners reported large relative gains in self-reported confidence and measurable knowledge improvements after repeated practice with AI-powered virtual-human simulations [17]. Collectively, these tools can drastically reduce barriers to frequent, on-demand practice compared with exclusively human-based simulation.

Where AI Excels

Generative AI appears particularly effective for high-stakes, language-dependent encounters. In breaking-bad-news training, virtual patients can be designed to react with denial, anger, or withdrawal when students disclose a cancer diagnosis, allowing learners to practice validation, silence, and empathic reflection repeatedly until they find approaches that de-escalate distress [19]. If a given strategy fails, they can reset instantly, experiment with alternative phrasing, and receive targeted feedback, something that is logistically difficult to achieve at scale with human SPs [12].

For medical error disclosure, AI virtual human platforms can analyze transcripts or real-time speech to flag distancing language such as “a mistake was made” and prompt students to rephrase using accountable, first-person statements, aligning with communication patterns associated with higher patient trust in the disclosure literature. Studies of virtual patients for breaking bad news and related scenarios show that learners value the ability to practice in a low-risk environment and that these tools can reinforce technical and organizational aspects of communication while complementing relational skills training [19]. The psychological safety of working with AI—where learners do not feel socially judged—helps them enter a “challenge” rather than “threat” state, which is more conducive to skill acquisition, especially for sensitive topics such as suicide screening or sexual history taking [20].

The Path Forward

The next five years will likely see broad integration of AI simulation, not as a replacement but as a complement to human encounters. Medical schools and health systems are already experimenting with assigning extensive AI-based practice prior to live OSCEs or bedside interactions, enabling scarce human SP and faculty time to be reserved for higher-level coaching and assessment [20]. Voice-enabled systems are beginning to incorporate prosody and sentiment analysis, and multimodal platforms are under development that can interpret facial expression and body language to give feedback on nonverbal empathy cues [18].

Critical gaps remain. Most current evidence focuses on perceived competence, knowledge, or communication behavior rather than hard patient outcomes, so whether AI-trained skills translate to better morbidity, mortality, or satisfaction at the bedside is still an open question [18]. Algorithmic bias requires vigilant mitigation: virtual patients and feedback algorithms must be designed and audited to avoid encoding and amplifying existing inequities, a concern explicitly highlighted in early scoping reviews of generative AI in simulation and in the AAMC’s principles for responsible AI in medical education [21]. The technology functions best when combined with human facilitation and guided reflection, where faculty help learners integrate what they practice with AI into their professional identity and ethical framework. The goal is not choosing between human and artificial intelligence, but using AI to provide the repetitions that build skill, freeing human faculty to mentor the moral development of healers.

References

  1. Hojat M, Vergare MJ, Maxwell K, et al. The devil is in the third year: a longitudinal study of erosion of empathy in medical school. Acad Med. 2009;84(9):1182-1191.
  2. Hojat M, Louis DZ, Markham FW, et al. Physicians’ empathy and clinical outcomes for diabetic patients. Acad Med. 2011;86(3):359-364.
  3. Dambha-Miller H, Feldman AL, Kinmonth AL, Griffin SJ. Association between primary care practitioner empathy and risk of cardiovascular events and all-cause mortality among patients with type 2 diabetes: a population-based prospective cohort study. Ann Fam Med. 2019;17(4):311-318.
  4. The Joint Commission. Sentinel Event Data: Root Causes by Event Type 2004–2015. Oakbrook Terrace, IL: The Joint Commission; 2015.
  5. Monden KR, Gentry L, Cox TR. Delivering bad news to patients. Proc (Bayl Univ Med Cent). 2016;29(1):101-102.
  6. Accreditation Council for Graduate Medical Education (ACGME). ACGME Core Competencies. Chicago, IL: ACGME; 1999.
  7. National Board of Medical Examiners (NBME). USMLE Step 2 Clinical Skills Information Booklet. Philadelphia, PA: NBME; 2004.
  8. Baile WF, Buckman R, Lenzi R, et al. SPIKES—A six-step protocol for delivering bad news: application to the patient with cancer. Oncologist. 2000;5(4):302-311.
  9. Barth J, Lannen P. Efficacy of communication skills training courses in oncology: a systematic review and meta-analysis. Ann Oncol. 2011;22(5):1030-1040.
  10. Back AL, Arnold RM, Baile WF, et al. Efficacy of communication skills training for giving bad news and discussing transitions to palliative care. Arch Intern Med. 2007;167(5):453-460.
  11. Medi-Tech Insights. Healthcare Simulation Market Size, Trends & Forecast to 2030. Brussels, Belgium: Medi-Tech Insights; 2023.
  12. Kron FW, Fetters MD, Scerbo MW, et al. Using a computer simulation for teaching communication skills: a blinded multisite mixed methods randomized controlled trial. Patient Educ Couns. 2017;100(4):748-759.
  13. Everett MR, May W, Nowels CT, Main DS. Recruitment, retention, and training of African American and Latino standardized patients: a collaborative study. Med Sci Educ. 2005;15(2):74-80.
  14. Lee J, Kim H, Kim KH, et al. Effective virtual patient simulators for medical communication training: a systematic review. Med Educ. 2020;54(9):786-795.
  15. Fung TCJ, Chan SL, Lam CF, et al. Effects of generative artificial intelligence (GenAI) patient simulation on perceived clinical competency among global nursing undergraduates: a cross-over randomised controlled trial. BMC Nurs. 2025;24:934.
  16. Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-596.
  17. Rozenfeld B, Nott I. Multi-course evaluation: AI-powered virtual-human simulations for healthcare communication. ACEhp Almanac. 2025.
  18. Janssen E, McLagan R, Habeck J, et al. Barriers to breakthroughs: a scoping review of generative AI in healthcare simulation. Clin Simul Nurs. 2025;107:101791.
  19. Carrard V, Bourquin C, Orsini S, Schmid Mast M, Berney A. Virtual patient simulation in breaking bad news training for medical students. Patient Educ Couns. 2020;103(7):1435-1438.
  20. Zhang B, Liu X, Wang Y, et al. Human or LLM as standardized patients? A comparative study for medical education. arXiv [preprint]. 2025:2511.14783.
  21. Association of American Medical Colleges (AAMC). Principles for the Responsible Use of AI in Medical Education. Washington, DC: AAMC; 2025.

Defeating the “Devil’s Drop”: How AI Can Save Medical Empathy

Medical students enter training with high empathy levels. Then something breaks. Research documents a significant drop in empathy during clinical years, precisely when students begin intensive patient contact. This "Devil's Drop" is not a character flaw. It reflects a training system that fails to provide the repeated practice required to maintain these critical skills under stress.
Published:
January 13, 2026
Read Time:
8 min read
Tags: