Calibration of AI Algorithm Scores Can Help Identify High-Risk Hypertrophic Cardiomyopathy Patients

Mount Sinai researchers studying hypertrophic cardiomyopathy have calibrated an artificial intelligence algorithm to more specifically identify patients with the condition and flag them as high risk for greater attention. And in a further study, the team developed an AI model to make individualized treatment recommendations for atrial fibrillation patients.

Mount Sinai researchers studying hypertrophic cardiomyopathy (HCM) have calibrated an artificial intelligence (AI) algorithm to more specifically identify patients with the condition and flag them as high risk for greater attention. The algorithm, Viz HCM, had previously been approved by the Food and Drug Administration for the detection of HCM on an electrocardiogram (ECG). The Mount Sinai study, published in April 2025 in NEJM AI, assigns numeric probabilities to the algorithm’s findings.

For example, while the algorithm might previously have said, "flagged as suspected HCM" or "high risk of HCM," the Mount Sinai study allows for interpretations such as, "You have about a 60 percent chance of having HCM," says corresponding author Joshua Lampert, MD, Medical Director of Machine Learning at Mount Sinai Fuster Heart Hospital. As a result, patients who had not previously been diagnosed with HCM may be able to get a better understanding of their individual disease risk, leading to a faster and more individualized evaluation, along with treatment to potentially prevent complications such as sudden cardiac death.

“This is an important step forward in translating novel deep-learning algorithms into clinical practice by providing clinicians and patients with more meaningful information. Clinicians can improve their clinical workflows by ensuring the highest-risk patients are identified at the top of their clinical work list using a sorting tool. Patients can be better counseled by receiving more individualized information through model calibration, which improves interpretability of model classification scores,” says Dr. Lampert, Assistant Professor of Medicine (Cardiology, and Data-Driven and Digital Medicine) at the Icahn School of Medicine at Mount Sinai. “Whether this local model calibration strategy is universally applicable to other settings remains to be demonstrated.”

In the study, which was sponsored by Viz.ai, Mount Sinai researchers ran the Viz HCM algorithm on nearly 71,000 patients who had an electrocardiogram between March 7, 2023, and January 18, 2024. The algorithm flagged 1,522 as being high risk for HCM. Cardiologists manually reviewed medical records and imaging to further classify individuals as HCM positive (HCM+) or HCM negative (HCM−). Logistic regression was used for calibration to transform model outputs to probabilities in the flagged sample. The team compared calibrated probabilities with the estimated probability of disease in the flagged sample and estimated positive predictive value (PPV) at different thresholds of the calibrated score.

The team found that the calibrated model gave an accurate estimate of a patient’s likelihood of having HCM.

Using the model to analyze patients’ ECG results could allow cardiologists to prioritize the highest-risk patients to bring them in sooner for an appointment and treatment before symptoms begin or exacerbate. This may help get new patients engaged and into care to prevent adverse outcomes associated with HCM, such as sudden death or symptoms from the thickened heart muscle obstructing blood flow.

“This study provides much-needed granularity to help rethink how we triage, risk-stratify, and counsel patients. In an era of augmented intelligence, we must grow to incorporate novel sophistication in our approach to patient care,” says co-senior author Vivek Reddy, MD, Director of Cardiac Arrhythmia Services for the Mount Sinai Health System, and the Leona M. and Harry B. Helmsley Charitable Trust Professor of Medicine in Cardiac Electrophysiology. “Using hypertrophic cardiomyopathy as an illustrative use case, we show how we can pragmatically operationalize novel tools even in the setting of less common diseases by sorting AI classifications to triage patients.”

“An out-of-the-box FDA-cleared deep-learning HCM algorithm that flags patients for further follow-up may have a low PPV, as those individuals flagged can still be relatively low risk,” the study concluded. “Calibration can be used to estimate the probability of HCM for individuals in the flagged sample, thereby providing important information for clinical decision-making or risk-thresholding for further follow-up. Sorting probability scores from highest to lowest can be helpful for prioritizing follow-up, though it may reduce the negative predictive value. Whether this local calibration strategy is universally applicable to other settings remains to be demonstrated. However, this mechanism should not be used as a filter to remove patients at a lower perceived risk of disease from clinical review.”

“This study reflects pragmatic implementation science at its best, demonstrating how we can responsibly and thoughtfully integrate advanced AI tools into real-world clinical workflows,” says co-senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Chief AI Officer of the Mount Sinai Health System, and Barbara T. Murphy Professor of AI at the Icahn School of Medicine at Mount Sinai. “It’s not just about building a high-performing algorithm—it’s about making sure it supports clinical decision-making in a way that improves patient outcomes and aligns with how care is actually delivered.”

AI Model Accurately Identifies Which Atrial Fibrillation Patients Need Blood Thinners to Prevent Stroke

In a further study, Mount Sinai researchers developed an AI model to make individualized treatment recommendations for atrial fibrillation (AF) patients—helping clinicians accurately decide whether to treat them with anticoagulants to prevent stroke, the standard treatment course in this patient population. This model presents a new approach for how clinical decisions are made for AF patients and could represent a potential paradigm shift in this area.

In this study, which was a late-breaking science presentation at the European Society of Cardiology in September 2025, the AI model recommended against anticoagulant treatment for up to half of the AF patients who otherwise would have received it based on standard-of-care tools. This could have profound ramifications for global health, the researchers say.

The AI model uses the patient’s electronic health record to recommend an individualized treatment recommendation. It weighs the risk of having a stroke against the risk of major bleeding (whether this would occur organically or as a result of treatment with the blood thinner). This approach to clinical decision-making is truly individualized compared to current practice, where clinicians use risk scores/tools that provide estimates of risk on average over the studied patient population, not for individual patients.

Researchers trained the AI model on electronic health records of 1.8 million patients over 21 million doctor visits, 82 million notes, and 1.2 billion data points. They generated a net-benefit recommendation on whether to treat the patient with blood thinners.

To validate the model, researchers tested the model’s performance among 38,642 patients with atrial fibrillation within the Mount Sinai Health System. They also externally validated the model on 12,817 patients from publicly available datasets from Stanford University.

The model generated treatment recommendations that aligned with mitigating stroke and bleeding. It reclassified around half of the AF patients to not receive anticoagulation. These patients would have received anticoagulants under current treatment guidelines.

“Avoiding stroke is the single most important goal in the management of patients with atrial fibrillation, a heart rhythm disorder that is estimated to affect 1 in 3 adults sometime in their life,” says Dr. Reddy, co-senior author of the study. “If future randomized clinical trials demonstrate that this AI model is even only a fraction as effective in discriminating the high vs low risk patients as observed in our study, the model would have a profound effect on patient care and outcomes.”

“This approach overcomes the need for clinicians to extrapolate population-level statistics to individuals while assessing the net benefit to the individual patient—which is at the core of what we hope to accomplish as clinicians,” says Dr. Lampert, corresponding author of the study. “The model can not only compute initial recommendations but also dynamically update recommendations based on the patient’s entire electronic health record prior to an appointment. Notably, these recommendations can be decomposed into probabilities for stroke and major bleeding, which relieves the clinician of the cognitive burden of weighing between stroke and bleeding risks not tailored to an individual patient, avoids human labor needed for additional data gathering, and provides discrete, relatable risk profiles to help counsel patients.”

Dr. Lampert is a paid consultant for Viz.ai.

Featured

Vivek Reddy, MD

Director of Cardiac Electrophysiology, and the Leona and Harry B. Helmsley Charitable Trust Professor of Cardiac Electrophysiology

Girish N. Nadkarni, MD, MPH

Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Chief AI Officer, Mount Sinai Health System

Joshua Lampert, MD

Assistant Professor of Medicine (Cardiology, and Data-Driven and Digital Medicine)