How Good Is AI? A New Study Shows It Can Outperform Seasoned Ophthalmology Specialists

How Good Is AI? A New Study Shows It Can Outperform Seasoned Ophthalmology Specialists

A large language model (LLM) AI system can match, or in some cases outperform, human ophthalmologists in diagnosis and decision-making about glaucoma and retina disease, a new Mount Sinai study shows.

4 minute read

Ophthalmologists skeptical about the performance of artificial intelligence (AI) in their specialized field may find a new study from New York Eye and Ear Infirmary of Mount Sinai (NYEE) an eye-opener.

Researchers demonstrated that a large language model (LLM) AI system can match, or in some cases outperform, human ophthalmologists in diagnosis and decision-making about glaucoma and retina disease. The study, published in the February issue of JAMA Ophthalmology, compared the knowledge of ophthalmic specialists against the capabilities of the latest-generation LLM system, GPT-4 (Generative Pre-Trained Transformer 4) from OpenAI, designed to scale up deep learning to replicate human-level performance.

“We applied GPT-4 to clinical data it had never seen before, and it proved in our small sample size to be better than glaucoma specialists in its assessments and treatment plans, and at least as good as retina specialists in those same areas,” says Louis Pasquale, MD, senior author of the study and Deputy Chair for Research, Department of Ophthalmology, Icahn School of Medicine at Mount Sinai. “Artificial intelligence is pretty astounding in terms of what it can do. Access to GPT-4 is like having the world’s knowledge at your fingertips.”

Advanced AI tools are seen by many as revolutionizing the field of medicine in the years ahead. Trained on vast amounts of patient data, text, and images, they have already shown an ability to diagnose and provide treatment guidance and confirmation on cases ranging from routine to complex. Those strengths could prove particularly valuable to ophthalmic specialists who typically handle large caseloads. Through its high level of accuracy and the comprehensiveness of its LLM-generated clinical responses, AI has the potential to ease some of that workload, giving ophthalmologists more time to practice evidence-based medicine.

For the human side of their study, the NYEE research team recruited 12 attending specialists and three senior trainees from the Department of Ophthalmology at Mount Sinai. A basic set of 20 commonly asked questions (10 each for glaucoma and retina) by patients was randomly selected from the American Academy of Ophthalmology’s “Ask an Ophthalmologist,” along with 20 de-identified patient cases culled from Mount Sinai-affiliated eye clinics. Responses from both the GPT-4 system and human specialists to each question and patient case were then elicited, and statistically analyzed and rated for accuracy and comprehensiveness using a Likert scale, commonly used in clinical research to score responses.

The results showed that AI matched or outperformed human specialists in terms of both accuracy and completeness of its assessments. More specifically, it demonstrated superior performance in response to glaucoma questions and case management advice, while reflecting a more balanced outcome in retina, where AI matched humans in accuracy but exceeded them in completeness.

Most of the time we use AI to confirm what we already know—the nitty-gritty details of a case—but other times it can give us new insights and point us in directions we hadn’t thought about.

- Andy S. Huang, MD

GPT-4 is currently in use by a coterie of early adapters at NYEE, including Andy S. Huang, MD, a PGY-2 ophthalmology resident and lead author of the study. “Most of the time we use AI to confirm what we already know—the nitty-gritty details of a case—but other times it can give us new insights and point us in directions we hadn’t thought about,” Dr. Huang says. “For me, it’s been transformative in providing access to information and generating assessment plans that, as our study showed, are similar to if not better than those of top-level subspecialty doctors.”

Beyond diagnosis and treatment of ocular disorders, artificial intelligence could potentially play an important role in physician education and research—pathways that are being carefully explored at NYEE. According to Dr. Pasquale, plans are afoot to incorporate AI into lectures and the overall resident training experience, as well as into research projects as a way of familiarizing young investigators with the promising new technology. Another way NYEE plans to set the stage for AI in clinical practice is by working with the Center for Ophthalmic Artificial Intelligence and Human Health, launched last year by NYEE in partnership with Icahn Mount Sinai, to develop breakthrough applications for AI-driven diagnosis and clinical care of eye disease. As one example, the Center will soon embark on a program to further automate, using artificial intelligence, NYEE’s innovative central retinal artery occlusion initiative to reduce the time it takes to diagnose this medical emergency and get patients started on sight-saving treatment.

“I’ve started using GPT-4 along with colleagues who have very busy ophthalmology practices, and we’re excited about the prospect of putting the technology to work in ways that aren’t even clear to us yet,” notes Dr. Pasquale.

Featured

Andy S. Huang, MD

Andy S. Huang, MD

PGY-2 ophthalmology resident

Louis Pasquale, MD

Louis Pasquale, MD

Shelley and Steven Einhorn Distinguished Chair of Ophthalmology