The landscape of the modern doctor’s office has changed remarkably over the last decade, especially with the emergence of artificial intelligence. The days of turning to Google for medical questions are becoming fewer and far between as large language models such as ChatGPT or Bing chatbot are helping to create personalized and comprehensive responses to routine medical questions. As a family physician, I ask myself how this tool might be helpful not only in the office but also to my patients after they leave our appointment.
Previously when my patients would arrive at their visits with research from Google, much of the visit would consist of teasing out pearls of reliable information from the slurry of online misinformation. Patients would go home with handouts full of relevant but overly complex information, usually never to be looked at again. Recent studies suggest that artificial intelligence may be a key player in closing the gap between what patients take home and what their doctors think they know.
Artificial intelligence is revolutionary in the sense that, over time, it can create its own connections and form new narratives after constructing a foundation of knowledge from the web. But how accurate are these narratives? Given all of the misinformation present online, if medical questions were posed to a language model such as ChatGPT, could it give reliable information to our patients? Guidelines recommend that women have screening mammograms through age 74, but recommendations for women over 74 years of age are nuanced and vary among organizations.
In one recent study, a team of six clinicians – experts in general internal medicine, family medicine, geriatric medicine, population health, cancer control, and radiology – evaluated how appropriate ChatGPT’s responses were to questions regarding mammograms over age 74. The study found that 64 percent of the time, ChatGPT came up with an appropriate response. It also demonstrated that 18 percent of the time, the responses were inappropriate, with the remainder of the responses being unreliable or without a true consensus.
Regarding more straightforward medical questions, large language models seem to do much better. A study by a team of radiologists found that Bing chatbot could easily handle questions regarding imaging studies with 93 percent of responses being entirely correct while 7 percent were mostly correct.
The reliability of the responses may depend on a few different factors:
1) avoid stacking questions when asking AI
2) some AI platforms have been shown to fabricate information, while others cite the sources for which they drew the information
3) areas of medicine without clear-cut answers may not produce reliable or appropriate responses from AI
Another place where AI has been shown to thrive is in patient education. Large language models can make patient education materials more streamlined and easily understandable and sometimes have the ability to translate this information into different languages.
Medical jargon can often overcomplicate patient instructions and lead to miscommunication between patients and their clinicians. With the ability of large language models to interpret and clarify patient education materials, barriers to communication can be lowered without prohibitive time and expense.
As a primary care physician, I am always looking for different places to connect my patients to reliable medical information when they feel like taking a deeper dive either before or after our appointments. With some improvements in reliability, I think artificial intelligence could be the glue that connects my patients to more meaningful office visits addressing their health concerns together.