My Healthcare AI Thoughts as of 5/2026

by Jonathan A. Handler, MD, FACEP, FAMIA

Here are some summaries of my thoughts on various topics related to healthcare AI as of 5/2026, mostly from a talk I recently gave at the Illinois Society of Health and Risk Management. These may include a mix of my opinions and facts. My thoughts and this content are subject to change and they might be wrong or erroneous. YMMV.

  1. Many define AI super-broadly. Under many definitions, many common medical devices might be considered AI (e.g, automated blood pressure devices, pulse oximetry, MRI and CT scanners, EKG machines and cardiac monitors, electronic health record decision support, and more), possibly almost anything with a computer chip.
  2. Many seem to define AI as the AI they’re not used to yet. As soon as they get used to it, they no longer seem to recognize it as AI.
  3. People use AI frequently in their everyday lives, with many of us literally putting our lives in the hands of AI on a regular basis (e.g., automatic transmission).
  4. What are some examples of healthcare AI seeming to get a lot of attention now?
  5. AI may have risks.
    • Some risks may include:
      • Bias (examples here and here).
      • Hallucinations (example)
      • Copyright and IP concerns
    • Regulation seems to have struggled in how to best balance safety vs. speeding adoption (see here and here).
    • There is much work in attempting to identify and mitigate the risks (examples here, here, here, here, here, here, here, and here), and huge investment in AI (billions and billions and billions), so much of what we know or think about AI today may be different in the future.
    • Many mitigate AI risks by using “human-in-the-loop” (HITL) workflows (e.g., a human expert reviews and modifies the AI output as needed before the output is sent to another human). In the absence of robust data to the contrary, this seems appropriate practice for use cases having the potential for non-negligible harm. It is not clear whether that will continue to hold true as AI improves in the future. Over time, we may increasingly demand for rigorous data about each AI that tells us whether human, the AI, or human + the AI creates the best outcomes with the least risk.
  6. AI is getting so good so quickly that many fear AI will replace their jobs.
    • As noted above, some studies have shown that AI can perform better than doctors in reasoning on clinical cases, although some of these studies have been criticized, for example, by presenting highly processed cases or clinical questions that do not actually represent the clinical experience.
    • As noted above studies have shown that AI may provide responses that are more consistently more empathetic than human clinicians, although the experience of empathy may be reduced when users find out that the respondent was an AI.
    • Some studies of AI in medical decision-making have found that humans using AI may perform better than humans not using AI, but worse than AI doing the work alone (example).
    • For clinicians using AI to outperform AI alone, I suspect that clinicians will need to learn when (and when not) to trust the AI, and that clinicians will need to be perceived by patients as consistently more empathetic, compassionate, and caring than AI.
  7. Why does it seem that AI has such promise yet we often read about disappointments? I see two common reasons (among many):
    • #1: Overhype and inadequate due diligence, especially related to common statistics
      • Many AI uses involve deciding yes vs. no (e.g., does this patient have sepsis, will this patient get readmitted, does this mammogram show cancer, etc.)
      • For many (probably most) yes/no uses of AI, “No” is the right answer far more often than “Yes” (examples here, here, and here). In these cases, it often doesn’t make sense to give full credit for guessing “No” when the right answer is usually “No”.
        • Why? Imagine…
          • One of my sons wants to earn a few extra bucks. I tell him I need 5 books from the local library. We have a fantastic library holding 250,000 books. I give him the list of books I want. I will probably end up returning the books late (overdue) and incur fees, so I really want him to get the right books. I don’t want to pay overdue fees for books I never wanted in the first place.
          • I tell him I will pay him $1 for every book I wanted that he checks out of the library for me.
          • I also tell him that I will subtract $1 from what I owe him for every book he checks out for me that I didn’t want.
          • My kid agrees, and runs into the library and checks out all 5 books I wanted, and no others. Great! I give him $5.
          • My kid gets angry with me, and tells him I owe him $249,995! I ask him why.
          • My kid says I should pay him $1 for every book I didn’t want that he was kind enough to leave in the library. The library has 249,995 books I didn’t want. Therefore, he says I owe him another $249,995!
          • Does that sound ridiculous? Well that’s how many commonly used statistics work in these situations.
        • Statistics that give credit each time the model correctly guesses “No” (finds “True Negatives,” or “TN”) may lead to overly optimistic expectations of the model’s performance, resulting in disappointment when the model is put into practice. These statistics are like my son in the imaginary story, wanting full credit for every book he left in the library that you didn’t want.
        • Specificity, Accuracy, Negative Predictive Value, and AUC-ROC (Area Under the Receiver Operator Characteristic curve) are some stats that give credit each time the model correctly guesses “No”, and therefore these stats may lead to overly optimistic expectations of the model’s performance, and ultimately disappointment.
        • Positive Predictive Value (“Precision”), Sensitivity (“Recall”), and AUC-PR (Area Under the Precision-Recall Curve) are some stats that do not include True Negatives in their calculations and therefore tend to provide a more meaningful description of the performance users will experience.
        • Study authors, the press, sales and marketing folks often seem to prefer reporting the AUC-ROC. I suspect this may be because they want to report a statistic about their work that people will interpret as good (even if incorrectly) and/or perhaps they don’t understand the AUC-ROC.
        • Some academics and others legitimately think the AUC-ROC is better to report than AUC-PR, but I generally don’t agree with their reasons. For example, some say the AUC-ROC allows models to be compared to one another even across studies of different populations having differing ratios of Yes’s vs. No’s. However, across studies, the inclusion criteria, populations, input features, and more are virutally never exactly the same, therefore the numbers likely aren’t comparable anyway.
        • I propose that the AUC-ROC and AUC-PR should both be reported, and the graphs (or tables) of these curves should also be provided.
        • For more of my thoughts on statistics that may be useful in achieving more successful implementations, you may find value in my blog posts on utility-based metrics (here and here).
    • #2: Grudin’s Law: “When those who benefit are not those who do the work, then the technology is likely to fail or, at least, be subverted.” and similarly Grudin’s Paradox: “What may be in the managers’ best interests may not be in the interests of individual contributors, and therefore not used.”
      • When technology initiatives in healthcare fail, people tend to blame clinicians for being “resistant to change” and “technology-averse.”
      • However, I have seen many examples of rapid uptake of technology and change by clinicians, often so avidly that IT departments try to slow, control, and “govern” adoption and use. These examples seem to suggest that clinicians are neither resistant to change nor technology-averse (e.g., Internet, iPhones, Google, ChatGPT, PACS, ambient LLM scribes, etc.).
      • Rather, clinicians, like virtually everyone, tend to resist change or technology they perceive will be bad for them, and it seems often rightly so.
      • Technologies that directly benefit users by saving them time without causing other perceived harms often find rapid adoption and successful implementation.
      • Although I believe this goes against the grain of popular “best practice,” I think that the usual approaches to successful implementations, such as finding “champions” of the technology, instituting careful change management, training, incentivizing use temporarily until “habit is developed,” and making it easy by “minimizing additional effort” are all likely to fail if the users do not directly benefit from the technology, usually from net time savings. When users adopt a technology that does not save them time, generally some other critical task must be deprioritized or abandoned, often leading to unintended and worse outcomes.
  8. Many key things must be considered with AI implementations, such as whether a vendor or any other third party will use your prompts and/or your data to train their models. If so, does that threaten the security of your data and patient privacy? Are you willing to let them make their AI better using your data while you also pay them for the use of their AI? A partial listing of things you might consider when implementing AI tech is below. However, I might ask, are these different than what needs to be considered for virtually any other implementation in healthcare?
    • FAVES: Is the AI Fair, Appropriate, Valid, Effective, and Safe?
    • ROI: If you implement the AI, what will be the Return On Investment?
    • Contract Terms, including Intellectual Property (“IP”) considerations
    • Privacy
    • Security
    • Usability
    • Maintainability
    • Regulation
    • And much more
  9. Although many seem to be opting for centralized AI governance (e.g., an AI Governance Committee), some are opting for a more decentralized approach.
    • It’s not clear which AI governance approach will “win” in the long run.
    • I suspect that the broad definitions of “AI” will lead to centralized AI Governance committees becoming either overwhelmed or greatly limiting their scope of oversight.
    • I suspect that the “intended use” of the solution, rather than its underlying technology (AI vs. not AI), should and ultimately will drive governance. Therefore, I tentatively and with trepidation predict that, over time, a more decentralized approach will prevail.
  10. As I’ve previously written, predicting seems more often luck than skill, so I make them tentatively and with trepidation. But… here I go anyway. 😀
    • I predict that many institutions will move from having an “AI Strategy” to instead:
      • In light of new AI, modify their institutional goals to aim higher to achieve something even better than they previously thought possible.
      • Use AI as an enabling technology for even better strategies to achieve those loftier goals.
      • In other words, move from an AI Strategy to recognizing that AI will allow them achieve more strategic goals and design better strategies that utilize AI (in part) to achieve them.

Will my predictions be proven right? Or will time prove my foolishness in even trying to make predictions? Maybe some of both, hopefully more of the former than the latter, but that’s probably wishful thinking on my part. 😀

All opinions expressed here are entirely those of the author(s) and do not necessarily represent the opinions or positions of their employers (if any), affiliates (if any), or anyone else. The author(s) reserve the right to change his/her/their minds at any time.

Leave a comment