Cali: A ChatGPT-like Clinical Baloney Detector Prototype

False Statement image by Nick Youngson CC BY-SA 3.0 Alpha Stock Images — from https://www.picpedia.org/chalkboard/f/false-statement.html

— Jonathan A. Handler, MD, FACEP, FAMIA

My wife and I are currently into the streaming series “Poker Face” on Paramount. Many have called it a modern-day version of Columbo, a classic (and awesome) TV detective show. The Poker Face protagonist, Charlie Cale, has a preternatural ability to detect when someone is lying. When she hears a lie, her knee-jerk reaction is often to blurt out “Baloney!” It’s actually a less “G-rated” word, but I’m keeping this post kid-friendly for all the budding little informaticists out there. 🙂

I’ve had many great encounters with excellent doctors. Unfortunately, I’ve also had situations in which I had to bite my tongue to stop myself from exclaiming “Baloney!” while in a healthcare setting.

  • I had a bout of pericarditis once. The first emergency physician I saw told me that my EKG finding of P-R depression was irrelevant, and only S-T segment elevations on the EKG are relevant to that diagnosis. Baloney!
  • I was with a relative at an academic-affiliated hospital and the hospitalist told me the problem couldn’t be Wernicke’s Encephalopathy because my relative does not have alcoholism. Baloney! In fact, Carl Wernicke himself first described the syndrome in a non-alcoholic in 1881.
  • A friend’s relative had severe pain and incontinence after back surgery. I spoke to the chief resident in neurosurgery who “guaranteed” me the symptoms weren’t related to her spine. Baloney! Two hours later he called back to (sheepishly) report they had operated on the wrong part of the spine.
  • When I was in medical school, one of my relatives was diagnosed with an abdominal aortic aneurysm and needed an operation to fix it. I asked the surgeon to do a thallium stress test before the operation in order to make sure his heart could handle the surgery. The surgeon refused, saying he’d already screened my relative to ensure the surgery was safe by using the Goldman Risk Index and concluded no further workup was necessary. Baloney! My relative had a massive heart attack on the table and never made it out of the hospital.

No one is perfect. There are things we know we don’t know. There are things we don’t know and we don’t know that we don’t know them. Worst of all, there are things we think we know that are flat-out wrong. We all could use a friendly, helpful voice to call out “Baloney!” at the right times. That could be especially helpful in cases of potentially faulty clinical thinking.

Not long ago, I wrote that most healthcare AI was neither shareable nor generalizable. Just two days after I published that post, OpenAI changed everything by releasing ChatGPT, a conversational AI bot that is “blowing everybody’s mind.” My friend Sean (and many others) wrote that ChatGPT may herald the start of a new and exciting era in artificial intelligence. If someone asks for the definition of “ironic,” just tell them about the timing of my post and the release of ChatGPT. OpenAI should have just posted “Baloney!” in the comments on my post and linked to ChatGPT. The post is still correct, as most healthcare AI is still neither shareable nor generalizable. However, suddenly a path to an exciting future for healthcare AI seems more clearly in sight than ever for properly selected use cases.

Reports note that ChatGPT is imperfect, and may even confabulate information that is simply untrue. But our doctors are imperfect too, and imperfect tools can still help imperfect doctors provide better care under the right circumstances.

Around 2001, I built a voice-in, voice-out digital assistant for doctors that I called “Hali.” I built the first prototype, and then my friend and mentor, Dr. Craig Feied, helped design a natural syntax for use with Hali, and worked with me to design a scalable, cloud-based architecture for the system. That was well over 10 years before Apple’s release of Siri, Amazon’s release of Alexa, and the other digital assistants commonly in use today. Hali was brought live and used in the emergency department shortly after. A user could ask Hali a question by voice and it would speak back with a helpful answer. You can see how Hali (and I) evolved over time in 2001, then 2003, and 2005. Note: of course, synthetic patient info was used in the 2003 video.

With that background in mind, I decided to explore the capabilities of OpenAI’s APIs (application programming interfaces). It didn’t take long at all to create a new software bot that I have named “Cali“. When run, Cali will listen to a single statement and then say “Baloney” if it thinks the sentence is incorrect.

Click here to see a video of Cali in action.

The Cali software is a rudimentary, quick ‘n’ dirty prototype. The speech recognition software used by the program is imperfect, especially for medical terms. OpenAI’s “davinci” API (the AI that powers Cali) is also imperfect. The speech synthesis sounds robotic. Still, finding, installing, learning, and programming the various speech in, speech out, and OpenAI APIs into a software program took me just a few hours. Imagine how much more powerful this could be with a little more time, and as the AI improves?

When I use Microsoft Teams, the optional “Speaker Coach” function listens in to my meetings and provides tips and feedback to help me better communicate on my Teams calls. Just after I finished the first draft of this post, the Wall Street Journal published an article about AI chatbots coaching call center personnel (ironically, one of those chatbots was named Charlie). Similarly, every doctor may soon have a bot to call “Baloney” after every faulty clinical statement or decision. Perhaps the patient will also have a bot providing the same feedback, enabling a meaningful dialog to begin that will strengthen the doctor-patient relationship and improve clinical care. Even if a bot does not always give the right answer, a quick double-check will resolve the issue. If the bots are right more often than wrong, and if they give their feedback in just the right way, they might create a net clinical benefit.

If you are technically oriented and would like to play around with Cali, the Python source code is below. Note that Cali is not accurate enough for anything but entertainment and example purposes. Don’t use it for medical decision-making or any other real-world purpose. The code is provided purely as a hypothetical example. Use of the code is entirely at your own risk. I expressly disclaim and deny any responsibility for any use or misuse of this code or its functionality. With that said, enjoy!

########### CALI ################
###   FOR DEMONSTRATION PURPOSES ONLY.
###   USE AT YOUR OWN RISK!!!
### THIS SHOULD NOT BE CONSIDERED HIPAA COMPLIANT.
### DO NOT USE FOR PATIENT CARE OR MEDICAL DECISION MAKING OR ANYTHING BUT ENTERTAINMENT.
#### To install speech recognition (thanks Anthony Zhang [uberi]):
#### pip install SpeechRecognition
#### On Mac silicon, I also needed to install PyAudio. This is likely needed on other systems also.
#### To install PyAudio on my Mac Silicon System, I had to use this approach: https://discussions.apple.com/thread/252638887
#### Sphinx may not have the best recognition capabilities, especially for clinical terms, and the quality of the microphone matters.
#### Consider using Whisper, Google, or another speech recognizer, but know
#### that some will send your content to the cloud.
#### Also consider using a good microphone.

#### I also needed to install pocketsphinx
#### pip install pocketsphinx
#### For more info, and to try different recognizers, see https://pypi.org/project/SpeechRecognition/

import openai
import tiktoken
import speech_recognition as sr
import pyttsx3
# import torch # required for whisper if used as speech recognition engine
# import whisper # If you use whisper as speech recognition engine
import secret # Remove this line unless you are keeping your secret API key here and not inline in the code below.

#### Global variables
tts_obj = pyttsx3.init() # Initialize text to speech object
verbose = False
enable_editing = False
speak_caveats = True
#### Initialize speech recognition
r = sr.Recognizer()
my_name = 'Cali'

###################################
# Main subroutine
###################################
def main():

    #### Who am I?
    speak(f"Hello. I am {my_name}.", True)
    print("----------------")

    keep_going = 'y'
    while keep_going == 'y':

        #### Ask them for the statement, then convert it to text.
        speak("Make a statement out loud and I will say 'Baloney' if it's not true.", True)
        spoken = recognize()

        #### Tell them what was heard:
        speak(f"I heard you say: {spoken}", verbose)
        print("----------------")

        #### Unfortunately, this speech recognition engine does not always recognize medical content well.
        #### So, give the user a chance to enter the text via typing if desired.
        if enable_editing:
            speak("Did I hear you correctly? If so, hit Enter. If I heard incorrectly, type the correct text.", True)
            typed = input("Type corrected text here: ", )
            if typed:
                spoken = typed

        #### Set the api_key, which you can get from OpenAI at https://platform.openai.com/account/api-keys
        openai.api_key = secret.secret #replace this with your key (in quotes) -- but don't let anyone else see it.

        #### Craft the request to OpenAI
        prompt = f"Is it true that {spoken}?"

        #### Figure out how many tokens are in the prompt
        encoding = tiktoken.encoding_for_model("text-davinci-003").encode(prompt)

        #### Add one more token so we can be returned at least a "Yes" or "No" (note, token count does not work as I thought,
        #### but for this demo it seems adequate.
        max_tokens = len(encoding) + 1

        openai_obj = openai.Completion.create(
          engine="text-davinci-003",
          prompt=prompt,
          temperature=0,
          max_tokens=max_tokens
        )

        #### Extract the answer from OpenAI
        answer = str(openai_obj.choices[0].text)

        #### Show the answer for debugging
        print("-------------")
        print(f"From OpenAI I received the answer {answer}")
        print("-------------")

        #### If the answer is "Yes" then don't say anything unless they want us to.
        #### Note that we trust Cali more to tell us that something is not true than to tell us something is true.
        if 'Yes' in answer:
            speak("Possibly correct.", speak_caveats)

        #### Otherwise, call it Baloney!
        else:
            #### Initialize text to speech object
            speak("Baloney!", True)
            #### Note that we can never be certain that Cali is correct -- request a double-check.
            speak("I'm often right when I call baloney, but not always, so you should double-check that.", speak_caveats)

        #### Keep going?
        speak("Should I continue listening?", True)
        spoken = recognize()
        if 'yes' in spoken or 'Yes' in spoken:
            keep_going = 'y'
        else:
            keep_going = 'n'

    #### All done
    print("All done!")
    exit()
    return


###################################
# Speak subroutine -- speaks and prints
###################################
def speak(words:str, speak_it=verbose):
    print(words)

    #### Only speak it if we were requested to do so.
    if speak_it:
        tts_obj.say(words)
        tts_obj.runAndWait()
    return

###################################
# Recognize speech subroutine
###################################
def recognize():
    #### Listen to the microphone
    with sr.Microphone() as source:
        print("Now listening...")
        audio = r.listen(source)

    #### Perform speech recognition on the audio
    spoken = ''
    try:
        spoken = r.recognize_sphinx(audio)
    except sr.UnknownValueError:
        speak("Failed recognition of the audio.")
    except sr.RequestError as e:
        speak(f"Error in recogition {e}")
    return spoken


###################################
# Typical ending of Python program
###################################
if __name__ == "__main__":
    main()

Opinions expressed here are those of the authors, not necessarily those of anyone else, including any employers the authors may or may not have.

3 responses to “Cali: A ChatGPT-like Clinical Baloney Detector Prototype”

Leave a comment