Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
What Happens As A.I. Enters the Real World
1. What Happens As A.I.
Enters the Real World?
From a User Perspective
Benjamin P. Geisler, M.D., M.P.H., F.A.C.P., M.R.C.P (London), F.H.M.
Massachusetts General Hospital/Harvard Medical School, Boston, MA
3. Outline
• My Vantage Point
• Why Sepsis is a Great Condition to Intervene On With A.I.
• EPIC: What Actually Happened
• A Later Model Properly Evaluated
• What to Do Instead?
17. 1. Test (or even retrain) clinical AI models in different settings
• A well-validated model may perform poorly if implement it
elsewhere
2. Involve multiple stakeholders from the planning stage
• Key opinion leaders can include professional societies and their
leaders
• These stakeholders can later become champions in their own
institutions and departments
18. 3. For the initial go-live:
• Involve hospital-level IT and departmental clinical champions
• Don’t just rely on mandatory online training modules – develop a
strong narrative instead and do “academic detailing” in advance
19. 4. For further refinements:
• Study not just quantitative results from structured data but “take
the pulse” of the front-line personnel: What do they love and
hate? Can you make it more intuitive? Develop metrics?
Benchmark? Gamify?
• Integration into clinical workflows could or should be a continuous
quality improvement process:
Editor's Notes
Hi everybody. Thanks for the kind introduction, Mari. As someone who spends a lot of time on Zoom, it’s nice to see you all in person. My name is Ben. I’m originally from Germany, but I’ve lived in the U.S. for the last 16 years but I’m in the process of moving to Norway. So, last night, at the welcome dinner, we had this discussion about A.I. experts – data scientists – and subject matter experts. How many of you would say that they are primarily a data scientist – raise your hands. And how many of you would say that are a subject matter expert – a physicist or an engineer or someone else applying A.I. – raise your hands. And now who would say they’re both a data scientist and a subject matter expert? Oh OK. So I’m a subject matter expert in medicine because I went to medical school. Do you know what they do all the kids in high school that are bad at math? … they send them to medical school.
Alright, these are my financial conflicts of interest. And I’ll tell you a little bit about my perspective as someone in between medicine, writing and publishing, and consulting for industry in a minute.
The talk is only fifteen minutes, but here is the outline for you. As I said, I’ll first qualify my perspective a little bit more – where I am coming from. Then I’ll briefly explain the subject matter in the case, sepsis, an overwhelming inflammatory response that many die from. Next, I’ll give you a first-hand account of when an A.I. model hit our electronic health record or EHR system, which is called EPIC. I mean it was also kind of EPIC what happened, but I’ll let you be the judge of that. Then I’ll give you the top-level results of a peer-reviewed manuscript where the model was formally evaluated. And then I’ll close with some take-home points – which will be informed by what we’ll go through here before as well as some other stuff that I’ve come across.
So first: my perspective, where am I coming from when I talk about AI entering the real-world.
I’m both a clinician and a research consultant. I work as a hospitalist in a large academic medical center on the Eastern seaboard of the U.S. I’ve done this now for over ten years On the left here is a picture of me while I was rounding on COVID-19 patients. And then I have also worked for about 12 years as a researcher and industry consultant, and my field was health economics and outcomes research until I’ve pivoted to A.I. And this is me after visiting a client in Paris, France.
So now I’ll give you a crash course on systemic inflammatory response syndrome and sepsis. Or at least I’ll tell you why it’s a great target for an A.I.-based intervention.
When the body has an overwhelming bacterial infection, the response is intense inflammation and that is called sepsis. This can be from any severe infection, for example pneumonia, a urinary tract infection, a wound… The bacteria often end up in the blood stream, you get a fever, your heart rate goes up, cells and fluid migrate into the tissue from your now leaky vessels, your blood pressure goes down, your heart rate goes up and then your organs start failing. By then you are in severe sepsis or even septic shock. The mortality rate from septic shock can range from 20-60%. Sepsis was at some point believed to be the third-most common cause of death. It’s absolutely critical to start treatment with fluids and antibiotics and then maybe vasopressors immediately – we call this the golden hour – and to work up the reason why someone is septic and if they need surgery or some other intervention, we call this source control. So, so far, sounds like a great application for A.I., right?
OK, so I was working in a hospital around 2014, treating my patients, minding my own business. They had introduced this new electronic health record system called EPIC a little while earlier. And just a little side note on that: I have not really seen any EHR that works really great. They called EPIC the “cream of the crap.” It’s kind of like Microsoft Office, or Microsoft 365. It’s not fantastic, it’s OK, and at least all the different parts look the same. I was once asked by Google to come in and take a look at some experimental EHRs: they tracked my eyes movements to see where I would look for information, what would be intuitive. And that’s my main criticism of EHRs: they seem to have been developed without involving subject matter experts – doctors and nurses and other health care workers. You know, us, the users! The developers don’t really seem to understand what our clinical workflows are.
So one day, boom, I get this alert in EPIC that looked like one of these. They called it a best practice advisory and they advised me that my patient may be septic. In that first patient that was indeed true – the only problem was that we had started to treat that patient for sepsis long ago, like hours earlier. Also, it was kind of odd to get this alert because no one had told us about them. Some iterations of contained links to an order set, so you can order all the right tests and treatments, otherwise were just sort of a pop-up that kept coming back unless you wrote a little explanation. There had been this intense focus on sepsis. Because it had been undertreated, there was a surviving sepsis campaign and guidelines we were already thinking about sepsis in every patient. One problem with sepsis is that you have a similar response, called systemic inflammatory response syndrome or SIRDS, that mimics the vital signs and sometimes symptoms of sepsis. And as an internal or emergency medicine doctor your job is to think long and hard (but also quickly) if this is SIRS or sepsis. My recollection from these alerts was that seemed to suggest that I was a bad doctor not thinking about sepsis, but they came on pretty indiscriminately – most of the time the patient didn’t have sepsis – and they came on late, sometimes hours later. What I later learned was that this was a shallow learning model, developed centrally by EPIC. There was no peer-reviewed publication on it and also not much transparency otherwise. It had to be turned on in each individual hospital, but wasn’t validated there. It was sold to hospital leadership as one way to decrease length of stay and costs and potentially also legal liability. IT was involved, but – at least in the beginning – the clinical departments were not. And I don’t think national sepsis experts or societies were involved.
This model didn’t go away but it was sort of reiterated and refined. And so I was very surprised when I saw a ..
… peer-reviewed study evaluating what they called the EPIC Sepsis Model or ESM in 2021. That model was by then based on machine learning, they predicted a score and used a cut-off for the alerts, for example 6 or greater in the table above. This paper was an external validation, so they used their own patients against a clinical gold standard, diagnosis of diagnosis. And, as you can see here, this was not an entirely uncommon diagnosis, almost 7% of hospital patients had sepsis. But their AUCs, their area under the receiver operating curve, was only .64 for the hospitalization. What I want to focus on here is the authors’ conclusion and then the positive predictive value. So they wrote that the ESM (in their external validation cohort) had a poort discrimination and calibration in predicting the onset of sepsis and that its adoption raises concerns about sepsis management in the U.S. That’s kind of wild, isn’t it? That we (or some of use) may think that we can outsource part of the decision-making process of this deadly disease to an algorithm that hadn’t even been externally validated – and that, if you just asked any “front line health care worker” didn’t seem to work well at all. Now, to the positive predictive value here –this is graphed over the model score where you set the cut-off value for the alert to fire. The PPV was 12% for the entire hospitalization, and for the first 4 hours – remember, it’s the first hour that really countrs – it was under 1%. You needed to send 109 “best practice advisories” for one true case.
Here you see a bunch of individual patients being graphed in the alert to the outcome over time. And if you zoom in here to the first 24 hours
You can see all the individual alerts that the doctors and nurses have been getting. OK, Ima let this graph speak for itself.
Let me close by some take-home messages, some conclusions
The first, to you as data scientists, may be obvious. A centrally trained model may not perform well in a hospital with very different patients, so it’s gotta be externally validated. Second, already when you’re planning your intervention should you involve multiple stakeholders. In the case of medicine, this could be national experts or societies or what have you. The more stakeholders you involve, the more you can use later as change agents in their home institutions.
Third, before you go live, there has to be some kind of collaboration between IT and clinicians. Ideally, you develop champions in the departments – it can be soo valuable to have them. And then work on the why, a strong narrative goes so much further than just another online training you have to click through. You can do academic detailing – this is slang. Detailing was representatives from pharamceuticals did to the doctors when they were trying to convince them to use a new intervention, in that case a drug, they fed them data and a narrative that made sense.
And finally, a large-scale intervention like this, A.I. or not does need some qualitative evaluation, too. I mean you can see from the data what happened – but in this case I could have told you years earlier that something was not working so well here. Ask the front-line clinicians – by the way, front-line is a war term what they love and hate, take their pulse. Make it more inuitive, develop metrics, benchmark against the competition, your past performance, your threshold where you claim victory. Maybe gamify it, I dunno. But then when it comes to clinical practice, you are impacting how people are being diagnosed and treated. So if you have such an, I would argue, invasive intervention, you need to integrate it well into the clinical workflow, and that may be best don in a quality improvement project, maybe even a continuous reiteration of PDSA cycles. That’s what I have.
Oh, here are my contact details. Thank you very much.