Not seeing results from AI? Engineering may be the missing piece
There’s no doubt that one of the hottest topics in healthcare right now is artificial intelligence. The promise of AI is exciting: It has helped identify cancerous images in radiology, found diabetes via retinal scans and predicted patient mortality risk, just to name a few examples of the medical advances it can deliver.
But the paths healthcare systems go down to make AI a reality are often flawed – resulting in a dabbling of AI with no measurable results. When the wrong path is taken, they end up with AI “solutions” to perceived problems without being able to verify if those problems are, in fact, real or measurable.
Vendors often turn on AI solutions … then walk away, leaving health systems unsure of how to use these new insights within the bounds of their old workflows. And these tools are often deployed without the engineering rigor to make sure this new technology is testable or resilient.
The result? These potential AI insights are often ignored, marginally helpful, quickly outdated, or – at worst – harmful. But who’s to know?
One common AI solution that is often a source of excitement among health systems and vendors alike is early sepsis detection.
In fact, finding septic patients happened to be my first assignment at Penn Medicine. The idea was that if we could find patients at risk of sepsis earlier, there were treatments that could be applied, resulting (we thought) in lives saved.
Coming from a background in missile defense, I naively thought this would be an easy task to create. There was a “find the missile, shoot the missile” similarity that seemed intuitive.
My team developed one of the top-performing sepsis models ever created. [1] It was validated, deployed and it resulted in more lab tests and faster ICU transfers – yet it produced zero patient outcome changes.
It turns out that Penn Medicine was already good at finding septic patients, and that this state-of-the-art algorithm wasn’t, in fact, needed at all. Had we gone through the full engineering process that’s now in place at Penn Medicine, we would’ve found no evidence that the original problem statement, “find septic patients” was a problem at all.
This engineering design effort would have saved many months of work and the deployment of a system that was ultimately distracting.
Over the last few years, hundreds of claims of successful AI implementations have been made by vendors and health systems alike. So why is it that only a handful of the resulting studies have been able to show actual value? [2]
The issue is that many health systems try to solve healthcare problems by simply configuring vendor products. What’s missed in this approach is the engineering rigor needed to design a complete solution, one that includes technology, human workflow, measurable value and long-term operational capability.
This vendor-first approach is often siloed, with independent teams assigned isolated tasks, and the completion of those tasks becomes how project success is measured.
Success, then, is firmly task-based, not value-based. Linking these tasks (or projects) to the measures that actually matter – lives saved, dollars saved – is difficult, and requires a comprehensive engineering approach.
Understanding whether these projects are working, how well they are working (or if they were ever needed to begin with), is not typically measured. The incomplete way of looking at it is: If AI technology is deployed, success is claimed, the project is complete. The engineering required to both define and measure value is not there.
Getting value from healthcare AI is a problem that requires a nuanced, thoughtful and long-term solution. Even the most useful AI technology can abruptly stop performing when hospital workflows change.
For example, a readmission risk model at Penn Medicine suddenly showed a subtle reduction in risk scores. The culprit? An accidental EHR configuration change. Because a complete solution had been engineered, the data feed was being monitored and the teams were able to quickly communicate and correct the EHR change.
We estimate that these types of situations have arisen approximately twice a year, for each predictive model deployed. So ongoing monitoring of the system, the workflow, and the data is needed, even during operations.
For AI in healthcare to reach its potential, health systems must expand their energies beyond clinical practice, and focus on total ownership of all AI solutions. Rigorous engineering, with clearly defined outcomes tied directly to measurable value, will be the foundation on which to build all successful AI programs.
Value must be defined in terms of lives saved, dollars saved, or patient/clinical satisfaction. The health systems that will realize success from AI will be the ones who carefully define their problems, measure evidence of those problems, and form experiments to connect the hypothesized interventions to better outcomes.
Successful health systems will understand that rigorous design processes are needed to properly scale their solutions in operations, and be willing to consider both the technologies and human workflows as part of the engineering process.
Like Blockbuster, which now famously failed to rethink the way it delivered movies – health systems who refuse to see themselves as engineering houses are at risk of drastically falling behind in their ability to properly leverage AI technology.
It’s one thing to make sure websites and email servers are working, it’s quite another to make sure the health system is optimizing care for heart failure.
One is an IT service, the other is a complete product solution that requires a comprehensive team of clinicians, data scientists, software developers, and engineers, as well as clearly defined metrics of success: lives and/or dollars saved.
[1] Giannini, H. M., Chivers, C., Draugelis, M., Hanish, A., Fuchs, B., Donnelly, P., & Mikkelsen, M. E. (2017). Development and Implementation of a Machine-Learning Algorithm for Early Identification of Sepsis in A Multi-Hospital Academic Healthcare System. American Journal of Respiratory and Critical Care Medicine. 195.
[2] The Digital Reconstruction of Health Care, John Halamka, MD, MS & Paul Cerrato, MA, NEJM Catalyst Innovations in Care Delivery 2020; 06
DOI: https://doi.org/10.1056/CAT.20.0082
Mike Draugelis is chief data scientist at Penn Medicine, where he leads its Predictive Healthcare team.