Improving Healthcare Operations Using Process Data Mining
It’s estimated that 80% of healthcare data is unstructured, which makes it challenging to do any sort of analytics to drive improvements in population health, patient care and operational efficiency. Machine learning techniques can be utilized to predict future events from similar past events, anticipate resource capacity issues and proactively identify bottlenecks and patient outcome risks. This session will provide an overview of how process data mining can be applied to healthcare and provide real-world examples of process data mining in action.
4. 4
:-)
Brand Sentiment
Higher NPS
360O Customer View
Loyal Customers
Product
Recommendation
More Sales
Propensity to
Churn
Greater Retention
Real-time Demand/
Supply Forecast
More Efficient
Predictive
Maintenance
Less Downtime
Fraud Detection
Lower Risk
Network
Optimization
Lower Cost
Insider Threats
Greater Security
Risk Mitigation, Real-time
Retain Market Value
Asset Tracking
Increase Productivity
Personalized
Care
Loyal Customers
Sources of data
Time series based data
Online
Services Web
Services
Servers
Security GPS
Location
Storage
Desktops
Networks
Packaged
Applications
Custom
ApplicationsMessaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call Detail
Records
Smartphones
and Devices
RFID
On-
Premises
Private
Cloud
Public
Cloud
A whole class of new use cases (new questions)
More complete picture of the business process
The desire for business operations “as it happens”
Exhaust from Apps and Devices
Volume | Velocity | Variety | Variability
5. 5
Correlating Data Provides Critical Insights to Business
Rx ID
Pt Comment
Time Waiting for RN
Rad ID
Hospital’s ID
Rx ID
Patient ID
Lab ID
Patient ID
Patient ID
Sources
Order Processing
Survey
Triage Care
IVR
Middleware
Error
6. 6
Domains of Data Diversity in Health Data
6
Subjects
Persons, Sensors,
Actuators, Mobile
Devices
Information
Users
Clinical, Family, Patient
System and
Locations
Home, Hospital, ER,
Nursing Homes
Ownership and
Management
7. 7
Virtual
Physical
Cloud
Healthcare Data is Time Oriented and Diverse
7
EHR
Systems
Web
Services
Developers
App
Support
Telecoms
Networking
Desktops
Servers
Security
Devices
Storage
Messaging
Patient
Surveys
clickstream
HIE
Patient
Networks
Healthcare Apps IT Systems and Med Devices Patient Generated Data
Medical
Devices
CDR
Mobile
PHI Access
Audit Logs
HL7
Messaging
Sensors
Departmental
and
Homegrown
Applications
14. A large Military Health system gain operational visibility and
improve dental service delivery with Splunk
• Integrate dental device logs, DICOM image
metadata, and patient satisfaction surveys.
• Alerts in case of anomalies.
• Correlate wait time with patient satisfaction
data and system performance degradations.
• Faster identification of system capacity bottlenecks such as excessive wait time.
• Proactively find unused resources and reallocate the resources.
• Saved millions by not buying new devices but optimize the current resource allocations
• Limited visibility into device bottlenecks
and customer satisfaction factors.
• Limited data for capacity planning and
workflow optimization
Key Challenges Key Splunk Functions
Business Value
16. 16
What can Machine Learning Do?
• Optimizing access to treatments such as chemotherapy
• Increase operating rooms efficiency
• In-patient bed capacity
• Decrease wait times
• Etc..
17. End to End Process KPI Dashboards
1
• Waiting Times/Delays at Highly Utilized Flow Steps (ex: ER Wait Time, Outpatient wait time,
Number of Patients Waiting )
• Patient Arrival and Departure Patterns by Time
• Service Time (ex: Time for a Brain MRI/patient)
• Bed, OR, Staff Capacity Utilization by Services
• Device Availability, Cycle Time, and Throughput
• Current discharge to bed readiness time
31. 32
Search events with tag in any field
Search events with tag in a specific field
Search events with tag using wildcards
Adding Metadata Knowledge: Search with Tags
3
Tag=GLYCEMIC, ASTHMA
tag::DX=diabetes type 2
Tag=diabetes*
1
2
3
32. Aliases
3
Normalize field labels to simplify search and correlation
Apply multiple aliases to a single field
Example: Username | cs_username | User user
Example: pid | patient | patient_id PATIENTID
Aliases appear alongside original fields
33. Event Tagging
3
Classify and group common events
Capture and share knowledge
Based on search
Use in combination with fields and tags to define
event topography
34. 1) Regular Expression
2) Natural Language Processing using SDK and REST
API
3
Feature Extraction from Texts
Do we know what a drug or diagnosis code means and does it mean the same in different EHRs? Similarly, do we know what an EHR event in an EHR event log means and does it mean the same in different systems. This last will be important for comparing process models, as EHRs are so user- customizable. “Check Meds” in one EHR might be called “Medications” in another. What exactly does “Check Meds” mean? Where, exactly, does it fit in a hierarchy of tasks, such as “checking” other things besides medications or involvement of medications in other activities besides “checking”? Is asking a patient about medications (or retrieving the medication list from online) an example of “Check Meds”?
Is there a difference in the ordering and frequency of activities between patients that were treated by either a high- or low-volume surgeon? (control-flow perspective)
Is there a difference in resource involvement between patients that were treated by either a high- or low-volume surgeon? (organisational perspective)
Is there a difference in time-related performance between patients that were treated by either a high- or low-volume surgeon? (performance perspective)
Is there a difference in the ordering and frequency of activities between patients that had surgical continuity and patients that had surgical discontinuity? (control-flow perspective)
2. Is there a difference in resource involvement between patients that had surgical continuity and patients that had surgical discontinuity? (organisational perspective)
3. Is there a difference in time-related performance between patients that had surgical continuity and patients that had surgical discontinuity? (performance perspective)
Is there a difference in the ordering and frequency of activities between patients that had a throughput time of 80 and 40 minutes or less in respectively the pre-operative and final postoperative examination and patients with a longer throughput time?
Is there a difference in organizational resource involvement between that had a throughput time of 80 and 40 minutes or less in respectively the pre-operative and final postoperative examination and patients with a longer throughput time?
Is there a difference in time-related performance between patients that had a throughput time of 80 and 40 minutes or less in respectively the pre-operative and final postoperative examination and patients with a longer throughput time?
It is apparent that the business processes in the medical domain are dynamic, ad-hoc, unstructured and multi-disciplinary in nature. he goal of clustering is to obtain homogeneous group of patients.
To frame our discussion, let’s use this example of a patient calling ACME Hospital’s nurse triage line asking about a medication that you were prescribed after being discharged for knee surgery, you call the 1-800#, and then tweet about your experience. All these events are captured - as they occur - in the machine data.
Each of the underlying systems has the potential to generate millions of machine data events daily. Here we see small excerpts from just some of them.
When we look more closely at the data we see that it contains valuable information (immediatly we see the time you waited on the phone for the triage RN, your patient ID, radiology ID from your MRI, lab ID when they had blood on standby for your surgery) – right down to what was tweeted.
What’s important, is first of all, the ability to actually see across all these data sources, but then also to correlate related events and provide meaningful insight.
If you can correlate and visualize the data, you can build a picture of activity, behavior and experience. And what if you can do all of this in real-time? You can respond more quickly to events that matter.
This example ties into your scenario but you can also extrapolate this example to a wide range of use cases – security and fraud, transaction monitoring and analysis, web analytics, IT operations and so on.
Subjects, locations, users, different data governance rules and standards that may conflict with each other
A defining characteristic of modern health care is the rapidly accelerating increase in information that is available to assist with the delivery of care and system management.
Time oriented data, 2. High diversity, 3. Some data is functional others are event logs generated by machines.
Data came from activities which are part of sequential process
Data is timestamped
Activities are interdependent discrete events
Machine data is generated by many different sources within the healthcare IT infrastructure. These sources include healthcare specific data sources such as electronic health record (EHR) systems, HL7 messaging, and connected medical devices. The data sources include core IT systems that support different applications such as desktops, servers, storage and network devices. Finally, they include all the patient facing applications and systems – portals, billing systems, claim management systems.
Machine data generated by this infrastructure shares the core characteristics of big data – lot of data (high volume), created rapidly (high velocity), from different sources (variety), and data that changes over time (variability). Getting timely and relevant insight into this data can be a source of huge value for the healthcare ecosystem.
Health Level Seven (HL7) International is one of several American National Standards Institute (ANSI) -accredited Standards Developing Organizations (SDOs) operating in the healthcare arena.
HL7 provides standards for interoperability that improve care delivery, optimize workflow, reduce ambiguity and enhance knowledge transfer among all of our stakeholders, including healthcare providers, government agencies, the vendor community, fellow SDOs and patients.
Provides healthcare systems a standard for clinical and administration purposes, particularly between systems/apps responsible for patient care, medical devices, pharmacy, billing, imaging, etc.
Each message starts with an MSH
Composed of multiple segments, each ending with a single carriage return.
Each segment begins with three letters , contains specific information: EVN event, PID patient demographics, PV1 Patient Visit,
Each field has information
Sometime’s there coded, so Splunk is IDEAL to enrich!
Types of messages, ADT (admission, discharge, transfers), Orders, Reports, etc.
Alerts are triggered when certain conditions are met by the results of the search upon which it is based. Alerts can be based on both historical and real-time searches.
When an alert is triggered, it performs an alert action. This action can be the sending of the alert information to a designated set of email addresses, or the posting of the alert information to an RSS feed. Alerts can also be set up to run a custom script when they are triggered.
You can base these alerts on a wide range of threshold and trend-based scenarios, including empty shopping carts, brute force firewall attacks, and server system errors.
By looking at historical demand patterns, and operational constraints, sophisticated forecasting algorithms can predict the daily volume and mix of patient volume and orchestrate appointment slots such that there are no “gaps” between treatments. This radically improves chair utilization, lowers patient waiting times and reduces the overall cost of operations. Doing this without sophisticated data science is hard — for example, just arranging the order in which 70 patients can be slotted for their treatments in a 35-chair infusion center is a number exceeding 10^100, as this analysis shows. Trying to solve this problem with pen, paper or Excel is a pointless exercise.
Study after study shows that the OR utilization at most large hospitals is at best 50-60 percent. In most hospitals, operating rooms are allocated to surgeons using “blocks” — for simplicity, the blocks are often either half-day or full-day blocks. Even the most prolific and productive surgeons often don’t fully utilize the blocks they are given, and the process for reallocating blocks on a monthly basis or even for last minute block swaps is cumbersome and manual. Using data science and machine learning, hospitals can monitor utilization, identify pockets for improvement, automatically reallocate underutilized blocks, and improve overall operating room utilization. A 3-5 point improvement in block utilization is worth $2 million per year for a surgical suite with just four operating rooms.
Imagine looking at each overnight patient, finding the 1,000 prior patients over the last two years who entered the hospital with a similar diagnostic or procedure code and reviewing their “flight path” through the hospital (i.e., # days spent in each of the units prior to discharge); then, an aggregate probabilistic assessment of the likely occupancy of each unit could be developed. Not only would it provide a better answer for today, it would also help anticipate the evolving unit capacity situation over the next 5-7 days, thereby leading to smarter operational decisions on transfers, elective surgery rescheduling, etc.
Vmware – House of Demos app. VM forest, esx server.
Status of VMs when you click on particular one.
One of the most useful types of visualizations is a “Sankey diagram”, which is used to describe flows through systems.
These can be customer flows through marketing or sales funnels, traffic flows through the actual network, energy flows through a physical system, capital flows through a financial system, etc.
It’s a very streamlined form of visualization that cuts out everything unrelated to “flow”.
Technically, this is a graph visualization: the nodes are smushed to these bars along the side, and edges are represented by these fat bars connecting nodes.
The width of a node is proportional to the volume of flow in and out of the node, and the width of an edge is proportional to the flow from the start node to the end node.
Customer journey: convert, repeat
Mobile Patent Suits
Dashed links are resolved suits; green links are licensing.
“Thomson Reuters published a rather abysmal infographic showing the "bowl of spaghetti" that is current flurry of patent-related suits in the mobile communications industry. So, inspired by a comment by John Firebaugh, I remade the visualization to better convey the network. That company in the center? Yeah, it's the world's largest, so little wonder it has the most incoming suits.”
mbostock’s block #1153292 August 18, 2011
http://bl.ocks.org/mbostock/1153292
One reason for agility is handling of data in scale using parallel data processing techniques. And lastly, we enable operational integration- two ways 1) speed of computations, 2) second is system integration through REST API support.
Splunk products are being used for data volumes ranging from gigabytes to hundreds of terabytes per day. Splunk software and cloud services reliably collects and indexes machine data, from a single source to tens of thousands of sources. All in real time. Once data is in Splunk Enterprise, you can search, analyze, report on and share insights form your data. The Splunk Enterprise platform is optimized for real-time, low-latency and interactivity, making it easy to explore, analyze and visualize your data. This is described as Operational Intelligence.
The insights gained from machine data support a number of use cases and can drive value across your organization.
[In North America]
Splunk Cloud is available in North America and offers Splunk Enterprise as a cloud-based service – essentially empowering you with Operational Intelligence without any operational effort.