SlideShare a Scribd company logo
1 of 37
Geisinger Health System:
Mark Mossel, Director of Data Team Operations
Dhruv Mathrawala, Senior Data Architect
Integrated health services organization
Innovative care delivery models
Serves >3 million residents in 45 counties
>30,000 employees
>1,500 employed physicians
12 hospital campuses
551,000 member health plan
A good first-start.
 Data assembled in a central location
 Allowed for self-service
 Could link disparate data
Health
Record
Data
Warehouse
Surveys
Cardiology
Oncology
Financials
Codesets
External
Data
Claims
“There are too
many
undocumented
data sources.”
“There is no
documented
understanding
of business
requirements for
CDIS business
analytics.”
“We don’t have
the
transformations
that the
business users
really need.”
“Cannot
provide data
that is fit for
purpose.” “Data dictionary
does not exist
today.”
“Can’t
“match” from
encounters to
bills to claim.”
“Much of my
group’s time
is spent
entering data
manually”
“The platform/
architecture in
place for CDIS
analytics is not
correct for the
types of work
being
performed.”
“Clinical data
quality
problems
related to
patient safety
exist.”
“Hierarchies
exist at many
levels.”
“The level of
detail that I
need is not
there in the
data.” “There are too
many pockets
of data.”
“The CDIS “lift
and shift” model
perpetuates the
problem with
too many
views/analytics”
• If Data isn’t accurate, it is worse than nothing.
• Incomplete isn’t useful.
• Data that isn’t timely is less than desirable.
• When multiple versions of data exist, relying
on the wrong value can lead to bad decisions.
•There must be ONE source of truth for data
•Data without documentation is of
questionable value
Often, the first exposure of new
data highlights data quality issues.
A unified data architecture (UDA) is a more comprehensive view of the overall enterprise
architecture; a collection of services, platforms, applications, and tools that help customers
define and deploy an architecture that makes the best use of available technologies to
unleash the optimal value of data. TDWI: Jun 6, 2013
The UDA at Geisinger Health System is the integration of key analytic platforms (e.g.,
Hadoop, EDW EHR, etc.) with a common semantic layer, and all performing under the
umbrella of the same Data Governance structure.
• Less expensive due to commodity hardware
• It could be as little as 10% of the cost of our traditional EDW.
• Faster ingestion of data
• Because of early binding, any mapping, modeling, etc. is typically done
upfront in traditional data warehousing. Late binding of Hadoop allows for
the data to simply be loaded without detailed analysis and preparation.
• Multiple views of the data
• Our multi-zoned Hadoop system allows for many views of the data, including
temporal, modeled, etc.
• Unstructured and semi-structured data
• Hadoop is not confined to structured data in discreet fields, as is the case
with traditional analytic platforms.
THE V’S OF BIG DATA
Controlling Data Volume, Velocity, and Variety
VolumeScale of data
600
TB
184clinical notes M
9,000Epic clarity tables
>136,000
patient-participants
for exome sequencing
VelocitySpeed of ingestion
late
DAYS
VERSUS
MONTHS
real-time
capabilities
<2
second
to search
all clinical
notes
b i d i n g
VarietyDifferent forms and views
non-
traditional
sources
home
devices
KeyHIE
social
media
patient
apps
Device
integration
genomics
struct
multi-
zoned
Lawson
VeracityUncertainty of data
Encryption
at rest
PHI
m a s ke d
Appropriate
Authentication,
Authorization,
And
Access
single
source of
TRUTH
ValueCost and resources
$20,000
vs$500K
10TB
opensource
commodity
hardware
NLP
can use
• ROI: use open-source, commodity hardware argument
• Change: SQL team are unfamiliar with Big Data ecosystem
• Data Load: Load EVERYTHING into Hadoop by building prototypes,
not use cases
• Self-service: Push for self-serve as much as possible,
• Adoption: Develop valuable early wins, invest in visualization (e.g.
Tableau)
• Data Zones: Create separate data zones, split PHI from non-PHI data
• Surge capacity: Pop-off to cloud-based options at surge capacity
needs
PRODUCTION FOOTPRINT
CDIS
Teradata production server
– Version 14.10
– ~13TB uncompressed
– ~30TB compressed
Hadoop
Production cluster
– Hortonworks Data Platform
v2.6
– 30 nodes
– 600TB total
– 200TB usable (3 copies)
MAJOR DATA SOURCES
Traditional EDW
• Health Record (clinical) data
• Financial
• Claims
• Pulmonary
• Pathology
• Oncology
Hadoop
• All EDW sources, plus:
• Lawson
– Fin, supply chain, A/P
• RIS (Radiology)
• Microbiology
• KeyHIE (Health Info Exchange)
• Lab System Data
• Phone Systems
• Lumedx (Cardiology)
LLAP STATISTICS
Configuration
• Running on 10 nodes
• Using 40% of the cluster
• 100GB Cache availability
Teradata vs LLAP
• Query under 1 minute : 80% queries
performed better than Teradata
• Query over 1 minute : 95% queries
performed better than Teradata
Epic
Cache
Epic Clarity
Hadoop
.ext files (ETL
files feeding
the clinical
reporting
database)
EDW
Primary Clinical
dataset containing
patient health records
Clinical reporting DB
Traditional Ent.
Data Warehouse
New Big Data Platform
Results in data
available hours
before the
traditional EDW
• More tables loaded nightly
• ~1100 in Teradata
• ~7200 in Hadoop
• Incremental EXT’s (~3,500 EXT files/night)
• Automated Epic loading process using Map Reduce
and Java
Landing
Zone
Raw Zone
Refined
Zone
Current
Zone
Integrated
Zone
• Source
system
pushes to
landing zone
• Stored
separately by
source
system
• Securely
transferred
• Auditing,
traceability,
compliance
and lineage
• New source
data is
appended,
not deleted
• Partitioned by
load date
• Compressed
• Data still
temporal
• Data types
match source
• Partitioned by
load date
• Organized by
business
attributes and
load date
• Current
snapshot
(temporal
history is
merged to
give the
latest view)
• Purpose-built
datasets for
quicker analytics
• Patient/member
uniquely
identified across
systems
• Encryption at rest for Hadoop data
• Authentication/Authorization
• LDAPS and AD Integration using Ranger/Knox
• Connections
• SSL endpoint encryption active for all network connections
• ODBC – SSL Secured
• JDBC – SSL Secured
• Data
• Appropriate access and roles as required. These roles will continue to be
defined by the Data Manger or his designate.
• All PHI data will be masked in the Development environment
• Kerberos Authentication: To thwart impersonation threats
• Bundled Payments Care Initiative
• Data Model
• De-identification of PHI/BSI
• Natural Language Processing
• Sepsis
• O.R. Workflows
• Bactec
• Social Security Death File
• Supply Chain
• Registries
• MPOG, AAA, Ortho Infection, Ortho Trauma
• Lung Nodules
• Abdominal Aortic Aneurysms
• RetrospectOR
• Check Please
• Problem
• Patients with lung nodules found on imaging are lost to follow-up
• Solution
• Ingestion of data from radiology imaging notes
• NLP
• Value
• Identify lung nodules
NLP and Dictionary annotator
Annotates with UMLS concept codes
Lung nodule Filter annotator
Identifies lung nodule notes
~ 10 million notes
Negation Annotator
Measurement/Lung RADS Calculator
~ 9.7 million notes
NO
YES
~ 300 thousand notes
. . .
Lung nodule
in note?
Radiology notes
LUNG NODULES – TEXT ANALYTICS WORKFLOW
28
Actual
Yes No
Predicted
Yes True Positive False Positive
No False Negative True Negative
• Precision = TP / (TP + FP)
• Recall = TP / (TP + FN)
• F1 Score = 2 * (Precision * Recall)
/ (Precision + Recall)
• Accuracy = TP + TN / (TP + TN +
FP + FN)
0.87precision
0.95recall
0.91accuracy
LUNG NODULES
• Problem
• Patients with AAA are lost to follow up
• Solution
• Ingestion of data from radiology imaging notes
• Use NLP and care-gap closure technologies
• Value
• Ensure proper follow up
502 patients identified
23 required urgent surgery
• Use case
• Provide capabilities to perform retrospective analysis of OR data
• Solution
• Ingest key data elements and metrics into a data model on Hadoop
• Provide advanced visualization and drill down capabilities using Tableau
• Value
• Improve OR utilization and quality of care using learnings from retrospective
analysis
• Scheduled vs Actual Analysis
• OR Staff Summary Information
• Various filters to slice and dice
the data in different ways
• Next day data availability
• Use case
• Understand the supply costs associated with OR procedures and variance by
provider/service/location
• Solution
• Ingest key data elements from EMR, Billing and Supply Chain systems
• Provide advanced visualization and drill down capabilities using Tableau
• Value
• Identify areas of greatest potential variance/opportunity to manage costs
• Opportunities for Isolation of data issues, best practices across platforms,
supply chain cost optimization and process improvement
• Compare supply cost for multiple
providers for same procedure
• Cost band indicates +/- 1 standard
deviation
• Compare cost for same procedure
by surgical role
• Heatmap of cost variance across all
service lines
• Heatmap of cost variance by
service lines
• Can be filtered by lead procedures
per case
• Drill down capability to show
implants/explants and supply cost
per procedure and per case
Geisinger Health System leverages data to improve care

More Related Content

What's hot

Industry 4.0 and applications
Industry 4.0 and applicationsIndustry 4.0 and applications
Industry 4.0 and applicationsUmang Tuladhar
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Industry 4.0: Merging Internet and Factories
Industry 4.0: Merging Internet and FactoriesIndustry 4.0: Merging Internet and Factories
Industry 4.0: Merging Internet and FactoriesFabernovel
 
CROM Digital Twins and IoT
CROM Digital Twins and IoTCROM Digital Twins and IoT
CROM Digital Twins and IoTJuan C. Vasquez
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettDaniel Zivkovic
 
Fundamentals of industry 4.0
Fundamentals of industry 4.0Fundamentals of industry 4.0
Fundamentals of industry 4.0SUBHODIP PAL
 
INDUSTRY 4.0 SMART SUPPLY CHAIN
INDUSTRY 4.0 SMART SUPPLY CHAININDUSTRY 4.0 SMART SUPPLY CHAIN
INDUSTRY 4.0 SMART SUPPLY CHAINCNRFID
 
Hybrid Intelligence: The New Paradigm
Hybrid Intelligence: The New ParadigmHybrid Intelligence: The New Paradigm
Hybrid Intelligence: The New ParadigmClare Corthell
 
ChatGPTAGameChangerinEducation.pdf
ChatGPTAGameChangerinEducation.pdfChatGPTAGameChangerinEducation.pdf
ChatGPTAGameChangerinEducation.pdfOgunleye Samuel
 

What's hot (10)

Industry 4.0 and applications
Industry 4.0 and applicationsIndustry 4.0 and applications
Industry 4.0 and applications
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Industry 4.0: Merging Internet and Factories
Industry 4.0: Merging Internet and FactoriesIndustry 4.0: Merging Internet and Factories
Industry 4.0: Merging Internet and Factories
 
CROM Digital Twins and IoT
CROM Digital Twins and IoTCROM Digital Twins and IoT
CROM Digital Twins and IoT
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
 
Fundamentals of industry 4.0
Fundamentals of industry 4.0Fundamentals of industry 4.0
Fundamentals of industry 4.0
 
INDUSTRY 4.0 SMART SUPPLY CHAIN
INDUSTRY 4.0 SMART SUPPLY CHAININDUSTRY 4.0 SMART SUPPLY CHAIN
INDUSTRY 4.0 SMART SUPPLY CHAIN
 
Future of Work - Automation
Future of Work - AutomationFuture of Work - Automation
Future of Work - Automation
 
Hybrid Intelligence: The New Paradigm
Hybrid Intelligence: The New ParadigmHybrid Intelligence: The New Paradigm
Hybrid Intelligence: The New Paradigm
 
ChatGPTAGameChangerinEducation.pdf
ChatGPTAGameChangerinEducation.pdfChatGPTAGameChangerinEducation.pdf
ChatGPTAGameChangerinEducation.pdf
 

Similar to Geisinger Health System leverages data to improve care

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...Amazon Web Services
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcFurore_com
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesUri Laserson
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
Big Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraBig Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraData Con LA
 
Design and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRDesign and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRPablo Pazos
 
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...Patrick Van Renterghem
 
Tackle healthcare interoperability challenges and improve transitions of care v3
Tackle healthcare interoperability challenges and improve transitions of care v3Tackle healthcare interoperability challenges and improve transitions of care v3
Tackle healthcare interoperability challenges and improve transitions of care v3Perficient, Inc.
 
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"CTSI at UCSF
 
Clinicaldatamanagementindiaasahub 130313225150-phpapp01
Clinicaldatamanagementindiaasahub 130313225150-phpapp01Clinicaldatamanagementindiaasahub 130313225150-phpapp01
Clinicaldatamanagementindiaasahub 130313225150-phpapp01Upendra Agarwal
 

Similar to Geisinger Health System leverages data to improve care (20)

How to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for HealthcareHow to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for Healthcare
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
(HLS305) Transforming Cancer Treatment: Integrating Data to Deliver on the Pr...
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: Radboudumc
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Big Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraBig Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit Mehra
 
Design and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHRDesign and implementation of Clinical Databases using openEHR
Design and implementation of Clinical Databases using openEHR
 
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
 
Tackle healthcare interoperability challenges and improve transitions of care v3
Tackle healthcare interoperability challenges and improve transitions of care v3Tackle healthcare interoperability challenges and improve transitions of care v3
Tackle healthcare interoperability challenges and improve transitions of care v3
 
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
UCSF Informatics Day 2014 - David Dobbs, "Enterprise Data Warehouse"
 
Clinicaldatamanagementindiaasahub 130313225150-phpapp01
Clinicaldatamanagementindiaasahub 130313225150-phpapp01Clinicaldatamanagementindiaasahub 130313225150-phpapp01
Clinicaldatamanagementindiaasahub 130313225150-phpapp01
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Geisinger Health System leverages data to improve care

  • 1. Geisinger Health System: Mark Mossel, Director of Data Team Operations Dhruv Mathrawala, Senior Data Architect
  • 2. Integrated health services organization Innovative care delivery models Serves >3 million residents in 45 counties >30,000 employees >1,500 employed physicians 12 hospital campuses 551,000 member health plan
  • 3.
  • 4. A good first-start.  Data assembled in a central location  Allowed for self-service  Could link disparate data Health Record Data Warehouse Surveys Cardiology Oncology Financials Codesets External Data Claims
  • 5. “There are too many undocumented data sources.” “There is no documented understanding of business requirements for CDIS business analytics.” “We don’t have the transformations that the business users really need.” “Cannot provide data that is fit for purpose.” “Data dictionary does not exist today.” “Can’t “match” from encounters to bills to claim.” “Much of my group’s time is spent entering data manually” “The platform/ architecture in place for CDIS analytics is not correct for the types of work being performed.” “Clinical data quality problems related to patient safety exist.” “Hierarchies exist at many levels.” “The level of detail that I need is not there in the data.” “There are too many pockets of data.” “The CDIS “lift and shift” model perpetuates the problem with too many views/analytics”
  • 6. • If Data isn’t accurate, it is worse than nothing. • Incomplete isn’t useful. • Data that isn’t timely is less than desirable. • When multiple versions of data exist, relying on the wrong value can lead to bad decisions. •There must be ONE source of truth for data •Data without documentation is of questionable value Often, the first exposure of new data highlights data quality issues.
  • 7. A unified data architecture (UDA) is a more comprehensive view of the overall enterprise architecture; a collection of services, platforms, applications, and tools that help customers define and deploy an architecture that makes the best use of available technologies to unleash the optimal value of data. TDWI: Jun 6, 2013 The UDA at Geisinger Health System is the integration of key analytic platforms (e.g., Hadoop, EDW EHR, etc.) with a common semantic layer, and all performing under the umbrella of the same Data Governance structure.
  • 8. • Less expensive due to commodity hardware • It could be as little as 10% of the cost of our traditional EDW. • Faster ingestion of data • Because of early binding, any mapping, modeling, etc. is typically done upfront in traditional data warehousing. Late binding of Hadoop allows for the data to simply be loaded without detailed analysis and preparation. • Multiple views of the data • Our multi-zoned Hadoop system allows for many views of the data, including temporal, modeled, etc. • Unstructured and semi-structured data • Hadoop is not confined to structured data in discreet fields, as is the case with traditional analytic platforms.
  • 9.
  • 10. THE V’S OF BIG DATA Controlling Data Volume, Velocity, and Variety
  • 11. VolumeScale of data 600 TB 184clinical notes M 9,000Epic clarity tables >136,000 patient-participants for exome sequencing
  • 13. VarietyDifferent forms and views non- traditional sources home devices KeyHIE social media patient apps Device integration genomics struct multi- zoned Lawson
  • 14. VeracityUncertainty of data Encryption at rest PHI m a s ke d Appropriate Authentication, Authorization, And Access single source of TRUTH
  • 16. • ROI: use open-source, commodity hardware argument • Change: SQL team are unfamiliar with Big Data ecosystem • Data Load: Load EVERYTHING into Hadoop by building prototypes, not use cases • Self-service: Push for self-serve as much as possible, • Adoption: Develop valuable early wins, invest in visualization (e.g. Tableau) • Data Zones: Create separate data zones, split PHI from non-PHI data • Surge capacity: Pop-off to cloud-based options at surge capacity needs
  • 17. PRODUCTION FOOTPRINT CDIS Teradata production server – Version 14.10 – ~13TB uncompressed – ~30TB compressed Hadoop Production cluster – Hortonworks Data Platform v2.6 – 30 nodes – 600TB total – 200TB usable (3 copies)
  • 18. MAJOR DATA SOURCES Traditional EDW • Health Record (clinical) data • Financial • Claims • Pulmonary • Pathology • Oncology Hadoop • All EDW sources, plus: • Lawson – Fin, supply chain, A/P • RIS (Radiology) • Microbiology • KeyHIE (Health Info Exchange) • Lab System Data • Phone Systems • Lumedx (Cardiology)
  • 19. LLAP STATISTICS Configuration • Running on 10 nodes • Using 40% of the cluster • 100GB Cache availability Teradata vs LLAP • Query under 1 minute : 80% queries performed better than Teradata • Query over 1 minute : 95% queries performed better than Teradata
  • 20. Epic Cache Epic Clarity Hadoop .ext files (ETL files feeding the clinical reporting database) EDW Primary Clinical dataset containing patient health records Clinical reporting DB Traditional Ent. Data Warehouse New Big Data Platform Results in data available hours before the traditional EDW
  • 21. • More tables loaded nightly • ~1100 in Teradata • ~7200 in Hadoop • Incremental EXT’s (~3,500 EXT files/night) • Automated Epic loading process using Map Reduce and Java
  • 22. Landing Zone Raw Zone Refined Zone Current Zone Integrated Zone • Source system pushes to landing zone • Stored separately by source system • Securely transferred • Auditing, traceability, compliance and lineage • New source data is appended, not deleted • Partitioned by load date • Compressed • Data still temporal • Data types match source • Partitioned by load date • Organized by business attributes and load date • Current snapshot (temporal history is merged to give the latest view) • Purpose-built datasets for quicker analytics • Patient/member uniquely identified across systems
  • 23. • Encryption at rest for Hadoop data • Authentication/Authorization • LDAPS and AD Integration using Ranger/Knox • Connections • SSL endpoint encryption active for all network connections • ODBC – SSL Secured • JDBC – SSL Secured • Data • Appropriate access and roles as required. These roles will continue to be defined by the Data Manger or his designate. • All PHI data will be masked in the Development environment • Kerberos Authentication: To thwart impersonation threats
  • 24. • Bundled Payments Care Initiative • Data Model • De-identification of PHI/BSI • Natural Language Processing • Sepsis • O.R. Workflows • Bactec • Social Security Death File • Supply Chain • Registries • MPOG, AAA, Ortho Infection, Ortho Trauma
  • 25. • Lung Nodules • Abdominal Aortic Aneurysms • RetrospectOR • Check Please
  • 26. • Problem • Patients with lung nodules found on imaging are lost to follow-up • Solution • Ingestion of data from radiology imaging notes • NLP • Value • Identify lung nodules
  • 27. NLP and Dictionary annotator Annotates with UMLS concept codes Lung nodule Filter annotator Identifies lung nodule notes ~ 10 million notes Negation Annotator Measurement/Lung RADS Calculator ~ 9.7 million notes NO YES ~ 300 thousand notes . . . Lung nodule in note? Radiology notes LUNG NODULES – TEXT ANALYTICS WORKFLOW
  • 28. 28 Actual Yes No Predicted Yes True Positive False Positive No False Negative True Negative • Precision = TP / (TP + FP) • Recall = TP / (TP + FN) • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) • Accuracy = TP + TN / (TP + TN + FP + FN)
  • 30. • Problem • Patients with AAA are lost to follow up • Solution • Ingestion of data from radiology imaging notes • Use NLP and care-gap closure technologies • Value • Ensure proper follow up
  • 31. 502 patients identified 23 required urgent surgery
  • 32. • Use case • Provide capabilities to perform retrospective analysis of OR data • Solution • Ingest key data elements and metrics into a data model on Hadoop • Provide advanced visualization and drill down capabilities using Tableau • Value • Improve OR utilization and quality of care using learnings from retrospective analysis
  • 33. • Scheduled vs Actual Analysis • OR Staff Summary Information • Various filters to slice and dice the data in different ways • Next day data availability
  • 34. • Use case • Understand the supply costs associated with OR procedures and variance by provider/service/location • Solution • Ingest key data elements from EMR, Billing and Supply Chain systems • Provide advanced visualization and drill down capabilities using Tableau • Value • Identify areas of greatest potential variance/opportunity to manage costs • Opportunities for Isolation of data issues, best practices across platforms, supply chain cost optimization and process improvement
  • 35. • Compare supply cost for multiple providers for same procedure • Cost band indicates +/- 1 standard deviation • Compare cost for same procedure by surgical role
  • 36. • Heatmap of cost variance across all service lines • Heatmap of cost variance by service lines • Can be filtered by lead procedures per case • Drill down capability to show implants/explants and supply cost per procedure and per case

Editor's Notes

  1. Brief introduction about Geisinger
  2. EHR in mid-90s. By 2006, leadership wanted EDW. CDIS (clin dec intel syst) live in 2008. Big win early. Few Healthcare orgs had this integration platform at this time. Internally, depts. (research) no longer had to request extracts from Epic for analytics. One platform of data (clin, fin, claims) for analytics, to transform the delivery of care. It has gone through a number of iterations, and currently supports much of the analytics running our day-to-day operations. Over 2100 users. 2012, switched to TD (higher performance). 2016, UDA. Integrate all key analytics platforms (Hadoop, Cerner, Epic EDW)
  3. Next phase of our analytics platform: Hadoop (Big Data)
  4. Late binding of Hadoop allows for the data to simply be loaded without detailed analysis and preparation up-front.
  5. Our multi-zoned Hadoop system allows for many views of the data, including temporal, modeled, etc. Hadoop is not confined to structured data in discreet fields, as is the case with traditional analytic platforms.
  6. LDAP and AD Integration using Ranger/Knox Encryption at rest SSL endpoint encryption active for all network connections Kerberos Authentication: To thwart impersonation threats Appropriate access and roles as required. These roles will continue to be defined by the Data Manger or his designate All PHI data will be masked in the Development environment
  7. Less costly hardware for storing increasing data (structured and unstructured) 5 million to purchase new Terradata hardware Prevent “one-off” data systems (e.g. IoT data capture, ICU real-time data capture, Cybersecurity)
  8. Lung nodules are commonly identified in free text within radiology reports and can easily be lost to follow up with potential for delayed cancer diagnosis. A treasure trove of useful, relevant, and unstructured clinical information in the form of text blobs and semi-templated data is locked inside EHRs. We used Solr, a module part of the Apache Hadoop ecosystem, to expose the data and let users perform rapid search. The ability to sort through over 184M clinical notes across 20-years worth of in/outpatient records Serves a framework to run CTAKES and other Natural Language Processing programs to find signal in the text noise, and make the data actionable.
  9. UMLS: Unified Medical Language System Negations Nearly 30 % of identified lung nodule notes are negative results. NLP engine constructs grammar tree and associates negation words with the identified lung nodule text Calculate Lung RADS scores based on nodule size and description Future tasks Measure accuracy of predicted Lund RADS scores and improve performace