SlideShare a Scribd company logo
Presented by:
COVID-19: Building Trust in Data
To Save Lives
Paul Balas
pbalas@303Computing.com
Over 25 Years Experience Leading Digital
Transformations
Multiple MDM Implementations, Data Governance, and
Data Warehouse Initiatives
● Digital Transformation Consultant
● IT Executive
● Enterprise Architect
● Developer
m
2
This is a presentation to uncover the
systemic challenges in our
Government’s response to The COVID
Pandemic through the use of data...
3
And a proposal to fix it.
The Team!
4
Mingo Sanchez Elizabeth
Michel
Keith WorfolkKatie Everett
Kamal MaheshwariAlexandra-Cosmina
Comaniciu
Pooja K SwamyEvan Hu Bryan Haagsman,
PMP
Agenda
5
● Why is the Data Wrong?
● Why the New HHS System Won’t
Work
● How To Fix It - a POC
Why is the COVID-19 Data Wrong?
6
➢ Data Quality is suspect
➢ The CDC had an aging system
➢ The virus is spreading quickly - huge volumes of new
data to process
➢ We aren’t capturing the data we need
7
➢ Managing our supply chain
(PPE, Testing Supplies, Hospital ICU Beds)
➢ Ensure we have enough doctors, nurses, and other healthcare
professionals
➢ Issuing protective orders to stop the spread of the virus (Shelter-in-
Place, Social Distancing, Shuttering Business)
Why Do We Need The Data?
8
We’ll be asking ourselves, “Could we have saved more lives?”
But without trusted data,
no decision will be driven
with conviction
9
Dr. Deborah Birx
said in a White
House
Coronavirus task
force meeting -
“There is
nothing
from the
CDC that I
can trust.”
10
11
The Data Isn’t Standardized - How Many Tests Have
Been Given?
12
13
There are literally 1,000’s of
examples of incorrectly
reported data
The CDC System Isn’t Solving The
Problem
14
15
➢ CDC has 1,200 Users and about 950 State CDC
Partners
➢ There are over 6,000 Hospitals in the US
➢ About 2,000 Hospitals provide Covid data directly to
HH Protect
➢ TeleTracking provides data from about 1,100
Hospitals
(the contract with Teletracking was a quick way to get more data, more quickly into the system)
The Scope of The Problem
16
The main focus behind improvement of the new system is speed,
data format standardization and validation.
Not on improving Data Governance and Collaboration
How The CDC is Trying to Solve The Problem
17
1000’s
Of
Organizations
18
Health and Human Services Goals for Data Governance
“Use of data across programs… remains a challenge.”
“Data are often housed in … data silos”
19
Even with a standardized platform like
TeleTracking, it’s just a data entry app that is
driven by procedural rules (though it is much
better than handing everyone an instruction
book and Excel)
The main problem in achieving
Data Quality is PEOPLE
Standardized Data Entry Vs. Agile Data Management
20
And here we have the problem; definitions
that may or may not be adhered to when
data is entered into the system
Medical forms are extremely complex and
require a great deal of training for health
practitioners to get it right
The Fallacy of Standardized Data Entry Solutions
Instructions for TeleTracking
21
If you’re a manager who believes that a
standardized data entry screen fixes
your organizations data quality
problems...
I strongly encourage you to speak with
your data scientists or data warehouse
developers
How To Fix It - a POC
22
The New HHS System Will Fall Short
The new system doesn’t solve for the hard problem to achieve
Data Quality
People efficiently aligning to create standards
Embedding standards into the system (Procedures vs. ML)
Let’s see how we envision a faster approach
23
Imagine This is Your Problem to Solve
You are the CDO of the CDC tasked to improve our Nation’s ability
to better manage this and the next pandemic through the use of
data
Your first goal is to understand the key issues in the current system
(“as-is”), and develop a roadmap to address them
The stakes are high
24
Architectural Principles
You outline your key architectural principles to keep the broad team focused on
outcomes
Goal 1 - Build better trust in the data
Goal 2 - Understand which issues to fix first (Prioritize)
Goal 3 - The system should be agile to change
(Days and Weeks, not Months or Years for new
features)
Goal 4 - Efficient e-collaboration
25
Two Paths
You decide to split the problem in two:
Path 1 - Standardize data entry systems - long path
Path 2 - Build a framework for efficient Data Governance and
do it quickly
26
Path 2 - The Reference Platform
Master Data Management
Taxonomy Management
Data Quality
Cloud Data Integration
Data Transformation
Orchestration
Natural Language / Feature Extraction
Data Lake
Compute and Storage
m
POC - Understand Data Issues
You want to focus on the systemic issues in the underlying data flow across all
stakeholders
You choose to look at issues around ‘Testing’ as you believe you can get
immediate benefits for public health if you can build confidence in the test
data
What problems are states having in processing Test Data?
Is Test Data being reported consistently, accurately?
28
JHU
Johns Hopkins University has become the de-facto authority on COVID-19 data,
but do you know they are pulling it from other agencies. What types of problems
do they see?
“The website relies upon publicly available data from multiple sources that are not always consistent in how and when they are released and
updated. States may report components of testing data with different cadences, or they may even change how they report categories of
data over time, all of which can affect calculations of the rate of positivity. For example, some states report testing positives separately from
testing negatives, which may make it appear that 100% of their tests were positive or 100% negative on that day. Also, states have changed how
they count positives and negative test results and may retroactively change the numbers reported.” - JHU COVID Website
If you could classify all the issues across stakeholders, you believe you could have
a tool to get alignment with stakeholders by listening to them through their own
words
29
When is a Test Not a Test?
The CDC, Johns Hopkins, Covid Tracking Project and hundreds of other sites all
deal with testing differently. It seems simple, but even the CDC made this mistake.
30
“The Centers for Disease Control and Prevention is conflating the results of two different types of
coronavirus tests, distorting several important metrics and providing the country with an inaccurate picture of
the state of the pandemic. We’ve learned that the CDC is making, at best, a debilitating mistake: combining
test results that diagnose current coronavirus infections with test results that measure whether someone has
ever had the virus. The upshot is that the government’s disease-fighting agency is overstating the country’s
ability to test people who are sick with COVID-19. “ ALEXIS C. MADRIGALROBINSON MEYERMAY 21, 2020
This is something we want to correct for and monitor in our POC. Can our system
compare test reports from various agencies and help explain why it’s different?
DataOps In Action
It’s Going Well
The framework is proving valuable. You can now see the systemic data
quality issues and importantly communicate with stakeholders effectively
to get alignment.
You see an opportunity to do more, quickly: Can we use the system to show
how people in the public eye might influence people to get tested? You
know that will be critical once our testing capacity increases.
You ask the team to classify news outlets by Public Influencers, Events, and
Locations.
Using a traditional tool, this wouldn’t be possible, but you’ve seen how
efficiently you can master and classify data through this platform
32
Public Influencers - New York
DataOps and More
Your new DataOps system will be
provide more than just good data
quality for COVID and other
Pandemics
It will also allow you to conduct
data science experiments to see if
there is a correlation between
public policy actions, infection
rates, and ultimately deaths
34
What Did You Achieve as CDC CDO?
➢ You delivered a DataOps framework that will expedite realization of data
standards
➢ It puts the power of data governance and master data management into the
hands of the experts at the CDC, HHS, Hospitals and Labs
➢ It works in compliment with systems like TeleTracking
➢ It will scale beyond infectious disease data and can serve as a model for HHS
to ensure and promote data quality for all citizens
35
Google Cloud Platform
InfoWorks
How Was It Built
Internet
data sources
Data orchestration TableauTamr
Big Query
VM instances
Google Cloud
Storage
Natural
language
Python
Twitter API
News API
State Health
Department
Web Pages
JHU Github
I Had To Extract Meaning From Text
Tamr - Data Experts - Spend More Time Analyzing/Strategizing
Before: Experts spend too much
Time manually fixing data
Today: ML can do 80% of
data mastering lift...
…. Enabling experts to put final
touches on the last 20%.
39
The Tamr Agile Approach to Data Mastering
Mastered data
OLD WAY
Rules-based
Source data
Mastered data
Time
Quality
Months to years
60%–80% Accuracy
Modify rules, create
exceptions
Months 1–4
Months 5–12+
Iterate
Machine-driven
NEW WAY
Days to weeks
90%+ accuracy
Source data
Weeks 1–12
Iterate with human-
guided machine
learning
Identify developers
Get business input
Write rules
Review with business
Unified data
Rules
40
ML vs. Procedural MDM
Effort
Time
Train Operate
MLP
7 Tamr Projects Built in a Few Weeks
Taxonomies: Before vs. After Tamr
Tamr enabled us to create standardized taxonomies that can be managed by a
networked group of hospitals, labs, health officials
These taxonomies are critical to having good quality and conformed data across a
widely distributed data network
There is an efficient mechanism for building consensus across experts at the same time
as fixing the data
There is no solution like it in the market.
L
MDM - People Mastering
L
Mastering People: 530K to 9K in a Few Days
Using Tamr, I was able to take a corpus of over 500k entity records identified by Google
Natural Language across 60,000 news articles, hundreds of web pages, thousands of
tweets, reducing it to about 9k Golden Master People Records with links back to each
news article they were referenced in regardless of spelling or abbreviation
I estimate the system can be maintained in one to two hours a week at scale,
decreasing to minutes a week as the model learns
I don’t even have to monitor it. Tamr can notify me of my quality score and if I have any
pairs that it’s unsure how to match
People Master Workflow
Conclusions
● The COVID Pandemic data challenges are a macro-view of the same
challenges we all face in our own companies as we use data as information to
improve outcomes
● People need to work together more effectively so we can erase this Pandemic
from our lives
● Trusted data can truly help us save more lives
47
Thank You!
Paul Balas
303-912-5912
pbalas@303computing.com
http://www.303computing.com/

More Related Content

What's hot

Big data
Big dataBig data
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
Piyush Malik
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
Teradata Aster
 
Datastax HealthCare Anytime
Datastax HealthCare AnytimeDatastax HealthCare Anytime
Datastax HealthCare Anytime
Mainstay
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
Castlebridge Associates
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016Anjan Roy, PMP
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRMDivya Malik
 
The value of big data
The value of big dataThe value of big data
The value of big data
SeymourSloan
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHealth Care DataWorks
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
Seta Wicaksana
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
Shailja Khurana
 
Enacting the data subjects access rights for gdpr with data services and data...
Enacting the data subjects access rights for gdpr with data services and data...Enacting the data subjects access rights for gdpr with data services and data...
Enacting the data subjects access rights for gdpr with data services and data...
Jean-Michel Franco
 
Big data impact and concerns
Big data impact and concernsBig data impact and concerns
Big data impact and concerns
Advaiya Solutions, Inc.
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Simplilearn
 
Change Management: The Secret to a Successful SAS® Implementation
Change Management:  The Secret to a Successful SAS® ImplementationChange Management:  The Secret to a Successful SAS® Implementation
Change Management: The Secret to a Successful SAS® Implementation
ThotWave
 
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBig Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Bill Kohnen
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics Dan Bart
 
Global Technology Outlook 2012 Booklet
Global Technology Outlook 2012 BookletGlobal Technology Outlook 2012 Booklet
Global Technology Outlook 2012 Booklet
IBM Danmark
 

What's hot (20)

Big data
Big dataBig data
Big data
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Datastax HealthCare Anytime
Datastax HealthCare AnytimeDatastax HealthCare Anytime
Datastax HealthCare Anytime
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
 
The value of big data
The value of big dataThe value of big data
The value of big data
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Enacting the data subjects access rights for gdpr with data services and data...
Enacting the data subjects access rights for gdpr with data services and data...Enacting the data subjects access rights for gdpr with data services and data...
Enacting the data subjects access rights for gdpr with data services and data...
 
Big data impact and concerns
Big data impact and concernsBig data impact and concerns
Big data impact and concerns
 
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...
 
Change Management: The Secret to a Successful SAS® Implementation
Change Management:  The Secret to a Successful SAS® ImplementationChange Management:  The Secret to a Successful SAS® Implementation
Change Management: The Secret to a Successful SAS® Implementation
 
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBig Data Impact on Purchasing and SCM - PASIA World Conference Discussion
Big Data Impact on Purchasing and SCM - PASIA World Conference Discussion
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics
 
Global Technology Outlook 2012 Booklet
Global Technology Outlook 2012 BookletGlobal Technology Outlook 2012 Booklet
Global Technology Outlook 2012 Booklet
 

Similar to COVID-19 - How to Improve Outcomes By Improving Data

AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
Dale Sanders
 
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
Michael Dykstra
 
Health Informatics- Module 3-Chapter 3.pptx
Health Informatics- Module 3-Chapter 3.pptxHealth Informatics- Module 3-Chapter 3.pptx
Health Informatics- Module 3-Chapter 3.pptx
Arti Parab Academics
 
PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)Preeti Sirohi
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
Transforming Healthcare: Build vs Buy
Transforming Healthcare: Build vs BuyTransforming Healthcare: Build vs Buy
Transforming Healthcare: Build vs Buy
ibi
 
TS Brochure_ Arch Strategy
TS Brochure_ Arch StrategyTS Brochure_ Arch Strategy
TS Brochure_ Arch StrategyRhonda Wille
 
The High Quality Data Gathering System Essay
The High Quality Data Gathering System EssayThe High Quality Data Gathering System Essay
The High Quality Data Gathering System Essay
Divya Watson
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Dale Sanders
 
Data analytics
Data analyticsData analytics
Data analytics
Bhanu Pratap
 
Creating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencyCreating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencySubhash Chandra
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
Perficient, Inc.
 
How much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuationHow much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuation
Sean Manion PhD
 
DATA INTEGRITY GMP COMPLIANCE
DATA INTEGRITY GMP COMPLIANCEDATA INTEGRITY GMP COMPLIANCE
DATA INTEGRITY GMP COMPLIANCE
Pharmaceutical
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
Chris Waller
 
Rapid Response Analytics Solution Accelerates Analytics ROI
Rapid Response Analytics Solution Accelerates Analytics ROIRapid Response Analytics Solution Accelerates Analytics ROI
Rapid Response Analytics Solution Accelerates Analytics ROI
Health Catalyst
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
Dale Sanders
 
Quality management system model
Quality management system modelQuality management system model
Quality management system modelselinasimpson2601
 
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsCentral_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsMichelle Phan (L.I.O.N.)
 
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsCentral_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsTiffany Nealy
 

Similar to COVID-19 - How to Improve Outcomes By Improving Data (20)

AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
5 Things to Know About the Clinical Analytics Data Management Challenge - Ext...
 
Health Informatics- Module 3-Chapter 3.pptx
Health Informatics- Module 3-Chapter 3.pptxHealth Informatics- Module 3-Chapter 3.pptx
Health Informatics- Module 3-Chapter 3.pptx
 
PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)PROJECT softwares (28 May 14)
PROJECT softwares (28 May 14)
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
Transforming Healthcare: Build vs Buy
Transforming Healthcare: Build vs BuyTransforming Healthcare: Build vs Buy
Transforming Healthcare: Build vs Buy
 
TS Brochure_ Arch Strategy
TS Brochure_ Arch StrategyTS Brochure_ Arch Strategy
TS Brochure_ Arch Strategy
 
The High Quality Data Gathering System Essay
The High Quality Data Gathering System EssayThe High Quality Data Gathering System Essay
The High Quality Data Gathering System Essay
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
 
Data analytics
Data analyticsData analytics
Data analytics
 
Creating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencyCreating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiency
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
How much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuationHow much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuation
 
DATA INTEGRITY GMP COMPLIANCE
DATA INTEGRITY GMP COMPLIANCEDATA INTEGRITY GMP COMPLIANCE
DATA INTEGRITY GMP COMPLIANCE
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Rapid Response Analytics Solution Accelerates Analytics ROI
Rapid Response Analytics Solution Accelerates Analytics ROIRapid Response Analytics Solution Accelerates Analytics ROI
Rapid Response Analytics Solution Accelerates Analytics ROI
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
 
Quality management system model
Quality management system modelQuality management system model
Quality management system model
 
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsCentral_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
 
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_SymptomsCentral_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
Central_Analytics_Treating_the_Cause_Not_Just_the_Symptoms
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 

COVID-19 - How to Improve Outcomes By Improving Data

  • 1. Presented by: COVID-19: Building Trust in Data To Save Lives
  • 2. Paul Balas pbalas@303Computing.com Over 25 Years Experience Leading Digital Transformations Multiple MDM Implementations, Data Governance, and Data Warehouse Initiatives ● Digital Transformation Consultant ● IT Executive ● Enterprise Architect ● Developer m 2
  • 3. This is a presentation to uncover the systemic challenges in our Government’s response to The COVID Pandemic through the use of data... 3 And a proposal to fix it.
  • 4. The Team! 4 Mingo Sanchez Elizabeth Michel Keith WorfolkKatie Everett Kamal MaheshwariAlexandra-Cosmina Comaniciu Pooja K SwamyEvan Hu Bryan Haagsman, PMP
  • 5. Agenda 5 ● Why is the Data Wrong? ● Why the New HHS System Won’t Work ● How To Fix It - a POC
  • 6. Why is the COVID-19 Data Wrong? 6
  • 7. ➢ Data Quality is suspect ➢ The CDC had an aging system ➢ The virus is spreading quickly - huge volumes of new data to process ➢ We aren’t capturing the data we need 7
  • 8. ➢ Managing our supply chain (PPE, Testing Supplies, Hospital ICU Beds) ➢ Ensure we have enough doctors, nurses, and other healthcare professionals ➢ Issuing protective orders to stop the spread of the virus (Shelter-in- Place, Social Distancing, Shuttering Business) Why Do We Need The Data? 8
  • 9. We’ll be asking ourselves, “Could we have saved more lives?” But without trusted data, no decision will be driven with conviction 9
  • 10. Dr. Deborah Birx said in a White House Coronavirus task force meeting - “There is nothing from the CDC that I can trust.” 10
  • 11. 11
  • 12. The Data Isn’t Standardized - How Many Tests Have Been Given? 12
  • 13. 13 There are literally 1,000’s of examples of incorrectly reported data
  • 14. The CDC System Isn’t Solving The Problem 14
  • 15. 15 ➢ CDC has 1,200 Users and about 950 State CDC Partners ➢ There are over 6,000 Hospitals in the US ➢ About 2,000 Hospitals provide Covid data directly to HH Protect ➢ TeleTracking provides data from about 1,100 Hospitals (the contract with Teletracking was a quick way to get more data, more quickly into the system) The Scope of The Problem
  • 16. 16 The main focus behind improvement of the new system is speed, data format standardization and validation. Not on improving Data Governance and Collaboration How The CDC is Trying to Solve The Problem
  • 18. 18 Health and Human Services Goals for Data Governance “Use of data across programs… remains a challenge.” “Data are often housed in … data silos”
  • 19. 19 Even with a standardized platform like TeleTracking, it’s just a data entry app that is driven by procedural rules (though it is much better than handing everyone an instruction book and Excel) The main problem in achieving Data Quality is PEOPLE Standardized Data Entry Vs. Agile Data Management
  • 20. 20 And here we have the problem; definitions that may or may not be adhered to when data is entered into the system Medical forms are extremely complex and require a great deal of training for health practitioners to get it right The Fallacy of Standardized Data Entry Solutions Instructions for TeleTracking
  • 21. 21 If you’re a manager who believes that a standardized data entry screen fixes your organizations data quality problems... I strongly encourage you to speak with your data scientists or data warehouse developers
  • 22. How To Fix It - a POC 22
  • 23. The New HHS System Will Fall Short The new system doesn’t solve for the hard problem to achieve Data Quality People efficiently aligning to create standards Embedding standards into the system (Procedures vs. ML) Let’s see how we envision a faster approach 23
  • 24. Imagine This is Your Problem to Solve You are the CDO of the CDC tasked to improve our Nation’s ability to better manage this and the next pandemic through the use of data Your first goal is to understand the key issues in the current system (“as-is”), and develop a roadmap to address them The stakes are high 24
  • 25. Architectural Principles You outline your key architectural principles to keep the broad team focused on outcomes Goal 1 - Build better trust in the data Goal 2 - Understand which issues to fix first (Prioritize) Goal 3 - The system should be agile to change (Days and Weeks, not Months or Years for new features) Goal 4 - Efficient e-collaboration 25
  • 26. Two Paths You decide to split the problem in two: Path 1 - Standardize data entry systems - long path Path 2 - Build a framework for efficient Data Governance and do it quickly 26
  • 27. Path 2 - The Reference Platform Master Data Management Taxonomy Management Data Quality Cloud Data Integration Data Transformation Orchestration Natural Language / Feature Extraction Data Lake Compute and Storage m
  • 28. POC - Understand Data Issues You want to focus on the systemic issues in the underlying data flow across all stakeholders You choose to look at issues around ‘Testing’ as you believe you can get immediate benefits for public health if you can build confidence in the test data What problems are states having in processing Test Data? Is Test Data being reported consistently, accurately? 28
  • 29. JHU Johns Hopkins University has become the de-facto authority on COVID-19 data, but do you know they are pulling it from other agencies. What types of problems do they see? “The website relies upon publicly available data from multiple sources that are not always consistent in how and when they are released and updated. States may report components of testing data with different cadences, or they may even change how they report categories of data over time, all of which can affect calculations of the rate of positivity. For example, some states report testing positives separately from testing negatives, which may make it appear that 100% of their tests were positive or 100% negative on that day. Also, states have changed how they count positives and negative test results and may retroactively change the numbers reported.” - JHU COVID Website If you could classify all the issues across stakeholders, you believe you could have a tool to get alignment with stakeholders by listening to them through their own words 29
  • 30. When is a Test Not a Test? The CDC, Johns Hopkins, Covid Tracking Project and hundreds of other sites all deal with testing differently. It seems simple, but even the CDC made this mistake. 30 “The Centers for Disease Control and Prevention is conflating the results of two different types of coronavirus tests, distorting several important metrics and providing the country with an inaccurate picture of the state of the pandemic. We’ve learned that the CDC is making, at best, a debilitating mistake: combining test results that diagnose current coronavirus infections with test results that measure whether someone has ever had the virus. The upshot is that the government’s disease-fighting agency is overstating the country’s ability to test people who are sick with COVID-19. “ ALEXIS C. MADRIGALROBINSON MEYERMAY 21, 2020 This is something we want to correct for and monitor in our POC. Can our system compare test reports from various agencies and help explain why it’s different?
  • 32. It’s Going Well The framework is proving valuable. You can now see the systemic data quality issues and importantly communicate with stakeholders effectively to get alignment. You see an opportunity to do more, quickly: Can we use the system to show how people in the public eye might influence people to get tested? You know that will be critical once our testing capacity increases. You ask the team to classify news outlets by Public Influencers, Events, and Locations. Using a traditional tool, this wouldn’t be possible, but you’ve seen how efficiently you can master and classify data through this platform 32
  • 34. DataOps and More Your new DataOps system will be provide more than just good data quality for COVID and other Pandemics It will also allow you to conduct data science experiments to see if there is a correlation between public policy actions, infection rates, and ultimately deaths 34
  • 35. What Did You Achieve as CDC CDO? ➢ You delivered a DataOps framework that will expedite realization of data standards ➢ It puts the power of data governance and master data management into the hands of the experts at the CDC, HHS, Hospitals and Labs ➢ It works in compliment with systems like TeleTracking ➢ It will scale beyond infectious disease data and can serve as a model for HHS to ensure and promote data quality for all citizens 35
  • 36. Google Cloud Platform InfoWorks How Was It Built Internet data sources Data orchestration TableauTamr Big Query VM instances Google Cloud Storage Natural language Python Twitter API News API State Health Department Web Pages JHU Github
  • 37. I Had To Extract Meaning From Text
  • 38.
  • 39. Tamr - Data Experts - Spend More Time Analyzing/Strategizing Before: Experts spend too much Time manually fixing data Today: ML can do 80% of data mastering lift... …. Enabling experts to put final touches on the last 20%. 39
  • 40. The Tamr Agile Approach to Data Mastering Mastered data OLD WAY Rules-based Source data Mastered data Time Quality Months to years 60%–80% Accuracy Modify rules, create exceptions Months 1–4 Months 5–12+ Iterate Machine-driven NEW WAY Days to weeks 90%+ accuracy Source data Weeks 1–12 Iterate with human- guided machine learning Identify developers Get business input Write rules Review with business Unified data Rules 40
  • 41. ML vs. Procedural MDM Effort Time Train Operate MLP
  • 42. 7 Tamr Projects Built in a Few Weeks
  • 43. Taxonomies: Before vs. After Tamr Tamr enabled us to create standardized taxonomies that can be managed by a networked group of hospitals, labs, health officials These taxonomies are critical to having good quality and conformed data across a widely distributed data network There is an efficient mechanism for building consensus across experts at the same time as fixing the data There is no solution like it in the market. L
  • 44. MDM - People Mastering L
  • 45. Mastering People: 530K to 9K in a Few Days Using Tamr, I was able to take a corpus of over 500k entity records identified by Google Natural Language across 60,000 news articles, hundreds of web pages, thousands of tweets, reducing it to about 9k Golden Master People Records with links back to each news article they were referenced in regardless of spelling or abbreviation I estimate the system can be maintained in one to two hours a week at scale, decreasing to minutes a week as the model learns I don’t even have to monitor it. Tamr can notify me of my quality score and if I have any pairs that it’s unsure how to match
  • 47. Conclusions ● The COVID Pandemic data challenges are a macro-view of the same challenges we all face in our own companies as we use data as information to improve outcomes ● People need to work together more effectively so we can erase this Pandemic from our lives ● Trusted data can truly help us save more lives 47