3. The Fourth Hurdle requires Real
World Evidence from RWD
Cost-effectiveness (or CER) has became
the “fourth hurdle” to market access
4. Real World Big Data
Complexity
Variety
Unstructured
data types
e.g. clinical
notes
Volume
Massive data
sets, e.g.
longitudinal
claims/EMR
Velocity
Fast, real-time
data collection
and
transmission
e.g. HIE,
wearables
5. Volume: Real World Population and
Real World Data
• Real World Evidence (RWE) evaluates safety, effectiveness and outcomes using real
world data (RWD).
• Not RCT data and broader than observational data, RWD is health data collected from
actual practice by healthcare providers or in day-to-day situations by patients or
caregivers
Real World
Population
Randomized
Clinical Trial
Population
100
1,000
10,000
100,000
1,000,000
10,000,000
Phase 1 Phase 2 Phase 3 Phase 4 5 yrs 10 yrs
Typical Pharma Data
Real World Data
#patients
Observational
Study Population
6. Variety: Major Real World Data
Types and Sources
• Claims (from payers or data vendors): Truven (MarketScan), IMS (PharMetrics),
United Health Group (Optum), Wellpoint, Aetna, Humana, CMS, ...
• EMR/EHR (from Healthcare providers or EMR vendors):
Nation-wide: VA, DoD, GE Centricity, Allscripts, Cerner, Humedica, Flatiron, etc…
Regional: Kaiser, Regenstrief, Partners, Mayo, Intermountain, Geisinger, ...
Academic: Harvard, Univ of Utah, Vanderbilt, Cincinnati Children's Hospital, ...
• Surveys and registries: NCHS (NHANES, NHIS, NAMCS , NHAMCS, NSAS,
NHDS, NNHS, NNAS, etc.), SEER registries, MEPS, ACC registries, ...
• PBM/Pharmacy Databases: Medco, Wallgreens, CVS, Walmart, …
• Lab databases: Quest, Labcorp, …
• PHRs: patient portals, MS HealthVault™, Indivo X, CMS PHR Pilots, …
• Patient forums/social media: Patientslikeme, inspire.com, smartpatients.com…
• Monitoring/wearables: medical device data, Apple ResearchKit, …
8. Complexity, Variability, Veracity
• Patient journeys are complex
• Real-world treatment
pathways can be messy
• Physicians not following
clinical practice
guidelines
• Patients not adherence
to medications
Treatment pathways are difficult to
reconstruct using healthcare data:
• Technical hurdles - need to repeatedly
query and merge across large # tables
• Conceptual hurdles of secondary use
• Claims for transaction
• EMR for patient care
9. 9
• Use business rules to translate data to events of interest
- Example: ndMM patient cohort
One inpatient diagnosis or two outpatient diagnoses (two separate dates)
list of ICD9 codes
One or more MM-specific treatments
list of drugs and procedures
First diagnosis: “index date”
At least 6 or 12 months continuous coverage before index date
At least 12 or 24 months continuous coverage after index date
What is a therapy line?
What is a drug switch, discontinuation, add-on, combo, “drug holiday”?
• Addresses some parts of the conceptual challenge
• Creates new problems
- How sensitive are our results to the rule definitions?
Typical Solutions
10. Potential Technical Solution:
Hadoop and MapReduce
• Hadoop: an open source software project
- Hadoop Distributed File System (HDFS)
- MapReduce: compute paradigm for parallel computing
- A whole ecosystem of additional products/services/tools
• History:
- 2003 Google file system paper
- 2004 Google Map Reduce paper
- Adopted by Yahoo, donated to the open source community in 2009
• The gist of it:
- Distributed file system, “cheap” storage on computer clusters
- Compute paradigm that abstracts the parallelism by breaking down
operations to “map” and “reduce”
- Hadoop framework takes care of everything else
11. Map Reduce in a Nutshell
Mappers work on data,
“emit” key-value pairs
We write Mappers and Reducers
Hadoop takes care of everything else
Reducer works on all
values (data) for the
same key
Shuffle-Sort:
intermediary data
sorted and distributed
by key
12. 12
• Load data into HDFS
- “Transactional” data (claims, interactions)
• Reconstructing a patient’s timeline is a textbook MapReduce
exercise:
- Mapper:
Read a piece of data. Example: claim
Figure out who it relates to. Example: patient ID
Return key-value pairs:
Key: patient ID
Value: the full piece of information (claim)
- Reducer:
Gets as an input a key and the set of all values (claims) associated with
that key (patient ID)
Organize the values (claims) to produce a basic patient history
Building Patient Timelines using
Hadoop and MapReduce
13. 13
Building Patient Timelines using
MapReduce Followed by Visual Analytics
Shuffle-Sort:
“Hadoop magic”
Mapper Reducer
14. Treatment Cost Trends
14
Cost analysis of PsA and PsO treatments
Biologics treatment costs have been high and going up
Presented to AMCP and ISPOR 2015 as posters
18. Future Directions
Cost of care analysis, comparing across different
pathways
Healthcare resource utilization analysis,
comparing across different pathways
Patterns of care analysis: predictive modeling
combining patient similarity measures and
clustering
Comparison to Clinical Practice Guidelines
(Compliance and Adherence)
Outcomes of care/CER: incorporating clinical
outcomes using integrated claims/EMR data
19. Some Learning Points
Some Hadoop functionality perfectly suited for
patient timeline analysis
Mapreduce for creating patient timelines
Once patient timelines are created, everything else scales
linearly
Map(reduce) for calculating patient metrics and
complex events
Mapreduce for analyzing treatment pathways
Cheap scalable storage capacity and compute
power
Scalability allows robust analysis
20. Healthcare Decision Making Requires
Real-world Big Data Analytics
Efficacy and Safety from RCT settings – FDA to approve
Cost effectiveness – Payer's willingness to pay
Clinical effectiveness (long term efficacy and safety) – Physicians to
prescribe, patient to adhere
Comparative effectiveness, patient reported outcomes – Physicians to
prescribe, patient to adhere
To Innovate To Approve To Pay for To Prescribe To Adhere
Industry FDA Physician Patient
Health Plan
IDS
Government
21. Forthcoming
Thank You!
Leveraging Hadoop MapReduce in Building Patient Timelines and Analyzing
Health Resource Utilization
Special Issue on Big Data in Pharmacoeconomics
Saar Golde, Ph.D., Knowledgent Group and NYU
Zhaohui “John” Cai, M.D. Ph.D., Celgene Corporation