Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Analytics for Treatment Pathways John Cai


Published on

  • Be the first to comment

Big Data Analytics for Treatment Pathways John Cai

  2. 2. Big Data for Pharma Decision making
  3. 3. The Fourth Hurdle requires Real World Evidence from RWD Cost-effectiveness (or CER) has became the “fourth hurdle” to market access
  4. 4. Real World Big Data Complexity Variety Unstructured data types e.g. clinical notes Volume Massive data sets, e.g. longitudinal claims/EMR Velocity Fast, real-time data collection and transmission e.g. HIE, wearables
  5. 5. Volume: Real World Population and Real World Data • Real World Evidence (RWE) evaluates safety, effectiveness and outcomes using real world data (RWD). • Not RCT data and broader than observational data, RWD is health data collected from actual practice by healthcare providers or in day-to-day situations by patients or caregivers Real World Population Randomized Clinical Trial Population 100 1,000 10,000 100,000 1,000,000 10,000,000 Phase 1 Phase 2 Phase 3 Phase 4 5 yrs 10 yrs Typical Pharma Data Real World Data #patients Observational Study Population
  6. 6. Variety: Major Real World Data Types and Sources • Claims (from payers or data vendors): Truven (MarketScan), IMS (PharMetrics), United Health Group (Optum), Wellpoint, Aetna, Humana, CMS, ... • EMR/EHR (from Healthcare providers or EMR vendors):  Nation-wide: VA, DoD, GE Centricity, Allscripts, Cerner, Humedica, Flatiron, etc…  Regional: Kaiser, Regenstrief, Partners, Mayo, Intermountain, Geisinger, ...  Academic: Harvard, Univ of Utah, Vanderbilt, Cincinnati Children's Hospital, ... • Surveys and registries: NCHS (NHANES, NHIS, NAMCS , NHAMCS, NSAS, NHDS, NNHS, NNAS, etc.), SEER registries, MEPS, ACC registries, ... • PBM/Pharmacy Databases: Medco, Wallgreens, CVS, Walmart, … • Lab databases: Quest, Labcorp, … • PHRs: patient portals, MS HealthVault™, Indivo X, CMS PHR Pilots, … • Patient forums/social media: Patientslikeme,,… • Monitoring/wearables: medical device data, Apple ResearchKit, …
  7. 7. Pharma CER EBM Proactive Pharmacovigilance Trial Design & Interpretation PHC Cost Effectiveness Drug Repositioning /New Indications Patient recruitment Velocity: Real World Data Transmission to Pharma Payer/ PBM Real World Data ? ?
  8. 8. Complexity, Variability, Veracity • Patient journeys are complex • Real-world treatment pathways can be messy • Physicians not following clinical practice guidelines • Patients not adherence to medications Treatment pathways are difficult to reconstruct using healthcare data: • Technical hurdles - need to repeatedly query and merge across large # tables • Conceptual hurdles of secondary use • Claims for transaction • EMR for patient care
  9. 9. 9 • Use business rules to translate data to events of interest - Example: ndMM patient cohort  One inpatient diagnosis or two outpatient diagnoses (two separate dates)  list of ICD9 codes  One or more MM-specific treatments  list of drugs and procedures  First diagnosis: “index date”  At least 6 or 12 months continuous coverage before index date  At least 12 or 24 months continuous coverage after index date  What is a therapy line?  What is a drug switch, discontinuation, add-on, combo, “drug holiday”? • Addresses some parts of the conceptual challenge • Creates new problems - How sensitive are our results to the rule definitions? Typical Solutions
  10. 10. Potential Technical Solution: Hadoop and MapReduce • Hadoop: an open source software project - Hadoop Distributed File System (HDFS) - MapReduce: compute paradigm for parallel computing - A whole ecosystem of additional products/services/tools • History: - 2003 Google file system paper - 2004 Google Map Reduce paper - Adopted by Yahoo, donated to the open source community in 2009 • The gist of it: - Distributed file system, “cheap” storage on computer clusters - Compute paradigm that abstracts the parallelism by breaking down operations to “map” and “reduce” - Hadoop framework takes care of everything else
  11. 11. Map Reduce in a Nutshell Mappers work on data, “emit” key-value pairs  We write Mappers and Reducers  Hadoop takes care of everything else Reducer works on all values (data) for the same key Shuffle-Sort: intermediary data sorted and distributed by key
  12. 12. 12 • Load data into HDFS - “Transactional” data (claims, interactions) • Reconstructing a patient’s timeline is a textbook MapReduce exercise: - Mapper:  Read a piece of data. Example: claim  Figure out who it relates to. Example: patient ID  Return key-value pairs: Key: patient ID Value: the full piece of information (claim) - Reducer:  Gets as an input a key and the set of all values (claims) associated with that key (patient ID)  Organize the values (claims) to produce a basic patient history Building Patient Timelines using Hadoop and MapReduce
  13. 13. 13 Building Patient Timelines using MapReduce Followed by Visual Analytics Shuffle-Sort: “Hadoop magic” Mapper Reducer
  14. 14. Treatment Cost Trends 14  Cost analysis of PsA and PsO treatments  Biologics treatment costs have been high and going up  Presented to AMCP and ISPOR 2015 as posters
  15. 15. Co-medication Usage
  16. 16. Treatment Pathways
  17. 17. Patient timelines - “individual story”
  18. 18. Future Directions  Cost of care analysis, comparing across different pathways  Healthcare resource utilization analysis, comparing across different pathways  Patterns of care analysis: predictive modeling combining patient similarity measures and clustering  Comparison to Clinical Practice Guidelines (Compliance and Adherence)  Outcomes of care/CER: incorporating clinical outcomes using integrated claims/EMR data
  19. 19. Some Learning Points  Some Hadoop functionality perfectly suited for patient timeline analysis  Mapreduce for creating patient timelines  Once patient timelines are created, everything else scales linearly  Map(reduce) for calculating patient metrics and complex events  Mapreduce for analyzing treatment pathways  Cheap scalable storage capacity and compute power  Scalability allows robust analysis
  20. 20. Healthcare Decision Making Requires Real-world Big Data Analytics  Efficacy and Safety from RCT settings – FDA to approve  Cost effectiveness – Payer's willingness to pay  Clinical effectiveness (long term efficacy and safety) – Physicians to prescribe, patient to adhere  Comparative effectiveness, patient reported outcomes – Physicians to prescribe, patient to adhere To Innovate To Approve To Pay for To Prescribe To Adhere Industry FDA Physician Patient Health Plan IDS Government
  21. 21. Forthcoming Thank You! Leveraging Hadoop MapReduce in Building Patient Timelines and Analyzing Health Resource Utilization Special Issue on Big Data in Pharmacoeconomics Saar Golde, Ph.D., Knowledgent Group and NYU Zhaohui “John” Cai, M.D. Ph.D., Celgene Corporation