Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals
Upcoming SlideShare
Loading in...5
×
 

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals

on

  • 2,186 views

Like most of healthcare and life science, pharmaceutical companies are undergoing a data-driven transformation. The industry-wide need to reduce the cost of developing, manufacturing and distributing ...

Like most of healthcare and life science, pharmaceutical companies are undergoing a data-driven transformation. The industry-wide need to reduce the cost of developing, manufacturing and distributing drugs while bringing to market new products is not a novel concept or challenge. However, the ability to process and analyze large amounts of data using cutting-edge massively parallel processing (MPP) technologies means innovation can be found not only in the traditional hypothesis-driven approaches we have come to expect. New technologies and approaches make it possible to incorporate all available data, structured and unstructured. At Pivotal, it is the goal of our data science practice to demonstrate the capabilities of the technologies we offer. We focus on building predictive models by combining the vast and variable data that is available to elicit action or generate insights. In our talk we will focus on a use case in pharmaceutical manufacturing, wherein we created a predictive model to produce more consistent, high-quality products and drive decisions to abandon lots with expected poor outcomes. In addition, we demonstrate how we used machine learning to cleanse data and to improve efficiencies in data collection by identifying low information-content measurements and incorporate under-utilized data sources in manufacturing. Beyond this use case, we will discuss our vision of using machine learning in all areas of the industry, from research through distribution, to drive change.

Statistics

Views

Total Views
2,186
Views on SlideShare
2,186
Embed Views
0

Actions

Likes
0
Downloads
22
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals Presentation Transcript

  • A NEW PLATFORM FOR A NEW ERA
  • Data Driven Drugs: Predictive Models to Improve Product Quality in Pharmaceuticals Sarah Aerni, PhD Senior Data Scientist at Pivotal saerni@gopivotal.com Strata RX September 26, 2013 © Copyright 2013 Pivotal. All rights reserved. 2
  • The Quantified Patient Medical History! Genetics! Family ! History! Imaging! Clinical! Narratives! Medications! Molecular! Diagnostics! Lab tests! Environment! © Copyright 2013 Pivotal. All rights reserved. Sensors! & Mobile! 3 View slide
  • Data driven drugs: From discovery to delivery Drug discovery + development RICH DATA SOURCES Clinical Trials Distribution and surveillance !  Molecular data –  Cellular drug screens –  Animal models !  Clinical data including notes, images, markers (e.g. genomics, lab results) !  Sensor and assay data !  Internal and partner/purchased external data Manufacturing !  Contact center data Marketing © Copyright 2013 Pivotal. All rights reserved. !  Patient registries, public and federal data, clinical partnerships 4 View slide
  • Data integration How Pivotal can enable industries to extract new value from data sources © Copyright 2013 Pivotal. All rights reserved. 5
  • Successful transformation into a data-driven enterprise requires a paradigm shift !  Bring available data sources to a central location Integration of a variety of data leads to new insights DATA IS THE NEW CENTER OF GRAVITY !  Analyze large volumes of variable data for richer models Building models without data movement reduces time to insight !  Share data, insights and ideas Leveraging various expertise will lead to more relevant business insights © Copyright 2013 Pivotal. All rights reserved. Data > Application! 6
  • Traditional Analytics Processes If you think databases are only good for storing data Time-to-Insights sample In-memory statistics tool In-memory optimization tool solution forecast © Copyright 2013 Pivotal. All rights reserved. 7
  • Pivotal One: Heritage Application Fabric Data Fabric GemFire Ingest & Query: very high-capacity & in-memory Scale-out storage: HDFS/Object vFabric Languages & Frameworks Services Analytics Automation: App Provisioning & Life-cycle Service Registry Cloud Abstraction (portability) Cloud Fabric © Copyright 2013 Pivotal. All rights reserved. 8
  • Performance Through Parallelism !  Automatic parallelization Database –  Load and query like any database –  Automatically distributed tables across nodes –  No need for manual partitioning or tuning !  Analytics Optimized: –  Analytics-oriented query optimization !  Extremely scalable MPP shared-nothing architecture Interconnect Compute Storage Loading –  All nodes can scan and process in parallel –  Linear scalability by adding nodes © Copyright 2013 Pivotal. All rights reserved. 9
  • Performance Through Parallelism !  Automatic parallelization Database –  Load and query like any database –  Automatically distributed tables across nodes –  No need for manual partitioning or tuning !  Analytics Optimized: –  Analytics-oriented query optimization !  Extremely scalable MPP shared-nothing architecture –  All nodes can scan and process in parallel –  Linear scalability by adding nodes © Copyright 2013 Pivotal. All rights reserved. Interconnect Compute Storage ETL Loadin File g Systems External Sources: Loading, streaming, etc. 10
  • Pivotal HD Architecture Pivotal HD Enterprise Resource Management & Workflow Pig, Hive, Mahout HBase Map Reduce Configure, Monitor, Manage Hadoop Virtualization (HVE) Yarn Command HDFS Zookeeper Center Sqoop Apache © Copyright 2013 Pivotal. All rights reserved. Deploy, Data Loader Flume Pivotal HD Enterprise 11
  • Pivotal HD Architecture HAWQ– Advanced Database Services ANSI SQL + Analytics Pivotal HD Enterprise Resource Management & Workflow Xtension Framework HBase Query Optimizer Dynamic Pipelining Pig, Hive, Mahout Map Reduce Deploy, Configure, Monitor, Manage Hadoop Virtualization (HVE) Yarn Command HDFS Zookeeper Center Sqoop Apache © Copyright 2013 Pivotal. All rights reserved. Catalog Services Flume Data Loader Pivotal HD Enterprise HAWQ 12
  • Leveraging healthcare data to drive predictive and precision care Clinical! Narratives! Medications! Decision support Imaging! Precision care Genetics! Environment! Labs test! Cohort identification Unified data supporting unified risk evaluation, decision-making, etc. ! Acting on full patient and medical profile! © Copyright 2013 Pivotal. All rights reserved. 13
  • Traditional Analytics Processes If you think databases are only good for storing data Time-to-Insights sample In-memory statistics tool In-memory optimization tool solution forecast © Copyright 2013 Pivotal. All rights reserved. 14
  • Analytics with Pivotal A single address for everything analytics Time-to-Insights Forecasting Clustering Regression Optimization Classification © Copyright 2013 Pivotal. All rights reserved. 15
  • Analytics Ecosystem COMMERCIAL OPEN SOURCE MADlib SAS/ACCESS& SAS&Scoring&Accelerator& SAS&High&Performance& Analy7cs& In0database&analy6cs& PL/R,&PL/Python&PL/Java& © Copyright 2013 Pivotal. All rights reserved. 16
  • MADlib: Machine Learning at Scale Collaborators © Copyright 2013 Pivotal. All rights reserved. 17
  • Data driven drugs: From discovery to delivery Drug discovery + development !  Molecular data Clinical Trials Distribution and surveillance Marketing © Copyright 2013 Pivotal. All rights reserved. –  Cellular drug screens –  Animal models !  Clinical data including notes, images, markers (e.g. genomics, lab results) !  Sensor and assay data !  Internal and partner/purchased external data Manufacturing !  Contact center data !  Patient registries, public and federal data, clinical partnerships 18
  • Manufacturing Data-driven approaches to tuning a drug manufacturing process © Copyright 2013 Pivotal. All rights reserved. 19
  • Predicting potency in vaccine manufacturing Customer Solution A major pharmaceutical company •  Introduced a new data model to make data accessible and enable analytics •  Built automated outlier detection/ correction methods to address manual data entry quality issues •  Devised imputation methods to deal with data completeness issues •  Built predictive models with high accuracy Business Problem Predict potency and antigen levels of live virus vaccines based on manufacturing sensor data and manual data collected throughout the process. Challenges •  Customer’s data model was not optimal for running analytical queries •  Manual data quality issues •  Data capture was performed with varying consistency due to high cost associated with manual data collection © Copyright 2013 Pivotal. All rights reserved. 20
  • Building predictive models to improved outcomes in manufacturing of vaccines Temp Counts Future Looking Predictive Models Cell expansion Virus propagation Duration of step Time Warning! Entered value not in expected range © Copyright 2013 Pivotal. All rights reserved. Pooling into final product Backward Looking Models 21
  • Enabling predictive models through rearchitecting Challenges •  Accessibility –  Certain parts of the data have never been used in any predictive modeling since it is extremely hard to query them Cell expansion •  Data Integrity –  Manual data entries are prone to errors. There is no immediate feedback to examine the validity of the values entered Virus propagation •  Data Completeness –  Manual data entry is time consuming. There is no feedback on what data is most useful in improving the efficiency and quality and hence no prioritization of what data should be collected © Copyright 2013 Pivotal. All rights reserved. Pooling into final product 22
  • Enabling predictive models through rearchitecting Challenges •  Accessibility –  Certain parts of the data have never been used in any predictive modeling since it is extremely hard to query them Purpose-built data models for rapid data querying and exploration •  Data Integrity –  Manual data entries are prone to errors. There is no immediate feedback to examine the validity of the values entered Automated data cleansing techniques •  Data Completeness –  Manual data entry is time consuming. There is no feedback on what data is most useful in improving the efficiency and quality and hence no prioritization of what data should be collected © Copyright 2013 Pivotal. All rights reserved. Opportunities to eliminate collection of incomplete or non-predictive data 23
  • Identifying and correcting data integrity problems Creating automated methods for detection and correction all data 60 80 100 !  Data integrity problems cause challenges in modeling 0 20 40 !  Sources of variation in entries of measurements 1 3 5 7 9 11 13 15 17 19 21 23 –  Variable units of measurement –  Manual data entry errors Approach: Detect the optimal threshold to separate two distributions © Copyright 2013 Pivotal. All rights reserved. 24
  • Identifying and correcting data integrity problems Creating automated methods for detection and correction all data 60 80 100 !  Data integrity problems cause challenges in modeling 20 40 !  Sources of variation in entries of measurements –  Variable units of measurement –  Manual data entry errors 0 1 3 5 7 9 11 13 15 17 19 lower half lower half upper half 23 !  Approach: Detect the optimal threshold to separate two distributions 40 10 20 510 5 20 10 10 15 20 30 lower half 30 Frequency 15 40 50 5020 60 60 upper half 0 0 00 Frequency Frequency Frequency 21 0.12 0.12 0.12 12 0.14 0.16 0.18 0.20 0.14 0.16 0.18 0.14 newVals[seq(1, maxBreak, 1)] 0.20 22 0.16 14 16 180.18 20 0.20 newVals[seq(1, maxBreak, 1)] newVals[seq(1, maxBreak, 1)] newVals[seq(maxBreak + 1, length(newVals), 1)] © Copyright 2013 Pivotal. All rights reserved. 0.22 0.22 0.22 24 12 14 16 18 20 22 24 newVals[seq(maxBreak + 1, length(newVals), 1)] 25
  • Identifying and correcting data integrity problems Creating automated methods for detection and correction 0 20 40 60 80 100 all data 1 3 5 7 9 11 13 15 17 19 lower half lower half upper half 23 Foreground Background 40 10 20 510 5 20 10 10 15 20 30 lower half 30 Frequency 15 40 50 5020 60 60 upper half 0 0 00 Frequency Frequency Frequency 21 0.12 0.12 0.12 12 0.14 0.16 0.18 0.20 0.14 0.16 0.18 0.14 newVals[seq(1, maxBreak, 1)] 0.20 22 0.16 14 16 180.18 20 0.20 newVals[seq(1, maxBreak, 1)] newVals[seq(1, maxBreak, 1)] newVals[seq(maxBreak + 1, length(newVals), 1)] © Copyright 2013 Pivotal. All rights reserved. 0.22 0.22 0.22 24 12 14 16 18 20 22 24 newVals[seq(maxBreak + 1, length(newVals), 1)] 26
  • Identifying and correcting data integrity problems Creating automated methods for detection and correction 0 20 40 60 80 100 all data 1 3 5 7 9 11 13 15 17 19 lower half lower half upper half 23 Foreground Background 40 10 20 510 5 20 10 10 15 20 30 lower half 30 Frequency 15 40 50 5020 60 60 upper half 0 0 00 Frequency Frequency Frequency 21 0.12 0.12 0.12 12 0.14 0.16 0.18 0.20 0.14 0.16 0.18 0.14 newVals[seq(1, maxBreak, 1)] 0.20 22 0.16 14 16 180.18 20 0.20 newVals[seq(1, maxBreak, 1)] newVals[seq(1, maxBreak, 1)] newVals[seq(maxBreak + 1, length(newVals), 1)] © Copyright 2013 Pivotal. All rights reserved. 0.22 0.22 0.22 24 12 14 16 18 20 22 24 newVals[seq(maxBreak + 1, length(newVals), 1)] 27
  • Identifying and correcting data integrity problems Creating automated methods for detection and correction 60 80 100 all data 5 7 9 11 13 15 17 19 lower half lower half upper half 23 0 40 12 20 510 5 20 10 10 15 20 30 lower half 30 Frequency 15 40 50 5020 60 60 20 20 upper half 12 12 14 14 14 16 16 16 18 18 18 20 20 20 22 22 22 24 24 10 c(loh, uph) 0 0 00 Frequency Frequency Frequency 21 40 40 3 Frequency 1 60 60 0 20 8080 40 cleanedHistogram of c(loh, uph) = 100 histogram with multiplier 0.12 0.12 0.12 12 0.14 0.16 0.18 0.20 0.14 0.16 0.18 0.14 newVals[seq(1, maxBreak, 1)] 0.20 22 0.16 14 16 180.18 20 0.20 newVals[seq(1, maxBreak, 1)] newVals[seq(1, maxBreak, 1)] newVals[seq(maxBreak + 1, length(newVals), 1)] © Copyright 2013 Pivotal. All rights reserved. 0.22 0.22 0.22 24 12 14 16 18 20 22 24 newVals[seq(maxBreak + 1, length(newVals), 1)] 28
  • Building models: First, start with the answer How to build models that solve the right problem Cell expansion Approach: Use historical data to build a model predicting potency of a final product using data from the manufacturing process !  Model form, how do we pick the right one? Virus propagation –  How do we deal with correlated features? –  Accuracy or interpretability? !  Available data Pooling into final product © Copyright 2013 Pivotal. All rights reserved. –  Thousands of features, without expert guidance how do we choose the right ones? –  What data do we want to use to predict? When is the right time for an intervention? 29
  • Model generation and evaluation Predicting vaccine potency using manufacturing data 13.5 !  Feature engineering and transformation Test R2=0.742 Train R2=0.823 –  Enabled by rapid in-database processing ● ● ● 13.0 ● ● predTest[, i] Predicted Potency Total test 0.742003189411406 ● ● ● ● ● 12.5 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 12.0 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● –  Partial least squares –  Random forest –  Regularized regression ● ● ● ● !  Interpretation of model results for insight generation ● ● ● ● ● 12.0 12.5 13.0 True Potency allTest[, i] © Copyright 2013 Pivotal. All rights reserved. !  Experimentation with model forms 13.5 –  Use cross-validation framework to assess variable importance 30
  • Sample model insights Interpreting the utility of a measure obtained during manufacturing based on model outcomes 13.0 12.8 13.0 Log of Potency 12.6 Potency 12.6 12.2 12.4 !  Features consistently absent from models may be uninformative for predicting potency 12.4 12.8 Potency 12.0 12.2 12.0 Log of Potency !  Some features may reveal tunable parameters to alter potency, others may simply be markers Correlation = 0.38 Correlation = -0.45 0.20 0.25 0.30 0.35 0.40 SP1 Total Viable Cells Harvested Per Sq. Cm Assayed value © Copyright 2013 Pivotal. All rights reserved. 0.45 12 12.5 13 13.5 14 14.5 15 15.5 SP2 Total Trypsinization Exposure Time of per CCS Duration of a step >=16 !  Opportunities to provide realtime feedback on data entry errors and predicted potency outcomes 31
  • Data-driven drugs Opportunities for data mining across the pharmaceutical industry © Copyright 2013 Pivotal. All rights reserved. 32
  • Data driven drugs: From discovery to delivery Drug discovery + development Clinical Trials Distribution and surveillance Manufacturing Marketing © Copyright 2013 Pivotal. All rights reserved. 33
  • Data driven drugs: From discovery to delivery Drug discovery + development Clinical Trials Distribution and surveillance !  Data repurposing New value exists in leveraging historical data across drugs and stages !  Data discovery External and publicly available datasets can augment proprietary sources Manufacturing !  Data collection Marketing © Copyright 2013 Pivotal. All rights reserved. Obtaining new data from different sources drives additional value 34
  • Data driven drugs: From discovery to delivery Drug discovery + development Clinical Trials Distribution and surveillance !  Data repurposing New value exists in leveraging historical data across drugs and stages Adverse events for new clinical indications !  Data discovery External and publicly available datasets can augment proprietary sources Twitter data to forecast demand Manufacturing !  Data collection Marketing © Copyright 2013 Pivotal. All rights reserved. Obtaining new data from different sources drives additional value Mobile and sensor data to measure patient adherence and outcomes 35
  • Leveraging Data to Improve Demand Forecasts Hospitals Doctor’s Offices Supply Distr. Surgery Centers Sales Data Pharmacies Analyze orders from customers Patients Laboratories Self-Reporting Publicly Available Resources Monitoring Patient Populations © Copyright 2013 Pivotal. All rights reserved. 36
  • Promising Advancements in Diabetes Studies Use of telehealth to provide tight glucose control Biochemical Measurements EMR Genomics Lifestyle Intervention © Copyright 2013 Pivotal. All rights reserved. 37
  • Launching a successful diabetes management program Multiple potential points of failure, requires use of analytics at every step Increase Awareness Patient Enrollment Comparative Effectiveness Remote Patient Monitoring Design Interventions Measure Impact on Population Best channel per cohort Best therapy for Resource each cohort: allocation Identify highest •  Medication decisions impact channels •  Delivery Medication Method adherence Stochastic •  Monitoring Churn Identify entity prediction influencers Method Predict risk of resolution negative Measure Campaign outcome for engagement optimization A/B testing to design best next 3 months engagement platform © Copyright 2013 Pivotal. All rights reserved. Attribution models Careful design of experiment to quantify the Impact 38
  • Launching a successful diabetes management program Interdisciplinary collaboration of data scientists essential to success Marketing Increase Awareness Healthcare Patient Enrollment Web Analytics Comparative Effectiveness Remote Patient Monitoring Optimization Design Interventions General ML Measure Impact on Population Best channel per cohort Best therapy for Resource each cohort: allocation Identify highest •  Medication decisions impact channels •  Delivery Medication Method adherence Stochastic •  Monitoring Churn Identify entity prediction influencers Method Predict risk of resolution negative Measure Campaign outcome for engagement optimization A/B testing to design best next 3 months engagement platform © Copyright 2013 Pivotal. All rights reserved. Attribution models Careful design of experiment to quantify the Impact 39
  • Pivotal Labs rapid application development !  Rheumatoid arthritis remote patient monitoring system –  Self-reporting –  Intuitive user interface https://itunes.apple.com/us/app/myra/id563338979?mt=8 © Copyright 2013 Pivotal. All rights reserved. 40
  • Pivotal One: Heritage Application Fabric Data Fabric GemFire Ingest & Query: very high-capacity & in-memory Scale-out storage: HDFS/Object vFabric Languages & Frameworks Services Analytics Automation: App Provisioning & Life-cycle Service Registry Cloud Abstraction (portability) Cloud Fabric © Copyright 2013 Pivotal. All rights reserved. 41
  • A NEW PLATFORM FOR A NEW ERA