Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RWE & Patient Analytics
Leveraging Databricks
An Use Case
Harini Gopalakrishnan & Martin Longpre
Sanofi
Disclaimer
• The views and opinions expressed in this presentation are that of
the individual presenter and should not be ...
Agenda
Harini Gopalakrishnan -20 minutes
▪ What is Real world evidence and Real world data
▪ Advanced analytics in RWE gen...
Defining the Problem- Real World Data and
Evidence
Context: How do we define RWE & RWD
Real World Data (RWD) is a term used to
describe health care related data that are
col...
Analysis in RWE: Advanced analytics methodology
Traditional analytics
• Traditional RWE statistics, meta-analysis, data mo...
Uses of RWE – why is it valuable
https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development
The...
Transforming RWD to Evidence: Use case in action
AI based indication searching approach that relies on Real-World Data thu...
Winner of the Gartner Award 2020 for Innovation in Health care and
Lifesciences
https://www.gartner.com/en/newsroom/press-...
Trust of data and analysis being performed is a MUST
“ Patients and consumers have a
significant role to play in the
colle...
Our Architecture & implementation
Key aspects of a RWE Ecosystem
Data
Management
Secure data
storage – triple
encrypted with
audited access
control
Full dat...
Powerful computer resources to handle billions of rows of data
Complete history of all data updates, with ability to bind ...
14
Data is always privacy preserved and de-identified. We do not own the KEY for re-identification within this eco system
...
When do we use Databricks
▪ Exploratory use cases – projects where we need to run AI/ML workflow for use cases that requir...
▪ Usage of our Azure AD
configuration
▪ One AD groups per data
type
▪ Deactivation of the DBFS
file system for end users
(...
▪ Only used for specific use case mostly for
Rstudio
▪Fully integrated to our AWS stack
▪IAM roles setup for S3 bucket acc...
Demo and Future
Demo
Improvements
▪ Support for R studio
▪ Data access control and policy propagation to restrict
unauthorized use of data- no ...
Summary- Our Journey and benefits
▪ Started from a traditional ware house 3
years ago to crate an end to end eco
system fo...
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

RWE & Patient Analytics Leveraging Databricks – A Use Case

Download to read offline

Gaining insights and knowledge from real-world health data (RWD), i.e., data acquired outside the context of randomized clinical trials, has been an area of continued opportunity for pharma organizations.

What is real-world data and real-world evidence – how it is generated, what value it drives for life sciences in general and what kind of analytics are performed.

What are some considerations and challenges related to data security, privacy, and industrialization of a big data platform hosted in the cloud.

How we leveraged Databricks to perform big data ingestion – advantages over native AWS Batch/Glue Examples of some of the advanced analytics use cases downstream that leveraged DB for RWE.

Note: This solution and one of the use case leveraging the solution won the 2020 Gartner Eye for Innovation award.

https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-the-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-awardIn this

  • Be the first to like this

RWE & Patient Analytics Leveraging Databricks – A Use Case

  1. 1. RWE & Patient Analytics Leveraging Databricks An Use Case Harini Gopalakrishnan & Martin Longpre Sanofi
  2. 2. Disclaimer • The views and opinions expressed in this presentation are that of the individual presenter and should not be attributed to any organization with whom the presenter is employed or affiliated • All registered trademarks cited are property of their respective owners.
  3. 3. Agenda Harini Gopalakrishnan -20 minutes ▪ What is Real world evidence and Real world data ▪ Advanced analytics in RWE generation ▪ Security and privacy of our Data ▪ Our journey – an conceptual view of the architecture and what we have achieved Martin Longpre – 20 minutes ▪ Databricks implementation- our customization ▪ Demo ▪ Look forward: where we want to partner for improvements Q&A – 20 minutes
  4. 4. Defining the Problem- Real World Data and Evidence
  5. 5. Context: How do we define RWE & RWD Real World Data (RWD) is a term used to describe health care related data that are collected outside the context of randomized clinical trials (RCTs), Real world evidence (RWE) is defined as the insight or knowledge derived from the analysis of real world data, conducted to respond to a specific research question RWE leverages analytics on RWD to discover, develop, deliver and provide new insights on healthcare interventions Examples of Real-world data sources ~ 130 TB (EHR/Claims) ~2000 TB per month in versions, transformations
  6. 6. Analysis in RWE: Advanced analytics methodology Traditional analytics • Traditional RWE statistics, meta-analysis, data modelling, propensity-score matching Advanced analytics • Predictive modelling, unsupervised clustering, rule extraction, model bootstrapping, natural language processing, machine learning Machine learning: a computer program is said to learn from experience (partially captured within data), when its performance increases with experience Supervised techniques example • Logistic regression • Markov chain • Bayesian network • K-nearest neighbour Non-supervised techniques examples • K-means clustering • Hierarchical ascendant classification • Factorial analyses • Non-negative matrix factorization Innovation in evidence generation
  7. 7. Uses of RWE – why is it valuable https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development The driving reasons for leveraging them more recently include: • Ease of availability in compute resources for big data • Availability of curated and high quality data sources both internally and externally Real world evidence influences all aspects of a pharma value chain Regulatory Decision making Reimbursement decisions Clinical Guidelines 2 3 1
  8. 8. Transforming RWD to Evidence: Use case in action AI based indication searching approach that relies on Real-World Data thus bringing a higher confidence and reducing biases Data is always privacy preserved and de-identified Sanofi: Novel Indications via AI — Finding new treatment indications for an approved therapy is of immense value to pharma for drug re-purposing efforts, R&D candidate prioritization, and overall productivity. Sanofi wanted to develop an AI based indication searching approach that relies on real-world data thus bringing a higher confidence and reducing biases. Sanofi applied unsupervised machine learning to create a phenotypic cluster of patients in order to identify relevant indications that worked across clusters. The pipeline crunched nearly 17 million patients with 2,700 characteristics derived from electronic health records (EHRs) The initial results of the novel approach recovered 90% of known indications and identified many more deemed credible by development teams producing a higher level of confidence in results and a reduction in cost and time to market, with fewer, faster and more targeted trials, while minimizing attrition and risk. https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
  9. 9. Winner of the Gartner Award 2020 for Innovation in Health care and Lifesciences https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
  10. 10. Trust of data and analysis being performed is a MUST “ Patients and consumers have a significant role to play in the collection of real-world data and generation of real-world evidence, but to be effective, patient and consumer engagement approaches would include considering them partners and capturing outcomes that are important to them “ ▪ Patient consent is a must ▪ Privacy preserved linkage must be performed, encryption is a key aspect ▪ Establish trusted Patient relationship to explain the usage of data and consent (e. g: secondary use of primary data) ▪ Data should not be used beyond the intended purpose- governance around the usage is a must
  11. 11. Our Architecture & implementation
  12. 12. Key aspects of a RWE Ecosystem Data Management Secure data storage – triple encrypted with audited access control Full data lineage – complete history of every data transformation Data pipeline – designed for high performance handling of big data Analytics Self-service tools – filtering and querying tools for feasibility an descriptive information Interactive tools – dashboards and applications for study execution Low-level tools – R, Python and SQL for comparative analysis and advanced analytics Access Control Multi-tenant configuration – provide each organization with their own namespace User provisioning – role-based access controlled by each organization Inherited data permissions – transformed data retains access control Auditing and Monitoring Full auditing of user actions – log each action and generate reports Comprehensive monitoring – performance, usage, and custom actions
  13. 13. Powerful computer resources to handle billions of rows of data Complete history of all data updates, with ability to bind to specific versions Complete data traceability – every transform and resulting data set is captured Robust data security and access control for all data and projects Ability to manage metadata, reference data and master data Built on a scalable data lake What does our system offer?
  14. 14. 14 Data is always privacy preserved and de-identified. We do not own the KEY for re-identification within this eco system Disclaimer: For example purposes only Clinical Bioinformatics Internal Sources External Sources Self Service Analysis Advanced Analytics Data Augmentation Visualization / Dashboards Data lake (Sanofi AWS ) Artificial Intelligence/ML Standardized analytical workflows Cohort Definitions and Data Modelling Conventional Studies (NLP) Secured and Traceable Sanofi controlled environment Data and Analysis Collaboration* Societies and Consortia Academic Institutions Regulatory Agencies Internal sources Insights External Collaboration Other Internal Platforms The Conceptual architecture https://aws.amazon.com/blogs/industries/sanofi-webinar-performing-end-to-end-real-world-evidence-generation-with-traceability-and-transparency-on-aws/ Data lake (Secured and Access controlled at the data level)
  15. 15. When do we use Databricks ▪ Exploratory use cases – projects where we need to run AI/ML workflow for use cases that require GPU , custom libraries, NLP /sentiment analysis ▪ Cross functional team: working on a specific project – both internal and external stakeholders ▪ Flexibility: Ability for users to manage their own cluster profiles – size up and down based on policy ▪ Data ingestion pipelines migrating away from AWS Glue and Batch for cost and performance reasons- 30% improvement in costs & productivity ▪ Delta lake under analysis: today it is directly managed in parquet /S3 ▪ SQL analytics: under evaluation
  16. 16. ▪ Usage of our Azure AD configuration ▪ One AD groups per data type ▪ Deactivation of the DBFS file system for end users (DBFS not align with our data restriction polices) ▪ All data access are predefined and available through /mnt ▪ Integration of the DB REPOS feature connected directly to our enterprise Gitlab services ▪ Usage of CI/CD pipelines for deploying scripts and tasks Passthrough for Security ▪ Cluster names suffixed with the policies names for audit and monitoring ▪ Limit the type of worker and driver for better budget management ▪ Enforce the termination of cluster with default values based on projects/use cases (manage by cluster policies) Databricks Customization (1/2) Gitlab integration Cluster Policies
  17. 17. ▪ Only used for specific use case mostly for Rstudio ▪Fully integrated to our AWS stack ▪IAM roles setup for S3 bucket accesses ▪One home folder per users created by default (internal process) Instance Profiling IAM roles and policies Databricks Customization (2/2)
  18. 18. Demo and Future Demo
  19. 19. Improvements ▪ Support for R studio ▪ Data access control and policy propagation to restrict unauthorized use of data- no lineage on data
  20. 20. Summary- Our Journey and benefits ▪ Started from a traditional ware house 3 years ago to crate an end to end eco system for evidence generation and insights ▪ Helped move away from conventional to more advanced analytical approaches leveraging the power of big data and cloud ▪ Delivered several evidence generating studies, i.e studies at scale that have impacted all aspects of pharma value chain with demonstratable ROI https://www.dovepress.com/cr_data/article_fulltext/s160000/160029/img/jmdh-160029_F003.jpg
  21. 21. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Gaining insights and knowledge from real-world health data (RWD), i.e., data acquired outside the context of randomized clinical trials, has been an area of continued opportunity for pharma organizations. What is real-world data and real-world evidence – how it is generated, what value it drives for life sciences in general and what kind of analytics are performed. What are some considerations and challenges related to data security, privacy, and industrialization of a big data platform hosted in the cloud. How we leveraged Databricks to perform big data ingestion – advantages over native AWS Batch/Glue Examples of some of the advanced analytics use cases downstream that leveraged DB for RWE. Note: This solution and one of the use case leveraging the solution won the 2020 Gartner Eye for Innovation award. https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-the-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-awardIn this

Views

Total views

196

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

95

Shares

0

Comments

0

Likes

0

×