The document discusses Celgene's real-world evidence platform called Synapse. It describes how Synapse enables therapeutic innovation through its data lake, analysis tools, code and cohort sharing capabilities, and self-serve applications. Synapse has ingested and regularly refreshes data from 24 real-world databases totaling 120-150 million patients to support evidence generation across the drug development pipeline.
10. DAN HOUSMAN
Consulting Managing Director,
ConvergeHEALTH
Deloitte Consulting, LLP
THE SPEAKERS
Dan Housman is a Director with Deloitte Consulting.
He is a software veteran with a scientific education at
MIT in Chemistry and Biology. He brings a strong
knowledge combining a passion for medicine and a
demonstrated track record of providing valuable and
innovative product management for supporting
complex distributed data analytics systems.
Dan directs ConvergeHEALTH’s product innovation
efforts with a focus on creating packaged products for
real world evidence and bioinformatics through the
Deloitte data warehouse platform and Miner Suite.
PATRICK LOERCH
Sr. Director, Data Sciences
Celgene
Patrick Loerch is a Sr. Director with Celgene. He is
a biostatistician and biochemist by training with
15+ years of experience analyzing genomics,
clinical and real world healthcare datasets in roles
throughout the pharmaceutical pipeline; from target
identification through to commercial operations. He
is responsible for building out the core data
sciences capability at Celgene. In addition to
building out the global team, he actively codes in R
and Python and brings a strong knowledge and
appreciation of the challenges associated with
applying scalable data sciences in healthcare to
create value for patients.
11. Deloitte found in our Real World Evidence Benchmarking Survey from March 2017 that real-world evidence is a top priority
for the industry, and companies are looking to transform capabilities through focus and investment
C o m p a n i e s A r e P r i o r i t i z i n g E v i d e n c e G e n e r a t i o n . . .
are seeking knowledge management solutions
that will enable broad sharing of information
around the organization about studies conducted,
evidence generated, and data available
54%
80% of companies with mature evidence capabilities
utilize cloud-based systems to support their
evidence work
Our survey shows that life sciences
companies are making progress in using
real-world evidence but still have
opportunities to expand applications across
the value chain, consider new channels to
access evidence, and improve overall
capabilities
93%
believe their current real-world evidence
capability is not meeting the needs of the
organization and are investing in expanding them
12. Major wave of investments by pharma created market learnings that are now driving a shift across industry in operating
models and technology choices
E m e r g i n g Tr e n d s i n A p p l i c a t i o n o f E v i d e n c e
Current state Emerging model
RWE mandates Observing and understanding Shaping and influencing
IT &
Informatics
Tightly coupled data and analytics solutions for
technical “experts”
Standards-based, modular architectures to empower
domain experts
Source of data
and tools
Data providers seen as vendors and suppliers Data providers seen as partners and collaborators
Value creation Evidence generation primarily supports commercial objectives
More in-depth understanding of disease, epidemiology, and
treatment standards, creating evidence to inform product
strategy, design, and patient care
13. Data Providers
A c h i e v i n g a g i l e d e v / o p s e ff i c i e n c y f o r e v i d e n c e d e v e l o p m e n t
Success in evidence life cycle management depends on decreasing cycle time to anticipate data supply needs from
sources by iterating to find and fill gaps through collaboration with data providers
Clinical
Translational
Patient reported
Strategic Partners
Find
Gaps
Anticipate
DataNeed
Engage
Ingest
Query
Ev/Ops
Insight Pipeline
14. End Users
RWE
RCT
Registry
‘omics
Common data
models (RWE,
Clinical)
Data layer
Analysis layer
Data
exploration
Measures
library
Analytics code
library
Study &
partner library
AccessControl
• Analyst
• Data scientist
• Data steward
• Researcher
• Collaborators
• Project/program
managers
Knowledge
management
Applications and standard
reports
Analytical tools and
compute environments
Evidence Generation Platform
N e w I n t e g r a t e d A n a l y t i c s , K n o wl e d g e M a n a g e m e n t &
C o l l a b o r a t i o n P l a t f o r m s A r e E m e r g i n g
16. D e e p e r D i v e : T h e Te c h n i c a l A r c h i t e c t u r e
Users
RWE
Sources
Third
Party
RCT
Sources
Persistent Storage
Data Integration
(PROD & Non-PROD Cluster)
Research
Application
Data Lake Storage – Amazon S3
Archival Layer – Amazon Glacier
MapR on EC2
(Hadoop, Drill,
Spark, HIVE, R)
Amazon
EBS
HTTPS
On-demand Instance
MapR on EC2
(Hadoop, Drill, Spark, HIVE, R)
Amazon
EBS
MapR on EC2
(Hadoop, Drill,
Spark, HIVE, R)
Amazon
EBS
Segregated Instance
Reserved Instance
Reserved Instance
Tableau
on EC2
Spotfire
on EC2
i2b2
on EC2
SAS
on EC2 Redshift DB
Management
Directory
Service
Identity & Access CloudTrail
Config
Cloud Watch
Key Mgmt Service CloudFormation
Internal
Users
Admins
LEGEND
High Resilience
AWS Data Pipeline
Future capabilities
Sources
Analytics Platform
Persistent Storage
Data Integration
Analytic Platform
Applications
Cloudera on EC2
(Impala, Pig, Spark,
HIVE, R)
Data Lake Storage – Amazon S3
Archival Layer – Amazon Glacier
HTTPS/S-FTP
Cloudera on EC2
(Impala, Pig, Spark, HIVE, R)
Management
Directory Service
Identity & Access CloudTrail
AWS Config
Data Dog
Key Mgmt Service Ansible
AWS Data Pipeline
Registry
Genomics
Social
Media
Data
AWS RDS
External
Collaborators
Extension to Cloud
based ‘omic analysis
platforms
Sensor
Data
Micro batch
data ingestion
R &
Python
on EC2
Workspaces
OHDSI
Tools on
EC2
Research Trust
Integration
ODBC
Claims
EMR
RCT
Infrequent
Access Zone
Visuali-
zationTableau
on EC2
Spotfire
on EC2
Splunk
SAS &
SPSS
on EC2
Knowledge
Mgmt
Data
Catalog
Analytics Instance
Hadoop on EMR
(Impala, Pig, Spark,
HIVE, R)
Analytics Instance
External Sandbox
Hadoop on EMR
(Impala, Pig, Spark,
HIVE, R)
Analytics Instance
Segregated Instance
Data Ingestion Cluster
GitHub Enterprise
17. DAN HOUSMAN
Consulting Managing Director,
ConvergeHEALTH
Deloitte Consulting, LLP
THE SPEAKERS
Dan Housman is a Director with Deloitte Consulting.
He is a software veteran with a scientific education at
MIT in Chemistry and Biology. He brings a strong
knowledge combining a passion for medicine and a
demonstrated track record of providing valuable and
innovative product management for supporting
complex distributed data analytics systems.
Dan directs ConvergeHEALTH’s product innovation
efforts with a focus on creating packaged products for
real world evidence and bioinformatics through the
Deloitte data warehouse platform and Miner Suite.
PATRICK LOERCH
Sr. Director, Data Sciences
Celgene
Patrick Loerch is a Sr. Director with Celgene. He is
a biostatistician and biochemist by training with
15+ years of experience analyzing genomics,
clinical and real world healthcare datasets in roles
throughout the pharmaceutical pipeline; from target
identification through to commercial operations. He
is responsible for building out the core data
sciences capability at Celgene. In addition to
building out the global team, he actively codes in R
and Python and brings a strong knowledge and
appreciation of the challenges associated with
applying scalable data sciences in healthcare to
create value for patients.
18. Build Foundation
Enable Business
Realize Value
● Learn
● Collaborate
● Pioneer
● Network of Strategic Partners
● Organizational Expertise
● Pioneering Mindset● Connective Technology
(Synapse platform)
● Data Harmonization
(Governance & taxonomy)
i K U : Tr a n s f o r m i n g H o w We T h i n k , A c t , a n d L e a d wi t h D a t a
19. Research & Early
Development
Market Access
& HEOR
Commercial &
Regulatory
Ph I Ph II Ph III
Early Research
Clinical
Global Clinical Trials Operations
Regulatory
Translational
HEOR
Medical Affairs
Commercial Ops
Market Access
Pharmacoepidemiology
P a t i e n t D a t a D r i v e s D e c i s i o n s A c r o s s t h e P i p e l i n e
20. Research & Early
Development
Market Access
& HEOR
Commercial &
Regulatory
Ph I Ph II Ph III
Persistent
salespeople
1,000s of
siloed employees
Countless,
redundant data
silos
Then: The Rise of the External Data Vendor
21. Dataset Refresh Rate Population Coverage Data Type Coverage
Safety Data Daily (1 day lag) Subset of Patients Safety Data
Patient Engagement Weekly (no lag) Small Subset of Patients Limited Sales Data
Sales Operations #1 Weekly (no lag) Small Subset of Patients Sales Data
Sales Operations #2 Weekly (1 week lag) Subset of Providers Limited Sales Data
Sales Operations #2 Weekly (3-4 day lag) Subset of Providers Sales Data
Prescriber Network Weekly (no lag) Reference Data Reference Data
Shipments Data Weekly Subset of Providers Individual Product
Demand Data Monthly (1 month lag) Subset of Providers Individual Product
Healthcare Org Services Data Monthly ( 1 month lag) Subset of Providers Reference Data
Partner Data #1 Monthly (1 month lag) Disease Population EMR Data
Claims Data #1 Quarterly (4 month lag) US Population Claims Data
Claims Data #2 Quarterly (9 month lag) US Population Claims Data
Claims Data #2 (OMOP) Quarterly (9 month lag) US Population Claims Data
Claims Data #3 Quarterly (9 month lag) US Population Claims Data
Claims Data #3 (OMOP) Quarterly (9 month lag) US Population Claims Data
Provider Network Data #1 Twice per Year (6 month lag) US Providers Channel Affinity
Provider Network Data #2 Twice per Year (6 month lag) US Providers HCP Accessibility
Partner Data #2 Monthly Disease Population EMR, Genomics & Lab Data
Partner Data #3 Quarterly European Populations EMR, Genomics & Lab Data
Partner Data #4 Monthly Disease Patients EMR, Genomics & Lab Data
CommOps
HEOR
BrandTeams
Narrow Coverage Broad Coverage
Slow Refresh Rapid Refresh
Market Research Data
Market Research Data #1 Monthly Subset of Providers Chart Abstraction
Market Research Data #2 Monthly Subset of Providers Chart Abstraction
Real-World Partnerships
*to be Ingested
GCRDO
R&ED
Safety
Medical
Affairs
Market
Access
Today: What is Our Data Footprint?
23. Cultivating a Network of Data Partners
• Cultivating clinical and genomic data on
previously uncharacterized patients to drive
insights across the value chain
• Leverage emerging technologies and
skill-sets to augment internal capabilities
• Risk/balance portfolio approach with near,
mid and long-term value creation
• Partnerships aligned with Celgene’s
strategic direction
• Deal structures tailored to aligned
interests with partners
• Active engagement through business
development and alliance management
Value
Approach
24. Registry Dataset #1 (n=24k)
Registry Dataset #2 (n=10k)
RWD Partner #1 (n=5k)
RWD Partner #2 (n=50k)
RWD Partner #3 (n=3k)
Logical Cohort
Definition
Clinical Trial Protocol
Medical Affairs
Questions
Safety Questions
Market Access
Questions
88 patients
15 patients
139
patients
50 patients
157
patientsSME Input
Serum M-protein greater or equal to 0.5 g/dL
OR Urine M-protein greater or equal to 200 mg/24 h
OR Serum free light chain (FLC) assay: involved FLC level
greater or equal to 10 mg/dL (100 mg/L) provided serum
FLC ratio is abnormal
OR A biopsy-proven evaluable plasmacytoma
OR Bone marrow plasma cells > 30% of total bone marrow
cells
OR a patient with two diagnoses of MM (ICD_9_CM =
203.0X*, ICD10 = C90.0-*) at least 30 days apart
> 25% increase in M Protein in blood and/or urine
OR > 25% increase in plasma cells in bone marrow
OR New bone leasions or increase in size of
existing lesions
OR a gap in all therapy of ≥180 days
OR addition of (or switch to) a new MM drug (any
of IMiDs, proteasome inhibitors, biologics, or all
cytotoxic agents) to the current regimen after ≥60
days on the prior regimen, unless the new drug(s)
is/are for maintenance therapy
OR re-start of previous MM drug(s) after a gap of
≥180 days unless the restart was maintenance
therapy
OR an increase in the maintenance dose of any
component of the previous regimen back to or
above the initial treatment dose. This mainly
applies to Revlimid, and should be considered a
new line even if it happens within 180 days.
Multiple Myeloma Diagnosis
Progressive Disease
select h.patientid, j.birthyear,…
from …
where linename like "%Lenalidomide%" or…
select h.patientid, j.birthyear,…
from …
where linename like "%Lenalidomide%" or…
select h.patientid, j.birthyear,…
from …
where linename like "%Lenalidomide%" or…
select h.patientid, j.birthyear,…
from …
where linename like "%Lenalidomide%" or…
select h.patientid, j.birthyear,…
from …
where linename like "%Lenalidomide%" or…
DB-specific data
extraction
Comparator Cohort
Exec ution Foc us ed on the D evelopment of N ovel Medic ines
25. Data Lake
& CDMs
Analysis
Tools
Code &
Cohort
Sharing
Visualization
Self
Serve
Applications
Summary Statistics
• 388 RWE databases tracked
• 24 DBs ingested & regularly
refreshed
• ~120-150M patients per claims DB
Features
• “Big Data” Analytics
• Data governance
• Code governance
• Interactive visualizations
• Self-serve applications (non-coders)
Data
Catalog &
Search
Kn o w le d g e
M a n a g e m e n t
Platfor m
26. Introduction
• Treatment for CD has advanced over the past 20 years with the introduction of biologics
• Despite the availability of biologics, patients may not be optimally managed
Objective
• The aim of this study is to identify and visualize CD treatment pathways to gain insight into
real-world treatment patterns
Methods
• The MarketScan Commercial and Medicare Databases were used to assess treatment
pathways in a large US insured population
‒ Patients had ≥ 2 consecutive health claims for CD* or UC† ≥ 30 days apart, with
≥ 1 occurrence of NDC/HCPCS codes for CD or UC medications from January 1, 2008, to March 31, 2016
‒ Required ≥ 3 (1 pre-diagnosis + 2 post-diagnosis) years of continuous enrollment
E x.: R e a l - Wo r l d Tr e a tm e nt P a th wa ys i n C r o h n ’s D i s e a s e
27. 5-ASA=5-aminosalicylic acid; IST=immunosuppressant (i.e., immunomodulator)
42%
35%
7%
Corticosteroids
5-ASA
5-ASA+Corticosteroids
IST
Surgery
Biologic
Other_Combo_NonBio
Other_Combo_Bio
Biologic+IST
5-ASA+Corticosteroids
Corticosteroids
5-ASA
Other_Combo_NonBio
IST
Other_Combo_Bio
Biologic
Surgery
Biologic+IST
Corticosteroids
5-ASA
5-ASA+Corticosteroids
Other_Combo_NonBio
IST
Biologic
Surgery
Biologic+IST
Other_Combo_Bio
N=16,260
Crohn’s Diagnosis
E x.: R e a l - Wo rl d Tre a tm e nt P a th wa ys i n C ro h n ’s D i s e a s e
28. E x.: R e a l - Wo r l d Tr e a tm e nt P a th wa ys i n C r o h n ’s D i s e a s e
Data-driven
clinical trial
site identification
29. Data Providers
Ac hieving agile dev/ops effic ienc y for evidenc e development
Success in evidence life cycle management depends on decreasing cycle time to anticipate data supply needs from
sources by iterating to find and fill gaps through collaboration with data providers
Clinical
Translational
Patient reported
Strategic Partners
Find
Gaps
Anticipate
DataNeed
Engage
Ingest
Query
Ev/Ops
Insight Pipeline
30. Takeaway
• The changing health-care landscape requires pharmaceutical companies to
increasingly become data-driven organizations
• Data needs to be proactively cultivated and evolve within the context of current
and upcoming medicines in the discovery and development pipeline
• As in other industries, IT and information platforms need to encourage data,
knowledge, and code sharing…with the appropriate access controls
• Through working with Deloitte, and leveraging AWS, Celgene has developed
an industry-leading, global platform spanning from data ingestion to knowledge
sharing