tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons Learned in Academic and Life Science Settings
Dan Housman, Recombinant by Deloitte
The Recombinant by Deloitte team has worked with organizations such as Kimmel Cancer Center as a model to adapt existing mature i2b2 implementations to meet business and scientific needs. Other organizations are increasingly focused on how to use cloud and high performance computing models to achieve different performance levels. Advanced initiatives are progressing to link commercial tools such as Qlikview to explore tranSMART data and to solve for key gaps in scientific pipelines. Dan will present recent lessons learned, new capabilities, and some of the impact on the path forwards for future tranSMART updates.
4. Recombinant + Deloitte - Organization Within Deloitte
Deloitte U.S. Firms
Audit &
Enterprise
Risk
Services
Tax
Financial
Advisory
Services
Consulting
Human Capital
Technology
Strategy & Ops
Service Area
Service Area
Service Area
Information Management & Life
Sciences Health Care
Consulting
Services
Innovation
Recombinant
By
Dedicated US-India
(USI) Resources
4
-4-
PPT Template_Recombinant by Deloitte.potx
Deloitte
5. Recombinant Vision For Capabilities for Translational Medicine
Payors
-5-
PPT Template_Recombinant by Deloitte.potx
Pharma
6. General Market Approach
Products / Tools
Data Strategy
Data Governance
Key
Capabilities
Target
Markets
Data Warehousing/Bioinformatics
Implementations
Professional Open Source Support
Contracts
Data Trust
Selectrus Analytics
Miner Suite
Clinical Performance Improvement
Clinical Quality
Operational Excellence
Accountable Care
Provider /
Research
ACO / Payer
-6-
Data Integration Hub
Open source tools
(I2B2, SHRINE, tranSMA
RT)
Translational Research
Cost Effectiveness
Comparative Effectiveness
Pharmacovigilance
Life Sciences
Federal
6
PPT Template_Recombinant by Deloitte.potx
Services
11. (Dan Housman’s)
Translational Research Enterprise Informatics
Infrastructure Maturity Model
Level 7
Level 6
Level 5
Cognitive computing
Real time decision support
External innovation and validation/optimization
Level 4
Level 3
Level 2
Level 1
Level 0
Business focused solutions
Enterprise utilization and standardization
Data integration – data warehouse
Fragmented and siloed analyses
Reliance on external vendors
12. Translational Research Enterprise Informatics nfrastructure Maturity Model
Level 7
Cognitive computing: Advanced ‘many to many’ unsupervised discovery algorithms with sufficient supporting underlying semantic models .
Use of very large compute to identify hard to find insights. Advanced imaging feature detection analysis. Significant use of NLP enrichment
and ‘on demand’ access to external data on ad-hoc basis. Broad access and use of phenotype and genotype for large populations. Use of in
silico models for systems biology translated from inputs and molecular innovation.
Level 6
Real time decision support: Use of predictive analytics to drive decision support at multiple levels. Patient level decision support with use of
molecular markers such as trial recruitment at point of care leveraging informatics services. Real time access to data from active studies.
Rapid incorporation of a broad array of data from data platforms such as microbiome, PRO, home health devices. CFR 11 validation of
translational analysis tools for use in active studies. Broad establishment of enterprise data driven culture within organization. Advanced
rapid access to data visualizations.
Level 5
External innovation and validation/optimization: Extensive automated data exchange, broad data access contracting. Collaboration cloud
with pre-competitive partners including AMCs, patient advocacy, peers, and commercial data providers. Execution of complex pipelines
across multiple modes of data e.g. mRNA and NGS and literature. Federated queries across multiple institutions and modes to answer key
questions. Collaborative environments with shared users and identity management and social networking. Automated tiered storage and
compute to manage very large data sets and reanalysis pipelines. Use of semantic web tools to expose resources. Common internal and
external tools and approaches such as OMOP.
Level 4
Business focused solutions: Differentiated solutions by business area such as health economics, safety, research, operations, marker
discovery, lab/sample availability, competitive analysis, pre-clinical, etc. Demonstrated and published results driving key business decisions
achieved from enterprise informatics frameworks. Integration of translational research informatics with multiple enterprise systems such as
portfolio management. Significant curated library by use containing clinical studies and associated open/public data. Secure web service API
access to data. Standard and shared algorithms and methods across disparate internal teams. Access to broad array of real world evidence
sources e.g. Twitter, adverse events, surveillance partnerships.
Level 3
Enterprise utilization and standardization: Focus on use of data for decision making in major R&D cycle decisions. Documented governance
of use of data, quality processes for data, and internal/external sharing. Semantic translation of studies into common formats. Cross study
and multiple platform analysis enablement through integration of analytic pipelines and advanced standardization. Central informatics
framework for interfacing to multiple commercial, open, and internally developed research platforms. Policy based self service access to
data. Factory model and self-service curation. Acquired data sets from subscriptions converted into standard formats or repository system.
Level 2
Data integration – data warehouse: Centralization of translational research data sets in single DBMS repository. Data includes clinical
studies, molecular assays, observational studies, 3rd party data. Linkage at patient level across data and between data and analyses. Access
controlled by ad-hoc governance model with honest broker or service delivery focus on analyses on an as needed basis to share data across
groups. Self service access via web for browsing and exploring data including basic analyses. Focused pilots engage early adopter users.
Level 1
Fragmented and siloed analyses: Silo approach to clinical data controlled through experts such as biostatistics groups. Data stored in primary
forms such as SAS data sets and files in organized directories. Analyses produced are ad-hoc with specific tools. Internal development of
systems to offer intranet or file server access to data files beyond. Recognition of governance needs. Subscription services manage reference
data or to search external data. Basic catalog available through files or experts. Desktop analysis tools primary interface to data.
Level 0
Reliance on external vendors: Historical focus on clinical only data sets with no ‘Omics and data integration internally. External vendors
exclusively generate analyses for combined clinical and molecular data. Infrastructure for storage is file servers with limited governance and
generally report focus. Limited to no institutional knowledge of available data sets from historical work.
14. Translational Research Platform
Research Portal
Precision Miner
Application Layer
(Miner)
Study
Design
Study
Recruitment
Manager
Cohort
Identification
Population
Miner
Outcomes
Miner
i2b2
In Memory
Exploration
Compare
tranSMART+
Patient
Journey
Security and Identity Management
Cohort Matching
Business and
Analytical
Services
Data Management, Storage
and Processing Engine
Safety
‘Omic
Explorer
Metrics Calculation
RIE Services
Data/Messaging
APIs
Clinical/Omics
Terminology
Mapping
Statistical Model
Execution
Knowledge
Management
Data Trust (DT)
Research Trust
Data Marts (ADM,
Research Mart,
CFDM, OMOP, i2b2)
Master Patient
Indexing
‘Omic Data
Management
OADM DT
Extensions
Data Processing
Pipelines
Data Integration
Data Integration
Hub (DIH)
Data De-ID/Re-ID
Services
Data Acquisition
Custom ETL
Packaged
Parsers/Adapters
Primary Sources
Research
Datasets
EMR/Clinical
Clinical Trials
Metadata/
Terminology
Services
29. Big picture type solution for ‘AMC’ genomics initiatives
Closed Loop
Data Warehouse /
Research Stores
Source Data
Clinical EMRs
& Claims
Data Workflow/
Enhancement
RI Analytics & Care Delivery
Translational Research
Applications
Data Trust
Extended Systems
Clinical
Partner
Clinical data
Research Trust
Labs
Clinical Trials,
Registries,
Internal/External
Results
Data
Curation
Honest Broker
Data Pipeline
Omics/Cohort
Explorer
Research Open Source
Research
i2b2
Data DeIdentification
tranSMART/
Sample Explorer
Omics
File Store
e.g. genomics (BAM, VCF, CEL)
Publications, PDF, Pathology
Biobanks
LIMS
Statistical Analysis
‘Omics
Platforms
(CLC Bio)
Master Data Management
MPI/Provider
Common
Services
Research
Data Marts
Research
Information
Exchange
Research Portal
ETL
Study
Recruitment
Manager
Security
MPI
Scientific
Reference
R
SPSS
SAS
Terminology
Reference
Ref Data Mgmt Hub
Collaboration
*Note: Representative diagram – not all integrations are shown
- 29 -
HPC
Portal
Storage
37. XML to i2b2
REDCap
Archive
(ODM XML)
i2b2
Staging
File system
Oracle Schema
• Ontology
• CRC
EDC
system
of
choice
- 37 -
i2b2
PRD
38. Choose your Stud(ies)
• Choose
studies to be
imported
• Supply token
to be used
for study
• Click to
initiate export
- 38 -
39. Choose your Stud(ies)
• If a project that
has been
previously
exported is
selected, the
export process
begins by
cleaning out all
references to the
project from the
i2b2 staging
database.
- 39 -
42. Modifiers POC (Kimmel Cancer Center)
Informatics Core Director, KCC, TJU Director of
Research Informatics, and Research Professor
of Cancer Biology
Dr Jack London
Informaticist, KCC Informatics Shared Resource
Devjani Chatterjee, PhD
Kimmel Cancer Center Deputy Director for Basic
Science and Professor of Cancer Biology
Karen Knudsen, PhD
Assistant Professor Medical Oncology
Hushan Yang, PhD
Professor, Cancer Biology
Hallgeir Rui, MD, PhD
Vice President and CIO
Stephen Tranquillo
48. The FDA needed to explore new approaches to data management and analysis for
effective evaluation of product safety and efficacy
Business Problem
Ideal Effort
Current Effort
The FDA has committed to
improving their overall submission
review process
•
Resources were spending too
much time on basic tasks to
aggregate data across clinical trials
•
As a result, fewer resources were
available for high-value data and
regulatory analysis
Effort
•
Review Activities
Data
Curation &
Loading
Data
Selection
Data Management
Data
Analysis
Innovation
Learning
Sharing
Regulatory Science
Strategic Goals
•
•
•
Implement improved data management systems across the following multiple FDA Centers
Enable the ability to:
• Automate the process of loading clinical trial data from multiple source formats
• Correlate data across clinical trials through a simple and intuitive user interface
• Conduct advanced analytics across multiple data sets to better inform regulatory decisions
Shift the utilization of resources from basic data management to high-value regulatory science
- 48 -
Screenshots description: First, we are looking at insights into a single research study, specifically, a box plot of a gene signature list against all participants in the study who have genomic data loaded. This shows us large varients in two distinct subgroups (Type I and Type IV). We can then create a heatmap, limiting our selections to those subgroups showing the variance in genetic markers. It shows variations, but they are not statistically significant.
Screenshot Description: From our initial investigations into a single study, we can now perform a comparison of two different study groups to observe first the phenotypic differences (Age, Sex, etc.) and then compare the specific variances of two gene variants between the two treatment regimens. Now, we can observe that there is a statistically better outcome for one of the study groups who received a particular treatment.
~150k patients with various boxes indicating types of treatments available. Also other characteristics like age, gender and so on …Identified the two most key treatment medications with the help of provider which are believed to be the cornerstone of treatment in patients with severe asthma, but do not have much of a difference in outcomes overall …
Token access to selected projects is required. A project’s token may be obtained from the administrator for the REDCap instance.The Last Processed timestamp is displayed for projects that have already been exported, along with the status of the export attempt.If a project that has already be exported is re-selected for export, the export process begins by cleaning out all references to the project from the i2b2 database. That is, a REDCap project that is imported into i2b2 more than once will contain only records from the most recent successful import; all traces of previous exports will be gone.
Token access to selected projects is required. A project’s token may be obtained from the administrator for the REDCap instance.The Last Processed timestamp is displayed for projects that have already been exported, along with the status of the export attempt.If a project that has already be exported is re-selected for export, the export process begins by cleaning out all references to the project from the i2b2 database. That is, a REDCap project that is imported into i2b2 more than once will contain only records from the most recent successful import; all traces of previous exports will be gone.