SlideShare a Scribd company logo
1 of 55
Download to read offline
https://tag.bio • spadhi@tag.bio
Join us: Tag.bio community on Slack
Tag.bio: Self Service Data Mesh Platform
Your questions. Your data. Your answers.
NSF Big Data Hub: Data Sharing and Cyberinfrastructure Meeting
Sanjay Padhi
Chief Technologist
Executive Vice President
Abstract:
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data
warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as
domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products
combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights
using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned,
reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive
complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the
platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are
using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without
explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy
centric data products (confidential computing) as well as integration with cloud services
2
Agenda
● Introduction
● Data as a Product
● Data Products in a Mesh
● Platform for Collaboration
● Platform for Developers and Integrators
● Demo: Analysis Platform and Developer Studio
● Partnerships with Cloud providers and NIH STRIDES
● Q&A
3
Source: Computing Perspectives: 25th International Conference on Computing in High-Energy and Nuclear Physics, 2021 4
CERN: Project Approach with Distributed Storage
Distributed data management and storage is expensive – hardware and operations
Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
NIH Data Commons: Project approach with Data Lake(s)
Research projects ain’t cheap; the average award for an NIH grant is about half a million dollars.
5
Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
Data Lake based approach with workspaces and Jupyter Notebooks for analysis
6
Source: Susan Gregurick (2020): STRIDES and NIH-supported biomedical data sharing
As of July 2020
It takes months-to-years to derive insights
7
NIH STRIDES (2018 - ): Turning Research Data Into Knowledge and Discovery
8
Consumers
millions
data
DATA DATA
Life Sciences: Data Growth during Drug development process
9
Data Warehouse(s)
Source: Databricks
Structured Data
Historical - used 40+ years
Coupled Compute and Storage into a single entity: Multiple Data Warehouses
- Metadata layer (where data is located)
- A data model – an abstraction in the data warehouse
- Data lineage – the tale of the origins and transformations of data in the
warehouse
- Summarization – algorithmic work designed to create the data
- KPIs – where are key performance indicators found
- ETL – enabled application data to be transformed into corporate data
Limitations:
- AI/ML introduce iterative algorithms with direct data access (not always SQL based)
- variety of datasets that are not always structured (text, IoT, Objects, Binary)
10
Data Lakes and Lake-houses
Source: Databricks
Data Architecture(s)
11
● Data Warehouse(s) - Direct coupling between compute and storage
● Distributed to Centralized Data Storage and Compute
● Data Lakes
● Date Lakehouse
● Data Products and Mesh
Ways to communicate (information sharing) via APIs also evolved:
● Salesforce (2000) - added APIs on top of applications
● Facebook (2006) - gave developers access to user informations (photo, profiles, events)
● Google (2006) - share massive geographical data via APIs
● Twilio (2008) - Created an API for their entire product line (Calls, Texts)
12
Project Vs Product Approach towards Data Architecture
Data Product
Data products represent a harmonized, decentralized application layer on top of disparate data sources.
Along with employing a universal “smart” API, they also present a simple, clean, standardized data model for apps and data scientists who
would do queries and extract data frames.
Apps
Data to Data Product
Data as a Product - Tag.bio
13
1. Data (data engineers)
2. Algorithms (data scientists)
3. Analysis apps (domain experts)
Smart API
Data
Map
Algorithms
Analysis apps
2
3
1
Tag.bio Data Products
Bringing together 3 things and 3 groups
14
Components
15
All data products are built with 4 components:
1. Source data in a schema
2. Runtime business logic that can be performed
on source data upon request
3. Smart API to invoke requests and return
responses
4. SDKs/Clients which enable communication
between other systems and the API
Data
Map
Algorithms
Smart API
1
2
3
4
16
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team/Role:
B. Data Scientist
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Data Sources
Siloed data
Data
warehouses
Data lakes
Data products
DNA-Seq
RNA-Seq
Proteomics
Flow cytometry
Clinical trials
Data Types
Data Formats
CSV
JSON
SPARK
XML
SQL
Machine behavior
& maintenance
Other data types
Emerging data
types
17
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team:
B. Data Scientist: Integrated (ML) algorithms with interface to
R, Python, ML/AI as analysis apps that
researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
18
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Single Cell Gene
Expression
Rmarkdown Gene
Signature Report
Elastic Net Cross
Validation
19
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Maximize the value of your data
with domain-driven data products
S
e
q
u
e
n
c
e
/
g
r
a
p
h
C
o
m
p
a
r
i
s
o
n
S
t
a
t
s
C
l
u
s
t
e
r
i
n
g
Module API
Client
JSON
XML
CSV
SQL
Spark
Key-value
A
n
a
l
y
s
i
s
(
A
P
I
)
R
e
g
r
e
s
s
i
o
n
Prediction
Exploration
Data extraction
M
a
c
h
i
n
e
L
e
a
r
n
i
n
g
How does it work?
Data
Map
Algorithms R & Python Plugin
Data Mapped
1
3
Data Product is a Source
of versioned, immutable,
integrated data.
Developer Studio
(coder)
AI/ML
Analysis Platform
(domain expert UI)
Point and Click
analysis Apps
Domain experts
Notebook
integration
Data
Scientists
Smart API
Data
Map Algorithms
2
20
Data Mesh
It’s a paradigm shift to treat data as a product
Data mesh encompasses data products
that are oriented around domains & owned by cross-functional data teams
21
Zhamak Dehghani: Data Mesh: A Paradigm Shift in Data Architecture
Pharma: Domain Driven Workloads
Drug Development Process
Disparate data types slow the drug development process
22
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Biomarkers Omics Model
Organisms
Phase I Phase II Phase III Patient
Registries
Phase IV
Regulatory
Submissions
What Happens When You Apply Data Mesh To Pharma?
Biomarkers
Model
Organisms
Phase I
Drug Development Process
Harmonized, connected data sources accelerate drug development
Phase II
Phase III
Omics
Patient
Registries
Phase IV
23
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Regulatory
What Happens When You Turn Data into a Product?
Streamlined data analysis process
VS.
Data Scientist
Researchers
Data Engineer
Data Warehouses
Analysis Platform
Data Product
Data Product
?
Data Mesh
24
Data Lakes
Siloed Data
Data Product
Months Minutes
Researchers
? ? ? ?!
?!
Data Mesh
Distributed data products
connected into a data mesh
2
25
A customizable self service (end-to-end) data mesh platform
What Is Tag.bio?
Data Product
Domain-driven, harmonized &
decentralized application layer
1
Analysis Environment
Data analysis environment for
researchers & data scientists
3
Data Product
any
cloud
26
Data products deployed in an interoperable data mesh
Tag.bio Data Mesh
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Data mesh enables organizations to:
● Connect data sources without moving data
● Rapidly add new data types
● Connect all data sources to accelerate the
drug development cycle
Data Product
Data Product
Data Product
27
Data analysis environment to access data mesh & use data products
Analysis Environment
Analysis Platform
for Researchers
Use data products with
no-code analysis apps that
speak their language.
Collaborate with Data Scientist
on how apps should work.
Developer Studio
for Data Scientists
Build data products using a
familiar, Jupyter
notebook-based setting.
Plug them into the Analysis
Platform for researchers to use.
28
How Are Organizations
Using Tag.bio’s
- Data Mesh Platform
29
Top 5 Pharma: Translational Oncology
Harmonized 10+ Clinical Trials
Working towards comprehensive
harmonization of
all past & future trials
Example: Phase-III biomarker analysis
Reference: https://www.nature.com/articles/s41591-020-1044-8 Figure 1b
https://demo.tag.bio/node/fc-nct02684006-refined/cox_survival_protocol/results?
param_reconfig=2784
30
Example: TCGA Pan-Cancer ATLAS - UMAP Expression Clustering
Reference: https://pubmed.ncbi.nlm.nih.gov/29628290/
Analyzing 1000s
of Flow Cytometry
Samples
The Jackson Laboratory
Enabling users to analyze
samples from various
immunocompromised mouse
strains with xenografts from
human donors
32
HIPAA and California’s Confidentiality of Medical Information Act (CMIA) Compliant environment
https://medschool.ucsd.edu/research/actri/Informatics/Health-Data-Portal/services/Pages/Virtual-Research-Desktop-VRD.aspx
https://campuslisa.ucsd.edu/_files/2020%20Campus%20LISA_HC_Data_mesh_.pdf
Data Products in action at UCSD
33
34
More Examples
Analyzing Phase IV & RWE Data
Top 50 Pharma
Looking at both drug & medication-adherence device clinical trials in
relation to schizophrenia
Immunotherapy & Single-Cell Omics
Cell Therapy Biotech
Deploying an array of proprietary & public-domain data products —
enabling users to investigate & discover gene expression markers with
respect to cell types
Showing how our customers fit into Drug Dev lifecycle
Biotech’s
Cell Therapy,Transplant
Large Pharma’s
Immunology, Oncology, and Neurology
CRO
RWE
CRO
Omics, IHC, TCR
Basic Research
Mouse Models and Other
AMCs - UCSD, UCSF
Value based Healthcare and
Patient Registries
35
36
Next Stage Of Data Evolution
1. Harmonize Data 2. Connect Data Products 3. Accelerate Outputs
Data Warehouses
Data Lakes
Siloed Data
Flat Files
Data Product
Data Product
Data Product
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Real-time answers,
self-service analysis
Validations, publications,
submissions.
Map data into data products FAIR data (findable, accessible, interoperable, reusable) Saved, shareable, reproducible, full QC
Tag.bio for Collaboration
37
Clinical Trials
Population Health
Clinical Decisions Discovery Biology
Data Mesh
Data Product 1
Data Product 2
Data Product 5
Data Product 3
Data Product 4
The data mesh connects groups to collaborative analysis resources to
form a data driven culture
Collaboration within an organization
38
Different types of data product act together as a
functional data mesh
Annotation
i.e. Gene, Variant,
Demographic, Identifying
data
Proprietary annotation
Domain Specific
Analysis
(Pan-Cancer TCGA
Patient Healthcare)
Usage
Full history of all
user activity
39
Organization 2
Governed access to selected data
products and apps
Clinical Research
COVID
Patient Registries
Oncology
Chronic Inflammation
Autoimmunity
Organization 1
Governed access to selected
data products and apps
Data Mesh
Data Product
1
Data Product
2
Data Product
5
Data Product 3
Data Product 4
How organizations collaborate via data product
Data Products
(in cloud account of organization)
Collaborator
(VPC/Private Link access to data products) 40
41
Tag.bio data exchange: Collaboration with Parkinson’s Foundation to provide data products to researchers
Tag.bio for developers
42
43
A two sided data environment
to enable real time collaboration
Analysis Platform
for Domain Experts:
No-code analysis apps
that speak your
language
Developer Studio
for Data Scientists:
Familiar Jupyter
Notebook-based
Developer Studio
Integration with Cloud Services: AI/ML
44
45
45
Integration with AWS Services: AI/ML
46
FHIR: Integration with Amazon HealthLake
47
Monitoring and Auto-deployment of products
48
Data Portal (domain expert)
http://demo.tag.bio/
Demo
49
Developer Studio (data scientist)
https://jupyter-aws-demo.tag.bio/
50
How To Get Started?
Tag.bio Resource Center: Knowledge base, Training & Tutorials (to build apps and data products)
https://tag.bio/company/contact-us/
AWS Marketplace Offerings
https://aws.amazon.com/marketplace/pp/prodview-dld5ezl4nh6us 52
How can (NIH funded) researchers access Tag.bio?
53
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
How can NIH ICOs access Tag.bio?
54
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
55
Data Products Data Mesh Self-Service Platform
Real time questions to answers Connect proprietary and public data Fully versioned and reproducible
Cross study comparison Pull in annotation automatically Aut-deployed, tested and scalable
UI’s for coders and
clickers
Bring the analysis to the
data
Collaboration between users, groups, and
organizations
Apps
Tag.bio is a “datamesh in a box”
Thank You! Questions?

More Related Content

What's hot

Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
David Walker
 

What's hot (20)

Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data mesh
Data meshData mesh
Data mesh
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
SAS Visual Analytics Overview
SAS Visual Analytics OverviewSAS Visual Analytics Overview
SAS Visual Analytics Overview
 
FAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptxFAIR Data-centric Information Architecture.pptx
FAIR Data-centric Information Architecture.pptx
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 

Similar to Tag.bio: Self Service Data Mesh Platform

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 

Similar to Tag.bio: Self Service Data Mesh Platform (20)

Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Managing R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute InfrastructureManaging R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute Infrastructure
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
HyderabadDolls
 

Recently uploaded (20)

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts ServiceCall Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 

Tag.bio: Self Service Data Mesh Platform

  • 1. https://tag.bio • spadhi@tag.bio Join us: Tag.bio community on Slack Tag.bio: Self Service Data Mesh Platform Your questions. Your data. Your answers. NSF Big Data Hub: Data Sharing and Cyberinfrastructure Meeting Sanjay Padhi Chief Technologist Executive Vice President
  • 2. Abstract: The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces. Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services 2
  • 3. Agenda ● Introduction ● Data as a Product ● Data Products in a Mesh ● Platform for Collaboration ● Platform for Developers and Integrators ● Demo: Analysis Platform and Developer Studio ● Partnerships with Cloud providers and NIH STRIDES ● Q&A 3
  • 4. Source: Computing Perspectives: 25th International Conference on Computing in High-Energy and Nuclear Physics, 2021 4 CERN: Project Approach with Distributed Storage Distributed data management and storage is expensive – hardware and operations
  • 5. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices NIH Data Commons: Project approach with Data Lake(s) Research projects ain’t cheap; the average award for an NIH grant is about half a million dollars. 5
  • 6. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices Data Lake based approach with workspaces and Jupyter Notebooks for analysis 6
  • 7. Source: Susan Gregurick (2020): STRIDES and NIH-supported biomedical data sharing As of July 2020 It takes months-to-years to derive insights 7 NIH STRIDES (2018 - ): Turning Research Data Into Knowledge and Discovery
  • 8. 8 Consumers millions data DATA DATA Life Sciences: Data Growth during Drug development process
  • 9. 9 Data Warehouse(s) Source: Databricks Structured Data Historical - used 40+ years Coupled Compute and Storage into a single entity: Multiple Data Warehouses - Metadata layer (where data is located) - A data model – an abstraction in the data warehouse - Data lineage – the tale of the origins and transformations of data in the warehouse - Summarization – algorithmic work designed to create the data - KPIs – where are key performance indicators found - ETL – enabled application data to be transformed into corporate data Limitations: - AI/ML introduce iterative algorithms with direct data access (not always SQL based) - variety of datasets that are not always structured (text, IoT, Objects, Binary)
  • 10. 10 Data Lakes and Lake-houses Source: Databricks
  • 11. Data Architecture(s) 11 ● Data Warehouse(s) - Direct coupling between compute and storage ● Distributed to Centralized Data Storage and Compute ● Data Lakes ● Date Lakehouse ● Data Products and Mesh Ways to communicate (information sharing) via APIs also evolved: ● Salesforce (2000) - added APIs on top of applications ● Facebook (2006) - gave developers access to user informations (photo, profiles, events) ● Google (2006) - share massive geographical data via APIs ● Twilio (2008) - Created an API for their entire product line (Calls, Texts)
  • 12. 12 Project Vs Product Approach towards Data Architecture Data Product
  • 13. Data products represent a harmonized, decentralized application layer on top of disparate data sources. Along with employing a universal “smart” API, they also present a simple, clean, standardized data model for apps and data scientists who would do queries and extract data frames. Apps Data to Data Product Data as a Product - Tag.bio 13
  • 14. 1. Data (data engineers) 2. Algorithms (data scientists) 3. Analysis apps (domain experts) Smart API Data Map Algorithms Analysis apps 2 3 1 Tag.bio Data Products Bringing together 3 things and 3 groups 14
  • 15. Components 15 All data products are built with 4 components: 1. Source data in a schema 2. Runtime business logic that can be performed on source data upon request 3. Smart API to invoke requests and return responses 4. SDKs/Clients which enable communication between other systems and the API Data Map Algorithms Smart API 1 2 3 4
  • 16. 16 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Cross-Functional Data Team/Role: B. Data Scientist A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher Smart API Data Map Algorithms Analysis apps A B C Data Sources Siloed data Data warehouses Data lakes Data products DNA-Seq RNA-Seq Proteomics Flow cytometry Clinical trials Data Types Data Formats CSV JSON SPARK XML SQL Machine behavior & maintenance Other data types Emerging data types
  • 17. 17 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Cross-Functional Data Team: B. Data Scientist: Integrated (ML) algorithms with interface to R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher Smart API Data Map Algorithms Analysis apps A B C
  • 18. 18 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Smart API Data Map Algorithms Analysis apps A B C Cross-Functional Data Team: Uses no-code, guided analysis apps to ask & answer their own questions. B. Data Scientist: Integrates R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher: Single Cell Gene Expression Rmarkdown Gene Signature Report Elastic Net Cross Validation
  • 19. 19 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Smart API Data Map Algorithms Analysis apps A B C Cross-Functional Data Team: Uses no-code, guided analysis apps to ask & answer their own questions. B. Data Scientist: Integrates R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher: Maximize the value of your data with domain-driven data products
  • 20. S e q u e n c e / g r a p h C o m p a r i s o n S t a t s C l u s t e r i n g Module API Client JSON XML CSV SQL Spark Key-value A n a l y s i s ( A P I ) R e g r e s s i o n Prediction Exploration Data extraction M a c h i n e L e a r n i n g How does it work? Data Map Algorithms R & Python Plugin Data Mapped 1 3 Data Product is a Source of versioned, immutable, integrated data. Developer Studio (coder) AI/ML Analysis Platform (domain expert UI) Point and Click analysis Apps Domain experts Notebook integration Data Scientists Smart API Data Map Algorithms 2 20
  • 21. Data Mesh It’s a paradigm shift to treat data as a product Data mesh encompasses data products that are oriented around domains & owned by cross-functional data teams 21 Zhamak Dehghani: Data Mesh: A Paradigm Shift in Data Architecture
  • 22. Pharma: Domain Driven Workloads Drug Development Process Disparate data types slow the drug development process 22 Clinical Trials Preclinical Basic Research Regulatory Review RWE & Patient care Biomarkers Omics Model Organisms Phase I Phase II Phase III Patient Registries Phase IV Regulatory Submissions
  • 23. What Happens When You Apply Data Mesh To Pharma? Biomarkers Model Organisms Phase I Drug Development Process Harmonized, connected data sources accelerate drug development Phase II Phase III Omics Patient Registries Phase IV 23 Clinical Trials Preclinical Basic Research Regulatory Review RWE & Patient care Regulatory
  • 24. What Happens When You Turn Data into a Product? Streamlined data analysis process VS. Data Scientist Researchers Data Engineer Data Warehouses Analysis Platform Data Product Data Product ? Data Mesh 24 Data Lakes Siloed Data Data Product Months Minutes Researchers ? ? ? ?! ?!
  • 25. Data Mesh Distributed data products connected into a data mesh 2 25 A customizable self service (end-to-end) data mesh platform What Is Tag.bio? Data Product Domain-driven, harmonized & decentralized application layer 1 Analysis Environment Data analysis environment for researchers & data scientists 3 Data Product
  • 26. any cloud 26 Data products deployed in an interoperable data mesh Tag.bio Data Mesh Smart API Data Map Algorithms Analysis apps Data Product Data mesh enables organizations to: ● Connect data sources without moving data ● Rapidly add new data types ● Connect all data sources to accelerate the drug development cycle Data Product Data Product Data Product
  • 27. 27 Data analysis environment to access data mesh & use data products Analysis Environment Analysis Platform for Researchers Use data products with no-code analysis apps that speak their language. Collaborate with Data Scientist on how apps should work. Developer Studio for Data Scientists Build data products using a familiar, Jupyter notebook-based setting. Plug them into the Analysis Platform for researchers to use.
  • 28. 28 How Are Organizations Using Tag.bio’s - Data Mesh Platform
  • 29. 29 Top 5 Pharma: Translational Oncology Harmonized 10+ Clinical Trials Working towards comprehensive harmonization of all past & future trials
  • 30. Example: Phase-III biomarker analysis Reference: https://www.nature.com/articles/s41591-020-1044-8 Figure 1b https://demo.tag.bio/node/fc-nct02684006-refined/cox_survival_protocol/results? param_reconfig=2784 30
  • 31. Example: TCGA Pan-Cancer ATLAS - UMAP Expression Clustering Reference: https://pubmed.ncbi.nlm.nih.gov/29628290/
  • 32. Analyzing 1000s of Flow Cytometry Samples The Jackson Laboratory Enabling users to analyze samples from various immunocompromised mouse strains with xenografts from human donors 32
  • 33. HIPAA and California’s Confidentiality of Medical Information Act (CMIA) Compliant environment https://medschool.ucsd.edu/research/actri/Informatics/Health-Data-Portal/services/Pages/Virtual-Research-Desktop-VRD.aspx https://campuslisa.ucsd.edu/_files/2020%20Campus%20LISA_HC_Data_mesh_.pdf Data Products in action at UCSD 33
  • 34. 34 More Examples Analyzing Phase IV & RWE Data Top 50 Pharma Looking at both drug & medication-adherence device clinical trials in relation to schizophrenia Immunotherapy & Single-Cell Omics Cell Therapy Biotech Deploying an array of proprietary & public-domain data products — enabling users to investigate & discover gene expression markers with respect to cell types
  • 35. Showing how our customers fit into Drug Dev lifecycle Biotech’s Cell Therapy,Transplant Large Pharma’s Immunology, Oncology, and Neurology CRO RWE CRO Omics, IHC, TCR Basic Research Mouse Models and Other AMCs - UCSD, UCSF Value based Healthcare and Patient Registries 35
  • 36. 36 Next Stage Of Data Evolution 1. Harmonize Data 2. Connect Data Products 3. Accelerate Outputs Data Warehouses Data Lakes Siloed Data Flat Files Data Product Data Product Data Product Smart API Data Map Algorithms Analysis apps Data Product Real-time answers, self-service analysis Validations, publications, submissions. Map data into data products FAIR data (findable, accessible, interoperable, reusable) Saved, shareable, reproducible, full QC
  • 38. Clinical Trials Population Health Clinical Decisions Discovery Biology Data Mesh Data Product 1 Data Product 2 Data Product 5 Data Product 3 Data Product 4 The data mesh connects groups to collaborative analysis resources to form a data driven culture Collaboration within an organization 38
  • 39. Different types of data product act together as a functional data mesh Annotation i.e. Gene, Variant, Demographic, Identifying data Proprietary annotation Domain Specific Analysis (Pan-Cancer TCGA Patient Healthcare) Usage Full history of all user activity 39
  • 40. Organization 2 Governed access to selected data products and apps Clinical Research COVID Patient Registries Oncology Chronic Inflammation Autoimmunity Organization 1 Governed access to selected data products and apps Data Mesh Data Product 1 Data Product 2 Data Product 5 Data Product 3 Data Product 4 How organizations collaborate via data product Data Products (in cloud account of organization) Collaborator (VPC/Private Link access to data products) 40
  • 41. 41 Tag.bio data exchange: Collaboration with Parkinson’s Foundation to provide data products to researchers
  • 43. 43 A two sided data environment to enable real time collaboration Analysis Platform for Domain Experts: No-code analysis apps that speak your language Developer Studio for Data Scientists: Familiar Jupyter Notebook-based Developer Studio
  • 44. Integration with Cloud Services: AI/ML 44
  • 45. 45 45
  • 46. Integration with AWS Services: AI/ML 46
  • 47. FHIR: Integration with Amazon HealthLake 47
  • 49. Data Portal (domain expert) http://demo.tag.bio/ Demo 49 Developer Studio (data scientist) https://jupyter-aws-demo.tag.bio/
  • 50. 50 How To Get Started?
  • 51. Tag.bio Resource Center: Knowledge base, Training & Tutorials (to build apps and data products) https://tag.bio/company/contact-us/
  • 53. How can (NIH funded) researchers access Tag.bio? 53 https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
  • 54. How can NIH ICOs access Tag.bio? 54 https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
  • 55. 55 Data Products Data Mesh Self-Service Platform Real time questions to answers Connect proprietary and public data Fully versioned and reproducible Cross study comparison Pull in annotation automatically Aut-deployed, tested and scalable UI’s for coders and clickers Bring the analysis to the data Collaboration between users, groups, and organizations Apps Tag.bio is a “datamesh in a box” Thank You! Questions?