SlideShare a Scribd company logo
1 of 38
Data Patterns
Life Sciences / Healthcare
Chris Dwan (chris@dwan.org)
https://dwan.org
Take-home messages
Data challenges are large and growing
– Not just volume
– Also variety, velocity, quality
There is no one single perfect solution
– Requirements are diverse
– Real world solutions will be hybrid
Metadata management is a huge challenge
– Even the basics are beyond most small organizations
– We need federated systems to transform medicine
Geek Cred: My First Petabyte,
2008
My first Petabyte: 2008
Geek Cred: My First Petabyte,
2008
My first Petabyte: 2008
The evolution of data transfer …
Genomic Data Production in ContextGenomic data production @ Broad
Genomic Data Production in ContextGenomic data production @ Broad
I did research computing at
Broad from 2014 - 2017
Geek Cred: My First Petabyte,
2008
My first Exabyte: 2014
Data: The new oil*
Data Base: Structure, queries
Data Warehouse: All the data in one place. Limited
integration.
Data Mart: Serve up warehoused data to users (Shiny counts)
Big Data: Volume, Variety, Velocity
Data Lake: Data warehouse, but designed for in-situ analytics
Data Ocean: A data lake, for the cromulently embiggened!
Data Commons: When the benefits of sharing data outweigh
the competitive instinct to horde it
Data Biosphere: A data commons, but for the cool kids
An immature ‘tyrant
flycatcher. Needs a data
mart, because it doesn’t
know R or Linux yet.
Hype-o-meter Impact-o-meter
Primary Data Production
Data are produced
on instruments …
Sequencer /
Mass Spec /
…
Analysis
Systems
High
Performance
Storage
… Transformed
and distilled …
… Delivered to
downstream
processes …
Customer
facing storage
Primary Data Production
Data are produced
on instruments …
Sequencer /
Mass Spec /
…
Analysis
Systems
High
Performance
Storage
… Transformed
and distilled …
… Delivered to
downstream
processes …
… And archived for various
purposes (FDA, HIPAA,
Intellectual property, …).
Customer
facing storage
Durable, cost
effective storage
Primary Data Production
Data are produced
on instruments …
Sequencer /
Mass Spec /
…
Analysis
Systems
High
Performance
Storage
… Transformed
and distilled …
… Delivered to
downstream
processes …
… And archived for various
purposes (FDA, HIPAA,
Intellectual property, …).
Customer
facing storage
Durable, cost
effective storage
I recommend an
‘archive first’ approach,
EMR
ELN
Primary Data Production
Data are produced
on instruments …
Sequencer /
Mass Spec /
…
Analysis
Systems
High
Performance
Storage
… Transformed
and distilled …
… Delivered to
downstream
processes …
… And archived for various
purposes (FDA, HIPAA,
Intellectual property, …).
Customer
facing storage
Durable, cost
effective storage
I recommend an
‘archive first’ approach,
LIMS
LIS
Metadata management is still a
massive challenge
Lab_Sample_tracker.xls
Filename_as_
metadata_for
_eric_v2
Quality Matters
Quality Matters
Ask a computational
biologist / data scientist
what fraction of their time
is spent fighting data
quality, formatting, and
similar issues.
Multiply that by an entire
industry
They deserve better.
Machine Learning (ML)
Algorithms that optimize and tune based on
large amounts of data
These have been around for a very long time
(KNN and Linear Regression are totally ML).
Algorithm innovations (deep neural nets),
plus ubiquitous big data, plus improvements
in computing, storage, network, and
software.
Killer apps everywhere in image recognition,
natural language processing, clustering,
categorization
Hype-o-meter Impact-o-meter
A ‘swan pink yellow’ columbine
flower. Identifying objects in
images is machine work now.
Data for Analytics / ML / AI
Analysis Systems
High Performance
Storage
A large and
growing set of
data is curated…
Commercial
/ outsource
labs
Public or
licensed
datasets
In-house
labs
Curation
… and mined for insights.
Analyst
Data for analytics
Analysis Systems
High Performance
Storage
A large and
growing set of
data is curated…
Commercial
/ outsource
labs
Public or
licensed
datasets
In-house
labs
Curation
… and mined for insights.
insights take both short and long
paths back into the system
Analyst
Data for analytics
Analysis Systems
High Performance
Storage
A large and
growing set of
data is curated…
Commercial
/ outsource
labs
Public or
licensed
datasets
In-house
labs
Curation
… and mined for insights.
insights take both short and long
paths back into the system
Analyst
Durable, cost
effective storage
• What does “backup”
mean, exactly?
• How do we capture
provenance without
massive duplication?
Artificial Intelligence (AI)
Distinguished (for me) by autonomous
behavior and clever-looking behavior in
the face of unanticipated situations.
No requirement that “intelligent” mean
“like a human.”
Machine learning algorithms are a great
(but not the only) way to create AI
systems.
Beware “bread machine AI.”
Hype-o-meter Impact-o-meter
Getting there!
My cat shows surprising
intelligence despite having a
brain the size of a walnut
Artificial Intelligence (AI)
Distinguished (for me) by autonomous
behavior and clever-looking behavior in
the face of unanticipated situations.
No requirement that intelligence be
human style.
Machine learning algorithms are a great
(but not the only) way to build AI
systems.
Beware “bread machine AI.”
Hype-o-meter Impact-o-meter
Getting there!
My cat shows surprising
intelligence despite having a
brain the size of a walnut
Incredible opportunities
here, and rapidly
developing data silos
The Clinical Data Ecosystem
There is an incredible
wealth of data available to
support both clinical care
and research
Patient Journals
Consumer products
Unfortunately, it is carved
up and isolated
Longitudinal Data from
other providers …
Electronic
Medical Records
Possibility of a self-normal
(N of 1) over time
Diagnostic
Imaging
Natural language processing
has strong potentialClinical Notes
Innovations in the basics of
clinical observation
Hospital Telemetry
Pressure to avoid incidental
findings prevent bias
Primary Lab Data
There are both good and
bad reasons for this
Personal Data Impacts Behavior
I use a commercial service
that combines labwork with
wearable data
They provide insights and
coaching
I have, personally, found this
transformational in how I
approach my health.
Personal Data Impacts Behavior
I use a commercial service
that combines labwork with
wearable data
They provide insights and
coaching
I have, personally, found this
transformational in how I
approach my health.
Personal Data Impacts Behavior
I use a commercial service
that combines labwork with
wearable data
They provide insights and
coaching
I have, personally, found this
transformational in how I
approach my health.
Personal Data Impacts Behavior
I use a commercial service
that combines labwork with
wearable data
They provide insights and
coaching
I have, personally, found this
transformational in how I
approach my health.
Personal Data Impacts Behavior
I use a commercial service
that combines labwork with
wearable data
They provide insights and
coaching
I have, personally, found this
transformational in how I
approach my health.
Why are we here?
• Improved health outcomes
• Quality-adjusted life-years
• Increased therapeutic effectiveness
• Reduced barriers to access
• Publications / Patents / Druggable leads
• Accelerated innovation cycle
• Reduced time to market
• Speeds & Feeds
• Improved performance on benchmarks
• Lower cost per unit
• Infrastructure agility
Social Mission
Scientific / Business Goals
Technology / Infrastructure
Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
If you lack this
You don’t get
to engage here
Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
Wireless Internet, Fully charged battery
If you lack this
You don’t get
to engage here
IT Hierarchy of Needs
Productivity and Security, Applications,
disaster preparedness
Automation and
compliance
“Thought
Partner”
Files, formats, naming conventions, access controls
Phones, Projectors, Internet, Email, Chat
Power, Building Access, Laptops, Wifi, Identity
If you lack this
You don’t get
to engage here
Data Visibility Saves Money
Private Data Holdings
Public
Data
Backups
…
Private
copy of
public
data
$$ !!
Lack of data visibility leads to
increased costs and engineering
challenges.
It is depressingly common to see
multiple representations of the same
data, all being archived together.
BAM BCL
FASTQ
This is also a metadata challenge
Challenge Architecture: The data DMZ
• An architecture to support data creation, delivery, and
use
• … for seamless collaboration between organizations …
• … without sacrificing security, appropriate usage, or
privacy …
• … and that delivers on the potential of modern analytic
capabilities.
Blockchain
”The clown car of our industry in 2018”
• Distributed ledger: trustworthy data /
records without a central authority.
• Self executing contracts: Shared,
trustworthy code to operate on that
data.
• Initial Coin Offerings: massively
accelerated (and deregulated) way to
set monetary value on a data
ecosystem.
Amazing possibilities in permission /
consent management.
When I make snarky comments on
LinkedIn, people ask if they can invest.
Hype-o-meter Impact-o-meter
The angel weeps because there are
some really compelling use cases for
blockchain, but the hype is
deafening.
Take-home messages
Data challenges are large and growing
– Not just volume
– Also variety, velocity, quality
There is no one single perfect solution
– Requirements are diverse
– Real world solutions will be hybrid
Metadata management is a huge challenge
– Even the basics are beyond most small organizations
– We need federated systems in order to transform
medicine.
Questions?
chris@dwan.org

More Related Content

What's hot

Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkKamleshKumar394
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Gregg Barrett
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the TrenchesChris Dagdigian
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides SlideTeam
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayXoriant Corporation
 
BIO-IT Brochure
BIO-IT Brochure BIO-IT Brochure
BIO-IT Brochure ArleneEMC
 

What's hot (20)

Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
big data
big databig data
big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the Trenches
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
BIO-IT Brochure
BIO-IT Brochure BIO-IT Brochure
BIO-IT Brochure
 

Similar to Data Patterns and Challenges in Life Sciences

Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of dataHarsha MV
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareHealth Catalyst
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareDale Sanders
 
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Cambridge Semantics
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptxNATASHABANO
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Chief Analytics Officer Forum
 

Similar to Data Patterns and Challenges in Life Sciences (20)

Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
 
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
Big Data Forum - Phoenix
Big Data Forum - PhoenixBig Data Forum - Phoenix
Big Data Forum - Phoenix
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
 

More from Chris Dwan

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfChris Dwan
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdfChris Dwan
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit AllChris Dwan
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetChris Dwan
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepoliceChris Dwan
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user trainingChris Dwan
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearingChris Dwan
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fairChris Dwan
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Chris Dwan
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"Chris Dwan
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPCChris Dwan
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinanceChris Dwan
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change MatrixChris Dwan
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaulChris Dwan
 
Response from newport
Response from newportResponse from newport
Response from newportChris Dwan
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docsChris Dwan
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy editionChris Dwan
 
Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Chris Dwan
 

More from Chris Dwan (20)

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdf
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit All
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed Budget
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepolice
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearing
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fair
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinance
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change Matrix
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaul
 
Response from newport
Response from newportResponse from newport
Response from newport
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docs
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 
Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Data Patterns and Challenges in Life Sciences

  • 1. Data Patterns Life Sciences / Healthcare Chris Dwan (chris@dwan.org) https://dwan.org
  • 2. Take-home messages Data challenges are large and growing – Not just volume – Also variety, velocity, quality There is no one single perfect solution – Requirements are diverse – Real world solutions will be hybrid Metadata management is a huge challenge – Even the basics are beyond most small organizations – We need federated systems to transform medicine
  • 3.
  • 4. Geek Cred: My First Petabyte, 2008 My first Petabyte: 2008
  • 5. Geek Cred: My First Petabyte, 2008 My first Petabyte: 2008
  • 6.
  • 7. The evolution of data transfer …
  • 8. Genomic Data Production in ContextGenomic data production @ Broad
  • 9. Genomic Data Production in ContextGenomic data production @ Broad I did research computing at Broad from 2014 - 2017
  • 10. Geek Cred: My First Petabyte, 2008 My first Exabyte: 2014
  • 11. Data: The new oil* Data Base: Structure, queries Data Warehouse: All the data in one place. Limited integration. Data Mart: Serve up warehoused data to users (Shiny counts) Big Data: Volume, Variety, Velocity Data Lake: Data warehouse, but designed for in-situ analytics Data Ocean: A data lake, for the cromulently embiggened! Data Commons: When the benefits of sharing data outweigh the competitive instinct to horde it Data Biosphere: A data commons, but for the cool kids An immature ‘tyrant flycatcher. Needs a data mart, because it doesn’t know R or Linux yet. Hype-o-meter Impact-o-meter
  • 12. Primary Data Production Data are produced on instruments … Sequencer / Mass Spec / … Analysis Systems High Performance Storage … Transformed and distilled … … Delivered to downstream processes … Customer facing storage
  • 13. Primary Data Production Data are produced on instruments … Sequencer / Mass Spec / … Analysis Systems High Performance Storage … Transformed and distilled … … Delivered to downstream processes … … And archived for various purposes (FDA, HIPAA, Intellectual property, …). Customer facing storage Durable, cost effective storage
  • 14. Primary Data Production Data are produced on instruments … Sequencer / Mass Spec / … Analysis Systems High Performance Storage … Transformed and distilled … … Delivered to downstream processes … … And archived for various purposes (FDA, HIPAA, Intellectual property, …). Customer facing storage Durable, cost effective storage I recommend an ‘archive first’ approach,
  • 15. EMR ELN Primary Data Production Data are produced on instruments … Sequencer / Mass Spec / … Analysis Systems High Performance Storage … Transformed and distilled … … Delivered to downstream processes … … And archived for various purposes (FDA, HIPAA, Intellectual property, …). Customer facing storage Durable, cost effective storage I recommend an ‘archive first’ approach, LIMS LIS Metadata management is still a massive challenge Lab_Sample_tracker.xls Filename_as_ metadata_for _eric_v2
  • 17. Quality Matters Ask a computational biologist / data scientist what fraction of their time is spent fighting data quality, formatting, and similar issues. Multiply that by an entire industry They deserve better.
  • 18. Machine Learning (ML) Algorithms that optimize and tune based on large amounts of data These have been around for a very long time (KNN and Linear Regression are totally ML). Algorithm innovations (deep neural nets), plus ubiquitous big data, plus improvements in computing, storage, network, and software. Killer apps everywhere in image recognition, natural language processing, clustering, categorization Hype-o-meter Impact-o-meter A ‘swan pink yellow’ columbine flower. Identifying objects in images is machine work now.
  • 19. Data for Analytics / ML / AI Analysis Systems High Performance Storage A large and growing set of data is curated… Commercial / outsource labs Public or licensed datasets In-house labs Curation … and mined for insights. Analyst
  • 20. Data for analytics Analysis Systems High Performance Storage A large and growing set of data is curated… Commercial / outsource labs Public or licensed datasets In-house labs Curation … and mined for insights. insights take both short and long paths back into the system Analyst
  • 21. Data for analytics Analysis Systems High Performance Storage A large and growing set of data is curated… Commercial / outsource labs Public or licensed datasets In-house labs Curation … and mined for insights. insights take both short and long paths back into the system Analyst Durable, cost effective storage • What does “backup” mean, exactly? • How do we capture provenance without massive duplication?
  • 22. Artificial Intelligence (AI) Distinguished (for me) by autonomous behavior and clever-looking behavior in the face of unanticipated situations. No requirement that “intelligent” mean “like a human.” Machine learning algorithms are a great (but not the only) way to create AI systems. Beware “bread machine AI.” Hype-o-meter Impact-o-meter Getting there! My cat shows surprising intelligence despite having a brain the size of a walnut
  • 23. Artificial Intelligence (AI) Distinguished (for me) by autonomous behavior and clever-looking behavior in the face of unanticipated situations. No requirement that intelligence be human style. Machine learning algorithms are a great (but not the only) way to build AI systems. Beware “bread machine AI.” Hype-o-meter Impact-o-meter Getting there! My cat shows surprising intelligence despite having a brain the size of a walnut
  • 24. Incredible opportunities here, and rapidly developing data silos The Clinical Data Ecosystem There is an incredible wealth of data available to support both clinical care and research Patient Journals Consumer products Unfortunately, it is carved up and isolated Longitudinal Data from other providers … Electronic Medical Records Possibility of a self-normal (N of 1) over time Diagnostic Imaging Natural language processing has strong potentialClinical Notes Innovations in the basics of clinical observation Hospital Telemetry Pressure to avoid incidental findings prevent bias Primary Lab Data There are both good and bad reasons for this
  • 25. Personal Data Impacts Behavior I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 26. Personal Data Impacts Behavior I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 27. Personal Data Impacts Behavior I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 28. Personal Data Impacts Behavior I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 29. Personal Data Impacts Behavior I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 30. Why are we here? • Improved health outcomes • Quality-adjusted life-years • Increased therapeutic effectiveness • Reduced barriers to access • Publications / Patents / Druggable leads • Accelerated innovation cycle • Reduced time to market • Speeds & Feeds • Improved performance on benchmarks • Lower cost per unit • Infrastructure agility Social Mission Scientific / Business Goals Technology / Infrastructure
  • 31. Maslow’s Hierarchy of Needs Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep If you lack this You don’t get to engage here
  • 32. Maslow’s Hierarchy of Needs Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep Wireless Internet, Fully charged battery If you lack this You don’t get to engage here
  • 33. IT Hierarchy of Needs Productivity and Security, Applications, disaster preparedness Automation and compliance “Thought Partner” Files, formats, naming conventions, access controls Phones, Projectors, Internet, Email, Chat Power, Building Access, Laptops, Wifi, Identity If you lack this You don’t get to engage here
  • 34. Data Visibility Saves Money Private Data Holdings Public Data Backups … Private copy of public data $$ !! Lack of data visibility leads to increased costs and engineering challenges. It is depressingly common to see multiple representations of the same data, all being archived together. BAM BCL FASTQ This is also a metadata challenge
  • 35. Challenge Architecture: The data DMZ • An architecture to support data creation, delivery, and use • … for seamless collaboration between organizations … • … without sacrificing security, appropriate usage, or privacy … • … and that delivers on the potential of modern analytic capabilities.
  • 36. Blockchain ”The clown car of our industry in 2018” • Distributed ledger: trustworthy data / records without a central authority. • Self executing contracts: Shared, trustworthy code to operate on that data. • Initial Coin Offerings: massively accelerated (and deregulated) way to set monetary value on a data ecosystem. Amazing possibilities in permission / consent management. When I make snarky comments on LinkedIn, people ask if they can invest. Hype-o-meter Impact-o-meter The angel weeps because there are some really compelling use cases for blockchain, but the hype is deafening.
  • 37. Take-home messages Data challenges are large and growing – Not just volume – Also variety, velocity, quality There is no one single perfect solution – Requirements are diverse – Real world solutions will be hybrid Metadata management is a huge challenge – Even the basics are beyond most small organizations – We need federated systems in order to transform medicine.