SlideShare a Scribd company logo
data - science - life
Considerations & Challenges in
Building an End-to-End
Microbiome Workflow
data - science - life
Craig earned his Ph.D. in Microbology at the University of Warwick, he then
completed his Postdoctoral research at the John Innes Centre before joining EBI
and developing their bioinformatics capability. Dr McAnulla has extensive
experience in developing large scale pipelines for analyzing microbiome
datasets. His skills cover both software and biology and he often provides deep
insights to industry experts on these topics.
Presenters
Raminderpal earned his PhD in the semiconductor industry where he spent a
decade building advanced technologies for semiconductor modelling and
data mining. In 2012, Dr Singh took the helm as the Business Development
Executive for IBM Research’s genomics program, where he developed and led
the execution of the go-to-market and business plan for the IBM Watson
Genomics program. In addition to many published papers, Dr Singh has two
books and twelve patents.
Dr Craig McAnulla
Senior Consultant Bioinformatics at Eagle Genomics
Dr Raminderpal Singh
Vice President Business Development at Eagle Genomics
Please Tweet your questions
@eaglegen or enter them
into GoTo Webinar
Agenda
1. Considerations
2. Challenges
3. Industrialized Microbiome Workflow & Collaboration
Environment
4. Deployment of a Collaboration System
data - science - life
data - science - life
Considerations for successful microbiome data management and analysis:
1. Intrinsic factors
2. The relatively early maturity of the field
3. Calculation of business value
1. Considerations
data - science - life
Intrinsic
• Microbiomes are complex and diverse biological systems
• Many components – organisms, genes, pathways, metabolites etc
• More DNA/genes than human genome
• Taxonomic diversity
• Many sources of error
• Appropriate study design and metadata collection are crucial.
1. Considerations - Intrinsic
data - science - life
Maturity
• Changing sequencing technologies
• Move from amplicon-based to shotgun
• Rapidly increasing data volumes
• Limited reference data
• Fast-moving software development
• Future data re-analysis with improved methods must be considered.
1. Considerations - Maturity
data - science - life
1. Considerations – Business value
Business value
• Commercial exploitation of the microbiome is in its infancy, and still at an exploratory phase.
• Many research/findings remain controversial and some early claims on the gut microbiome have not
been replicated.
• The true potential of microbiome data is still to be tapped, and some novel experimental designs are
being developed that result in maximum value being generated.
data - science - life
The analysis-at-scale of microbiomics data faces several challenges:
• Metadata management
• Compute resources necessary for storage and analysis
• Statistical techniques
Metadata Management
• Crucial to analysis
• Often done poorly
• Needs to encompass all relevant information
• Check for batch effects
2. Challenges
data - science - life
2. Challenges
Data Storage and Compute Resources
• Shotgun metagenomics example study - 1.7 billion paired-end Illumina reads, 266.5 Gb.
• Bigger data volume than the human genome at 30x coverage.
• Study size is increasing
How to store this?
• Analysis selection critical
• Analyses MUST work with these data volumes
• eg Kraken, Humann2
• Compute resources
• Eg Kraken, multicore machine > 100Gigs
• Solution – the cloud
• AWS, Azure
data - science - life
Statistical Techniques
• Data normalization
• rarefaction, normalization, log
transformation?
• Statistical techniques
• parametric, nonparametric?
• Confusion
• No correct answer
• Depends on the data
• function vs taxonomy
• shotgun vs amplicon
2. Challenges
data - science - life
• Eagle Genomics leads the industry in
configuring and deploying best-in-class
microbiome workflow solutions.
• These run large-scale parallelized
pipelines cost-efficiently, with seamless
connectivity between the source data
and the scientists.
3. Industrialized Microbiome
Workflow & Collaboration
Environment
www.eaglegenomics.com
“Unilever’s digital data program now
processes genetic sequences twenty
times faster - without incurring higher
compute costs. In addition, its robust
architecture supports ten times as many
scientists, all working simultaneously.”
Pete Keeley, eScience Technical Lead - R&D IT
at Unilever
data - science - life
organization
3. Industrialized Microbiome
Workflow & Collaboration
Environment
data - science - life
MICROBIOME
DATA FILES
(amplicon or shotgun)
METADATA: eaglecatalog (implements security model)
QUERY/
FILTERING:
eaglecatalog
DATA
SHARING
CRO’s
DEPOSIT
DATA
ELN
EXPERIMENT
METADATA
INTEGRATION
WITH OMICS
(RNASeq, proteomics)
FEDERATED DATA &
RESULTS
INTEGRATION
TAXONOMY/
FUNCTION
ANALYSIS
REPORTS
R,QIIME
POPULATION
COHORTS
VISUALIZATION
CLOUD STORAGE
eaglecatalog
COLLABORATIVE
ENVIRONMENT
FOR PARTNERS
FEEDBACK TO
EXPERIMENTAL
DESIGN
COST OF
EXPERIMENT
DATA COLLABORATION ARCHITECTURE
Solving the metadata problem
data - science - life
4. Deployment of a Collaboration
System
Eagle Genomics Proposal
Accelerating metadata adoption and scale up
• Offering a new rapid deployment for microbiome R&D organizations looking to
cost-efficiently build world-class integrated workflows.
• Set up of an initial collaboration system for your team.
• 1 to 4 weeks
• Enabling self-load your data to the catalog – leading to a standardized and
streamlined workflow, which is ready for future scale out
• Managing internal data, partner data, or public data
data - science - life
Current Situation Desired State
Negative Consequences
of Non-action
Positive Business Impact,
using Eagle software
Data in silos, not reused
Easily searchable, discoverable,
accessible and intuitive portal by
researchers to all data and results
Significant researcher time spent in
“data wrangling”. Breakage
pending, based on projected
bottleneck
Improved time and ability to insight
through radically improved productivity
of researchers
No correlation between
experiments
Quick and easy exploratory analyses
across experiments and data
Missed opportunities for insights.
Failure to leverage data as an asset
Faster identification and generation of
successful product innovation and
insight, leading to reduced time to
market
No unified view of data
Data and analysis results stored in a
unified catalog, with standardized &
repeatable access using best-in-
class technology
Data (internal, external) and the
associated learning not available or
exploited as necessary
Competitive advantage through
cutting edge research, driving
innovation in the product pipeline
Lack of governance and data
provenance
Secure authenticated access with
relevant data provenance of data
sets; simple data sharing with
vendors/collaborators
Costly, inefficient and limited data
sharing with vendors; risk of
inappropriate data sharing
Efficient compliant, documented,
evidenced, and enforced data sharing
process; effective (time & cost)
collaboration between vendors and
internal teams
Data is in-effect archived
Catalog integrated in the scientific
workflow, used continuously and
keeping pace with innovation in the
industry
Missed insights; silo-ed research
activities, with high barriers to
collaboration
Systematic data sharing and re-use,
driving faster unique cross-functional
insights and reducing waste in spend
BENEFITS
For	more	info,	contact:
Raminderpal Singh
raminderpal.singh@eaglegenomics.com
In Summary,
Considerations & Challenges in
Building an End-to-End Microbiome
Workflow
data - science - life

More Related Content

What's hot

Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALJohn Koch
 
X team 2 - presentation
X team 2 - presentationX team 2 - presentation
X team 2 - presentation
Rayna Harris
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
Pistoia Alliance
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
D3 Consutling
 
Fair by design
Fair by designFair by design
Fair by design
Pistoia Alliance
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
Al Dossetter
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analytics
Axon Lawyers
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Dana Caulder
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
Christopher Hart
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
Thomas Wilckens
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutJosef Scheiber
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
Thomas Wilckens
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
Databricks
 
Using the CDD Vault for MM4TB
Using the CDD Vault for MM4TBUsing the CDD Vault for MM4TB
Using the CDD Vault for MM4TB
Sean Ekins
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use CasesFrom Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
Neo4j
 
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Covance
 

What's hot (20)

Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINAL
 
X team 2 - presentation
X team 2 - presentationX team 2 - presentation
X team 2 - presentation
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
 
Fair by design
Fair by designFair by design
Fair by design
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analytics
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
 
Using the CDD Vault for MM4TB
Using the CDD Vault for MM4TBUsing the CDD Vault for MM4TB
Using the CDD Vault for MM4TB
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use CasesFrom Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
 
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
 

Similar to Considerations and challenges in building an end to-end microbiome workflow

Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with Unilever
Eagle Genomics
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
Chris Dwan
 
FocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences researchFocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences research
FOCALCXM
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Ilkay Altintas, Ph.D.
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
Denodo
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
BHARATH KUMAR
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
How Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data ManagementHow Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data Management
Agaram Technologies
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia Alliance
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
Health Catalyst
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
Eduserv
 
Big data
Big dataBig data
Big data
Palash Jain
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
Ari Berman
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Amazon Web Services
 
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
PerkinElmer Informatics
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 

Similar to Considerations and challenges in building an end to-end microbiome workflow (20)

Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with Unilever
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
FocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences researchFocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences research
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
How Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data ManagementHow Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data Management
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Big data
Big dataBig data
Big data
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

Considerations and challenges in building an end to-end microbiome workflow

  • 1. data - science - life Considerations & Challenges in Building an End-to-End Microbiome Workflow
  • 2. data - science - life Craig earned his Ph.D. in Microbology at the University of Warwick, he then completed his Postdoctoral research at the John Innes Centre before joining EBI and developing their bioinformatics capability. Dr McAnulla has extensive experience in developing large scale pipelines for analyzing microbiome datasets. His skills cover both software and biology and he often provides deep insights to industry experts on these topics. Presenters Raminderpal earned his PhD in the semiconductor industry where he spent a decade building advanced technologies for semiconductor modelling and data mining. In 2012, Dr Singh took the helm as the Business Development Executive for IBM Research’s genomics program, where he developed and led the execution of the go-to-market and business plan for the IBM Watson Genomics program. In addition to many published papers, Dr Singh has two books and twelve patents. Dr Craig McAnulla Senior Consultant Bioinformatics at Eagle Genomics Dr Raminderpal Singh Vice President Business Development at Eagle Genomics Please Tweet your questions @eaglegen or enter them into GoTo Webinar
  • 3. Agenda 1. Considerations 2. Challenges 3. Industrialized Microbiome Workflow & Collaboration Environment 4. Deployment of a Collaboration System data - science - life
  • 4. data - science - life Considerations for successful microbiome data management and analysis: 1. Intrinsic factors 2. The relatively early maturity of the field 3. Calculation of business value 1. Considerations
  • 5. data - science - life Intrinsic • Microbiomes are complex and diverse biological systems • Many components – organisms, genes, pathways, metabolites etc • More DNA/genes than human genome • Taxonomic diversity • Many sources of error • Appropriate study design and metadata collection are crucial. 1. Considerations - Intrinsic
  • 6. data - science - life Maturity • Changing sequencing technologies • Move from amplicon-based to shotgun • Rapidly increasing data volumes • Limited reference data • Fast-moving software development • Future data re-analysis with improved methods must be considered. 1. Considerations - Maturity
  • 7. data - science - life 1. Considerations – Business value Business value • Commercial exploitation of the microbiome is in its infancy, and still at an exploratory phase. • Many research/findings remain controversial and some early claims on the gut microbiome have not been replicated. • The true potential of microbiome data is still to be tapped, and some novel experimental designs are being developed that result in maximum value being generated.
  • 8. data - science - life The analysis-at-scale of microbiomics data faces several challenges: • Metadata management • Compute resources necessary for storage and analysis • Statistical techniques Metadata Management • Crucial to analysis • Often done poorly • Needs to encompass all relevant information • Check for batch effects 2. Challenges
  • 9. data - science - life 2. Challenges Data Storage and Compute Resources • Shotgun metagenomics example study - 1.7 billion paired-end Illumina reads, 266.5 Gb. • Bigger data volume than the human genome at 30x coverage. • Study size is increasing How to store this? • Analysis selection critical • Analyses MUST work with these data volumes • eg Kraken, Humann2 • Compute resources • Eg Kraken, multicore machine > 100Gigs • Solution – the cloud • AWS, Azure
  • 10. data - science - life Statistical Techniques • Data normalization • rarefaction, normalization, log transformation? • Statistical techniques • parametric, nonparametric? • Confusion • No correct answer • Depends on the data • function vs taxonomy • shotgun vs amplicon 2. Challenges
  • 11. data - science - life • Eagle Genomics leads the industry in configuring and deploying best-in-class microbiome workflow solutions. • These run large-scale parallelized pipelines cost-efficiently, with seamless connectivity between the source data and the scientists. 3. Industrialized Microbiome Workflow & Collaboration Environment www.eaglegenomics.com “Unilever’s digital data program now processes genetic sequences twenty times faster - without incurring higher compute costs. In addition, its robust architecture supports ten times as many scientists, all working simultaneously.” Pete Keeley, eScience Technical Lead - R&D IT at Unilever
  • 12. data - science - life organization 3. Industrialized Microbiome Workflow & Collaboration Environment
  • 13. data - science - life MICROBIOME DATA FILES (amplicon or shotgun) METADATA: eaglecatalog (implements security model) QUERY/ FILTERING: eaglecatalog DATA SHARING CRO’s DEPOSIT DATA ELN EXPERIMENT METADATA INTEGRATION WITH OMICS (RNASeq, proteomics) FEDERATED DATA & RESULTS INTEGRATION TAXONOMY/ FUNCTION ANALYSIS REPORTS R,QIIME POPULATION COHORTS VISUALIZATION CLOUD STORAGE eaglecatalog COLLABORATIVE ENVIRONMENT FOR PARTNERS FEEDBACK TO EXPERIMENTAL DESIGN COST OF EXPERIMENT DATA COLLABORATION ARCHITECTURE Solving the metadata problem
  • 14. data - science - life 4. Deployment of a Collaboration System Eagle Genomics Proposal Accelerating metadata adoption and scale up • Offering a new rapid deployment for microbiome R&D organizations looking to cost-efficiently build world-class integrated workflows. • Set up of an initial collaboration system for your team. • 1 to 4 weeks • Enabling self-load your data to the catalog – leading to a standardized and streamlined workflow, which is ready for future scale out • Managing internal data, partner data, or public data
  • 15. data - science - life Current Situation Desired State Negative Consequences of Non-action Positive Business Impact, using Eagle software Data in silos, not reused Easily searchable, discoverable, accessible and intuitive portal by researchers to all data and results Significant researcher time spent in “data wrangling”. Breakage pending, based on projected bottleneck Improved time and ability to insight through radically improved productivity of researchers No correlation between experiments Quick and easy exploratory analyses across experiments and data Missed opportunities for insights. Failure to leverage data as an asset Faster identification and generation of successful product innovation and insight, leading to reduced time to market No unified view of data Data and analysis results stored in a unified catalog, with standardized & repeatable access using best-in- class technology Data (internal, external) and the associated learning not available or exploited as necessary Competitive advantage through cutting edge research, driving innovation in the product pipeline Lack of governance and data provenance Secure authenticated access with relevant data provenance of data sets; simple data sharing with vendors/collaborators Costly, inefficient and limited data sharing with vendors; risk of inappropriate data sharing Efficient compliant, documented, evidenced, and enforced data sharing process; effective (time & cost) collaboration between vendors and internal teams Data is in-effect archived Catalog integrated in the scientific workflow, used continuously and keeping pace with innovation in the industry Missed insights; silo-ed research activities, with high barriers to collaboration Systematic data sharing and re-use, driving faster unique cross-functional insights and reducing waste in spend BENEFITS
  • 16. For more info, contact: Raminderpal Singh raminderpal.singh@eaglegenomics.com In Summary, Considerations & Challenges in Building an End-to-End Microbiome Workflow data - science - life