A panel of experts, including Dr Barry Murphy, Microbiomics Science Lead at Unilever, Dr Craig McAnulla, Senior Consultant for Bioinformatics and Dr Yasmin Alam-Faruque, Scientific Data Manager/Biocurator discuss first-hand experience and views on how to get better insights faster from microbiome data.
2. Panelists
Dr Barry Murphy
Microbiomics Science Lead at Unilever
Dr Craig McAnulla
Senior Consultant Bioinformatics at Eagle
Genomics
Dr Raminderpal Singh
Vice President, Head of Microbiome
Division at Eagle Genomics
Dr Yasmin Alam-Faruque
Scientific Data Manager/Biocurator at Eagle
Genomics
3. Panelist Topics
• Craig – Example applications for microbiome analyses
• Yasmin – The importance of data catalogue and ontologies
• Barry – Scaling in industry and bringing the data to the hands of the scientist
4. Example 1: Minimal requirements for scaled bioinformatic
analyses
• All interested parties (e.g. scientists, managers) can see the data and its context
• Searchable – data re-use
• Cost efficient workflow management
• Point-and-click analysis
• Software versions and parameters reported
• Links to analysis results
• Easy re-analysis
• Programmatic access e.g. integration with other systems
5. Example 2: Managing Food Safety Simply & Quickly
• Background
• Food testing company, working with an ice
cream maker/customer
• Checking samples for potential pathogens
etc.
• Retrospective
• Total bacterial counts within limits but
increasing
• How we solved the problem
• “data catalogue” + analysis
• Narrowed the source of the problem
• Company located the source and fixed
• No harm to consumers!
6. Example 3: Detection of Antibiotic-Resistant Genes
• Problem
• Antiseptics/disinfectants widely used
• Effective at killing bacteria
• Some linked to antibiotic resistance
• Does your product promote antibiotic
resistant superbugs?
• What needs to be done
• Integrate deep-learning antibiotic
resistance detection
• Analyse treated vs. untreated
microbiome
• Trends over time
7. Benefits of Data Catalogue
• Resource - legacy & current experimental datasets
• Federation of disparate data sources (internal and
external)
• Collaboration and sharing of data within organisation
and external partners
• Economical benefit - data/ sample reuse in new studies/
analyses and prevent repeating experiments
• Accessibility of datasets – available for reanalyses with
newer analytical tools/ algorithms, providing enhanced
scientific insight
• Data curation – multistep activity, crucial for data
comparison, integration and analyses
• In-house bioinformatics "Analyses” – allows scientists to
seamlessly perform meaningful computational analytics
8. Best practice - data management
• Curation requires improved data standards - better data management, integration and
interoperability
• Currently - poorly defined and often overlooked/ ignored, when present
• Hampered by multiple standards and many associated ontologies which can overlap in the
same domain
• Ontology: formal naming and definition of concepts and the relationships between entities in
a universal set domain, understood by people and computer software (e.g. microbiome
referred to as microbiota, microflora, metagenome)
• Ontologies – important, serving as the “smart glue” for data integration and knowledge
management to semantically allow linking between datasets and their metadata
• Initiatives e.g. Pistoia Alliance Ontologies Mapping project - working towards supporting
better tools, services and best practices for ontology management and mapping in life
sciences
9. Value of Curation and Ontologies
• Ontologies - essential for curation, leading to enhanced data harmonisation and data
governance across all levels of organisation
• Improving the quality of the underlying datasets/ metadata
• Pioneering value driven curation used to bridge gap between ‘big data’ and ‘biological
insight’ to assist with answering business and scientific questions
• Structured metadata/ datasets - increases an organisations’ data management maturity across
all levels of integration, analyses and reporting
10. Using the right ontologies
0
20
40
60
80
100
120
140
DOID
GO
UBERON
CLO
OBI
CHEBI
CL
HP
EFO
OMIT
ORDO
HP;DOID
BTO
BIOASSAY
PATHWAY
ECO
XCO
ONTONEO
KEGG;GO
SNOMEDCT
MESH
SNOMEDCT;
HGNC
CHEMBL
Reactome
UNIPROT
QIAGEN
PUBCHEM
INTERPRO
NCBIGene
CVCL(cellosaurus)
NULL
Initial suggestion Final
Disease/ Phenotype
Biological metabolic reac<ons/pathways
Experimental design, methodology
• Many ontologies exist for different domains of biology
• Overlap can cause confusion
• Need to choose best one