SlideShare a Scribd company logo
Ondex – Data integration and 			visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
Rothamsted Research North Wyke ,[object Object]
Almost certainly the oldest in the world (started in 1843)
350 Scientific staff
Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
Outline ,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Genomics, transcriptomics, proteomics, metabolomics, …
The biological systems span multiple levels of biological organisation
Non-trivial to integrate the data 2 main challenges
Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
Outline ,[object Object]
Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Ontology of Concept Classes, Relation Types and additional Properties
Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Concepts: Relations:
Data integration in Ondex Data Integration Data Input Graph of concepts and relations  Biological Databases Import Ontologies & Free Text Data alignment ,[object Object]
 Sequence analysis
 Text miningExperimental Data
Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous  data sources Ondexgraph warehouse Integration Methods Ondex Visualization  Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
Outline ,[object Object]
Data integration in Ondex
Data visualisation in Ondex and application cases,[object Object]
Complexity of interactions
PPI, co-expression, 	co-citation, … ,[object Object]
Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
Filters Integrating different datasets   large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
Filters in Ondex Protein protein interactions measured using quantitative techniques ,[object Object]
 Threshold filter,[object Object]
http://www.phi-base.org/ ,[object Object]
Loss of pathogenicity
Reduced virulence
Only genes validated by gene disruption experiments,[object Object]
Integrated phenotype and comparative genome information
Annotators (1/3) ,[object Object]
Colour
Shape
Size,[object Object]
Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network  Ondex resizes the concepts based on this score Scale Concept by Value  Pie charts Up/down regulation is indicated in red/green

More Related Content

What's hot

Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
Hafiz Muhammad Zeeshan Raza
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
Chris Evelo
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databasescschlos2
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
mikaelhuss
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
Abhiroop Ghatak
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
MSCW Mysore
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
Chris Evelo
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
Meghaj Mallick
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
Chris Evelo
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
AyeshaYousaf20
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
Catherine Canevet
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
KAUSHAL SAHU
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
VinaKhan1
 

What's hot (20)

Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databases
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 

Similar to Ondex: Data integration and visualisation

Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
Lars Juhl Jensen
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
open_phacts
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
open_phacts
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
String.pptx
String.pptxString.pptx
String.pptx
RitikaChoudhary57
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
OVium Solutions
 
Protein databases
Protein databasesProtein databases
Protein databases
bansalaman80
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
Vidya Kalaivani Rajkumar
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
Jaclyn Williams
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
Elia Brodsky
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]
Joanne Luciano
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
nedalalazzwy
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Anubis Hosein
 
gky1131.pdf
gky1131.pdfgky1131.pdf
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
kigaruantony
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
Melanie Courtot
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
Tim Clark
 

Similar to Ondex: Data integration and visualisation (20)

Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
String.pptx
String.pptxString.pptx
String.pptx
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
gky1131.pdf
gky1131.pdfgky1131.pdf
gky1131.pdf
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 

More from Biogeeks

Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart disease
Biogeeks
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modelling
Biogeeks
 
Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...
Biogeeks
 
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
Biogeeks
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
Biogeeks
 

More from Biogeeks (6)

Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart disease
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modelling
 
Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...
 
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Ondex: Data integration and visualisation

  • 1. Ondex – Data integration and visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
  • 2.
  • 3. Almost certainly the oldest in the world (started in 1843)
  • 5. Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
  • 6.
  • 8.
  • 10.
  • 12. The biological systems span multiple levels of biological organisation
  • 13. Non-trivial to integrate the data 2 main challenges
  • 14. Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
  • 15. Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
  • 16.
  • 18.
  • 19. Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Ontology of Concept Classes, Relation Types and additional Properties
  • 20. Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Concepts: Relations:
  • 21.
  • 24. Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
  • 25. Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
  • 26. Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous data sources Ondexgraph warehouse Integration Methods Ondex Visualization Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
  • 27. Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
  • 28.
  • 30.
  • 32.
  • 33. Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
  • 34. Filters Integrating different datasets  large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
  • 35.
  • 36.
  • 37.
  • 40.
  • 41. Integrated phenotype and comparative genome information
  • 42.
  • 44. Shape
  • 45.
  • 46. Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network Ondex resizes the concepts based on this score Scale Concept by Value Pie charts Up/down regulation is indicated in red/green
  • 47. AraCyc ONDEX Application case2: Mapping microarray expression data to integrated pathways Parser tab file Arabidopsis C/N uptake OXL tab file Jan Taubert Accession based Mapping usingTAIR IDs Ondex Interactive exploration Enriched spreadsheet, e.g. AraCyc pathways
  • 48.
  • 49.
  • 53. Network diameter Add annotation to the graph
  • 54. Application case 3: Arabidopsis PPI network Artem Lysenko IntAct TAIR BioGRID  Mapping the 3 databases based on TAIR accessions
  • 55. Adding 3 sources of evidence co-expression sequence similarity co-occurrence in scientific literature  facilitate the identification of functionally related groups of proteins
  • 56. Added attributes to nodes/edges Network stats Betweenness centrality (BWC)  How influential (bridge) Degree centrality (DC)  Hub likeness Markov Clustering Identifies strongly connected groups of proteins in the network
  • 57.
  • 58. Degree centrality repr. by node size
  • 59. Betweenness centrality repr. by node colourArtem Lysenko
  • 60. Filters, annotators and layouts Combination of these three types of tools in Ondex  a more complex application case …
  • 61. Application case 4: Bioenergy Project Use bioinformatics to support phenotype-genotype research in bioenergy crops Given a phenotypic variant is it possible to pin down the relevant genes? Develop tools to support systematic analysis of QTL regions to pin down relevant genes Identify genes implicated in biomass production in willow Prioritise genes for experimental validation Keywan Hassani-Pak Biofuel Conversion Process http://www.jgi.doe.gov/education/bioenergy/bioenergy_1.html
  • 62. QTL and Genomic Data QTL Willow genome is not sequenced yetQTL may encompass many potentialcandidates, perhaps hundreds Poplar is the first tree with fully sequenced genome 19 Chromosomes, 45778 predicted genes 4x larger than Arabidopsis genome Not much known about the function of the genes
  • 63. Linking genes to data sources Linked References model e.g. Poplar, Arabidopsis Willow Pathways Plant Hormones QTL Map Orthologous Markers Physical map Expression Patterns Genes Gene Function List of candidate genes linked to biological processes
  • 64. Relevant Data Sources Release 15.10 Poplar Gene Prediction v2.0 (Jan 2010) All plants: 739,396 proteins Reviewed: 28,404 proteins (3,84%) PoplarCyc 1.0: 285 pathways, 3434 enzymes, 1363 compounds (Oct 2009) Pfam 24.0: 11,912 protein families (Oct 2009) Poplar Transcription Factors - DPTF: 2,576 putative TF (March 2007) - PlnTFDB: 2,901 putative TF (July 2009) 29,365 GO terms (Jan 2010) Poplar/ Willow QTL - work in progress - preliminary dataset available Only loading referenced publications ~15,000 articles
  • 65. Unique Knowledge Base for Poplar Proteins annotated with functional information and publications Based on Comparative genomics and Protein familyanalysis Genes, QTLs enriched withpositionalinformation Data integration was done in Ondex
  • 66. Ondex Genomics Layout Genomic Layout displays chromosomes, genes and QTLs Chromosomal regions and QTLs can be selected
  • 67. Ondex Genomics Filter Genes of interest Enriched protein annotation network
  • 68. Phenotypic Information in Literature HMMer: 650581 – HLH E-Value: 3.4E-7 Score: 30.0 BLAST 217086 – LAX E-Value: 8.3E-17 Score: 80.88 BLAST 217086 – BHLH63 E-Value: 8.3E-9 Score: 54.3 PMID:13130077 “LAX and SPA: major regulators of shoot branching in rice.” Poplar protein 217086 We identified two remote homologs in Rice (LAX) and in Arabidopsis (BHLH63), as well as one protein domain HLH The LAX homolog contains evidence to be a major regulator of shoot branching  Hypothesis generation
  • 69.
  • 71. Text miningExperimental Data Hypothesis New experiments
  • 72.
  • 81.
  • 91.
  • 101.
  • 106.
  • 107.
  • 109.

Editor's Notes

  1. Light pink – Increased virulenceLight blue – Reduced virulenceLight Green – Loss of pathogenicityYellow – Unaffected pathogenicityStar – animalCircle – plant
  2. Virtual KO scoreis based on 3 other scores: - "extension" gives the number of paths that would be extended if a concept was added- "deletion" gives the number of paths that would be deleted if this concept was deleted- "nochange" gives the number of paths that would not be shortened/extended if this concept was deleted
  3. IntAct4625 protein interactions (data derived from literature curation or direct user submissions)TAIR (The Arabidopsis Information Resource) – 1143 interactionsgenome sequence, gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publicationsBioGrid (General Repository for Interaction Datasets)collections of protein and genetic interactions from major model organism species1223 interactions for Arabidopsis derived from high-throughput studies and conventional focused studies
  4. ATTED II (Arabidopsis thalianatrans-factor and cis-element prediction database)provides co-regulated gene relationships in Arabidopsis to estimate gene functionsgives the Pearson correlation coefficients of co-expressed genes in Arabidopsis calculated from available microarray dataNCBI PSI-BLASTidentify similarities between our reference set of proteinsMatching against Arabidopsis subset of UNIPROTCo-occurrence of protein names25,900 Medline abstracts related to Arabidopsis ThalianaIntegrated Lucene-based mapping method
  5. Solid biomass (in the form of plants and trees) can be converted into liquid fuels (such as ethanol, methanol, and biodiesel)The challenge lies in efficient conversion,creating more energy than the input required to produce itincrease biomass yieldDevelop means to support systematic analysis of QTL regions and prioritise genes for experimental analyses identify genes controlling biomass production in willow
  6. QTL are genomic regions that assign variations observed in a phenotype to a region on the genetic mapBiomass traits: branching, height, leaf number etc.Going from Willow to Poplar to Arabidopsis and other species
  7. Reduced hypothesis space from 100 potential candidates to 3 hot candidates.Next steps: Cloning and transformation for experimental validation.