SlideShare a Scribd company logo
1 of 25
Download to read offline
Reactome2JSON-LD
Converting REACTOME database in JSON-LD
and explore the potential in ElasticSearch/Siren Investigate
François Belleau@ Arnaud Droit Lab
19 January 2021
Initial goal
Create a Biomart like user interface built around Reactome,
CHEBI, Uniprot and GO databases all interlinked with URIs.
for
Project’s final architecture
REST/JSON
REST/JSON
EBI:REST/JSON
WSDL/XML
TXT FILE
XML FILE
1 500 000 documents
from 6 data sources
Initial project’s data model
Siren Investigate final data model
How we did it ?
For datasource in [reactome, chebi, uniprot, go, resid, intact]:
1. ETL for data transformation
a. Get the list of object IDs
b. For each document using Python
i. Use API (REST, WSDL, file) to fetch each object
ii. Load document to ElasticSearch index
c. Convert each document to JSON-LD format using Elasticsearch pipeline
d. Export index to Siren Investigate server using elasticsearch-dump
2. With Siren Investigate
a. For each datasource
i. Create visualisation
ii. Combine them to create a dashboard
b. Configure a relational model
c. Explore Reactome data into graph visualizer
Simple python scripts to fetch API and load into ES
available here https://github.com/fbelleau/swat4ls-2021/tree/main/stuff2json-ld
ES pipelines to transform JSON to JSON-LD
at warp speed in the 4 nodes ES cluster
JSON-LD validation @ https://json-ld.org/playground/
Build Siren visualization and dashboard
Using a web user interface :
1. Create visualization
2. Edit visualization parameters
3. Add visualisation to dashboard
Create relational data model in Siren
The BioMart like user
interface and plus...
Using free Siren Investigate Community edition
https://siren.io/siren-platform-editions/
Slicing and dicing a dataset with facets and text search
Main feature of BioMart
Full text search over 6 data sources
Full text search in Reactome using relations
Biomart was relational
148
pathways
222
physical entities
Visual dashboard with facet browsing
Siren Investigate graph browser
ChEBI molecules involved in process GO:0006096 according to Reactome
Conclusions
Lessons learned
● Crawling existing REST/JSON API is a simple and fast way to access Life
Science data sources
● Converting JSON to JSON-LD is simple and efficient
● Elasticsearch is a good container to store JSON-LD
○ with dereferencable URI for free
○ fast text search and query language
○ efficient disk storage
● Siren Investigate community edition works
○ to browse data with facets (Biomart way)
○ to build visual dashboard
○ to query a relational data model composed of ES indexes
○ to explore the knowledge graph using a performing graph browser
● We now know how to store RDF in Elasticsearch.
Next
We are building Kibio.science project at ADLab, it is
a new approach of data and knowledge
management that will provide a powerful solution
by integrating and connecting healthcare and
research.
It is builded with Elasticsearch and Siren
Investigate and will be hosted on our own
infrastructure.
We are open for collaboration with the SWAT4LS
community.
Arnaud Droit Lab, the
computational biology platform
of the research center of
Quebec
https://compbio.ca/
Thanks to my colleagues at ADLab
● Dr Arnaud Droit (PI)
● Mickael Leclercq
● Regis Ongaro-Carcy
and support from the Siren team
● Manu Agarwal
● Andrew Winter

More Related Content

What's hot

Scripting User Contributed Interlinking
Scripting User Contributed InterlinkingScripting User Contributed Interlinking
Scripting User Contributed Interlinkingwhalb
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountLeigh Dodds
 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingCynthiaCruz55
 
Introduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and DockerIntroduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and DockerDaniel Platt
 
A Novel Method and Architecture for Law Processing, Utilising High Performan...
A Novel Method and Architecture  for Law Processing, Utilising High Performan...A Novel Method and Architecture  for Law Processing, Utilising High Performan...
A Novel Method and Architecture for Law Processing, Utilising High Performan...Samos2019Summit
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and AtyponCrossref
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDimitris Kontokostas
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...Jenn Riley
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataRoberto García
 

What's hot (17)

The Power of Machine Learning and Graphs
The Power of Machine Learning and GraphsThe Power of Machine Learning and Graphs
The Power of Machine Learning and Graphs
 
Scripting User Contributed Interlinking
Scripting User Contributed InterlinkingScripting User Contributed Interlinking
Scripting User Contributed Interlinking
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
STI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDFSTI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDF
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
 
Web Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen ScrapingWeb Scraping using Python | Web Screen Scraping
Web Scraping using Python | Web Screen Scraping
 
Introduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and DockerIntroduction to Elastic with a hint of Symfony and Docker
Introduction to Elastic with a hint of Symfony and Docker
 
A Novel Method and Architecture for Law Processing, Utilising High Performan...
A Novel Method and Architecture  for Law Processing, Utilising High Performan...A Novel Method and Architecture  for Law Processing, Utilising High Performan...
A Novel Method and Architecture for Law Processing, Utilising High Performan...
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and Atypon
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...
An Introduction to the Open Archives Initiative Object Reuse and Exchange (OA...
 
Using xml in a data set (ado.net)
Using xml in a data set (ado.net)Using xml in a data set (ado.net)
Using xml in a data set (ado.net)
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open Data
 

Similar to Reactome2JSON-LD: Converting REACTOME Database to JSON-LD

Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONENCODE-DCC
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API ENCODE-DCC
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...Chunlei Wu
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Benoit Pauwels
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...ULB - Bibliothèques
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)ijcsitcejournal
 
Recovered file 1
Recovered file 1Recovered file 1
Recovered file 1Uthara Iyer
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataDavid Newbury
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 

Similar to Reactome2JSON-LD: Converting REACTOME Database to JSON-LD (20)

Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Metadata and me
Metadata and meMetadata and me
Metadata and me
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)
 
Recovered file 1
Recovered file 1Recovered file 1
Recovered file 1
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked Data
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 

More from François Belleau

Pitch Qliic coopérathon 2017
Pitch Qliic coopérathon 2017Pitch Qliic coopérathon 2017
Pitch Qliic coopérathon 2017François Belleau
 
2015-11-17 Présentation SEAO et ES
2015-11-17 Présentation SEAO et ES2015-11-17 Présentation SEAO et ES
2015-11-17 Présentation SEAO et ESFrançois Belleau
 
BD2K hackathon - Bio2RDF submission
BD2K hackathon - Bio2RDF submissionBD2K hackathon - Bio2RDF submission
BD2K hackathon - Bio2RDF submissionFrançois Belleau
 
Découvrir le web sémantique en 15 minutes (Decideo 2014)
Découvrir le web sémantique en 15 minutes (Decideo 2014)Découvrir le web sémantique en 15 minutes (Decideo 2014)
Découvrir le web sémantique en 15 minutes (Decideo 2014)François Belleau
 
Bio2RDF poster for Biocurator 2014 conference
Bio2RDF poster for Biocurator 2014 conferenceBio2RDF poster for Biocurator 2014 conference
Bio2RDF poster for Biocurator 2014 conferenceFrançois Belleau
 
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDF
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDFAcfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDF
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDFFrançois Belleau
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...François Belleau
 
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and MouseBio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and MouseFrançois Belleau
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemFrançois Belleau
 

More from François Belleau (20)

Bio2RDF @ DILS 2008
Bio2RDF @ DILS 2008Bio2RDF @ DILS 2008
Bio2RDF @ DILS 2008
 
Show de boucane pour ELK
Show de boucane pour ELKShow de boucane pour ELK
Show de boucane pour ELK
 
Pitch Qliic coopérathon 2017
Pitch Qliic coopérathon 2017Pitch Qliic coopérathon 2017
Pitch Qliic coopérathon 2017
 
2015-11-17 Présentation SEAO et ES
2015-11-17 Présentation SEAO et ES2015-11-17 Présentation SEAO et ES
2015-11-17 Présentation SEAO et ES
 
Linuq 20160130
Linuq 20160130Linuq 20160130
Linuq 20160130
 
textOdossier
textOdossiertextOdossier
textOdossier
 
BD2K hackathon - Bio2RDF submission
BD2K hackathon - Bio2RDF submissionBD2K hackathon - Bio2RDF submission
BD2K hackathon - Bio2RDF submission
 
Découvrir le web sémantique en 15 minutes (Decideo 2014)
Découvrir le web sémantique en 15 minutes (Decideo 2014)Découvrir le web sémantique en 15 minutes (Decideo 2014)
Découvrir le web sémantique en 15 minutes (Decideo 2014)
 
Bio2RDF poster for Biocurator 2014 conference
Bio2RDF poster for Biocurator 2014 conferenceBio2RDF poster for Biocurator 2014 conference
Bio2RDF poster for Biocurator 2014 conference
 
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDF
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDFAcfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDF
Acfas 2013 - Comment publier sur le web sémantique : la méthode de Bio2RDF
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Bio2RDF-ISMB2008
Bio2RDF-ISMB2008Bio2RDF-ISMB2008
Bio2RDF-ISMB2008
 
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and MouseBio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse
Bio2RDF : A Semantic Web Atlas of post genomic knowledge about Human and Mouse
 
Bio2RDF should we do it
Bio2RDF should we do itBio2RDF should we do it
Bio2RDF should we do it
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
 
Bio2RDF/Virtuoso
Bio2RDF/VirtuosoBio2RDF/Virtuoso
Bio2RDF/Virtuoso
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Reactome2JSON-LD: Converting REACTOME Database to JSON-LD

  • 1. Reactome2JSON-LD Converting REACTOME database in JSON-LD and explore the potential in ElasticSearch/Siren Investigate François Belleau@ Arnaud Droit Lab 19 January 2021
  • 3. Create a Biomart like user interface built around Reactome, CHEBI, Uniprot and GO databases all interlinked with URIs. for
  • 4. Project’s final architecture REST/JSON REST/JSON EBI:REST/JSON WSDL/XML TXT FILE XML FILE 1 500 000 documents from 6 data sources
  • 7. How we did it ?
  • 8. For datasource in [reactome, chebi, uniprot, go, resid, intact]: 1. ETL for data transformation a. Get the list of object IDs b. For each document using Python i. Use API (REST, WSDL, file) to fetch each object ii. Load document to ElasticSearch index c. Convert each document to JSON-LD format using Elasticsearch pipeline d. Export index to Siren Investigate server using elasticsearch-dump 2. With Siren Investigate a. For each datasource i. Create visualisation ii. Combine them to create a dashboard b. Configure a relational model c. Explore Reactome data into graph visualizer
  • 9. Simple python scripts to fetch API and load into ES available here https://github.com/fbelleau/swat4ls-2021/tree/main/stuff2json-ld
  • 10. ES pipelines to transform JSON to JSON-LD at warp speed in the 4 nodes ES cluster
  • 11. JSON-LD validation @ https://json-ld.org/playground/
  • 12. Build Siren visualization and dashboard Using a web user interface : 1. Create visualization 2. Edit visualization parameters 3. Add visualisation to dashboard
  • 13. Create relational data model in Siren
  • 14. The BioMart like user interface and plus...
  • 15. Using free Siren Investigate Community edition https://siren.io/siren-platform-editions/
  • 16. Slicing and dicing a dataset with facets and text search Main feature of BioMart
  • 17. Full text search over 6 data sources
  • 18. Full text search in Reactome using relations Biomart was relational 148 pathways 222 physical entities
  • 19. Visual dashboard with facet browsing
  • 20. Siren Investigate graph browser ChEBI molecules involved in process GO:0006096 according to Reactome
  • 22. Lessons learned ● Crawling existing REST/JSON API is a simple and fast way to access Life Science data sources ● Converting JSON to JSON-LD is simple and efficient ● Elasticsearch is a good container to store JSON-LD ○ with dereferencable URI for free ○ fast text search and query language ○ efficient disk storage ● Siren Investigate community edition works ○ to browse data with facets (Biomart way) ○ to build visual dashboard ○ to query a relational data model composed of ES indexes ○ to explore the knowledge graph using a performing graph browser ● We now know how to store RDF in Elasticsearch.
  • 23. Next
  • 24. We are building Kibio.science project at ADLab, it is a new approach of data and knowledge management that will provide a powerful solution by integrating and connecting healthcare and research. It is builded with Elasticsearch and Siren Investigate and will be hosted on our own infrastructure. We are open for collaboration with the SWAT4LS community.
  • 25. Arnaud Droit Lab, the computational biology platform of the research center of Quebec https://compbio.ca/ Thanks to my colleagues at ADLab ● Dr Arnaud Droit (PI) ● Mickael Leclercq ● Regis Ongaro-Carcy and support from the Siren team ● Manu Agarwal ● Andrew Winter