Customisable cross-database Bio2RDF queries

•

0 likes•401 views

Peter Ansell

Education

Introduction
● Large number of cross-disciplinary datasources in
different locations
● Scientists require simplified access to many of the
datasources for complex research
● Recently a large number of datasources have been
published using the RDF syntax
● Now, we need to learn how to query across them and
be able to share that knowledge with others

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

2

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

3

Linked Data
1) Use URIs as names for things
2) Use HTTP URIs so that people can look up those
names.
3) When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
4) Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

4

Linked Data querying
● Strategy 1 (Naive) :
– Retrieve resources
– Retrieve linked resources
– Cache the resources and perform queries locally
● Strategy 2 (Search engine):
– Retrieve resources
– Retrieve directly linked resources
– Query a semantic search engine for related resources
– Cache the resources and perform queries locally
Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

5

Linked Data querying
● Strategy 3 (Distributed query) :
– Mix SPARQL endpoint queries with URI based
resolution to avoid having a large local cache
– Normalise results from each site to form final query
result
– Users can process the results, and perform one or more
queries based on their interpretation of the results

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

6

Bio2RDF distributed queries
● Assign Namespaces to providers
● Query across relevant providers given a users query
● Aggregate all results into a single RDF document
and return to the user
● It works: 700000 queries during the last month
● Largest dataset has 10 billion triples, the Protein
Databank, with others making up about 5 billion
triples

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

7

Workflow
Resolved URI: http://bio2rdf.org/label/go:0000345

Host name: http://bio2rdf.org/ Query: label/go:0000345

Regular expression: label/([w-]+):(.+)

http://bio2rdf.org/query:labelsearch

http://bio2rdf.org/query:labelsearchforgo

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

8

Collaboration
● Query and provider definitions have an RDF
representation
● Any other person is able to take a definition and
change it to suit their needs and redistribute their
definition
● If definitions are Linked Data themselves, as the
Bio2RDF configuration item are, the HTTP URI can
be used by others to pull the definition into their
own software
Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

9

Provenance for queries and data
● Provenance can be attached to each item, including
details such as OpenID URI's and dates
● The sources and queries that were used for a
particular URI can be found by utilising the query
plan option
– http://bio2rdf.org/queryplan/label/go:0000345
– Enables automatic query collaboration, as you could
take this query plan and add or modify queries
inside the query plan

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

10

Conclusion
● Many large distributed datasources
● Single interface, RDF
● Distribute queries efficiently across the endpoints
● Enabling people to create the definitions of what
they did so other people can collaborate, via a single
server or copy and paste

Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009

11

What's hot

The expanding dataverseMerce Crosas

Using Neo4j for exploring the research graph connections made by RD-Switchboardamiraryani

鏈結資料在圖書館的應用20131107皓仁柯

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen

Data Citation in The Dataverse NetworkMicah Altman

Scholze liber 2015-06-25_finalKarlsruhe Institute of Technology (KIT)

Standardization and integration of molecular biology information with DASRafael C. Jimenez

GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen

EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen

Scholze goportis 4-11-14Karlsruhe Institute of Technology (KIT)

DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE

Data exchange alternatives, GIGA TAG (2009)Dag Endresen

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble

20151102koyamaYukinobu Koyama

Dataset Citation and Identificationguest453b14

re3data.org – a Registry of Research Data RepositoriesHeinz Pampel

Opendata repository-v2Desconnets Jean-Christophe

Global Biodiversity Information Facility - 2013Dag Endresen

Elab 16 5-13-re3data-scholze-finalKarlsruhe Institute of Technology (KIT)

Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)Dag Endresen

What's hot (20)

The expanding dataverse

Using Neo4j for exploring the research graph connections made by RD-Switchboard

鏈結資料在圖書館的應用20131107

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)

Data Citation in The Dataverse Network

Scholze liber 2015-06-25_final

Standardization and integration of molecular biology information with DAS

GBIF BIFA mentoring, Day 5a Data management, July 2016

EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)

Scholze goportis 4-11-14

DataCite – Bridging the gap and helping to find, access and reuse data – Herb...

Data exchange alternatives, GIGA TAG (2009)

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...

20151102koyama

Dataset Citation and Identification

re3data.org – a Registry of Research Data Repositories

Opendata repository-v2

Global Biodiversity Information Facility - 2013

Elab 16 5-13-re3data-scholze-final

Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)

Viewers also liked

HIKM2010 - Query Resolution for Biology and MedicinePeter Ansell

12 Jyotirlingastamil nenjam

Take The Timetamil nenjam

Bio2RDF Distributed Querying modelPeter Ansell

Lynes Presentation 1Aama Projects

Mathematics Of Lifetamil nenjam

students using statisticspetsat

32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman

Viewers also liked (8)

HIKM2010 - Query Resolution for Biology and Medicine

12 Jyotirlingas

Take The Time

Bio2RDF Distributed Querying model

Lynes Presentation 1

Mathematics Of Life

students using statistics

32 Ways a Digital Marketing Consultant Can Help Grow Your Business

Similar to Customisable cross-database Bio2RDF queries

W4 4 marc-alexandre-nolin-v2nolmar01

RDM Programme @ Edinburgh - Service InteroperationEDINA, University of Edinburgh

Linking Open Government Data at Scale Bernadette Hyland-Wood

Making Data Dynamic: Views from UC3, CDLCarly Strasser

The Heterogenous Zone: Six use cases for six research data collections in Edi...EDINA, University of Edinburgh

Engaging the Researcher in RDMEDINA, University of Edinburgh

RDA UpdateResearch Data Alliance

Data Access & Storage @ UWA - UWA Research Week September 2017Katina Toufexis

DSpace for Data RevisitedEDINA, University of Edinburgh

Building a linked data based content discovery service for the RTÉ ArchivesMediaMixerCommunity

Edinburgh DataShare - DSpace for DataHistoric Environment Scotland

Exploring the Semantic WebRoberto García

Sandra Collins - Building a linked data based content discovery service for t...dri_ireland

Researcher Identifiers and National Federated Search Portal for Japanese Inst...National Institute of Informatics

UWA Research Week 2016Katina Toufexis

Metadata for Research Objectsseanb

ElN - repository integration at the University of Goettingenrmacneil88

Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...ProgCity

Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...Communication and Media Studies, Carleton University

Introduction to Research Data ManagementEDINA, University of Edinburgh

Similar to Customisable cross-database Bio2RDF queries (20)

W4 4 marc-alexandre-nolin-v2

RDM Programme @ Edinburgh - Service Interoperation

Linking Open Government Data at Scale

Making Data Dynamic: Views from UC3, CDL

The Heterogenous Zone: Six use cases for six research data collections in Edi...

Engaging the Researcher in RDM

RDA Update

Data Access & Storage @ UWA - UWA Research Week September 2017

DSpace for Data Revisited

Building a linked data based content discovery service for the RTÉ Archives

Edinburgh DataShare - DSpace for Data

Exploring the Semantic Web

Sandra Collins - Building a linked data based content discovery service for t...

Researcher Identifiers and National Federated Search Portal for Japanese Inst...

UWA Research Week 2016

Metadata for Research Objects

ElN - repository integration at the University of Goettingen

Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...

Introduction to Research Data Management

Recently uploaded

Alper Gobel In Media Res Media ComponentInMediaRes1

Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita

Software Engineering Methodologies (overview)eniolaolutunde

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

The Most Excellent Way | 1 Corinthians 13Steve Thomason

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR

9953330565 Low Rate Call Girls In Rohini Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

microwave assisted reaction. General introductionMaksud Ahmed

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE

MENTAL STATUS EXAMINATION format.docxPoojaSen20

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

Presiding Officer Training module 2024 lok sabha electionsanshu789521

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

mini mental status format.docxPoojaSen20

Recently uploaded (20)

Alper Gobel In Media Res Media Component

Class 11 Legal Studies Ch-1 Concept of State .pdf

Software Engineering Methodologies (overview)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17

CARE OF CHILD IN INCUBATOR..........pptx

Separation of Lanthanides/ Lanthanides and Actinides

The Most Excellent Way | 1 Corinthians 13

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️

9953330565 Low Rate Call Girls In Rohini Delhi NCR

How to Make a Pirate ship Primary Education.pptx

microwave assisted reaction. General introduction

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...

MENTAL STATUS EXAMINATION format.docx

Solving Puzzles Benefits Everyone (English).pptx

Presiding Officer Training module 2024 lok sabha elections

Introduction to ArtificiaI Intelligence in Higher Education

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝

mini mental status format.docx

Customisable cross-database Bio2RDF queries

1. Collaborative development of cross- database Bio2RDF queries Peter Ansell Microsoft Queensland University of Technology eResearch Centre p.ansell@qut.edu.au

2. Introduction ● Large number of cross-disciplinary datasources in different locations ● Scientists require simplified access to many of the datasources for complex research ● Recently a large number of datasources have been published using the RDF syntax ● Now, we need to learn how to query across them and be able to share that knowledge with others Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 2

3. Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 3

4. Linked Data 1) Use URIs as names for things 2) Use HTTP URIs so that people can look up those names. 3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4) Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 4

5. Linked Data querying ● Strategy 1 (Naive) : – Retrieve resources – Retrieve linked resources – Cache the resources and perform queries locally ● Strategy 2 (Search engine): – Retrieve resources – Retrieve directly linked resources – Query a semantic search engine for related resources – Cache the resources and perform queries locally Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 5

6. Linked Data querying ● Strategy 3 (Distributed query) : – Mix SPARQL endpoint queries with URI based resolution to avoid having a large local cache – Normalise results from each site to form final query result – Users can process the results, and perform one or more queries based on their interpretation of the results Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 6

7. Bio2RDF distributed queries ● Assign Namespaces to providers ● Query across relevant providers given a users query ● Aggregate all results into a single RDF document and return to the user ● It works: 700000 queries during the last month ● Largest dataset has 10 billion triples, the Protein Databank, with others making up about 5 billion triples Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 7

8. Workflow Resolved URI: http://bio2rdf.org/label/go:0000345 Host name: http://bio2rdf.org/ Query: label/go:0000345 Regular expression: label/([w-]+):(.+) http://bio2rdf.org/query:labelsearch http://bio2rdf.org/query:labelsearchforgo Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 8

9. Collaboration ● Query and provider definitions have an RDF representation ● Any other person is able to take a definition and change it to suit their needs and redistribute their definition ● If definitions are Linked Data themselves, as the Bio2RDF configuration item are, the HTTP URI can be used by others to pull the definition into their own software Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 9

10. Provenance for queries and data ● Provenance can be attached to each item, including details such as OpenID URI's and dates ● The sources and queries that were used for a particular URI can be found by utilising the query plan option – http://bio2rdf.org/queryplan/label/go:0000345 – Enables automatic query collaboration, as you could take this query plan and add or modify queries inside the query plan Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 10

11. Conclusion ● Many large distributed datasources ● Single interface, RDF ● Distribute queries efficiently across the endpoints ● Enabling people to create the definitions of what they did so other people can collaborate, via a single server or copy and paste Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 11

Customisable cross-database Bio2RDF queries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Customisable cross-database Bio2RDF queries

Similar to Customisable cross-database Bio2RDF queries (20)

Recently uploaded

Recently uploaded (20)

Customisable cross-database Bio2RDF queries