Publishing and Consuming FAIR Data A Case in the Agri-Food Domain

Rothamsted Research, UK
Rothamsted Research, UKBioinformatics Specialist
Publishing and Consuming FAIR Data
A Case in the Agri-Food Domain
#ODS 2021, April 17th, 2021
Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Find this presentation on SlideShare
background source: https://www.eurekalert.org/multimedia/pub/248200.php
Hello!
• Geek since 1980s and C=64 times
• Started working with Life Science Data 2003
• Started with Semantic Web and LOD
• Univ. of Milano-Bicocca, EMBL-EBI
• and now Rothamsted Research
• Meanwhile, (h)activism in open source, open data
• Especially in Italy (SOD)
• Still with Semantic Web and LOD, but ...
A Major Problem with (Open) Data
How many oil paintings from 1600s
are available in Italy? What are
their locations?
Source: Wikipedia:Cattedrale_di_Caltanissetta
A Major Problem with (Open) Data
How many oil paintings from 1600s are available
in Italy? What are their locations?
• 2 regions using common CSV
• 1 using its own CSV
• 1 using completely custom
RDF (!)
• None using Cultural-ON or
another standard
Source: Brandizi, Agenda Digitale (2018), tinyurl.com/y72wjhm8 github.com/marco-brandizi/cultural_on_ex
A Common Curse Problem in Many Domains
Source: Kamdar, Musen, 2021,
https://www.nature.com/articles/s41597-021-00797-y Source: Brandizi, IB2019, https://tinyurl.com/y6p78968
What we Do for (Plant) Biology and Agriculture
Based on publications, which genes are related to the yellow rust disease?
In which biological processes are their encoded proteins involved?
1 2
5 8
1
3
4
5
7
6
4
3
2
1
6 7
8
Towards FAIRer Data
Based on publications, which genes are related to the yellow rust disease? In which
biological processes are their encoded proteins involved?
AgriSchemas
ontology (BioKNO)
ETL
Tools
knetminer.org
Want some demo?
• Count Data Sources
• Integration of Knetminer publications and EBI/GXA gene
expression experiments
• Using data with Jupyter (and Neo4j, see more here)
Why schema.org?
Simple & Complementary
Why schema.org?
Web-Oriented, Standard and FAIR
Source and recommended read: https://tinyurl.com/yxocd3b9
(3) Findable
Register it dataset DOI on datasetsearch.research.google.com
Recognised via schema.org
(2) Accessible
Resolvable URIs makes data accessible
(1) Interoperable
Recognised via schema.org, links to bio-ontologies, standard IDs
Query/representation standards (SPARQL, Cypher, GraphQL, JSON-LD)
(4) Reusable
Clear licence
Ideally, machine-readable licence (eg, CCREL)
However, we’re schema-agnostic
ETL
Tools
However, we’re schema-agnostic
• Pipelines based on incremental workflows (Snakemake)
• Dependency management (Anaconda)
• RDF/RDF conversion via SPARQL
• Ontology API and Ontology annotator (via APIs)
• Want more details? Check it out on github
ETL
Tools
Hence, we could collaborate!
• Do you have your data integration project?
• To perform analysis?
• To try machine learning / artificial intelligence?
• Are you in the agri-food domain?
• Or life sciences, ecology, biomedicine, healthcare?
• Want to build visualisations, data explorers, UI components, etc?
• For known schemas/ontologies, ie, reusable!
• Are you a student? A teacher?
Ajit Singh
Software Engineer
• Samiul Haque, Ed Eyles, IT admins
• Joseph Hearnshaw, software engineer
• Louis Timberlake, visiting student
• Alice Minotto, Earlham Institute, hosting providers
• Robert Davey, Earlham Institute, DFW WP4 coordinator
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors, developers
Keywan Hassani-Pak
KnetMiner Team Leader
Chris Rawlings
Head of Computational & Analytical Sciences
Jeremy Parsons
Bioinformatics Scientist
Acknowledgements
Simple & Complementary (the Profiles Approach)
Source: https://bioschemas.org/profiles/Study/0.2-DRAFT/
Why schema.org? Web-oriented
Source: https://bioschemas.org/liveDeploys/
1 of 16

Recommended

Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine... by
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
355 views27 slides
Datat and donuts: how to write a data management plan by
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planC. Tobin Magle
451 views52 slides
Linked Data for improved organization of research data by
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research dataSamuel Lampa
700 views24 slides
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin... by
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
835 views3 slides
A guided tour of Araport by
A guided tour of AraportA guided tour of Araport
A guided tour of AraportAraport
1.1K views31 slides
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat... by
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
925 views4 slides

More Related Content

What's hot

Tripal within the Arabidopsis Information Portal - PAG XXIII by
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
1.2K views19 slides
Software Sustainability: Better Software Better Science by
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
2.1K views73 slides
Vaughn aip walkthru_pag2015 by
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Araport
1.5K views54 slides
Intro to Reproducible Research by
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible ResearchC. Tobin Magle
163 views25 slides
FAIR Computational Workflows by
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
982 views49 slides
ICAR 2015 Workshop - Agnes Chan by
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanAraport
1.4K views33 slides

What's hot(20)

Tripal within the Arabidopsis Information Portal - PAG XXIII by Vivek Krishnakumar
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar1.2K views
Software Sustainability: Better Software Better Science by Carole Goble
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble2.1K views
Vaughn aip walkthru_pag2015 by Araport
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
Araport1.5K views
Intro to Reproducible Research by C. Tobin Magle
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
C. Tobin Magle163 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble982 views
ICAR 2015 Workshop - Agnes Chan by Araport
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes Chan
Araport1.4K views
Module development by Araport
Module development Module development
Module development
Araport998 views
Slow-cooked data and APIs in the world of Big Data: the view from a city per... by Oscar Corcho
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Oscar Corcho3.1K views
2016 Summer - Araport Project Overview Leaflet by Araport
2016 Summer - Araport Project Overview Leaflet2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet
Araport734 views
2009 0807 Lod Gmod by Jun Zhao
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao397 views
Plant ontology web services on Araport by Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
Araport1.5K views
20141112 courtot big_datasemwebontologies by Melanie Courtot
20141112 courtot big_datasemwebontologies20141112 courtot big_datasemwebontologies
20141112 courtot big_datasemwebontologies
Melanie Courtot901 views
My Research Journey with R by Tom Kelly
My Research Journey with RMy Research Journey with R
My Research Journey with R
Tom Kelly383 views
Why do they call it Linked Data when they want to say...? by Oscar Corcho
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
Oscar Corcho3.9K views
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford by Simeon Warner
Linked Data for Libraries: Experiments between Cornell, Harvard and StanfordLinked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Simeon Warner6.4K views
Inferring Web Citations using Social Data and SPARQL Rules by Matthew Rowe
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
Matthew Rowe446 views
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by dgarijo
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo519 views

Similar to Publishing and Consuming FAIR Data A Case in the Agri-Food Domain

The Europeana Strategy and Linked Open Data by
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open DataDavid Haskiya
2.2K views51 slides
Producing, publishing and consuming linked data - CSHALS 2013 by
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
2.5K views57 slides
Open Source Software for Libraries by
Open Source Software for LibrariesOpen Source Software for Libraries
Open Source Software for LibrariesAmber Billey
4.3K views29 slides
Building OBO Foundry ontology using semantic web tools by
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
1.5K views40 slides
Open interoperability standards, tools and services at EMBL-EBI by
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
3.1K views29 slides
What is New in W3C land? by
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
712 views63 slides

Similar to Publishing and Consuming FAIR Data A Case in the Agri-Food Domain(20)

The Europeana Strategy and Linked Open Data by David Haskiya
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open Data
David Haskiya2.2K views
Producing, publishing and consuming linked data - CSHALS 2013 by François Belleau
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
François Belleau2.5K views
Open Source Software for Libraries by Amber Billey
Open Source Software for LibrariesOpen Source Software for Libraries
Open Source Software for Libraries
Amber Billey4.3K views
Building OBO Foundry ontology using semantic web tools by Melanie Courtot
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
Melanie Courtot1.5K views
Open interoperability standards, tools and services at EMBL-EBI by Pistoia Alliance
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance3.1K views
What is New in W3C land? by Ivan Herman
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
Ivan Herman712 views
Ontology and Ontology Libraries: a Critical Study by Debashisnaskar
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
Debashisnaskar3K views
Ontology and Ontology Libraries: a critical study by Debashisnaskar
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
Debashisnaskar2.4K views
The Dendro research data management platform: Applying ontologies to long-ter... by João Rocha da Silva
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
Resource and Metadata Management with a Linked Data perspective by Hannes Ebner
Resource and Metadata Management with a Linked Data perspectiveResource and Metadata Management with a Linked Data perspective
Resource and Metadata Management with a Linked Data perspective
Hannes Ebner636 views
From Open Access to Open Standards, (Linked) Data and Collaborations by Simeon Warner
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
Simeon Warner268 views
Automated interpretability of linked data ontologies: an evaluation within th... by Nuno Freire
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
Nuno Freire54 views
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data by 21Style
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
21Style1.4K views
Intro to-technologies-Green-City-Hackathon-Athens by Stoitsis Giannis
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis513 views
Optimising Workflows for Digital Archives: UCD Digital Library by UCD Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
UCD Library351 views
Research Object Composer: A Tool for Publishing Complex Data Objects in the C... by Anita de Waard
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard383 views
Usage of Linked Data: Introduction and Application Scenarios by EUCLID project
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
EUCLID project33.2K views

More from Rothamsted Research, UK

FAIR Agronomy, where are we? The KnetMiner Use Case by
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
137 views26 slides
Interoperable Data for KnetMiner and DFW Use Cases by
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
146 views15 slides
AgriSchemas: Sharing Agrifood data with Bioschemas by
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
105 views7 slides
Continuos Integration @Knetminer by
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @KnetminerRothamsted Research, UK
408 views13 slides
Better Data for a Better World by
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better WorldRothamsted Research, UK
1K views37 slides
AgriSchemas Progress Report by
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress ReportRothamsted Research, UK
243 views13 slides

More from Rothamsted Research, UK(20)

Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and... by Rothamsted Research, UK
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle... by Rothamsted Research, UK
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
graph2tab, a library to convert experimental workflow graphs into tabular for... by Rothamsted Research, UK
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...

Recently uploaded

Ecology by
Ecology Ecology
Ecology Abhijith Raj.R
7 views10 slides
application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
9 views12 slides
How to be(come) a successful PhD student by
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
473 views62 slides
별헤는 사람들 2023년 12월호 전명원 교수 자료 by
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료sciencepeople
37 views30 slides
scopus cited journals.pdf by
scopus cited journals.pdfscopus cited journals.pdf
scopus cited journals.pdfKSAravindSrivastava
7 views15 slides
DEVELOPMENT OF FROG.pptx by
DEVELOPMENT OF FROG.pptxDEVELOPMENT OF FROG.pptx
DEVELOPMENT OF FROG.pptxsushant292556
5 views21 slides

Recently uploaded(20)

application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz9 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens473 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople37 views
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001126 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH16 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez124 views
himalay baruah acid fast staining.pptx by HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah7 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro12 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97618 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific49 views
Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani31 views

Publishing and Consuming FAIR Data A Case in the Agri-Food Domain

  • 1. Publishing and Consuming FAIR Data A Case in the Agri-Food Domain #ODS 2021, April 17th, 2021 Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Find this presentation on SlideShare background source: https://www.eurekalert.org/multimedia/pub/248200.php
  • 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • Started with Semantic Web and LOD • Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data • Especially in Italy (SOD) • Still with Semantic Web and LOD, but ...
  • 3. A Major Problem with (Open) Data How many oil paintings from 1600s are available in Italy? What are their locations? Source: Wikipedia:Cattedrale_di_Caltanissetta
  • 4. A Major Problem with (Open) Data How many oil paintings from 1600s are available in Italy? What are their locations? • 2 regions using common CSV • 1 using its own CSV • 1 using completely custom RDF (!) • None using Cultural-ON or another standard Source: Brandizi, Agenda Digitale (2018), tinyurl.com/y72wjhm8 github.com/marco-brandizi/cultural_on_ex
  • 5. A Common Curse Problem in Many Domains Source: Kamdar, Musen, 2021, https://www.nature.com/articles/s41597-021-00797-y Source: Brandizi, IB2019, https://tinyurl.com/y6p78968
  • 6. What we Do for (Plant) Biology and Agriculture Based on publications, which genes are related to the yellow rust disease? In which biological processes are their encoded proteins involved? 1 2 5 8 1 3 4 5 7 6 4 3 2 1 6 7 8
  • 7. Towards FAIRer Data Based on publications, which genes are related to the yellow rust disease? In which biological processes are their encoded proteins involved? AgriSchemas ontology (BioKNO) ETL Tools knetminer.org
  • 8. Want some demo? • Count Data Sources • Integration of Knetminer publications and EBI/GXA gene expression experiments • Using data with Jupyter (and Neo4j, see more here)
  • 9. Why schema.org? Simple & Complementary
  • 10. Why schema.org? Web-Oriented, Standard and FAIR Source and recommended read: https://tinyurl.com/yxocd3b9 (3) Findable Register it dataset DOI on datasetsearch.research.google.com Recognised via schema.org (2) Accessible Resolvable URIs makes data accessible (1) Interoperable Recognised via schema.org, links to bio-ontologies, standard IDs Query/representation standards (SPARQL, Cypher, GraphQL, JSON-LD) (4) Reusable Clear licence Ideally, machine-readable licence (eg, CCREL)
  • 12. However, we’re schema-agnostic • Pipelines based on incremental workflows (Snakemake) • Dependency management (Anaconda) • RDF/RDF conversion via SPARQL • Ontology API and Ontology annotator (via APIs) • Want more details? Check it out on github ETL Tools
  • 13. Hence, we could collaborate! • Do you have your data integration project? • To perform analysis? • To try machine learning / artificial intelligence? • Are you in the agri-food domain? • Or life sciences, ecology, biomedicine, healthcare? • Want to build visualisations, data explorers, UI components, etc? • For known schemas/ontologies, ie, reusable! • Are you a student? A teacher?
  • 14. Ajit Singh Software Engineer • Samiul Haque, Ed Eyles, IT admins • Joseph Hearnshaw, software engineer • Louis Timberlake, visiting student • Alice Minotto, Earlham Institute, hosting providers • Robert Davey, Earlham Institute, DFW WP4 coordinator • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak KnetMiner Team Leader Chris Rawlings Head of Computational & Analytical Sciences Jeremy Parsons Bioinformatics Scientist Acknowledgements
  • 15. Simple & Complementary (the Profiles Approach) Source: https://bioschemas.org/profiles/Study/0.2-DRAFT/
  • 16. Why schema.org? Web-oriented Source: https://bioschemas.org/liveDeploys/