Sharing data with lightweight data standards, such as schema.org and bioschemas. The Knetminer case, an application for the agrifood domain and molecular biology.
Presented at Open Data Sicilia (#ODS2021)
Publishing and Consuming FAIR DataA Case in the Agri-Food Domain
1. Publishing and Consuming FAIR Data
A Case in the Agri-Food Domain
#ODS 2021, April 17th, 2021
Marco Brandizi <firstname.lastname@example.org>
Find this presentation on SlideShare
background source: https://www.eurekalert.org/multimedia/pub/248200.php
• Geek since 1980s and C=64 times
• Started working with Life Science Data 2003
• Started with Semantic Web and LOD
• Univ. of Milano-Bicocca, EMBL-EBI
• and now Rothamsted Research
• Meanwhile, (h)activism in open source, open data
• Especially in Italy (SOD)
• Still with Semantic Web and LOD, but ...
3. A Major Problem with (Open) Data
How many oil paintings from 1600s
are available in Italy? What are
4. A Major Problem with (Open) Data
How many oil paintings from 1600s are available
in Italy? What are their locations?
• 2 regions using common CSV
• 1 using its own CSV
• 1 using completely custom
• None using Cultural-ON or
Source: Brandizi, Agenda Digitale (2018), tinyurl.com/y72wjhm8 github.com/marco-brandizi/cultural_on_ex
5. A Common Curse Problem in Many Domains
Source: Kamdar, Musen, 2021,
https://www.nature.com/articles/s41597-021-00797-y Source: Brandizi, IB2019, https://tinyurl.com/y6p78968
6. What we Do for (Plant) Biology and Agriculture
Based on publications, which genes are related to the yellow rust disease?
In which biological processes are their encoded proteins involved?
7. Towards FAIRer Data
Based on publications, which genes are related to the yellow rust disease? In which
biological processes are their encoded proteins involved?
8. Want some demo?
• Count Data Sources
• Integration of Knetminer publications and EBI/GXA gene
• Using data with Jupyter (and Neo4j, see more here)
10. Why schema.org?
Web-Oriented, Standard and FAIR
Source and recommended read: https://tinyurl.com/yxocd3b9
Register it dataset DOI on datasetsearch.research.google.com
Recognised via schema.org
Resolvable URIs makes data accessible
Recognised via schema.org, links to bio-ontologies, standard IDs
Query/representation standards (SPARQL, Cypher, GraphQL, JSON-LD)
Ideally, machine-readable licence (eg, CCREL)
12. However, we’re schema-agnostic
• Pipelines based on incremental workflows (Snakemake)
• Dependency management (Anaconda)
• RDF/RDF conversion via SPARQL
• Ontology API and Ontology annotator (via APIs)
• Want more details? Check it out on github
13. Hence, we could collaborate!
• Do you have your data integration project?
• To perform analysis?
• To try machine learning / artificial intelligence?
• Are you in the agri-food domain?
• Or life sciences, ecology, biomedicine, healthcare?
• Want to build visualisations, data explorers, UI components, etc?
• For known schemas/ontologies, ie, reusable!
• Are you a student? A teacher?
14. Ajit Singh
• Samiul Haque, Ed Eyles, IT admins
• Joseph Hearnshaw, software engineer
• Louis Timberlake, visiting student
• Alice Minotto, Earlham Institute, hosting providers
• Robert Davey, Earlham Institute, DFW WP4 coordinator
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Madhu Donepudi, Richard Holland, ext contractors, developers
KnetMiner Team Leader
Head of Computational & Analytical Sciences