Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Publishing and Consuming FAIR Data A Case in the Agri-Food Domain

Sharing data with lightweight data standards, such as schema.org and bioschemas. The Knetminer case, an application for the agrifood domain and molecular biology.

Presented at Open Data Sicilia (#ODS2021)

  • Be the first to comment

  • Be the first to like this

Publishing and Consuming FAIR Data A Case in the Agri-Food Domain

  1. 1. Publishing and Consuming FAIR Data A Case in the Agri-Food Domain #ODS 2021, April 17th, 2021 Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Find this presentation on SlideShare background source: https://www.eurekalert.org/multimedia/pub/248200.php
  2. 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • Started with Semantic Web and LOD • Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data • Especially in Italy (SOD) • Still with Semantic Web and LOD, but ...
  3. 3. A Major Problem with (Open) Data How many oil paintings from 1600s are available in Italy? What are their locations? Source: Wikipedia:Cattedrale_di_Caltanissetta
  4. 4. A Major Problem with (Open) Data How many oil paintings from 1600s are available in Italy? What are their locations? • 2 regions using common CSV • 1 using its own CSV • 1 using completely custom RDF (!) • None using Cultural-ON or another standard Source: Brandizi, Agenda Digitale (2018), tinyurl.com/y72wjhm8 github.com/marco-brandizi/cultural_on_ex
  5. 5. A Common Curse Problem in Many Domains Source: Kamdar, Musen, 2021, https://www.nature.com/articles/s41597-021-00797-y Source: Brandizi, IB2019, https://tinyurl.com/y6p78968
  6. 6. What we Do for (Plant) Biology and Agriculture Based on publications, which genes are related to the yellow rust disease? In which biological processes are their encoded proteins involved? 1 2 5 8 1 3 4 5 7 6 4 3 2 1 6 7 8
  7. 7. Towards FAIRer Data Based on publications, which genes are related to the yellow rust disease? In which biological processes are their encoded proteins involved? AgriSchemas ontology (BioKNO) ETL Tools knetminer.org
  8. 8. Want some demo? • Count Data Sources • Integration of Knetminer publications and EBI/GXA gene expression experiments • Using data with Jupyter (and Neo4j, see more here)
  9. 9. Why schema.org? Simple & Complementary
  10. 10. Why schema.org? Web-Oriented, Standard and FAIR Source and recommended read: https://tinyurl.com/yxocd3b9 (3) Findable Register it dataset DOI on datasetsearch.research.google.com Recognised via schema.org (2) Accessible Resolvable URIs makes data accessible (1) Interoperable Recognised via schema.org, links to bio-ontologies, standard IDs Query/representation standards (SPARQL, Cypher, GraphQL, JSON-LD) (4) Reusable Clear licence Ideally, machine-readable licence (eg, CCREL)
  11. 11. However, we’re schema-agnostic ETL Tools
  12. 12. However, we’re schema-agnostic • Pipelines based on incremental workflows (Snakemake) • Dependency management (Anaconda) • RDF/RDF conversion via SPARQL • Ontology API and Ontology annotator (via APIs) • Want more details? Check it out on github ETL Tools
  13. 13. Hence, we could collaborate! • Do you have your data integration project? • To perform analysis? • To try machine learning / artificial intelligence? • Are you in the agri-food domain? • Or life sciences, ecology, biomedicine, healthcare? • Want to build visualisations, data explorers, UI components, etc? • For known schemas/ontologies, ie, reusable! • Are you a student? A teacher?
  14. 14. Ajit Singh Software Engineer • Samiul Haque, Ed Eyles, IT admins • Joseph Hearnshaw, software engineer • Louis Timberlake, visiting student • Alice Minotto, Earlham Institute, hosting providers • Robert Davey, Earlham Institute, DFW WP4 coordinator • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak KnetMiner Team Leader Chris Rawlings Head of Computational & Analytical Sciences Jeremy Parsons Bioinformatics Scientist Acknowledgements
  15. 15. Simple & Complementary (the Profiles Approach) Source: https://bioschemas.org/profiles/Study/0.2-DRAFT/
  16. 16. Why schema.org? Web-oriented Source: https://bioschemas.org/liveDeploys/

×