Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
FAIR Agronomy, where are we? The KnetMiner Use Case
1. FAIR Agronomy, where are we?
The KnetMiner Use Case
SWAT4LS 2022
Marco Brandizi marco.brandizi@rothamsted.ac.uk
Keywan Hassani-Pak keywan.hassani-pak@rothamsted.ac.uk
Find this presentation on SlideShare
background source: https://www.flickr.com/photos/60091022@N08/22004495993/
2. Hello!
https://knetminer.com
• Rothamsted Research is a non-profit research centre, focused on agricultural
science, including farming, plant biology, statistics and bioinformatics, livestock
management, entomology
• 300 people, founded in 1843, international collaborations, national
capabilities
• Our team develops the KnetMiner gene discovery platform, offering
knowledge graph based services for plants, pests and their interactions
• We are part of the Computational and Analytical Sciences department, a
community of experts in a wide range of analytical technologies and data
analysis tools
3. What is the future for SW4LS? What can Agronomy learn from Biomedical SW?
Network knowledge is thriving
There is more than Semantic Web and Linked Data now
Lightweight modelling is replac… adding to OWL
Agrifood is leveraging biomedics experience
background source: https://wallpapersafari.com/w/yAIwLx
6. Network Knowledge is Thriving
Good Reads:
• Recent trends in knowledge graphs: theory and practice,
https://link.springer.com/article/10.1007/s00500-021-05756-8
• Wikidata: A large-scale collaborative ontological medical database,
https://www.sciencedirect.com/science/article/pii/S1532046419302114
• KG-COVID-19: A Framework to Produce Customized Knowledge Graphs
for COVID-19 Response, https://doi.org/10.1016/j.patter.2020.100155
• Constructing and Mining Web-Scale Knowledge Graphs,
https://fdocuments.net/reader/full/kdd14-t2-bordes-gabrilovich-3
• Graph Embeddings: The Secret Ingredient for Relationship-Driven AI,
https://www.youtube.com/watch?v=-CscGHDXrZY
19. Use case Data Types Data Sources Status
Molecular Biology Gene, Protein, Pathway
encodes, participates
Via Knetminer: ENSEMBL, UniProt,
TILLING, wheat-expression.com,
KEGG
Done.
Ontology Annotations Ontology Term
(schema:DefinedTerm)
dc:type, schema:additionalType
Via Knetminer: GO, PO, CROP-
Onto
Done.
Experiments Study, agri:StudyFactor,
PropertyValue
EBI/GXA, GLTen, MIAPPE/BrAPI
sources, ?
GXA Done
MIAPPE, much work done during
ELIXIR BioHackathon, going on
with monthly calls
GLTen use case drafted
Literature agri:ScholarlyPublication
mentions
Via Knetminer: PubMed Done
Gene Expression bioschema:expressedIn, reified
statements, agri:evidence,
agri:pvalue, agri:baseCondition
EBI/GXA, Via Knetminer: wheat-
expression.com
GXA
Host-pathogen interaction Gene, Phenotype,
agri:ScholarlyPublication
agri:HostPathogenInteraction
agri:evidence
PHI-Base Use case drafted
Weather ? ? TO DO
Dataset metadata Dataset, DataCatalog
license, distribution
knetminer.org/data ongoing
20. AgriSchemas Progress
Use case Data Types Data Sources Status
Molecular Biology Gene, Protein, Pathway
encodes, participates
Via Knetminer: ENSEMBL, UniProt,
TILLING, wheat-expression.com,
KEGG
Done.
Ontology Annotations Ontology Term
(schema:DefinedTerm)
dc:type, schema:additionalType
Via Knetminer: GO, PO, CROP-Onto Done.
Experiments Study, agri:StudyFactor,
PropertyValue
EBI/GXA, GLTen, MIAPPE/BrAPI
sources, ?
GXA Done
MIAPPE, much work done during
ELIXIR BioHackathon, going on with
monthly calls
GLTen use case drafted
Literature agri:ScholarlyPublication
mentions
Via Knetminer: PubMed Done
Gene Expression bioschema:expressedIn, reified
statements, agri:evidence,
agri:pvalue, agri:baseCondition
EBI/GXA, Via Knetminer: wheat-
expression.com
GXA
Host-pathogen interaction Gene, Phenotype,
agri:ScholarlyPublication
agri:HostPathogenInteraction
agri:evidence
PHI-Base Use case drafted
Weather ? ? TO DO
Dataset metadata Dataset, DataCatalog
license, distribution
knetminer.org/data ongoing
24. Agrifood is leveraging biomedics experience
Biomedics Agrifood
Resources NCBI, EBI, BioPortal, Bio2RDF. Many used in plant biology too (eg, ENSEMBL,
PubMED). Smaller similar projects, eg, AgroLD,
AgroPortal, CGIAR/GARDIAN, WheatIS, more data
integration needed.
Ontologies Integration efforts by OBO. MolBio ontologies (eg, GO,
CL), medical ontologies (eg, UMLS, PATO), experimental
reporting ontologies (eg, OBI, SIO, EFO), specie
classification (NCBITax).
Significant overlapping (eg, GO, NCBITax), good
coverage of plant biology (eg, Crop Ontology, Plant Trait
Onto). Food-related ontologies (eg, FoodOn) and
taxonomies (eg, AGROVOC). Coverage of different
aspects (eg, animals, forestry). Lack of coverage? Eg,
weather.
Schemas and formats Established formats (eg, FASTA, ISA-Tab, CDISC),
established standardisations (HL7, FHIR) ongoing
standardization efforts (BioSchemas).
Significant overlapping (eg, FASTA, ISA-Tab). More
heterogeneous landscape, eg, MIAPPE and COPO not
widely adopted, simpler solutions preferred (eg,
Frictionless in DFW)
Data Access Heterogeneous APIs, much use of SPARQL, but still
limited.
A few established APIs (eg, BrAPI, WheatIS), more
desirable.
Data and AI Much traditional machine learning, a number of graph
embedding projects (eg, https://tinyurl.com/yxbau649,
https://tinyurl.com/yyq9rdh9)
ML for precision farming (eg,
https://tinyurl.com/y5zopcd5). Some graph embedding
projects (eg, https://tinyurl.com/yxt9lgk7,
https://tinyurl.com/y3mnr2n2)
25. References
• AgriSchemas
• https://github.com/Rothamsted/agri-schemas
• Use cases: https://github.com/Rothamsted/agri-schemas/tree/master/drafts/201904-dfw-
hackathon
• Real data & ETL tools: https://github.com/Rothamsted/agri-schemas/tree/master/dfw-dataset
• Knetminer
• Web site: http://knetminer.org
• Publication: https://doi.org/10.1111/pbi.13583
• Case study about FAIR data:
• https://knetminer.com/cases/the-power-of-standardised-and-fair-knowledge-graphs.html
• FAIR data infrastructure: https://doi.org/10.1515/jib-2018-0023
• Data endpoint: http://knetminer.org/data
• DFW
• AgriSchemas and DFW:
• https://designingfuturewheat.org.uk/dfw-and-fair-agriculture-data-the-knetminer-
experience/
• Me
• https://www.slideshare.net/mbrandizi, https://marcobrandizi.info/about-me/
26. Acknowledgements
Ajit Singh
Software Engineer
• Samiul Haque, Ed Eyles, IT admins
• Joseph Hearnshaw, software engineer
• Louis Timberlake, visiting student
• Alice Minotto, Earlham Institute, hosting providers
• Robert Davey, Earlham Institute, DFW WP4 coordinator
• William Brown, Ricardo Gregorio, IT admins
• Monika Mistry, master Student, data Curator
• Sandeep Amberkar, bioinformatician, data curator
• Richard Holland, ext contractors, developers
Keywan Hassani-Pak
KnetMiner Team Leader
Chris Rawlings
Head of Computational & Analytical Sciences
Jeremy Parsons
Bioinformatics Scientist
Editor's Notes
Our application is an example of gene exploration app over KGs. Knowledge is matched to search input and ranked based on search input plus biological significance (semantic motifs, later)
Initially, and still quite so, based on ad-hoc workflow system (KnetBuilder) and ad-hoc KG format (OXL). So, KGs come in many ways.
Many different knowledge graph databases exist, and also several ETL + data exchange formats. SPARQL alternatives exist for graph-like data access.
We support both SPARQL and Cypher, they’ve different sets of pros/cons. RDF is at the base of modelling and ETL. We have developed the rdf2pg tool for combining the two.
An example of Cypher benefits: keyword-found entities are matched to genes via well-known paths (semantic motifs). Initially, SM were based on a limited transition machine syntax, now we have a Cypher traverser, more expressive syntax.
Everyone knows schema.org. Here, two examples of how rather commercial apps are leveraging it. schema.org and bioschemas are “lightweight”, easier to use and practical for application development. Moreover, they’re complementary to “true” ontologies like GO.
What we do with schema.org and bioschemas: our ontology mapped to the standards, more datasets integrated, a new project, AgriSchemas, to extend bioschemas to the agrifood domain.
A prototype with real datasets is on line. Here, a query that combine specific KnetMiner data (with several mappings and URI reuse) with EBI/GXA data mapped to AgriSchemas.
And here the results. As you see, Knetminer info (eg, gene/pub associations computed via text mining) can be linked to GXA info (ie, gene expression), including ontology annotations. If you want to collaborate, we have ideas on how to exploit this combination for novel insights in wheat.