The document discusses knowledge representation in the agricultural domain through ontologies, taxonomies, and reference data. It provides an overview of how these can be used to structure information and defines key terms. As an example, it outlines how an ontology for coffee farming could be developed to track variety performance, establish common definitions, and integrate data from different sources. Building ontologies is positioned as a core data governance practice for organizing an organization's knowledge and integrating disparate systems and data.
2. Agenda
• Target Audience and How to Use
• Problem Statement
• Knowledge Representation in the Agri Landscape
• The journey into Ontologies
• Ontology Use Case
• How do we build and use Ontologies?
• Next Steps
2
3. Problem Statement
3
The Open Agri space is busy with emerging, living and Dying initiatives
We need to understand
what exists, what has
worked, and what hasn’t
We need to support our
own tactical and strategic
needs
We need to Establish
where we can and should
fit into this eco-system
4. Target Audience and How to Use
4
Chief Data Office
Agronomists
Data Scientists
Data Engineers
Data Stewards
Brand Managers
Sustainability and Supply Chain
Personnel
Client Context
Target Audience How to Use
The asset should be used with clients in the context of:
Ø Building marketplaces
Ø Building data platforms
Ø Enabling agronomy AI use cases
Ø Grower Management
IBM Community
Ø Presentation: Lighthouse, Seismic, Solution Gateway
Ø GitHub Repository: IBM GitHub
Ø Recorded Webinar: CBDS Data Platform, Data Architect
Community
5. Knowledge Representation
5
Knowledge
Management
Information
Management
Data
Management
Data is discrete,
facts have no meaning
in isolation
Data has relevance and
purpose. It informs and
causes the uses to change
state,
Knowledge is actionable
information placed in
context based on
facts and meaning.
Wisdom
Knowledge enables
Understanding necessary
For effective decision-making
6. Knowledge Modelling Landscape
6
• Ontology:
• Semantic data model
• Classify Things and define more
specific relations and attributes
• Local and global ontologies can be
combined
• Culturally neutral
• Taxonomy:
• Describes the organization’s
vocabulary, common terms, and
synonyms.
• Organize terms into hierarchy
• Reference Data:
• Set of well-known value for
attributes, translates into local
meaning
• Glossary:
• Define terms in a simple way
Ontology
Subject – Predicate - Object
Logical Data Model
Entity - Relationship
Taxonomy
Hierarchy – Tree Structure
Glossary / Dictionary
Term, Definitions, References
SEMANTIC
RICHNESS
LOW
HIGH
8. Reference Data
8
“Reference data are data that define the set of permissible values to be used
by other data fields. Reference data gain in value when they are widely re-used
and widely referenced. Typically, they do not change overly much in terms of
definition, apart from occasional revisions.” - Wikipedia
What are the common names of
the crops and what are the
correspondences across
languages and regions?? What sources of reference data
can we find for agriculture?
What are the terms and
conditions around the use of
reference data?
Can we find reference data for:
Crops, Growing Practices, Inputs,
Equipment, Soils, Weather
10. Coffee Farming Use Case
10
As An Agronomist
Manager,
I want to be able to
record known
performance of local
Arabica
and Robusta varieties,
So that agronomists
and nurseries know
the performance of
the variety and can
inform farmers of
potential benefits
KPIs to track
• Yield (metric
tons/hectare):
• Product type: Cherry,
Green Coffee or Parch.
• Measurement Tree
Density (tree / hectare)
• Cup quality expected
(optional): 1.0 (best) to
1.4 (worst)
• Compliance with Green
Coffee Specifications:
Y/N/Unknown
• Granulometry (100 beans
weight) – grams
• Resistance to drought
(Sensitive / Moderate /
Tolerant)
11. Coffee Quality Map
11
Cup Quality
Measures
• Aroma
• Flavour
• Aftertaste
• Acidity
• Body
• Balance
• Uniformity
• Cup Cleanliness
• Sweetness
• Moisture
• Defects
Farm Metadata
• Owner
• Country of Origin
• Region
• Farm Name
• Lot Number
• Holding Pattern
• Mill (on site)
• Company
• Location and Altitude
• Farm Map
• Farm Area
Bean Metadata
• Processing Method (wet
or dry process, washed or
natural)
• Bean size & density
• Bean Colour
• Species (Arabica /
Robusta)
• Roast appearance and
cup quality in relation to
flavour, characteristics
and cleanliness
12. Coffee Varieties
12
Two dominant coffee varieties out of 125!
Coffee is a long-term crop
with a lifespan of more than
10 years, and considerably
longer under good
management, thus the
choice of variety (cultivar) is
very important. As quality of
the coffee bean is crucial for
production of high-grade
coffee, choose only varieties
that are recommended for
your area.
13. Reference Data sources for Coffee
13
Production &
Yield
Disease
Resistance
Soil Type
Plant genotype
& taxonomy
Growing
Practices
Data Available
15. Ontology Development Methodology
16
Define Entities
Define Use Cases
Ontologies
Guidelines
Ontologies
Tooling
Define Reference
Data
Identify Reusable
Ontologies
Create Ontology 1 Create Ontology 2 Create Ontology N
…
Collaboration
Process
Ontology
Repository
Framework
Domain Definition
Execution
Preparation
16. How does it fit with a CDO organization?
18
LOW
Ontologies
+
Knowledge Graphs
=
• Identify all
• Provide context
• Discover hidden facts
“An enterprise knowledge graph is a representation of an organization’s
knowledge domain and artefacts that is understood by both humans and machines”
Graph Database
Enterprise
Data Sources
Glossaries,
Taxonomies
• Also known as “triple store”
• Collection of references to
knowledge objects in their
source systems
• Store properties for each
object from the various
sources
• Store relationships between
those objects
• Variety of data sources
and systems
• Many disparate
systems
17. How does it fit with a CDO organization?
19
• Watson Decision Platform for Agriculture:
• The Electronic Field/Regional Record holds domain specific information across growing seasons
• Terms and Reference are used to define the allowable values for attributes such as:
• Crop type,
• Irrigation type,
• Tilling methods, etc.
• Work to extend with Taxonomies & Ontologies covering more extensive information about:
• Growth stages, soil types, planting characteristics is planned.
18. Ontologies as an IBM asset
Example: potato crop ontology build for Yara ODX
20
Assets for knowledge
modelling
• Architecture for building
ontologies
• Method and framework
for implementation
ontologies
• Domain-specific
ontologies (i.e. farming &
agri business)
Commercialization
• Can be owned by industry
or service line practices,
like industry models
• Can be part of the
industry commercial
offering package
• Can be extended to any
industry
20. Open Agri Landscape Data initiatives
22
WHAT CAN WE
LEARN FROM?
WHAT CAN WE
POSITION AGAINST?
WHAT CAN WE
REUSE?
21. Work Initiatives
23
Understand the
Marketplace
Understand
Ontologies
Understand
What’s Available
Understand Data
Sharing
• Building out the Matrix
(Existence, Liveness,
Usefulness)
• What can we leverage
• What we should not do
• Plants, Growing
Practices, Inputs,
Measurements,
Environmental
• Translations and regional
variability
• Country specific
requirements
• Industry Requirements
• Public Expectations
• Agricultural
• Environmental
• Growers
22. Using Structured Knowledge
Structure Why Examples
Ontology Graphs representing both simple and complex
relationships and attributes. A mechanism to consistently
capture knowledge about a domain.
Describing environmental rules for
where to plant what crops and why.
Relationships between different inputs
and crops. Typically used in both search
and analytics.
Taxonomy Simple hierarchies to structure and find reference data
and glossary terms – structure implies hierarchical
relationships such as containment or part-of
Crop growth stages, Crop types,
Equipment types - all have implicit
structure. Used in analytics and
presentation.
Reference Data Common value sets with translations and regionalization –
often a missing aspect to integrate data from multiple
systems together.
Think of pull-down lists, kind of crop,
kind of irrigation, colors, places, etc.
Used to consistently communicate
internally and externally.
Glossary Common terminology for communicating internally and
with customers
Understandable terminology on screens
and reports
24
24. Two dominant coffee varieties are being produced
26
Robusta (aka Canephora)
• Lower Quality
• More robust/ fewer disease problems
• Easy to grow and manage
• Higher yielding (up to 3 tonnes/ha green bean
possible for small holder production)
• Easy to process
• Dominates lower end of the market such as instant
coffee and less discerning markets
• One third of world production
• Average world market price/kg approximately half
of that of Arabica
Arabica
• Higher Quality
• Less tolerant to environmental fluctuations/ More
prone to disease
• More complex physiological management Higher
yielding (up to 3 tonnes/ha green bean possible for
small holder production)
• Lower yielding (up to 1.5 tonnes/ha green bean
possible for small holder production)
• More sophisticated processing needed
• Operates at the higher end of the market such as
roasted and ground coffee
• Two thirds of world production
• Average world market price/kg approximately twice
that of Robusta
VS
Source:
25. Coffee end market segmentation by quality
27
Source: https://www.cbi.eu/market-information/coffee/belgium/market-entry/
26. Stages of Coffee Crop Growth
28
Source: https://www.torchcoffee.asia/resources
28. How does it fit with a CDO organization?
30
Identify Subject Areas
Define the Meaning of things in the enterprise
Describe the logical representation of properties
Describe the physical means of data storage
Represent the coding language of the data platform
Store the values of the properties applied to the data in a schema
ONTOLOGY
Ontology, Conceptual ER,
Business Process
Entity-Relation model, Json schema,
XML schema
Physical schema, asset repository
Contextual
Conceptual
Logical
Physical
Definition
Instance