SlideShare a Scribd company logo
1 of 23
SEMANTICS FOR BIG DATA
APPLICATIONS IN SMART DAIRY FARMING
Jack Verhoosel, Jacco Spek
Presentation at the Semantics 2016 conference
14 September 2016
aantalle
n NL
noemen
Collaboration project
3 Cooperations
7 SME’s
5 Research institutes
7 Real farmers
Timeline:
SDF1: 2011 – 2014
Nothern part of the Netherlands
Website (in Dutch):
http://www.smartdairyfarming.nl/nl/
Goal of SDF:
to support dairy farmers in the care of individual animals.
with the specific goal of a longer productive stay at the farm due to
improvement of individual health.
Challenge SDF2:
more farmers: from 7 to 60 (and prepare for 2500)
more sensor suppliers and more data consumers
incorporate semantics and big data analysis
Numbers for the Dutch situation:
• 15000+ farmers
• in total more then 1.5 million milk cows
• 20 to 200+ datafields per cow
• many different stakeholders in the
chain
2
SDF 1.0 (2011 – 2014)
SDF 2.0 (2015 – 2017)
3
Starting point:
Cow centric thinking
Starting point:
Farmer in control
“De boer aan het roer”
Real time analysis models
(at different organisations)
Sensors from
different suppliers:
Lely, Delaval,
Agis, Gallagher,…
Other data sources,
CRV, FC, AgriFirm,
Weather, Satellite
InfoBroker: Open platform
for sharing (sensor) data
producers and consumers
Cow specifics
Workinstructions (SOP)
This project is made possible by:
Data sharing in the dairy chain
Think big,
start small 12GB sensordata per year for
7 farms => 310 GB triples
From 7 to 50 farms
of 15.000 in NL
4
InfoBroker concept
InfoBroker functionalities:
 Open interfaces for data exchange (API)
 Authentication
 who are you (are you allowed to login)
 Permissions
 which data may be used by whom
 to be set by the farmers
 Namingservice
 location where the data can be found
– static data
– cow-centric sensor data
 Integration
 combining info from different sources
 Pay-per-use
 fixed costs (connections)
 variable costs (used data)
So:
 no central datastore for (sensor)data!
 but indeed a broker
 and reduces/prevents duplication
cow specific work
instructions (SOPs)
InfoBroker
cow centric data
cow centric
data
Cow centric
Sensor data
Static data
(e.g. feed)
Cow centric
Sensor data
Static data
(e.g. date of birth)
Dashboard
Model
Model
Model
x 15.000+
InfoBroker – Facts & Figures
5
Farm 1 Farm 2 Farm 3 Farm 4 Farm 5 Farm 6 Farm 7
# cows/calves 459 186 315 239 706 202 351
Behaviour x x
Temperature x x
Activity x x x x x x
Milk production x x x x x
Food intake x x x
Weight x x x x x x x
Water intake x x
Milk intake x x
Date: february 2015
NB1: this are “sensor data categories” at a farm
NB2: not all animals are monitored for SDF (e.g. 3 and 4 only calves)
InfoBroker – Facts & Figures
6
Number of cows
vs time
Number of sensorfields
vs time
WHY LINKED DATA AND SEMANTICS?
1. To make the various data sets accessible in an
automatically linkable manner for easier integration
2. To align the semantics of the datasets in isolation
as well as in combination using ontologies
3. To enable a rich set of questions to be queried on
the datasets for better analysis
7
AgriFirm:
“How much feed did a group of cows at a dairy
farm take in a certain period per type of feed
and how strong is the correlation with milk
yield?”
CRV:
“What was the average weight per day over the
last lactation period of a cow and what was the
weight in/decrease over that period?
BIG DATA ANALYSIS QUESTIONS
SIMPLE EXAMPLE OF LINKED DATA
Subject Object
predicate
Cow Animal
is a
(type)
Parcel
grazes on
Parcel
Grasslandis a
(type)
40 ha
has surface
11
LINKED DATA ROADMAP*
The four design principles of Linked
Data (by Tim Berners Lee):
1. Use Uniform Resource Identifiers
(URIs) as names for things.
2. Use HTTP URIs so that people can
look up those names.
3. When someone looks up a URI,
provide useful information, using the
standards (RDF*, SPARQL).
4. Include links to other URIs so that
they can discover more things.
12
LODRefine
*Based on PLDN LD roadmap
….. sensor data….. sensor data
ONTOLOGY-BASED SDF
13
Cow specific data per farm per sensor
equipment
Delaval sensor data
Lely sensor data
Visualization and analysis apps
Common Dairy Ontology
MS-ontology
Measurements Triples Static Triples
ST-ontology
Static data
(e.g. date of birth)
Cow specific data per farm
mapping mapping
Nedap sensor data
Agis sensor data
COMMON DAIRY ONTOLOGY
14
“What was the average
weight per day and
weight in/decrease over
the last lactation period
of a cow in a group ?”
ONTOLOGY MAPPING
15
Common
Dairy
Ontology
rdfs:label = “Activity2hours” rdfs:label = “BodyWeight”
Measurement ontology
PLASIDO: OUR BIG, LINKED DATA PLATFORM
Powerful server: 128GB memory, 5TB storage
Triplestores:
Marmotta Triplestore with Relational TripleDB
Jena Fuseki Triplestore with Native GraphDB
Virtuoso Relational ClassicalDB with SPARQL-2-SQL interface
All SDF data of 2014 and 2015 retrieved from InfoBroker
Converted into triples using LODRefine and RDF generator
From 12GB to 310GB, increase of factor 25
Stored in Marmotta and Fuseki for comparison
Application development:
Angular-Javascript JSON converter
Google Visualization
16
LODRefine
POC DEMO: AGRIFIRM AND CRV
17
POC DEMO: STATIC COW INFO
18
POC DEMO: MILKYIELD / FEEDINTAKE
19
Correlation?
How strong?
AI data analytics!
POC DEMO: FEEDINTAKE PER FEED
20
PLASIDO PERFORMANCE TESTS
1. All 2014 SDF triple data from Infobroker into Apache
Marmotta triplestore
12GB of CSV data turned into +/- 310 GB of RDF triples
Marmotta makes use of classical relational database to store triples
Simple queries with only one parameter can be easily answered
More complex queries let to unacceptable response time
Main reason is inefficient access to underlying RDB
2. Next step: switch all data to Apache Jena Fuseki triplestore
Still +/- 310 GB of RDF triples
Fuseki makes use of modern graph database to store triples
Simple queries with only one parameter can be easily answered
More complex analysis queries lead to long, but still acceptable response
time, upto 15 minutes
So, an acceptable performance.
See next slide for some numbers.
21
PLASIDO PERFORMANCE
Query Input Graph size Search par Response
Select an overview with the number of cows
of a farmer
Stokman 111,604,625 1 0.04s
Bakker 167,894,559 1 0.03s
Antonides 79,739,365 1 0.37s
Select the list of cows with number and parity Stokman 28,704 3 0.934s
Bakker 9,400 3 15.110s
Antonides 45,816 3 27.006s
Select feed per type per day over all cows of a
farmer
Stokman 66,551,765 3 913.003s
Bakker 38,034,692 3 350.917s
Antonides 45,637,592 3 380.470s
Select average weight over all cows per day
per parity
Antonides 45,637,592 3 348.704s
Select static info for a cow NL 715820911 45,816 2 0.094s
Select weight per day in lactation period NL 715820911 45,683,408 5 5.129s
Select weight and milkyield per day in
lactation period
NL 715820911 45,683,408 7 13.714s
Select milkyield per day in lactation period NL 715820911 45,683,408 3 4.142s
With set-up 2 using Apache Jena Fuseki triplestore with graphDB
PLASIDO PERFORMANCE TESTS
Conclusion of previous 2 tests: working with large triplesets is
acceptable
However, is it really necessary to put all data into triples?
Why not use only the CDO as semantic interface and leave all
data in classical table database format?
3. Next step: put all data into tables and use RDB of Virtuoso
Only +/- 10 GB of raw data
Makes use of classical relational database to store raw data in table form
SPARQL interface based on CDO
Mapping between SPARQL and SQL to translate queries towards RDB
Simple queries with only one parameter can be easily answered
More complex analysis queries lead to errors, because the SPARQL-2-
SQL mapping generates too large SQL queries and results that cannot be
handled by Virtuoso
So, an unacceptable performance for this Virtuoso-based solution.
23
CONCLUSION AND NEXT STEPS
CDO ontology application in SDF architecture
Further enhancement of CDO to cover the dairy domain extensively
Architectural study to enable the use of CDO with InfoBroker
CDO as semantic interface for InfoBroker
Apply AI deep learning algorithms to perform data analytics
More extensive performance studies
Other linked data platforms to deal with big data: D2RQ
Other possibilities for improvements: Linked Data Fragments
Dealing with heterogeneous sensordata: differently measured
Enabling analysis based on incomplete sensordata
RDF stream processing for dealing with streaming sensor data
24
THANK YOU FOR YOUR ATTENTION!
25

More Related Content

Viewers also liked

Clever science, smart farming - Roger Sylvester-Bradley (Adas)
Clever science, smart farming - Roger Sylvester-Bradley (Adas)Clever science, smart farming - Roger Sylvester-Bradley (Adas)
Clever science, smart farming - Roger Sylvester-Bradley (Adas)
Farming Futures
 
Smarter Agriculture Handout - v3
Smarter Agriculture Handout - v3Smarter Agriculture Handout - v3
Smarter Agriculture Handout - v3
Ann Lambrecht
 
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
IJOAEM
 

Viewers also liked (20)

Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Sharing Evidence and Experience on Climate-Smart Agriculture in Smallholder I...
Sharing Evidence and Experience on Climate-Smart Agriculture in Smallholder I...Sharing Evidence and Experience on Climate-Smart Agriculture in Smallholder I...
Sharing Evidence and Experience on Climate-Smart Agriculture in Smallholder I...
 
User needs for environment-smart farming
User needs for environment-smart farmingUser needs for environment-smart farming
User needs for environment-smart farming
 
Clever science, smart farming - Roger Sylvester-Bradley (Adas)
Clever science, smart farming - Roger Sylvester-Bradley (Adas)Clever science, smart farming - Roger Sylvester-Bradley (Adas)
Clever science, smart farming - Roger Sylvester-Bradley (Adas)
 
Smarter Agriculture Handout - v3
Smarter Agriculture Handout - v3Smarter Agriculture Handout - v3
Smarter Agriculture Handout - v3
 
Smart farming in de akkerbouw
Smart farming in de akkerbouw Smart farming in de akkerbouw
Smart farming in de akkerbouw
 
Smart farming กับการสร้างโอกาสทางการเกษตรในยุคเศรษฐกิจดิจิตอล
Smart farming กับการสร้างโอกาสทางการเกษตรในยุคเศรษฐกิจดิจิตอลSmart farming กับการสร้างโอกาสทางการเกษตรในยุคเศรษฐกิจดิจิตอล
Smart farming กับการสร้างโอกาสทางการเกษตรในยุคเศรษฐกิจดิจิตอล
 
IoT Farm 2 Mouth (F2M) - SenZations 2015 - Team: OKI DOKI
IoT Farm 2 Mouth (F2M) - SenZations 2015 - Team: OKI DOKIIoT Farm 2 Mouth (F2M) - SenZations 2015 - Team: OKI DOKI
IoT Farm 2 Mouth (F2M) - SenZations 2015 - Team: OKI DOKI
 
Smart Phones, Smart Farming - Wim Bastiaanssen
Smart Phones, Smart Farming - Wim BastiaanssenSmart Phones, Smart Farming - Wim Bastiaanssen
Smart Phones, Smart Farming - Wim Bastiaanssen
 
Effect of Big Data on Farm Enterprises
Effect of Big Data on Farm EnterprisesEffect of Big Data on Farm Enterprises
Effect of Big Data on Farm Enterprises
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
Planning, implementing and evaluating Climate-Smart Agriculture in smallholde...
Planning, implementing and evaluating Climate-Smart Agriculture in smallholde...Planning, implementing and evaluating Climate-Smart Agriculture in smallholde...
Planning, implementing and evaluating Climate-Smart Agriculture in smallholde...
 
Content Engineering and The Internet of “Smart” Things with Mark Lewis
Content Engineering and The Internet of “Smart” Things with Mark LewisContent Engineering and The Internet of “Smart” Things with Mark Lewis
Content Engineering and The Internet of “Smart” Things with Mark Lewis
 
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
Advance Agro Farm Design With Smart Farming, Irrigation and Rain Water Harves...
 
Smart farming for the Future Lynne Strong CCRSPI conference feb 18th 2011
Smart farming for the Future Lynne Strong CCRSPI conference feb 18th 2011 Smart farming for the Future Lynne Strong CCRSPI conference feb 18th 2011
Smart farming for the Future Lynne Strong CCRSPI conference feb 18th 2011
 
AgroConnect PPS Smart Farming project and FIspace
AgroConnect PPS Smart Farming project and FIspaceAgroConnect PPS Smart Farming project and FIspace
AgroConnect PPS Smart Farming project and FIspace
 
Smart farm thailand
Smart farm thailandSmart farm thailand
Smart farm thailand
 
Socio-economic impact of Big Data and Smart Farming
Socio-economic impact of Big Data  and Smart FarmingSocio-economic impact of Big Data  and Smart Farming
Socio-economic impact of Big Data and Smart Farming
 
How IoT is changing the agribusiness landscape
How IoT is changing the agribusiness landscapeHow IoT is changing the agribusiness landscape
How IoT is changing the agribusiness landscape
 
Internet of Things based approach to Agriculture Monitoring
Internet of Things based approach to Agriculture MonitoringInternet of Things based approach to Agriculture Monitoring
Internet of Things based approach to Agriculture Monitoring
 

Similar to Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
plan4all
 

Similar to Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology (20)

The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Big Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark DataBig Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark Data
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
iMarine catalogue of services
iMarine catalogue of servicesiMarine catalogue of services
iMarine catalogue of services
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – Harris
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Hunk - Unlocking The Power of Big Data Breakout Session
Hunk - Unlocking The Power of Big Data Breakout SessionHunk - Unlocking The Power of Big Data Breakout Session
Hunk - Unlocking The Power of Big Data Breakout Session
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
HEPData workshop talk
HEPData workshop talkHEPData workshop talk
HEPData workshop talk
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 

More from semanticsconference

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Jack Verhoosel | Semantics in Dairy Farming: towards a Common Dairy Ontology

  • 1. SEMANTICS FOR BIG DATA APPLICATIONS IN SMART DAIRY FARMING Jack Verhoosel, Jacco Spek Presentation at the Semantics 2016 conference 14 September 2016
  • 2. aantalle n NL noemen Collaboration project 3 Cooperations 7 SME’s 5 Research institutes 7 Real farmers Timeline: SDF1: 2011 – 2014 Nothern part of the Netherlands Website (in Dutch): http://www.smartdairyfarming.nl/nl/ Goal of SDF: to support dairy farmers in the care of individual animals. with the specific goal of a longer productive stay at the farm due to improvement of individual health. Challenge SDF2: more farmers: from 7 to 60 (and prepare for 2500) more sensor suppliers and more data consumers incorporate semantics and big data analysis Numbers for the Dutch situation: • 15000+ farmers • in total more then 1.5 million milk cows • 20 to 200+ datafields per cow • many different stakeholders in the chain 2 SDF 1.0 (2011 – 2014) SDF 2.0 (2015 – 2017)
  • 3. 3 Starting point: Cow centric thinking Starting point: Farmer in control “De boer aan het roer” Real time analysis models (at different organisations) Sensors from different suppliers: Lely, Delaval, Agis, Gallagher,… Other data sources, CRV, FC, AgriFirm, Weather, Satellite InfoBroker: Open platform for sharing (sensor) data producers and consumers Cow specifics Workinstructions (SOP) This project is made possible by: Data sharing in the dairy chain Think big, start small 12GB sensordata per year for 7 farms => 310 GB triples From 7 to 50 farms of 15.000 in NL
  • 4. 4 InfoBroker concept InfoBroker functionalities:  Open interfaces for data exchange (API)  Authentication  who are you (are you allowed to login)  Permissions  which data may be used by whom  to be set by the farmers  Namingservice  location where the data can be found – static data – cow-centric sensor data  Integration  combining info from different sources  Pay-per-use  fixed costs (connections)  variable costs (used data) So:  no central datastore for (sensor)data!  but indeed a broker  and reduces/prevents duplication cow specific work instructions (SOPs) InfoBroker cow centric data cow centric data Cow centric Sensor data Static data (e.g. feed) Cow centric Sensor data Static data (e.g. date of birth) Dashboard Model Model Model x 15.000+
  • 5. InfoBroker – Facts & Figures 5 Farm 1 Farm 2 Farm 3 Farm 4 Farm 5 Farm 6 Farm 7 # cows/calves 459 186 315 239 706 202 351 Behaviour x x Temperature x x Activity x x x x x x Milk production x x x x x Food intake x x x Weight x x x x x x x Water intake x x Milk intake x x Date: february 2015 NB1: this are “sensor data categories” at a farm NB2: not all animals are monitored for SDF (e.g. 3 and 4 only calves)
  • 6. InfoBroker – Facts & Figures 6 Number of cows vs time Number of sensorfields vs time
  • 7. WHY LINKED DATA AND SEMANTICS? 1. To make the various data sets accessible in an automatically linkable manner for easier integration 2. To align the semantics of the datasets in isolation as well as in combination using ontologies 3. To enable a rich set of questions to be queried on the datasets for better analysis 7
  • 8. AgriFirm: “How much feed did a group of cows at a dairy farm take in a certain period per type of feed and how strong is the correlation with milk yield?” CRV: “What was the average weight per day over the last lactation period of a cow and what was the weight in/decrease over that period? BIG DATA ANALYSIS QUESTIONS
  • 9. SIMPLE EXAMPLE OF LINKED DATA Subject Object predicate Cow Animal is a (type) Parcel grazes on Parcel Grasslandis a (type) 40 ha has surface 11
  • 10. LINKED DATA ROADMAP* The four design principles of Linked Data (by Tim Berners Lee): 1. Use Uniform Resource Identifiers (URIs) as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). 4. Include links to other URIs so that they can discover more things. 12 LODRefine *Based on PLDN LD roadmap
  • 11. ….. sensor data….. sensor data ONTOLOGY-BASED SDF 13 Cow specific data per farm per sensor equipment Delaval sensor data Lely sensor data Visualization and analysis apps Common Dairy Ontology MS-ontology Measurements Triples Static Triples ST-ontology Static data (e.g. date of birth) Cow specific data per farm mapping mapping Nedap sensor data Agis sensor data
  • 12. COMMON DAIRY ONTOLOGY 14 “What was the average weight per day and weight in/decrease over the last lactation period of a cow in a group ?”
  • 13. ONTOLOGY MAPPING 15 Common Dairy Ontology rdfs:label = “Activity2hours” rdfs:label = “BodyWeight” Measurement ontology
  • 14. PLASIDO: OUR BIG, LINKED DATA PLATFORM Powerful server: 128GB memory, 5TB storage Triplestores: Marmotta Triplestore with Relational TripleDB Jena Fuseki Triplestore with Native GraphDB Virtuoso Relational ClassicalDB with SPARQL-2-SQL interface All SDF data of 2014 and 2015 retrieved from InfoBroker Converted into triples using LODRefine and RDF generator From 12GB to 310GB, increase of factor 25 Stored in Marmotta and Fuseki for comparison Application development: Angular-Javascript JSON converter Google Visualization 16 LODRefine
  • 15. POC DEMO: AGRIFIRM AND CRV 17
  • 16. POC DEMO: STATIC COW INFO 18
  • 17. POC DEMO: MILKYIELD / FEEDINTAKE 19 Correlation? How strong? AI data analytics!
  • 18. POC DEMO: FEEDINTAKE PER FEED 20
  • 19. PLASIDO PERFORMANCE TESTS 1. All 2014 SDF triple data from Infobroker into Apache Marmotta triplestore 12GB of CSV data turned into +/- 310 GB of RDF triples Marmotta makes use of classical relational database to store triples Simple queries with only one parameter can be easily answered More complex queries let to unacceptable response time Main reason is inefficient access to underlying RDB 2. Next step: switch all data to Apache Jena Fuseki triplestore Still +/- 310 GB of RDF triples Fuseki makes use of modern graph database to store triples Simple queries with only one parameter can be easily answered More complex analysis queries lead to long, but still acceptable response time, upto 15 minutes So, an acceptable performance. See next slide for some numbers. 21
  • 20. PLASIDO PERFORMANCE Query Input Graph size Search par Response Select an overview with the number of cows of a farmer Stokman 111,604,625 1 0.04s Bakker 167,894,559 1 0.03s Antonides 79,739,365 1 0.37s Select the list of cows with number and parity Stokman 28,704 3 0.934s Bakker 9,400 3 15.110s Antonides 45,816 3 27.006s Select feed per type per day over all cows of a farmer Stokman 66,551,765 3 913.003s Bakker 38,034,692 3 350.917s Antonides 45,637,592 3 380.470s Select average weight over all cows per day per parity Antonides 45,637,592 3 348.704s Select static info for a cow NL 715820911 45,816 2 0.094s Select weight per day in lactation period NL 715820911 45,683,408 5 5.129s Select weight and milkyield per day in lactation period NL 715820911 45,683,408 7 13.714s Select milkyield per day in lactation period NL 715820911 45,683,408 3 4.142s With set-up 2 using Apache Jena Fuseki triplestore with graphDB
  • 21. PLASIDO PERFORMANCE TESTS Conclusion of previous 2 tests: working with large triplesets is acceptable However, is it really necessary to put all data into triples? Why not use only the CDO as semantic interface and leave all data in classical table database format? 3. Next step: put all data into tables and use RDB of Virtuoso Only +/- 10 GB of raw data Makes use of classical relational database to store raw data in table form SPARQL interface based on CDO Mapping between SPARQL and SQL to translate queries towards RDB Simple queries with only one parameter can be easily answered More complex analysis queries lead to errors, because the SPARQL-2- SQL mapping generates too large SQL queries and results that cannot be handled by Virtuoso So, an unacceptable performance for this Virtuoso-based solution. 23
  • 22. CONCLUSION AND NEXT STEPS CDO ontology application in SDF architecture Further enhancement of CDO to cover the dairy domain extensively Architectural study to enable the use of CDO with InfoBroker CDO as semantic interface for InfoBroker Apply AI deep learning algorithms to perform data analytics More extensive performance studies Other linked data platforms to deal with big data: D2RQ Other possibilities for improvements: Linked Data Fragments Dealing with heterogeneous sensordata: differently measured Enabling analysis based on incomplete sensordata RDF stream processing for dealing with streaming sensor data 24
  • 23. THANK YOU FOR YOUR ATTENTION! 25

Editor's Notes

  1. So, what is SDF about Only possible if we integrate the whole chain.
  2. Drie uitdagingen: Jongvee opfok Fertility: Vruchtbaarheid (when?) Transitie/droogstand (wanneer?  juiste voeding)
  3. Het verband tussen open en linked data Wat is LD: verbinden van data met semantiek om databronnen te kunnen combineren
  4. LOD stappenplan Step1: Select: BOMOD Step2: Datasets are similar to raw material: they first have to be refined before they become useful. Data cleaning (also referred to as cleansing or scrubbing) describes the process of: fixing errors, transforming and homogenizing formats, aligning inconsistencies in data and metadata, removing duplicate and redundant information, adding lacking information, and making sure the information is up-to-date. One concrete example is the deletion of white spaces and empty cells in a dataset and the identification of missing data Step3: Make a conceptual model of the data by defining concepts and their relationships and properties. You can use the logical data model obtained when preparing the data as input for this step. Investigate how others are already describing similar or related data in vocabularies. Formalize the model and your vocabulary, preferably in the Web Ontology Language OWL Step4: Step7; About dataset, follow ckan rules. Decsription ,website/explanation, LOD stars, use of voc’s, time period Step 8: Welke issues/uitdagingen? Hoe van data LD maken? LOD stappenplan Hoe LD beschikbaar maken? Infrastructuur: adapters, infobroker
  5. Uitdagingen: Schaalbaarheid infrastructuur Bronnen matchen, semantiek verbinden