Big data in agriculture
Andreas Drakos
Project Manager, Agro-Know
Presentation Outline
• The importance of Big Data in Agriculture
• Major challenges
• The agINFRA and SemaGrow solutions
•...
INTRO TO OPEN DATA IN
AGRICULTURE
EDBT Special Track Big Data, Athens, March 2014 3
Source:http://www.agricorner.com/share...
Agriculture data to solve major
societal challenges
• All demographic and food demand projections
suggest that, by 2050, t...
Open Data in Agriculture
• In an era of Big Data, one of the most promising routes to
bootstrap innovation in agriculture ...
Open Data in agriculture: a political
priority
“How Open Data can be
harnessed to help meet the
challenge of sustainably
f...
A huge market, globally
Food & Agricultural commodities production, http://faostat.fao.org
EDBT Special Track Big Data, At...
Some figures
• Food - Gross Production Value globally in 2011:
$2,318,966,621
• Agriculture - Gross Production Value globa...
Open data for businesses
EDBT Special Track Big Data, Athens, March 2014 9
Farmers starting to capitalize on
Big Data technology
• Freeing farmers from the constraints of uncertain
factors
– Dairy ...
EDBT Special Track Big Data, Athens, March 2014 11
BIG DATA IN AGRICULTURE
EDBT Special Track Big Data, Athens, March 2014 12
Agricultural data types I
• Publications, theses, reports, other grey literature
• Educational material and content, cours...
Agricultural data types II
• Provenance information, incl. authors, their
organizations and projects
• Experimental protoc...
Big Data demand…
• Storage
– High volume storage
– Impractical or impossible to use centralized storage
• Distribution
• F...
Rationale: Problem statement
 Enable the inclusion of:
• Large, live, constantly updated datasets and
streams
• Heterogen...
Use Cases (DLO)
Heterogeneous Data Collections &
Streams
 Big data:
– Sensor data: soil data, weather
– GIS data: land us...
Use Cases (FAO)
Reactive Data Analysis
 Big data:
– Document collections: past experiences, analysis and research results...
Use Cases (AK)
Reactive Resource Discovery
 Big data:
– Multimedia content about agriculture and biodiversity
 Problem:
...
THE AGINFRA & SEMAGROW SOLUTIONS
EDBT Special Track Big Data, Athens, March 2014 20
The agINFRA project
• e-infrastructure for agricultural research
resources (content/data) and services
• Higher interopera...
agINFRA Grid & Cloud resources
EDBT Special Track Big Data, Athens, March 2014 22
• PARADOX cluster
704 CPU; 50 TB
• Roma ...
The SemaGrow project
• Develop novel algorithms and methods for
querying distributed triple stores
• Overcome problems ste...
The SemaGrow Stack
• Integrates the components in order to offer a single
SPARQL endpoint that federates a number of
heter...
Moving Forward
HARVESTER
OAI-PMH Service
Provider #1
Schema #1
OAI-PMH Service
Provider #n
Schema #n
INDEXER
Aggregated
XM...
Query
Federated endpoint Wrapper
SemaGrow
SPARQL endpoint
Resource Discovery
Query
results
query fragment,
Source
(#1)
Ins...
SUPPORTING GLOBAL INITIATIVES
EDBT Special Track Big Data, Athens, March 2014 27
Global Open Data for Agriculture and
Nutrition (GODAN) godan.info
EDBT Special Track Big Data, Athens, March 2014 28
Resea...
Thank you!
Contact: Andreas Drakos
drakos@agroknow.gr
Upcoming SlideShare
Loading in...5
×

Big Data in Agriculture, the SemaGrow and agINFRA experience

1,074

Published on

Presentation of the SemaGrow and agINFRA projects during the EDBT/ICDT 2014 Special Track on Big Data Management Challenges and Solutions in the Context of European Projects, 27th of March 2014
http://www.edbticdt2014.gr/index.php/eu-projects-track

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,074
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
119
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • G-8 International Conference on Open Data for Agriculture: https://sites.google.com/site/g8opendataconference/home
  • http://www.atelier.net/en/trends/articles/farmers-starting-capitalize-big-data-technology_424444
  • Mention Velocity, Variety, Volume, Value, Viscocity, Virality
  • Overcome problems stemming from heterogeneity and from the fact that the distribution of data over nodes is not determined by the needs of better load balancing and more efficient resource discovery, but by data providers
  • Big Data in Agriculture, the SemaGrow and agINFRA experience

    1. 1. Big data in agriculture Andreas Drakos Project Manager, Agro-Know
    2. 2. Presentation Outline • The importance of Big Data in Agriculture • Major challenges • The agINFRA and SemaGrow solutions • Supporting Global Initiatives EDBT Special Track Big Data, Athens, March 2014 2
    3. 3. INTRO TO OPEN DATA IN AGRICULTURE EDBT Special Track Big Data, Athens, March 2014 3 Source:http://www.agricorner.com/shareholder-demands-to-shape-modern-agriculture/
    4. 4. Agriculture data to solve major societal challenges • All demographic and food demand projections suggest that, by 2050, the planet will face severe food crises due to our inability to meet agricultural demand – by 2050: – 9.3 billion global population, 34% higher than today – 70% of the world’s population will be urban, compared to 49% today – food production (net of food used for biofuels) must increase by 70% • According to these projections, and in order to achieve the forecasted food levels by 2050, a total investment of USD 83 billion per annum will be required EDBT Special Track Big Data, Athens, March 2014 4
    5. 5. Open Data in Agriculture • In an era of Big Data, one of the most promising routes to bootstrap innovation in agriculture is by the use of Open Data: – e.g. provisioning, maintaining, enriching with relevant metadata, making openly available a vast amount of information • The use and wide dissemination of these data sets is strongly advocated by a number of global and national policy makers such as: – The New Alliance for Food Security and Nutrition G-8 initiative – Food & Agriculture Organization of the UN – DEFRA & DFID in UK – USDA & USAID in the US EDBT Special Track Big Data, Athens, March 2014 5
    6. 6. Open Data in agriculture: a political priority “How Open Data can be harnessed to help meet the challenge of sustainably feeding nine billion people by 2050” April, 2013, Washington, D.C. USA EDBT Special Track Big Data, Athens, March 2014 6
    7. 7. A huge market, globally Food & Agricultural commodities production, http://faostat.fao.org EDBT Special Track Big Data, Athens, March 2014 7
    8. 8. Some figures • Food - Gross Production Value globally in 2011: $2,318,966,621 • Agriculture - Gross Production Value globally in 2011: $2,405,001,443 • Investment in agriculture - Gross Capital Stock globally: $5,356,830,000 … they are big EDBT Special Track Big Data, Athens, March 2014 8
    9. 9. Open data for businesses EDBT Special Track Big Data, Athens, March 2014 9
    10. 10. Farmers starting to capitalize on Big Data technology • Freeing farmers from the constraints of uncertain factors – Dairy farm in UK with ‘connected’ herd • anticipating the risks of epidemics and spotting random factors in milk production – Monsanto’s new acquisition protects farmers from weather issues • The spread of smart sensors – Wine-growers in Spain reduced application of fertilizers and fungicides by 20%, accompanied by a 15% improvement in overall productivity using humidity sensors EDBT Special Track Big Data, Athens, March 2014 10
    11. 11. EDBT Special Track Big Data, Athens, March 2014 11
    12. 12. BIG DATA IN AGRICULTURE EDBT Special Track Big Data, Athens, March 2014 12
    13. 13. Agricultural data types I • Publications, theses, reports, other grey literature • Educational material and content, courseware • Research data, – Primary data, such as measurements & observations structured, e.g. datasets as tables digitized, e.g. images, videos – Secondary data, such as processed elaborations e.g. dendrograms, pie charts, models • Sensor data EDBT Special Track Big Data, Athens, March 2014 13
    14. 14. Agricultural data types II • Provenance information, incl. authors, their organizations and projects • Experimental protocols & methods • Social data, tags, ratings, etc. • Germplasm data • Soil maps • Statistical data • Financial data EDBT Special Track Big Data, Athens, March 2014 14
    15. 15. Big Data demand… • Storage – High volume storage – Impractical or impossible to use centralized storage • Distribution • Federation • Computational power – For efficient discovering / querying – For aggregating and processing – For joining EDBT Special Track Big Data, Athens, March 2014 15
    16. 16. Rationale: Problem statement  Enable the inclusion of: • Large, live, constantly updated datasets and streams • Heterogeneous data  Involve publishers that • cannot or will not directly and immediately make the transition to standards and best practices Open Agricultural Data Liaison Meeting 30-31/10/2013EDBT Special Track Big Data, Athens, March 2014 16
    17. 17. Use Cases (DLO) Heterogeneous Data Collections & Streams  Big data: – Sensor data: soil data, weather – GIS data: land usage, forest and natural resources management data – Historical data: crop yield, economic data – Forecasts: climate change models  Problem: – Combine heterogeneous sources to analyze past food production and forecast future trends – Cannot clone and translate: large scale, live data streams – Cannot immediately and directly affect radical re-design of all sensing and processing currently in place 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 17
    18. 18. Use Cases (FAO) Reactive Data Analysis  Big data: – Document collections: past experiences, analysis and research results – Databases: climate conditions and crop yield observations, economic data (land and food prices)  Problem: – Retrieving complete and accurate information to compile reports • Raw data and reports, scientific publications, etc. – Wastes human resources that could analyze data and synthesize useful knowledge and advice for food production • Too much time spent cross-relating responses from different sources – Too many different organizations and processes rely on the different schemas to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 18
    19. 19. Use Cases (AK) Reactive Resource Discovery  Big data: – Multimedia content about agriculture and biodiversity  Problem: – Real-time retrieval of relevant content – Used to compile educational activities – Schema heterogeneity: • Different providers (Oganic edunet, Europeana, VOA3R, etc.) – Too many different organizations and processes rely on the different schema to make re-design viable – Cloning is inefficient: large and constantly updated stores 3rd Plenary & ESG Meeting 21/10/2013EDBT Special Track Big Data, Athens, March 2014 19
    20. 20. THE AGINFRA & SEMAGROW SOLUTIONS EDBT Special Track Big Data, Athens, March 2014 20
    21. 21. The agINFRA project • e-infrastructure for agricultural research resources (content/data) and services • Higher interoperability between agricultural and other data resources (linked data) • Improved research data services and tools using Grid and Cloud resources EDBT Special Track Big Data, Athens, March 2014 21
    22. 22. agINFRA Grid & Cloud resources EDBT Special Track Big Data, Athens, March 2014 22 • PARADOX cluster 704 CPU; 50 TB • Roma Tre cluster 350 CPUs; 100TB • Catania cluster 800 CPUs; 700 TB • SZTAKI cluster 8 CPUs • PARADOX upgrade 1696 CPU;100 TB • Total: 3.5 kCPU; 0.9 PT
    23. 23. The SemaGrow project • Develop novel algorithms and methods for querying distributed triple stores • Overcome problems stemming from heterogeneity and unbalanced distribution of data • Develop scalable and robust semantic indexing algorithms that can serve detailed and accurate data summaries and other data source annotations about extremely large datasets EDBT Special Track Big Data, Athens, March 2014 23
    24. 24. The SemaGrow Stack • Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources • Targets the federation of independently provided data sources • Use POWDER to mass-annotate large- subspaces – W3C recommendation, exploits natural groupings of URIs to annotate all resources in a subset of the URI space EDBT Special Track Big Data, Athens, March 2014 24
    25. 25. Moving Forward HARVESTER OAI-PMH Service Provider #1 Schema #1 OAI-PMH Service Provider #n Schema #n INDEXER Aggregated XML Repository Web Portals Open AGRIS (FAO) AgLR/GLN (ARIADNE) Organic.Edunet (UAH) VOA3R (UAH) ... AGRIS AP Schema IEEE LOM Schema DC Schema ... RDF Triple Store Common Schema SPARQL endpoint (Data Source #1) SPARQL endpoint (Data Source #n) INDEXER Web Portals SPARQL endpoint NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES EDBT Special Track Big Data, Athens, March 2014 25
    26. 26. Query Federated endpoint Wrapper SemaGrow SPARQL endpoint Resource Discovery Query results query fragment, Source (#1) Instance Statistics Data Summaries SPARQL endpoint POWDER Inference Layer P-Store Instance Statistics query fragment, target Source transformed query Query Decomposition query patterns Query Results Merger query fragment, Source (#n) query results Client Reactivity parameters Query Decomposer Data Source(s) Selector Ctrl Candidate Source(s) List Instance Statistics Load Info Semantic Proximity Query Transformation Service Schema Mappings SPARQL endpoint (Data Source #n) SPARQL query Ctrl Ctrl Load Info Instance Statistics Data Summaries Set of query patterns Query Pattern Discovery Service equivalent patterns query pattern Semantic Proximity Resource Selector query results schema transformed schema query request #1 query request #n query results SPARQL endpoint (Data Source #1) SPARQL query Query Manager What Semantic Web can bring into the picture • One Data Access Point for the entire Data Cloud – Enabling Service-Data level agreements with Data providers • Application-level Vocabularies / Thesauri / Ontologies – Enabling different application facets for different communities of users over the SAME data pool • Going beyond existing Distributed Triple Store Implementations –Link Heterogeneous but Semantically Connected Data –Index Extremely Large Information Volumes (Peta Sizes) –Improve Information Retrieval response • Data (+Metadata) physically stored in Data Provider – No need for harvesting • Vocabularies / Thesauri / Ontologies of Data Provider choice – No need for aligning according to common schemas EDBT Special Track Big Data, Athens, March 2014 26
    27. 27. SUPPORTING GLOBAL INITIATIVES EDBT Special Track Big Data, Athens, March 2014 27
    28. 28. Global Open Data for Agriculture and Nutrition (GODAN) godan.info EDBT Special Track Big Data, Athens, March 2014 28 Research Data Alliance (RDA) rd-alliance.org Agricultural Data Interoperability Interest Group Wheat Data Interoperability Working Group CIARD - global movement dedicated to open agricultural knowledge www.ciard.net e-Conference on Germplasm Data Interoperability
    29. 29. Thank you! Contact: Andreas Drakos drakos@agroknow.gr
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×