Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigDataGrapes_Wine Making Pilot

Coraline Damasio presentation on the Wine Making Pilot progress at the BigDataGrapes workshop "Big Data for the Grapevine Industries" in Pisa, Italy (8/3/2019).

  • Be the first to comment

  • Be the first to like this

BigDataGrapes_Wine Making Pilot

  1. 1. Data and Modeling Plants / Environment and Interactions Pascal Neveu – INRA Montpellier 01/15/2018
  2. 2. UMR MISTEA Montpellier Mathematics, Informatics STatistics for Agriculture and Environment → particular emphasis on the dynamics  2 Institutions: INRA and SupAgro  Member of two Labex: Numev and Agro  Member of the french institute #DigitAg  Affiliation of MUSE (Montpellier University) Current staff: over 40 with 22 permanents 2 Teams:  GAMMA Gestion Analyse et Modèles pour les Masses de données en Environnement et Agronomie Management, Analysis and Models for Big Data in Environment and Agriculture  MOCAS Modélisation Optimisation Commande pour les Agro-écoSystèmes Modeling Optimization Control for AgroEcosystems
  3. 3. DATA SCIENCES for Big Data and Modelisation in Agriculture Method keywords: ● Statistics High-dimensional data, curve clustering, functional data analysis → Frequentist and Bayesian approaches + Data integration, Knowledge discovery ● Artificial intelligence and Information System, ontology, data semantic, reasoning Interoperability, scalability, constrains programming ● Dynamic systems, ODE, optimisation, risk management, stochastic process, asymptotic analysis, simuling algorithm, sensitivity analysis, model reduction UMR MISTEA
  4. 4. Method developement for  High Throughput Plant Phenotyping  Digital Agriculture  Agroecology ● Food transformation * Environmental Interactions For understanding and decision support 4
  5. 5. WWW.BIGDATAGRAPES.EU Wine Making Pilot Data sources & linkability INRA Wine Pilot 5
  6. 6. .06 01/15/2018 Plant Sciences The need is to integrate complex and heterogenous data, keep all the information of every experiment and be able to compare data → Data has value if they are grouped Data Integration for High-Throughput Plant Observation
  7. 7. 11 avril 2013 Pascal Neveu 7 Heterogeneous Data Sources
  8. 8. 11 avril 2013 Pascal Neveu 8 Data Must be grouped
  9. 9. .09 01/15/2018 Plant Sciences Data How to structure data ? Data Integration for High-Throughput Plant Observation
  10. 10. .010 01/15/2018 OpenSILEX Open source software set ● Methods, tools, components to implement information system for experimental data in agriculture and environment ● Organized system for the collection, structuration, storage, exchange and treatment of information Data Integration for High-Throughput Plant Observation OpenSILEX implementations ● SILEX Viti-oeno ● SILEX LBE ● PHIS Data Methods People Software Hardware
  11. 11. .011 01/15/2018 OpenSILEX - PHIS Information system for high-throughput phenotyping Designed for data management in phenotyping platforms ● Management of huge, complex and heterogeneous data (millions of images, sensor data, etc) ● cophysiology and agronomy Implement good practices of data management ● Make FAIR data ● Foster collaborations (Open and Flexible) ● Ability to understand and reproduce data processing ● Ability to enforce DMP and Open Data Data Integration for High-Throughput Plant Observation
  12. 12. .012 Semantics to structure data OpenSILEX - PHIS #mauguio5 is-ais-a Cultivated Land Agrovoc/FAO Reference (thesaurus/ontologie) Subclass of #merlot #Plot #s2351 is-ais-a within Data Integration for High-Throughput Plant Observation 01/15/2018 Metadata / ontologies provide the meaning of data → Link each data element to a controlled, shared, vocabulary and machine readable vocabulary
  13. 13. .013 01/15/2018 OpenSILEX - PHIS Architecture Scientific Computation and Workflow LAYER Web API LAYER NoSQL database Triplestor e Data LAYER EMPHASIS LAYER NoSQL database Triplestore Semantic Services e-infrastructure LAYER Distributed storage system Web User Interface Software agents
  14. 14. .014 Sous-titre de la page (facultatif) Knowledge Discovery Illustration 01/15/2018 PHIS provides contextualisation : intercepted light value
  15. 15. .015 Sous-titre de la page (facultatif) Knowledge Discovery Illustration Data Integration for High-Throughput Plant Observation 01/15/2018 A common relationship between leaf width and intercepted light per plant accounted for variations in width between fields, and for the difference between field and greenhouse
  16. 16. .016 OpenSILEX - PHIS Plant and object information Data Integration for High-Throughput Plant Observation 01/15/2018
  17. 17. .017 JOUR / MOIS / ANNEE OpenSILEX - PHIS Environmental Sensor Data Integration for High-Throughput Plant Observation
  18. 18. .018 OpenSILEX - PHIS Trait measurements and associated images Data Integration for High-Throughput Plant Observation 01/15/2018
  19. 19. .019 OpenSILEX - PHIS Event annotation Data Integration for High-Throughput Plant Observation 01/15/2018
  20. 20. .020 OpenSILEX - PHIS Data visualization Data Integration for High-Throughput Plant Observation 01/15/2018
  21. 21. .021 OpenSILEX - PHIS Data Integration for High-Throughput Plant Observation Data Analysis 01/15/2018
  22. 22. .022 OpenSILEX - PHIS Data Integration for High-Throughput Plant Observation Workflow management 01/15/2018
  23. 23. .023 OpenSILEX Data Integration for High-Throughput Plant Observation In short 01/15/2018 Allows management of huge and complex data Enables and facilitates cloud computing (data center, EGI) → distributed computing, distributed storage, backup Open technologies International identification (URI and DOI) Semantic (ontologies, standardized vocabularies) Portal interoperability and Open technologies Provenance and reproducibility data processing Flexible design MISTEA team: support and development www.opensilex.org
  24. 24. Methodologies • More and more data available • Challenge: Understanding the data 24
  25. 25. Goal • Discover keys in numerical data – Keys: combinations of properties that discriminate a resource • Evaluate their quality 25 • Experimental numerical data in 3 wine flavour datasets (2011-2014) How do we discriminate the wines??
  26. 26. Data pre-processing • Objective: Interpret numerical data in a symbolic way • Solution: Use quantiles to group data values – Quantiles: Cut points dividing a set of observations into equal-sized groups 26 PH Wine1 3.15 Wine2 3.22 Wine3 3.23 Wine4 3.24 Wine5 3.56 Wine6 3.68 Initial DataInitial Data PH Wine1 3.15 Wine2 3.22 Wine3 3.23 Wine4 3.24 Wine5 3.56 Wine6 3.68 QuantilesQuantiles PH Wine1 1 Wine2 1 Wine3 2 Wine4 2 Wine5 3 Wine6 3 Transf. Data Transf. Data
  27. 27. Conclusion We proposed a first step towards interdisciplinary research: • A method that allows the key discovery numerical data • Deal with different quality measures /symbolic and numerical data • A validation step using real data on wine flavours 27
  28. 28. Data analytics Provide a decision support for watering Explain the production of black truffles with precipitation and temperature Functional data method: scalar data of production is associated with a serie of precipitation and a serie of temperature
  29. 29. Data analytics
  30. 30. .030 Conclusions and perspectives Data Integration for High-Throughput Plant Observation 01/15/2018  Management of huge and complex data Enables and facilitates cloud computing (data center, EGI) → distributed computing, distributed storage, backup Open technologies Shared Data analysis tools

    Be the first to comment

    Login to see the comments

Coraline Damasio presentation on the Wine Making Pilot progress at the BigDataGrapes workshop "Big Data for the Grapevine Industries" in Pisa, Italy (8/3/2019).

Views

Total views

214

On Slideshare

0

From embeds

0

Number of embeds

42

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×