Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INRA's Big Data perspectives
and implementation challenges
Pascal Neveu
UMR MISTEA
INRA - Montpellier
11 avril 2013
Pascal Neveu 2
What is INRA?
French Institute for Agricultural Research
• Specialties:
– Agriculture
– Envir...
11 avril 2013
Pascal Neveu 3
What is INRA ?
Technical staff (4000) --> data producer
INRA Data characteristics:
- Spread o...
Pascal Neveu / AG MIA 2014 4
Raise integrated issues and challenges:
– How to adapt agriculture to climate change?
– How a...
11 avril 2013
Pascal Neveu 5
Data challenges in Science
Modern science must deal with:
● More experimental data
● A lot of...
11 avril 2013
Pascal Neveu 6
Big Data
A definition: data sets grow so large and complex that it becomes
impossible to proc...
11 avril 2013
Pascal Neveu 7
Agronomic Big Data
V characteristics
● Volume: massive data and growing size
→ hard to store,...
11 avril 2013
Pascal Neveu 8
Why Big Data is important in
Agronomic Sciences?
Production of a lot of heterogeneous data fo...
11 avril 2013
Pascal Neveu 9
Illustration: Plant Phenotyping
Environment
Plant Genotype
Phenotype
Interactions
11 avril 2013
Pascal Neveu 10
Searching for the most adapted genotypes
High throughput
Various Environments
Many Plant Gen...
11 avril 2013
Pascal Neveu 11
Why high throughput phenotyping
is important for agriculture?
● Adaptation to climate change...
11 avril 2013
Pascal Neveu 12
Phenome
High throughput plant phenotyping
French Infrastructure
9 multi-species platforms
● ...
11 avril 2013
Pascal Neveu 13
5 Field Platforms
Various scales and data types
● Cell, organ, plant, population
● Images, h...
11 avril 2013
Pascal Neveu 14
2 Green house Platforms
-1
Time (d)
0 20 40 60 80 100
0
10
20
30
40
50
60
Plantbiomassg
Vari...
11 avril 2013
Pascal Neveu 15
2 « Omics » platforms
●
Grinding
weighting (-80°C)
Extraction
Fractionation
Pipetting
Incuba...
11 avril 2013
Pascal Neveu 16
Data management challenges
in Phenome: Volume growth
40 Tbytes in 2013, 100 Tbytes in 2014, ...
11 avril 2013
Pascal Neveu 17
Data management challenges
in Phenome: Variety
● Produced by different communities (genetici...
11 avril 2013
Pascal Neveu 18
Data management in Phenome:
Velocity
– Green House platforms produce tens of thousands
image...
11 avril 2013
Pascal Neveu 19
Data management challenges
in Phenome: Validity
Data cleaning
● Automatically diagnose and m...
11 avril 2013
Pascal Neveu 20
Conclusion
High throughput phenotyping data:
– Hard to produce
– Hard to manage
– Hard to an...
Upcoming SlideShare
Loading in …5
×

Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges

716 views

Published on

The AIMS team in cooperation with the BigDataEurope consortium is pleased to announce the webinar “INRA's Big Data Perspectives and Implementation Challenge”. It will present perspectives and implementation challenges of big data for agronomic research organisations. It is open for information management specialists, librarians, software developers and other interested people.

Abstract

"INRA's Big Data Perspectives and Implementation Challenges" webinar will give a perspective on big data and implementation challenges for agronomic research organisations such as the French National Institute for Agricultural Research (INRA). The presenter will focus on the characteristics of big data in agriculture using a specific case study on high throughput plant phenotyping where a lot of heterogeneous data need to be generated and managed. This use case shows how to deal with variety of data.

Published in: Technology
  • Be the first to comment

Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges

  1. 1. INRA's Big Data perspectives and implementation challenges Pascal Neveu UMR MISTEA INRA - Montpellier
  2. 2. 11 avril 2013 Pascal Neveu 2 What is INRA? French Institute for Agricultural Research • Specialties: – Agriculture – Environment – Food ● 10000 Persons (4000 researchers) ● 18 Centres (> 40 sites)
  3. 3. 11 avril 2013 Pascal Neveu 3 What is INRA ? Technical staff (4000) --> data producer INRA Data characteristics: - Spread over territory - Many disciplines - National and international collaborations
  4. 4. Pascal Neveu / AG MIA 2014 4 Raise integrated issues and challenges: – How to adapt agriculture to climate change? – How agriculture impacts environment? – Agroecology «producing and supplying food in a different way » – Global food security and needs of adaptation – Plant treatment and food safety – ... Agronomic Sciences
  5. 5. 11 avril 2013 Pascal Neveu 5 Data challenges in Science Modern science must deal with: ● More experimental data ● A lot of datasets available on the Web ● More collaborative and integrative approaches → Management, sharing and data analysis play an increasing role in research Discover, combine and analyse these data → Big Data Challenges
  6. 6. 11 avril 2013 Pascal Neveu 6 Big Data A definition: data sets grow so large and complex that it becomes impossible to process using traditional data processing methods (Management and Analysis) Make data valuable (information, knowledge, decision support)
  7. 7. 11 avril 2013 Pascal Neveu 7 Agronomic Big Data V characteristics ● Volume: massive data and growing size → hard to store, manage and analyze ● Variety and Complexity: different sources, scales, disciplines different semantics, schemas and formats etc. → hard to understand, combine, integrate ● Velocity: speed of data generation → have to be processed on line ● Validity, Veracity, Vulnerability, Volatility, Visibility, Visualisation, etc.
  8. 8. 11 avril 2013 Pascal Neveu 8 Why Big Data is important in Agronomic Sciences? Production of a lot of heterogeneous data for understanding ● Open new insights ● Allow to know: – Which theories are consistent and which ones are not! – When data did not quite match what we expect… → Decision support needs (integrative and predictive approaches)
  9. 9. 11 avril 2013 Pascal Neveu 9 Illustration: Plant Phenotyping Environment Plant Genotype Phenotype Interactions
  10. 10. 11 avril 2013 Pascal Neveu 10 Searching for the most adapted genotypes High throughput Various Environments Many Plant Genotypes High frequency and dynamic trait observations of a lot of Phenotypes Interactions
  11. 11. 11 avril 2013 Pascal Neveu 11 Why high throughput phenotyping is important for agriculture? ● Adaptation to climate change ● More efficient use of natural resources (including water and soil) in our farming practices ● Sustainable management and equity ● Food security Crop performance (yields are globally decreasing) ● … Genotyping and Phenotyping Plant phenotyping has become a bottleneck for progress in plant science and plant breeding
  12. 12. 11 avril 2013 Pascal Neveu 12 Phenome High throughput plant phenotyping French Infrastructure 9 multi-species platforms ● 2 green house platforms ● 5 field platforms ● 2 omics platforms
  13. 13. 11 avril 2013 Pascal Neveu 13 5 Field Platforms Various scales and data types ● Cell, organ, plant, population ● Images, hyperspectral, spectral, sensors, human readings... Thousands of micro-plots time
  14. 14. 11 avril 2013 Pascal Neveu 14 2 Green house Platforms -1 Time (d) 0 20 40 60 80 100 0 10 20 30 40 50 60 Plantbiomassg Various scales and data types time
  15. 15. 11 avril 2013 Pascal Neveu 15 2 « Omics » platforms ● Grinding weighting (-80°C) Extraction Fractionation Pipetting Incubation Reading Various data complex types Composition and structure Quantification of metabolites and enzyme activities
  16. 16. 11 avril 2013 Pascal Neveu 16 Data management challenges in Phenome: Volume growth 40 Tbytes in 2013, 100 Tbytes in 2014, … ● Volume is a relative concept – Exponential growth makes hard ● Storage ● Management ● Analysis Phenome HPC and Storage→ Grid computing (FranceGrille, EGI) – On-demand infrastructure and Elasticity – Virtualization technologies – Data-Based parallelism (same operation on different data)
  17. 17. 11 avril 2013 Pascal Neveu 17 Data management challenges in Phenome: Variety ● Produced by different communities (geneticians, ecophysiologists, farmers, breeders, etc) ● Data integration needs extensive connections and associations to other types of data ● (environments, individuals, populations, etc.) ● Different semantics, data schemas, … Extremely diverse data → Web API, Ontology sets, NoSQL and Semantic Web methods
  18. 18. 11 avril 2013 Pascal Neveu 18 Data management in Phenome: Velocity – Green House platforms produce tens of thousands images/day (200 days/year) – Field platforms produce tens of thousands images/day (100 days/year) – Omic platforms produce tens of Gbytes/day (300 days/year) Scientific Workflow – OpenAlea /provenance module (Virtual Plant INRIA team) – Scifloware (Zenith INRIA team)
  19. 19. 11 avril 2013 Pascal Neveu 19 Data management challenges in Phenome: Validity Data cleaning ● Automatically diagnose and manage: – Consistency? Duplicates? Wrong? – Annotation consistency? – Outliers? – Disguised missing data? – ... Some approaches – Unsupervised Curve clustering (Zenith INRIA team) – Curve fitting over dynamic constraints – Image Clustering
  20. 20. 11 avril 2013 Pascal Neveu 20 Conclusion High throughput phenotyping data: – Hard to produce – Hard to manage – Hard to analyse → We need new approaches Thank you for your attention

×