Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INRA's Big Data perspectives
and implementation challenges
Pascal Neveu
UMR MISTEA
INRA - Montpellier
11 avril 2013
Pascal Neveu 2
What is INRA?
French Institute for Agricultural Research
• Specialties:
– Agriculture
– Envir...
11 avril 2013
Pascal Neveu 3
What is INRA ?
Technical staff -> data producer
INRA Data characteristics:
- Spread over terr...
Pascal Neveu / AG MIA 2014 4
Raises integrated issues and challenges:
– How to adapt agriculture to climate change?
– How ...
11 avril 2013
Pascal Neveu 5
Data challenges in Science
Modern science must deal:
● More data production
● A lot of experi...
11 avril 2013
Pascal Neveu 6
Agronomic Big data
V characteristics
● Volume: massive data and growing size
→ hard to store,...
11 avril 2013
Pascal Neveu 7
Why Big Data is important in
Agronomic Sciences?
Production of a lot of heterogeneous data fo...
11 avril 2013
Pascal Neveu 8
Illustration: High throughpout phenotyping
High throughput?
Many Environments
Many Plant Geno...
11 avril 2013
Pascal Neveu 9
Why high throughput phenotyping
is important for agriculture?
● Adaptation to climate change
...
11 avril 2013
Pascal Neveu 10
Phenome
High throughput plant phenotyping
French Infrastructure
9 multi-species plateforms
●...
11 avril 2013
Pascal Neveu 11
5 Field Platforms
Various scales and data types
● Cell, organ, plant, canopy, population
● I...
11 avril 2013
Pascal Neveu 12
2 Controlled Platforms
-1
Time (d)
0 20 40 60 80 100
0
10
20
30
40
50
60
Plantbiomassg
Vario...
11 avril 2013
Pascal Neveu 13
2 « Omics » platforms
●
Grinding
weighting (-80°C)
Extraction
Fractionation
Pipetting
Incuba...
11 avril 2013
Pascal Neveu 14
Data management challenges
in Phenome: Volume growth
40 Tbytes in 2013, 100 Tbytes in 2014, ...
11 avril 2013
Pascal Neveu 15
Data management challenges
in Phenome: Variety
– Can be produce by differents communities (g...
11 avril 2013
Pascal Neveu 16
Data management in Phenome:
Velocity
– Controlled platforms produce tens of thousands
images...
11 avril 2013
Pascal Neveu 17
Data management challenges
in Phenome: Validity
Data cleaning
● Automatically diagnose and m...
11 avril 2013
Pascal Neveu 18
Conclusion
High throughput phenotyping data:
– Hard to produce
– Hard to manage
– Also hard ...
Upcoming SlideShare
Loading in …5
×

SC2 Workshop 1: INRA's Big Data perspectives and implementation challenges

1,104 views

Published on

Opening presentation to the Big Data Europe (BDE) Workshop of Societal Challenge 2 (Food security, sustainable agriculture and forestry, marine and maritime and inland water research, and the Bioeconomy) on 22.9.2015 in Paris by Pascal Neveu (INRA).

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

SC2 Workshop 1: INRA's Big Data perspectives and implementation challenges

  1. 1. INRA's Big Data perspectives and implementation challenges Pascal Neveu UMR MISTEA INRA - Montpellier
  2. 2. 11 avril 2013 Pascal Neveu 2 What is INRA? French Institute for Agricultural Research • Specialties: – Agriculture – Environment – Food ● 10000 Persons (4000 researchers) ● 18 Centres (> 40 sites)
  3. 3. 11 avril 2013 Pascal Neveu 3 What is INRA ? Technical staff -> data producer INRA Data characteristics: - Spread over territory - Many disciplines - National and international collaborations
  4. 4. Pascal Neveu / AG MIA 2014 4 Raises integrated issues and challenges: – How to adapt agriculture to climate change? – How agriculture impacts environment? – Agroecology «producing and supplying food in a different way » – Global food security and needs of adaptation – Plant treatment and food safety – ... Agronomic Sciences
  5. 5. 11 avril 2013 Pascal Neveu 5 Data challenges in Science Modern science must deal: ● More data production ● A lot of experimental datasets available on the Web ● More collaborative and integrative approaches → Management, sharing and data analysis play an increasing role in research Discover, combine and analyse these data → Big Data Challenges
  6. 6. 11 avril 2013 Pascal Neveu 6 Agronomic Big data V characteristics ● Volume: massive data and growing size → hard to store, manage and analyze ● Variety and Complexity: different sources, scales, disciplines different semantics, schemas and formats etc. → hard to understand, combine, integrate, ● Velocity: speed of data generation → have to be process on line ● Validity, Veracity, Vulnerability, Volatility, Visibility, Visualisation, etc.
  7. 7. 11 avril 2013 Pascal Neveu 7 Why Big Data is important in Agronomic Sciences? Production of a lot of heterogeneous data for understanding ● Open new insights ● Allow to know: – Which theories are consistent and which ones are not! – When data did not quite match what we expect… → Decision support needs (integrative and predictive approaches)
  8. 8. 11 avril 2013 Pascal Neveu 8 Illustration: High throughpout phenotyping High throughput? Many Environments Many Plant Genotypes High frequency and many trait observations of Phenotypes Interactions
  9. 9. 11 avril 2013 Pascal Neveu 9 Why high throughput phenotyping is important for agriculture? ● Adaptation to climate change ● More efficient use of natural resources (including water and soil) in our farming practices ● Sustainable management and equity ● Food security Crop performance (yields are globally decreasing) ● … Genotyping and Phenotyping Plant phenotyping has become a bottleneck for progress in plant science and plant breeding
  10. 10. 11 avril 2013 Pascal Neveu 10 Phenome High throughput plant phenotyping French Infrastructure 9 multi-species plateforms ● 2 controlled platforms ● 5 field platforms ● 2 high throughout omics
  11. 11. 11 avril 2013 Pascal Neveu 11 5 Field Platforms Various scales and data types ● Cell, organ, plant, canopy, population ● Images, hyperspectral, spectral, sensors, actuators, human readings... Thousands of micro-plots time
  12. 12. 11 avril 2013 Pascal Neveu 12 2 Controlled Platforms -1 Time (d) 0 20 40 60 80 100 0 10 20 30 40 50 60 Plantbiomassg Various scales and data types time
  13. 13. 11 avril 2013 Pascal Neveu 13 2 « Omics » platforms ● Grinding weighting (-80°C) Extraction Fractionation Pipetting Incubation Reading Various data complex types composition and the structure of biopolymers Quantification of metabolites and enzyme activities
  14. 14. 11 avril 2013 Pascal Neveu 14 Data management challenges in Phenome: Volume growth 40 Tbytes in 2013, 100 Tbytes in 2014, … ● Volume is a relative concept – Exponential growth makes hard ● Storage ● Management ● Analysis Phenome HPC and Storage→ Cloud (FranceGrille, EGI) – Easy to use with a sort of « unlimited scalability » – On-demand infrastructure and Elasticity (season) – Virtualization technologies – Data-Based parallelism (same operation on different data)
  15. 15. 11 avril 2013 Pascal Neveu 15 Data management challenges in Phenome: Variety – Can be produce by differents communities (geneticians, ecophysiologists, farmers, breeders, etc) – Data integration needs extensive connections to other types of data (genotypes, environments, experimental methods, etc.) – Different semantics, data schemas, … – Can be associated in many ways (environments, individuals, populations, etc.) ● Extremely diverse data → Web API, Ontology sets, NoSQL and Semantic Web methods
  16. 16. 11 avril 2013 Pascal Neveu 16 Data management in Phenome: Velocity – Controlled platforms produce tens of thousands images/day (200 days per year) – Field platforms produce tens of thousands images/day (100 days per year) – Omics platforms produce tens of Gbytes/day (300 days per year) Scientific Workflow – OpenAlea /provenance module (Virtual Plant INRIA team) – Scifloware (Zenith INRIA team)
  17. 17. 11 avril 2013 Pascal Neveu 17 Data management challenges in Phenome: Validity Data cleaning ● Automatically diagnose and manage: – Consistency?, duplicate? Wrong? – annotation consistency? – Outliers? – Disguised missing data? – ... Some approaches – Unsupervised Curve clustering (Zenith INRIA team) – Curve fitting over dynamic constrains – Clustering of Image histograms
  18. 18. 11 avril 2013 Pascal Neveu 18 Conclusion High throughput phenotyping data: – Hard to produce – Hard to manage – Also hard to analyse Thank you for your attention

×