In a nutshell…
• Laying a foundation: highly organized data and data culture
• Three exceptionally powerful examples
• Key messages looking forwards – for discussion
What we propose?
+ + =
Climate Soil Crop management (productivity/ha)
(including varieties)
% ? + % ? + %? = To Explain (100 %)
A complementary bottom-up approach: Information from commercial fields -
Taking advantage of modern information technologies
Empirical modelling approaches aimed to identify the combination of factors
that lead to either high or low productivities (mostly based on machine
learning techniques ) –Data-driven agronomy to optimize productivity in
agricultural systems!!!
Crop response
Tremendous Analytical Challenges
Machine Learning
Artificial neural nets
Random Forest
Multiple linear regression
Kohonen self
organizing maps
Conditional Forest
Factorial analysis
Generalised linear models
Mixed models
FEDEARROZ 733 - 34 % of productivity
variation explained
Multivariate analysis for Saldaña (research station- Andean zone ): cropping
events (2007 to 2012) – Irrigated rice – Technique: C-Forest
Cimarron Barinas - 56 % of
productivity variation explained
Our findings:
Cimarron Barinas
(N=78)
Fedearroz 733
(N=267)
Genomics – CASSAVA BIG DATA
1
2
3
4
8
6
7
5
> 18,000 RAD-seq
93 Fluidigm SNPYCHIP
93 SNPYalleles- RAD database
1
1
1
4
4
4
8
8
8
3
3
3
6
6
5
5
5
2
2
2
7
7
Mutation in restriction sites and current analytical
approaches cause a Reduction in estimates of
observed heterozygocity.
RAD at currently may only be suitable to establish
shallow relationships in population genetics studies
355 LAC Landraces
analyzed
10 to 15 TB genomic data
collected for 1,500 Land races,
wild and improved materials
Methods to detect deforestation only worked for dense humid forests.
Forest monitoring
In 2006, only one country located in the tropics monitored deforestation: Brazil
There was no consistent estimation of deforestation trends in the world (figures based on
statistics provided by the governments)
Vegetation identification and monitoring
• 2 satellites (MODIS Aqua and Terra)
• Take a picture of the globe daily
• With a 250m spatial resolution
(6.25ha)
• We use 16 days composite images
to reduce the effect of clouds
• 390 billion individual values were
analyzed
Detections
Jan 2004
Oct 2012
Context – Method – Results – Impact – Conclusions
We generate a new map
of deforestation every 16
days with a resolution of
250m for all Latin
America
Impact
• Data used for a publication in Science
• Data used by independent media and
platforms such as Global Forest Watch
• www.terra-i.org
• +1900 users
• +250 organizations
• Terra-i Peru is now the official alerts
system used by the Peruvian
government.
Big Data: A behavior change
• YES big data requires large amounts of data and therefore big
servers, BUT it is much more than that:
• REUSING the data: Extracting embedded knowledge from existing
datasets to answer questions that don’t have to do with the initial
purpose for which the data was captured.
• COMBINING datasets that were originally not supposed to meet,
enable to relate more variables and uncover useful correlations.
• ANALYZING with CREATIVITY: the data scientist needs to be
innovative in the uses he is giving the data. Who would have guessed
that Google requests could help fighting flu?
Big Data in Ag: Greater reach
• Open ag. science to NON EXPERIMENTAL DATA: low quality can be compensated
by quantity. Results are always tied to an uncertainty level = welcome to fuzzy
logic world (more complexity for more exactitude).
• TO OBSERVE, not to EXPLAIN: Big data is about identifying patterns, correlations
that tells you that when you do A, B will occur. Even if you don’t know the reason
why, this is of great help to :
• Make tactical decisions on a farm
• Characterize the impact of specific climate pattern on crops
• Prioritize funds allocation in research (specific zones, genotypes)
• Design plant breeding strategies
• Near real-time monitoring of crops (or other products, like in the industry)
• It’s not only about PREDICTING, we can use Big Data to UNDERSTAND a
phenomenon. Pulling pieces together: soil data, weather records, management
practices information, price series, remote sensing products, UAV data,
genomics….
Democratizing Big Data…..
About CGIAR mission: propose ANOTHER BUSINESS MODEL for the use
of these techniques.
Google, Monsanto, John Deere all entered the business of big data in
Ag, but with the same business model: subscribed service for
commercial farmers. Smallholders also have much to benefit from BD,
but can’t always pay for the service.
How do we close equity gaps instead of widening them?
Looking forwards: Data continues to support
CIAT science
• Partnerships upstream: Analytics, data science, infrastructure
• Looking left and right: data commodity players, ICT development
• Partnerships downstream: Farmers and their organisations, local
organisations (private and public) etc.