Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to Highly Organised, Disruptive Big Data Science in CIAT(20)

Advertisement

More from CIAT(20)

Advertisement

Highly Organised, Disruptive Big Data Science in CIAT

  1. Highly Organised, Disruptive Big Data Science in CIAT
  2. In a nutshell… • Laying a foundation: highly organized data and data culture • Three exceptionally powerful examples • Key messages looking forwards – for discussion
  3. • Data-omics (agr, phen, gen) • Spatial • Socio-economics
  4. Agricultura Específica por Sitio (AEPS) Big Data para la agronomía
  5. What we propose? + + = Climate Soil Crop management (productivity/ha) (including varieties) % ? + % ? + %? = To Explain (100 %) A complementary bottom-up approach: Information from commercial fields - Taking advantage of modern information technologies Empirical modelling approaches aimed to identify the combination of factors that lead to either high or low productivities (mostly based on machine learning techniques ) –Data-driven agronomy to optimize productivity in agricultural systems!!! Crop response
  6. Tremendous Analytical Challenges Machine Learning Artificial neural nets Random Forest Multiple linear regression Kohonen self organizing maps Conditional Forest Factorial analysis Generalised linear models Mixed models
  7. FEDEARROZ 733 - 34 % of productivity variation explained Multivariate analysis for Saldaña (research station- Andean zone ): cropping events (2007 to 2012) – Irrigated rice – Technique: C-Forest Cimarron Barinas - 56 % of productivity variation explained Our findings: Cimarron Barinas (N=78) Fedearroz 733 (N=267)
  8. Years Tha 201419XX Imported technology Regional adapted agronomy Data-driven Agronomy Broadly adapted technology Some of the reasons why this is so exciting!!! Open call for “Agronomicians”
  9. Reconocimiento mundial por el trabajo conjunto MADR-CIAT-FEDEARROZ
  10. Genomics – CASSAVA BIG DATA 1 2 3 4 8 6 7 5 > 18,000 RAD-seq 93 Fluidigm SNPYCHIP 93 SNPYalleles- RAD database 1 1 1 4 4 4 8 8 8 3 3 3 6 6 5 5 5 2 2 2 7 7 Mutation in restriction sites and current analytical approaches cause a Reduction in estimates of observed heterozygocity. RAD at currently may only be suitable to establish shallow relationships in population genetics studies 355 LAC Landraces analyzed 10 to 15 TB genomic data collected for 1,500 Land races, wild and improved materials
  11. Near-real time pan-tropical monitoring system for natural vegetation conversion detection
  12. Methods to detect deforestation only worked for dense humid forests. Forest monitoring In 2006, only one country located in the tropics monitored deforestation: Brazil There was no consistent estimation of deforestation trends in the world (figures based on statistics provided by the governments)
  13. Vegetation identification and monitoring • 2 satellites (MODIS Aqua and Terra) • Take a picture of the globe daily • With a 250m spatial resolution (6.25ha) • We use 16 days composite images to reduce the effect of clouds • 390 billion individual values were analyzed
  14. Detections Jan 2004 Oct 2012 Context – Method – Results – Impact – Conclusions We generate a new map of deforestation every 16 days with a resolution of 250m for all Latin America
  15. Impact • Data used for a publication in Science • Data used by independent media and platforms such as Global Forest Watch • www.terra-i.org • +1900 users • +250 organizations • Terra-i Peru is now the official alerts system used by the Peruvian government.
  16. Big Data: A behavior change • YES big data requires large amounts of data and therefore big servers, BUT it is much more than that: • REUSING the data: Extracting embedded knowledge from existing datasets to answer questions that don’t have to do with the initial purpose for which the data was captured. • COMBINING datasets that were originally not supposed to meet, enable to relate more variables and uncover useful correlations. • ANALYZING with CREATIVITY: the data scientist needs to be innovative in the uses he is giving the data. Who would have guessed that Google requests could help fighting flu?
  17. Big Data in Ag: Greater reach • Open ag. science to NON EXPERIMENTAL DATA: low quality can be compensated by quantity. Results are always tied to an uncertainty level = welcome to fuzzy logic world (more complexity for more exactitude). • TO OBSERVE, not to EXPLAIN: Big data is about identifying patterns, correlations that tells you that when you do A, B will occur. Even if you don’t know the reason why, this is of great help to : • Make tactical decisions on a farm • Characterize the impact of specific climate pattern on crops • Prioritize funds allocation in research (specific zones, genotypes) • Design plant breeding strategies • Near real-time monitoring of crops (or other products, like in the industry) • It’s not only about PREDICTING, we can use Big Data to UNDERSTAND a phenomenon. Pulling pieces together: soil data, weather records, management practices information, price series, remote sensing products, UAV data, genomics….
  18. Democratizing Big Data….. About CGIAR mission: propose ANOTHER BUSINESS MODEL for the use of these techniques. Google, Monsanto, John Deere all entered the business of big data in Ag, but with the same business model: subscribed service for commercial farmers. Smallholders also have much to benefit from BD, but can’t always pay for the service. How do we close equity gaps instead of widening them?
  19. Looking forwards: Data continues to support CIAT science • Partnerships upstream: Analytics, data science, infrastructure • Looking left and right: data commodity players, ICT development • Partnerships downstream: Farmers and their organisations, local organisations (private and public) etc.
Advertisement