Overview of CGIAR’s Big Data
Platform
Medha Devare
Sr. Research Fellow – IFPRI
[Big Data Platform Module Lead]
Ibnou Dieng
AfricaRice
February 15, 2019
The goal of the Big Data Platform is to harness the capabilities
of data to accelerate and enhance the impact of international
agricultural research.
https://thelukewarmersway.wordpress.com/2016/02/07/climate-scientists-in-like-flint/
Why share data? Funder, country policies
Journals increasingly require data underlying publications to be shared
or deposited within an accessible database or repository – as a
condition for publication.
…and there are a growing number of data journals, that provide
citations similar to those for publications – may be used as KPIs…
Why share data? Journal requirements
Piwowar, H.A et al.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone
.0000308
Why share data? Citation advantages
Publicly available data was significantly (p = 0.006)
associated with a 69% increase in citations, independently
of journal impact factor, date of publication, and author
country of origin ...
“The goal is to turn data into information, and information into insight.”
– Carly Fiorina, former executive, president, and chair of Hewlett-Packard
Hey Cigi, should I direct seed or transplant my rice?
How should I manage my crop?
Real-time decision support for farmers
Easy natural language as an interface
Smart artificial intelligence trained by
CGIAR and partners
Leveraging open, harmonized and
interoperable “small” data into
queriable large data pool
1. Making data FAIR: Technical support towards CGIAR Center and partner
efforts to make data Findable, Accessible, Interoperable, and Reusable.
2. Enabling data discovery: Enable the contextually-linked discovery of
resources (research outputs, experts, geographies) across CGIAR.
3. Building capacity: Facilitate FAIR data and comfort with Big Data technologies
- the power and the risks (in-person; guidance materials; webinars).
4. Enabling data exploration, analysis, visualization: Leverage interoperability
and reusability to allow semantic exploration and seamless “plug and play”
with analytical and visualization tools.
How is the Platform helping with CGIAR’s FAIRy tale?
Support for Center repositories to implement CG Core Metadata Schema; tools
to facilitate repository and data-level metadata entry using CG Core elements,
AGROVOC, ontologies
Refine and develop Crop Ontology, AgrO, SociO; invest in tools to enable data
annotation
Facilitate standardization of agronomic trial data at collection rather than at
archiving, via ontology-based field book (field-testing starts early 2019)
Support for Interoperability...
Support for Reusability (best practices in privacy/ethics)……
https://bigdata.cgiar.org/responsible-data-guidelines/
Getting to (and leveraging) FAIR…
http://gardian.bigdata.cgiar.org/
Results filtered for AfricaRice
71 publications, 18
datasets for Benin
Click to see other data these
authors may have published
Filter to find data in GARDIAN
based on these controlled
vocabulary/ontology terms
GARDIAN algorithms attempt to find pubs
related to dataset – and vice versa
GARDIAN brings in
data from Genebanks
Platform’s Genesys
Zoom in on map and drop pin for pop-
up summary of “rice rainfed yield” for
the country (Benin) – from SPAM 2005;
data from 2010 and 2015 coming soon
…or use polygon feature for summary
of “rice rainfed yield” in a particular
area of interest…
Collaborate and convene around data and
agricultural R4D
Developing Technical Partnerships
Providing Shared Services (data and tools)
Providing Technical Training
Supporting six Communities of Practice
Mini-Grants for Key Datasets
Convene
CommunitiesofPractice
Data-Driven Agronomy | CIAT
Crop Modeling | CIMMYT
Geospatial Analysis | IFPRI
Livestock Data | Univ. Edinburgh
Ontologies | Bioversity Int’l
Socio-Economic Data | CIMMYT
Plus…?
Innovation process to enhance data
science research in CRPs
Competition
5 pilots (100K ea); 1 scale-up (250K)
Criteria
- Data use
- Scale
- Impact
- Sustainability
- Innovation
Inspire
Topics
- Revealing Food System Flows
- Monitoring Pests & Diseases
- Disrupting Impact Assessment
- Empowering Data-Driven Farming
S. Mohapatra: Head, Marketing & Communications
C. Kacou: Interim Head of ICT Unit
M. Bernard: Head Knowledge Management
P. Kouame: Data Manager
AfricaRice Contributors
Thank you!
bigdata.cgiar.org
Questions?
i.dieng@cgiar.org
m.devare@cgiar.org
Thank you!

Overview of CGIAR’s Big Data Platform

  • 1.
    Overview of CGIAR’sBig Data Platform Medha Devare Sr. Research Fellow – IFPRI [Big Data Platform Module Lead] Ibnou Dieng AfricaRice February 15, 2019
  • 2.
    The goal ofthe Big Data Platform is to harness the capabilities of data to accelerate and enhance the impact of international agricultural research.
  • 3.
  • 4.
    Why share data?Funder, country policies
  • 5.
    Journals increasingly requiredata underlying publications to be shared or deposited within an accessible database or repository – as a condition for publication. …and there are a growing number of data journals, that provide citations similar to those for publications – may be used as KPIs… Why share data? Journal requirements
  • 6.
    Piwowar, H.A etal. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone .0000308 Why share data? Citation advantages Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin ...
  • 7.
    “The goal isto turn data into information, and information into insight.” – Carly Fiorina, former executive, president, and chair of Hewlett-Packard
  • 8.
    Hey Cigi, shouldI direct seed or transplant my rice? How should I manage my crop? Real-time decision support for farmers Easy natural language as an interface Smart artificial intelligence trained by CGIAR and partners Leveraging open, harmonized and interoperable “small” data into queriable large data pool
  • 10.
    1. Making dataFAIR: Technical support towards CGIAR Center and partner efforts to make data Findable, Accessible, Interoperable, and Reusable. 2. Enabling data discovery: Enable the contextually-linked discovery of resources (research outputs, experts, geographies) across CGIAR. 3. Building capacity: Facilitate FAIR data and comfort with Big Data technologies - the power and the risks (in-person; guidance materials; webinars). 4. Enabling data exploration, analysis, visualization: Leverage interoperability and reusability to allow semantic exploration and seamless “plug and play” with analytical and visualization tools. How is the Platform helping with CGIAR’s FAIRy tale?
  • 11.
    Support for Centerrepositories to implement CG Core Metadata Schema; tools to facilitate repository and data-level metadata entry using CG Core elements, AGROVOC, ontologies Refine and develop Crop Ontology, AgrO, SociO; invest in tools to enable data annotation Facilitate standardization of agronomic trial data at collection rather than at archiving, via ontology-based field book (field-testing starts early 2019) Support for Interoperability...
  • 12.
    Support for Reusability(best practices in privacy/ethics)…… https://bigdata.cgiar.org/responsible-data-guidelines/
  • 13.
    Getting to (andleveraging) FAIR… http://gardian.bigdata.cgiar.org/
  • 14.
  • 15.
  • 16.
    Click to seeother data these authors may have published
  • 17.
    Filter to finddata in GARDIAN based on these controlled vocabulary/ontology terms
  • 18.
    GARDIAN algorithms attemptto find pubs related to dataset – and vice versa
  • 19.
    GARDIAN brings in datafrom Genebanks Platform’s Genesys
  • 21.
    Zoom in onmap and drop pin for pop- up summary of “rice rainfed yield” for the country (Benin) – from SPAM 2005; data from 2010 and 2015 coming soon
  • 22.
    …or use polygonfeature for summary of “rice rainfed yield” in a particular area of interest…
  • 23.
    Collaborate and convenearound data and agricultural R4D Developing Technical Partnerships Providing Shared Services (data and tools) Providing Technical Training Supporting six Communities of Practice Mini-Grants for Key Datasets Convene CommunitiesofPractice Data-Driven Agronomy | CIAT Crop Modeling | CIMMYT Geospatial Analysis | IFPRI Livestock Data | Univ. Edinburgh Ontologies | Bioversity Int’l Socio-Economic Data | CIMMYT Plus…?
  • 25.
    Innovation process toenhance data science research in CRPs Competition 5 pilots (100K ea); 1 scale-up (250K) Criteria - Data use - Scale - Impact - Sustainability - Innovation Inspire Topics - Revealing Food System Flows - Monitoring Pests & Diseases - Disrupting Impact Assessment - Empowering Data-Driven Farming
  • 26.
    S. Mohapatra: Head,Marketing & Communications C. Kacou: Interim Head of ICT Unit M. Bernard: Head Knowledge Management P. Kouame: Data Manager AfricaRice Contributors
  • 27.