Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making agricultural knowledge globally discoverable: are we there yet?


Published on

Slides of talk at TGI tutorial series at IFPRI, Washington DC, July 11, 2014.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Making agricultural knowledge globally discoverable: are we there yet?

  1. 1. making agricultural knowledge globally discoverable (and hopefully usable) Nikos Manouselis CEO Agro-Know
  2. 2. background
  3. 3. An extraordinary company that captures, organizes and adds value to the rich information available in agricultural and biodiversity sciences, in order to make it universally accessible, useful and meaningful.
  4. 4. Our way of doing things  We put our people at our focus  We have a culture of shared, co-defined values  We are based on trust and transparency  We see beyond profit by serving our users and customers so that they create societal impact
  5. 5. We develop and put in real practice solutions that transform data into meaningful knowledge and services We help people solve problems informed by data
  6. 6. Unorganized Content in local and remote sites Widgets Authoring services Data Discovery Services Analytics services Data Platform Ingestion Translation Publication Harvesting BlossomCultivation Organized and structured Content in local and remote DBs Educational Bibliographic Other Enrichment Aggregate data from diverse sources Works with different type of data Prepare data for meaningful services Educational Bibliographic data aggregation & sharing solutions
  7. 7. working with high profile partners & clients • Food and Agriculture Organization (FAO) of the United Nations • World Bank Group • UK’s Dept for International Development (DFID) • Michigan State University (MSU) • Wageningen University & Research (WUR) • French Institute of Agricultural Research (INRA) • Creative Commons
  8. 8. large scale data-related projects • agINFRA: a data infrastructure to support agricultural scientific communities (2011 -now) – EU, $5.2M, 12 partners (incl. FAO); tech coordinator, evaluation, sustainability – in G8 Open Data in Agriculture Action Plan for Europe • SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures (2012 - now) – EU, $3.1M, 8 partners (incl. FAO, WUR); tech coordinator, evaluation, sustainability – in G8 Open Data in Agriculture Action Plan for Europe • Organic.Lingua: Demonstrating the potential of multilingual Web Portal for Sustainable Agricultural & Environmental Education (2011- 2014) – EU, $2.4M, 11 partners (incl. INRA); tech+data coordinator, evaluation
  9. 9. data interoperability work • Agricultural Interoperability Interest Group (IG) at Research Data Alliance (RDA) • Database Subgroup, Knowledge & Learning Systems Group, Global Food Safety Partnership (GFSP)
  10. 10. context
  11. 11. “Knowledge is the engine of our economy. And data is its fuel” Neelie Kroes, Vice President of the European Commission agenda/en/news/economic-and- social-benefits-big-data
  12. 12. “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.” Big Data Research & Development Initiative crosites/ostp/big_data_press_release_final_2.pdf
  13. 13. policy • USA’s National Research Council on Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age –“researchers to make all research data, methods, and other information underlying results publicly accessible in a timely manner –“the stewardship of research data is a critical long-term task for the research enterprise and its stakeholders”
  14. 14. internationally • joint USA, EU, Australia, Research Data Alliance (RDA) vision –“researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society”
  15. 15. CIARD’s manifesto • “towards a Knowledge Commons on Agricultural Research for Development” • “agricultural knowledge is freely accessible and contributes to reducing hunger and poverty” • “open knowledge makes it easier to provide better solutions”
  16. 16. GODAN’s statement of purpose • “support global efforts to make agricultural and nutritionally relevant data available, accessible, and usable for unrestricted use worldwide” • “advocate for the release and re-usability of data in support of Innovation and Economic Growth, Improved Service Delivery and Effective Governance, and Improved Environmental and Social Outcomes”
  17. 17. IFPRI & open access • “…research is an international public good, that should be freely disseminated to the extent possible…” • “IFPRI is committed to the principle of free access to the knowledge it generates”
  18. 18. CGIAR & open access • “CGIAR regards the results of its research and development activities as international public goods and is committed to their widespread dissemination and use to achieve the maximum impact to advantage the poor…”
  19. 19. agricultural knowledge: globally accessible? a “good enough” case study
  20. 20. agricultural bibliography • bibliography on agricultural sciences • several efforts in putting together (aggregating/indexing) metadata records on agricultural publications & grey literature • FAO’s AGRIS service: a prominent example – quite advanced data ingestion workflow & infrastructure – semantic backbone with AGROVOC as LOD & triple store with all aggregated records – more than 7.5 million publications indexed & made discoverable
  21. 21. elaborated, automated workflow Metadata harvester Filtering component Stores File system (DC, IEEE LOM, MODS XML) File system (DC, IEEE LOM, MODS XML) Stores Identification and de-duplication component MySQL Dupli cates Stores Transformation component ( to AKIF) Store metadata in JSON (Internal Format) Link checking component PostProcessing/ Enrichment component File system (XMLs) Get unique ID Records with Broken Links Indexing mechanism API
  22. 22. AGRIS search service
  23. 23. results mashing up more info
  24. 24. similar/relevant efforts • PubAg: forthcoming service by National Agricultural Library (NAL) for discovering USDA publications – and beyond • LGU community of ag knowledge: forthcoming service federating institutional repositories of Land Grant Universities • CGIAR open: (to be) federating & providing access to all CG center repositories • …and more to come
  25. 25. but we are not there yet a) each initiative replicating technical & data processing effort (harvesting, transforming, indexing…) b) coverage is not complete – transferring the discovery problem to the level of aggregators c) still not focusing on the needs of each specific subject, group, region, project, … d) agriculture is multi-disciplinary: relevant publications may be found in other domains (health, economics, environment, … )
  26. 26. agricultural knowledge: globally accessible? a more demanding case study
  27. 27. CSPI • the organized voice of the American public on nutrition, food safety, health and other issues – “improve food safety laws and reduce the incidence of foodborne illness” • has tracked foodborne illness outbreaks since 1997 – events where two or more people become ill from eating the same food – outbreaks where both the food and pathogen can be identified
  28. 28. US Outbreak Alert Database (until 2011)
  29. 29. US Outbreak Report (after 2011)
  30. 30. Safe Food International
  31. 31. data sources of interest • CDC - Foodborne Outbreak Online Database (FOOD) – • ProMED mail – • Kansas FS-net – blogging at – posting news at – archive at • Project TYCHO –
  32. 32. some of the challenges a) time-consuming & laborious primary data identification and documentation (by hand) b) not complete coverage: incomplete & problematic data collection and sharing c) multiple & outdated databases for secondary/processed data storage and curation d) time-consuming & expensive processed data visualization & publication
  33. 33. improving curation of data • focus on making data documentation, storage, management easier a) migrate existing multiple databases in single data repository b) improve data organization & classification schemes (e.g. by pathogen, food, geographical location, time reported, …etc) c) improve data curation & filtering workflows (document & store data once, feed multiple sites/access points; US vs. international sites)
  34. 34. modernize outbreak data repository
  35. 35. advanced data organisation & classification
  36. 36. use single data repository for all CSPI sites
  37. 37. improving discovery & processing • focus on foodborne illness outbreak reports & product recalls a) automate as much as possible workflow of reports’ processing (feeding directly into CSPI data repository) b) extend coverage of data types (include food product recalls) c) extend coverage of data sources (include more sites with outbreak reports & product recalls)
  38. 38. auto extract structured data from text
  39. 39. include & link to food recall data
  40. 40. include waterborne illness data
  41. 41. add more (relevant) data sources
  42. 42. improving visualization & publication • focus on making processed & validated data accessible immediately online a) automate as much as possible workflows for generating filtered reports (feed diagrams & tables for CSPI publications, present directly online through CSPI & SFI web sites) b) offer opportunities for public to interact with data online (play with parameters and generate new data reports & visualizations) c) share data openly for research, education and awareness through CSPI & SFI web sites)
  43. 43. enhance search/discovery of data Landing page Search and filter page View details and access page
  44. 44. use of advanced data visualizations
  45. 45. allow users to customize data reports
  46. 46. provide multi-channel access to data
  47. 47. shaping a more big & hairy goal…
  48. 48. let’s imagine that • we have an very big, open, scalable platform that… – …will catalog all relevant information entities – …will make all information machine readable and discoverable – …will allow information providers express how, with whom, under which license and for which purposes they share this info – …will help people utilize the collective power of information to solve more societal challenges, better – …will make funding & resource use transparent for donors and the public – …will coordinate, consolidate and harmonize data & technology sharing among agri-food sectors and user communities
  49. 49. for example: CIARD RING
  50. 50. catalogues (some) data
  51. 51. catalogues (some) solutions
  52. 52. catalogues (some) organisations
  53. 53. could federate & include: more data
  54. 54. could federate & include: more software
  55. 55. could federate & include: donors
  56. 56. could federate & include: funding
  57. 57. scale up, per federated info type Meta-registry platform federating all existing registries & making information discoverable Registries of data sources Federated data registry Federated information providers Registries of organisations’ catalogs Federated org registry Registries of software apps/components Federated solution registry …etc
  58. 58. evolving technology further HARVESTER OAI-PMH Service Provider #1 Schema #1 OAI-PMH Service Provider #n Schema #n INDEXER Aggregated XML Repository Web Portals Open AGRIS (FAO) AgLR/GLN (ARIADNE) Organic.Edunet (UAH) VOA3R (UAH) ... AGRIS AP Schema IEEE LOM Schema DC Schema ... RDF Triple Store Common Schema SPARQL endpoint (Data Source #1) SPARQL endpoint (Data Source #n) INDEXER Web Portals SPARQL endpoint NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES How Many? Big Data Problem! Is it feasible?
  59. 59. wrapping up
  60. 60. which are the real problems that we are trying to solve? information & technology are just enablers
  61. 61. for more info