Making agricultural knowledge globally discoverable: are we there yet?

557 views

Published on

Slides of talk at TGI tutorial series at IFPRI, Washington DC, July 11, 2014.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
557
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Check how AJAX is characterized as technology
  • Making agricultural knowledge globally discoverable: are we there yet?

    1. 1. making agricultural knowledge globally discoverable (and hopefully usable) Nikos Manouselis CEO Agro-Know www.agroknow.gr
    2. 2. background
    3. 3. An extraordinary company that captures, organizes and adds value to the rich information available in agricultural and biodiversity sciences, in order to make it universally accessible, useful and meaningful. http://www.agroknow.gr
    4. 4. Our way of doing things  We put our people at our focus  We have a culture of shared, co-defined values  We are based on trust and transparency  We see beyond profit by serving our users and customers so that they create societal impact
    5. 5. We develop and put in real practice solutions that transform data into meaningful knowledge and services We help people solve problems informed by data
    6. 6. Unorganized Content in local and remote sites Widgets Authoring services Data Discovery Services Analytics services Data Platform Ingestion Translation Publication Harvesting BlossomCultivation Organized and structured Content in local and remote DBs Educational Bibliographic Other Enrichment Aggregate data from diverse sources Works with different type of data Prepare data for meaningful services Educational Bibliographic data aggregation & sharing solutions
    7. 7. working with high profile partners & clients • Food and Agriculture Organization (FAO) of the United Nations • World Bank Group • UK’s Dept for International Development (DFID) • Michigan State University (MSU) • Wageningen University & Research (WUR) • French Institute of Agricultural Research (INRA) • Creative Commons
    8. 8. large scale data-related projects • agINFRA: a data infrastructure to support agricultural scientific communities (2011 -now) – EU, $5.2M, 12 partners (incl. FAO); tech coordinator, evaluation, sustainability – in G8 Open Data in Agriculture Action Plan for Europe • SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures (2012 - now) – EU, $3.1M, 8 partners (incl. FAO, WUR); tech coordinator, evaluation, sustainability – in G8 Open Data in Agriculture Action Plan for Europe • Organic.Lingua: Demonstrating the potential of multilingual Web Portal for Sustainable Agricultural & Environmental Education (2011- 2014) – EU, $2.4M, 11 partners (incl. INRA); tech+data coordinator, evaluation
    9. 9. data interoperability work • Agricultural Interoperability Interest Group (IG) at Research Data Alliance (RDA) • Database Subgroup, Knowledge & Learning Systems Group, Global Food Safety Partnership (GFSP)
    10. 10. context
    11. 11. “Knowledge is the engine of our economy. And data is its fuel” Neelie Kroes, Vice President of the European Commission http://ec.europa.eu/digital- agenda/en/news/economic-and- social-benefits-big-data
    12. 12. “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.” Big Data Research & Development Initiative http://www.whitehouse.gov/sites/default/files/mi crosites/ostp/big_data_press_release_final_2.pdf
    13. 13. policy • USA’s National Research Council on Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age –“researchers to make all research data, methods, and other information underlying results publicly accessible in a timely manner –“the stewardship of research data is a critical long-term task for the research enterprise and its stakeholders” http://www.nap.edu/catalog.php?record_id=12615
    14. 14. internationally • joint USA, EU, Australia, Research Data Alliance (RDA) vision –“researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society” https://rd-alliance.org/about.html
    15. 15. CIARD’s manifesto • “towards a Knowledge Commons on Agricultural Research for Development” • “agricultural knowledge is freely accessible and contributes to reducing hunger and poverty” • “open knowledge makes it easier to provide better solutions” http://www.ciard.net/about/manifesto
    16. 16. GODAN’s statement of purpose • “support global efforts to make agricultural and nutritionally relevant data available, accessible, and usable for unrestricted use worldwide” • “advocate for the release and re-usability of data in support of Innovation and Economic Growth, Improved Service Delivery and Effective Governance, and Improved Environmental and Social Outcomes” http://godan.info/statement.html
    17. 17. IFPRI & open access • “…research is an international public good, that should be freely disseminated to the extent possible…” • “IFPRI is committed to the principle of free access to the knowledge it generates”
    18. 18. CGIAR & open access • “CGIAR regards the results of its research and development activities as international public goods and is committed to their widespread dissemination and use to achieve the maximum impact to advantage the poor…”
    19. 19. agricultural knowledge: globally accessible? a “good enough” case study
    20. 20. agricultural bibliography • bibliography on agricultural sciences • several efforts in putting together (aggregating/indexing) metadata records on agricultural publications & grey literature • FAO’s AGRIS service: a prominent example – quite advanced data ingestion workflow & infrastructure – semantic backbone with AGROVOC as LOD & triple store with all aggregated records – more than 7.5 million publications indexed & made discoverable
    21. 21. elaborated, automated workflow Metadata harvester Filtering component Stores File system (DC, IEEE LOM, MODS XML) File system (DC, IEEE LOM, MODS XML) Stores Identification and de-duplication component MySQL Dupli cates Stores Transformation component ( to AKIF) Store metadata in JSON (Internal Format) Link checking component PostProcessing/ Enrichment component File system (XMLs) Get unique ID Records with Broken Links Indexing mechanism API
    22. 22. AGRIS search service
    23. 23. results mashing up more info
    24. 24. similar/relevant efforts • PubAg: forthcoming service by National Agricultural Library (NAL) for discovering USDA publications – and beyond • LGU community of ag knowledge: forthcoming service federating institutional repositories of Land Grant Universities • CGIAR open: (to be) federating & providing access to all CG center repositories • …and more to come
    25. 25. but we are not there yet a) each initiative replicating technical & data processing effort (harvesting, transforming, indexing…) b) coverage is not complete – transferring the discovery problem to the level of aggregators c) still not focusing on the needs of each specific subject, group, region, project, … d) agriculture is multi-disciplinary: relevant publications may be found in other domains (health, economics, environment, … )
    26. 26. agricultural knowledge: globally accessible? a more demanding case study
    27. 27. CSPI • the organized voice of the American public on nutrition, food safety, health and other issues – “improve food safety laws and reduce the incidence of foodborne illness” • has tracked foodborne illness outbreaks since 1997 – events where two or more people become ill from eating the same food – outbreaks where both the food and pathogen can be identified
    28. 28. US Outbreak Alert Database (until 2011) http://cspinet.org/foodsafety/outbreak/pathogen.php
    29. 29. US Outbreak Report (after 2011) http://cspinet.org/foodsafety/outbreak_report.html
    30. 30. Safe Food International http://regionalnews.safefoodinternational.org
    31. 31. data sources of interest • CDC - Foodborne Outbreak Online Database (FOOD) – http://wwwn.cdc.gov/foodborneoutbreaks/ • ProMED mail – http://www.promedmail.org • Kansas FS-net – blogging at http://barfblog.com – posting news at http://bites.ksu.edu – archive at http://www.safefoodhandler.com/fsnet.htm • Project TYCHO – https://www.tycho.pitt.edu
    32. 32. some of the challenges a) time-consuming & laborious primary data identification and documentation (by hand) b) not complete coverage: incomplete & problematic data collection and sharing c) multiple & outdated databases for secondary/processed data storage and curation d) time-consuming & expensive processed data visualization & publication
    33. 33. improving curation of data • focus on making data documentation, storage, management easier a) migrate existing multiple databases in single data repository b) improve data organization & classification schemes (e.g. by pathogen, food, geographical location, time reported, …etc) c) improve data curation & filtering workflows (document & store data once, feed multiple sites/access points; US vs. international sites)
    34. 34. modernize outbreak data repository
    35. 35. advanced data organisation & classification
    36. 36. use single data repository for all CSPI sites
    37. 37. improving discovery & processing • focus on foodborne illness outbreak reports & product recalls a) automate as much as possible workflow of reports’ processing (feeding directly into CSPI data repository) b) extend coverage of data types (include food product recalls) c) extend coverage of data sources (include more sites with outbreak reports & product recalls)
    38. 38. auto extract structured data from text
    39. 39. include & link to food recall data
    40. 40. include waterborne illness data
    41. 41. add more (relevant) data sources
    42. 42. improving visualization & publication • focus on making processed & validated data accessible immediately online a) automate as much as possible workflows for generating filtered reports (feed diagrams & tables for CSPI publications, present directly online through CSPI & SFI web sites) b) offer opportunities for public to interact with data online (play with parameters and generate new data reports & visualizations) c) share data openly for research, education and awareness through CSPI & SFI web sites)
    43. 43. enhance search/discovery of data Landing page Search and filter page View details and access page
    44. 44. use of advanced data visualizations
    45. 45. allow users to customize data reports
    46. 46. provide multi-channel access to data
    47. 47. shaping a more big & hairy goal…
    48. 48. let’s imagine that • we have an very big, open, scalable platform that… – …will catalog all relevant information entities – …will make all information machine readable and discoverable – …will allow information providers express how, with whom, under which license and for which purposes they share this info – …will help people utilize the collective power of information to solve more societal challenges, better – …will make funding & resource use transparent for donors and the public – …will coordinate, consolidate and harmonize data & technology sharing among agri-food sectors and user communities
    49. 49. for example: CIARD RING
    50. 50. catalogues (some) data
    51. 51. catalogues (some) solutions
    52. 52. catalogues (some) organisations
    53. 53. could federate & include: more data
    54. 54. could federate & include: more software
    55. 55. could federate & include: donors
    56. 56. could federate & include: funding
    57. 57. scale up, per federated info type Meta-registry platform federating all existing registries & making information discoverable Registries of data sources Federated data registry Federated information providers Registries of organisations’ catalogs Federated org registry Registries of software apps/components Federated solution registry …etc
    58. 58. evolving technology further HARVESTER OAI-PMH Service Provider #1 Schema #1 OAI-PMH Service Provider #n Schema #n INDEXER Aggregated XML Repository Web Portals Open AGRIS (FAO) AgLR/GLN (ARIADNE) Organic.Edunet (UAH) VOA3R (UAH) ... AGRIS AP Schema IEEE LOM Schema DC Schema ... RDF Triple Store Common Schema SPARQL endpoint (Data Source #1) SPARQL endpoint (Data Source #n) INDEXER Web Portals SPARQL endpoint NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES How Many? Big Data Problem! Is it feasible? http://semagrow.eu
    59. 59. wrapping up
    60. 60. which are the real problems that we are trying to solve? information & technology are just enablers
    61. 61. for more info nikosm@agroknow.gr www.agroknow.gr

    ×