NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions


Published on

Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.

This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.

Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Definition varyUsually based on socio-economic parametersGDP, …In any case, they are the majority of the world193 members of UNBut more countries and territories in the world…Not only the majority of countries, also the majority of people….In 2009: "Out of every 100 persons added to the population in the coming decade, 97 will live in developing countries.“Hania Zlotnik, UN Population Division
  • AGRIS is a central repository aggregating and centralizing data from more than 200 bibliographic collections worldwide, some of them of a huge relevance in the agricultural domain.AGRIS ingests data from collections varying from National Research Centres, open access repositories of full-text scholarly literature, publishers of scientific electronic journals in agriculture, and so on.Open Access repositories in 2012.. 29, 355 records from the Wageningen UR, Library (Netherlands)28,582 from the Open Knowledge Repository of the World Bank, which recently opened up to OA to ensure that their research projects and publications are widely available13,000 from R4D: Research for Development - Department for International Development in UK11,600 from AgEcon open access repository15,000 resources from EMBRAPA’s Open Repository
  • - AGRIS consumes metadata provided by the community and publishes it as open data The metadata is captured either via a client harvester collecting the data from OAI-PMH client services and open repositories or by delivery (via email or ftp) of database dumps from other information systems and cross reference tools.The data is thus ingested, validated, processed and indexed/stored in two different repositories (the XML and the RDF store). In the next few months, data will be stored only in the RDF repositoryThe data is disseminated via the OpenAGRIS application
  • NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions

    1. 1. NISO/DCMI Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions September 25, 2013 Speakers: Johannes Keizer - Information Systems Officer, Food and Agriculture Organization of the United Nations Caterina Caracciolo - Senior Information Specialist at the Food and Agriculture Organization of the United Nations
    2. 2. Implementing Linked Data in Developing Countries and Low Resource Conditions NISO/DCMI Webminar 25 September, 2013 Caterina Caracciolo, Johannes Keizer {caterina.caracciolo},{johannes.keizer}
    3. 3. Goal of this Webinar • Overview of Linked data stack and components • LOD in low resource conditions – Possible? Why to do it? • What to think of when doing LOD in low resources • Explain some initiatives to enable LOD in low resources • Exemplify a real world LOD Szenario
    4. 4. The importance of the issue Source: United Nations Population Division, World Population Prospects: The 2010 Revision, medium variant (2011).
    5. 5. World by population
    6. 6. • ~ 7000 languages es/overview/content_language /all And there is something more ~ 7000 languages
    7. 7. The world by languages spoken
    8. 8. Let’s get into the nitty gritty
    9. 9. Implementing Linked Data in Developing Countries and Low Resource Conditions Part 2 NISO/DCMI Webminar 25 September, 2013 Caterina Caracciolo
    10. 10. Today • A bird’s eye view on Linked Data lifecycle, from data consumption to data generation • Discussion on major difficulties, especially in the data generation phase • Some considerations on possible solutions, especially from a strategic and organizational point of view • No ambition to have a comprehensive survey of tools!
    11. 11. What are low resource conditions really?
    12. 12. CPU, memory and technology constraints...
    13. 13. Electricity may be unreliable…
    14. 14. …occasionally available…
    15. 15. …expensive…
    16. 16. Internet connection may be slow...
    17. 17. … and dependent on the weather…
    18. 18. Funding... is always a problem 
    19. 19. IT competencies… Few IT people, over-busy, trained on different technologies, with little or no incentives to learn/adopt new ones
    20. 20. IT and domain-specific competencies • Usually, complete separation between those working on IT and those working on collecting/analysing/maintaining data (domain specialists) • Domain specialists do not want to spend time changing formats, validating conversions, explaining intended meaning of data etc. – Tendency to consider data as “my” data
    21. 21. Linked Data
    22. 22. Scenario An institution has data to publish as Linked Data – Data is produced internally, e.g. list of publications produced by the institution, specimens in the local museum, factsheets on local plants, statistics on production, … – Data may be online or inside somebody’s computer – Typically in some RDB, or spreadsheets in file system
    23. 23. Remark • Although not necessary, strictly speaking, here we consider RDF as the format for Linked Data
    24. 24. A typical Linked Data flow SPARQL endpoint HTML/RDF Content negotiation RDF store RDF dump LOD based applications Data consumptionData exposureData storageData lifecycle Data conversion Data linking Data maintenance
    25. 25. Data consumption
    26. 26. Building LOD based applications is easy… (relatively)
    27. 27. Relatively easy… • It is about making mash up applications… • But interfacing with the data may be an issue – Developers need to know SPARQL – And how to use it within his/her framework of choice
    28. 28. A pointer • Research to Impact Hackathon, Kenya, Jan 2013 – @iHub Research, Kenya • local agricultural and nutritional sector – Comments on that in Tim Davies’ blog • • Other blogs around … (search for them!)
    29. 29. Data exposure can be done in various ways
    30. 30. Exposing de-referenceable URIs • Need to set up content negotiation mechanism – Serving content for URIs • In our experience, not a big problem – Simple back-ends are available, e.g. Pubby • Still, need server 24/7… properly configured
    31. 31. Provide an RDF dump • Always a good choice – Data is downloaded for inclusion in applications – Efficiency of access to data is under control – Perhaps not always clear how to produce the dump, what to include in it… • Only the data? Also the links?
    32. 32. Expose SPARQL endpoint • Endpoint typically provided by triple store • Heavy on server side • Query processing is left to the SPARQL engine – Implementation of reasoning – Implementation of order in clause processing – filters, unions, select • Require 24/7 server availability
    33. 33. Expose Web Services • Known technology • May be built on top RDF stores • Good performances • Control on what data may be accessed • API formats to simplify use of linked data by web developers
    34. 34. Data storage is tricky
    35. 35. Triple stores are well known resource-guzzlers • Intense use of CPU, memory • Server configuration needs to be appropriate • Internet connection may be a bottleneck • Again, some tech know-how needed to choose the best solution – Also considering other technologies, e.g. NoSQL
    36. 36. The Semantic Web is resource guzzler! Downscale the Semantic Web!
    37. 37. Data generation
    38. 38. Producing RDF may be a daunting task
    39. 39. Getting to RDF… from what? • In many cases, RDF means an abrupt jump from formats that we consider long abandoned • From a recent survey, we learn that some AGROVOC users (libraries, institutions) use the paper version – Last published in 1992
    40. 40. RDF generation • It is a simple format, simply triples • But requires some familiarity with the technology, and especially acquaintance with the mentality around, especially on standards and reuse
    41. 41. A much simplified example from AGROVOC TermCode 1 TermCode 2 TermSpell1 TermSpell2 LangCode 1 LangCode 2 LinkType 1 2 Irrigated farm Farm EN EN BT 1 3 Irrigated farm irrigation EN EN RT
    42. 42. Can be turned into some RDF… Subject Predicate Object Entity1 TermSpell Irrigated farm Entity1 BT Entity2 Entity2 TermSpell Farm Entity3 TermSpell Irrigation Entity2 BT Entity3
    43. 43. The problem is the middle column • These are locally defined predicates • One has to guess what they stand for! Predicate TermSpell BT TermSpell TermSpell BT
    44. 44. Better something like that.. Subject Predicate Object URI_1 rdfs:label “Irrigated farm” URI_1 skos:broader URI_2 URI_2 rdfs:label “Farm” URI_3 rdfs:label “Irrigation” URI_1 skos:related URI_3
    45. 45. Using standard vocabularies is the key • Standard, or de facto standard • Only a few of them: – Dublin Core, BIBO, FOAF, SKOS, .. • Ensure possibility of reuse of data
    46. 46. Standard vocabularies as Step 0 of Linked Data • Reusing existing vocabularies is the first step to have some indications of what data may be linked and what not – E.g. dct:subject in a bibliographic record indicates the “topic” of the record
    47. 47. How to know what vocabulary to use? • And how to know if the right vocabulary exists? – We very often receive questions about this from local institutions (who expect to use AGROVOC for that…) • This is probably the very first conceptual blocker!
    48. 48. Need to support data managers • Initiatives such as Linked Open Vocabularies (LOV) are useful: – • But also need usable and stable tools to support data managers
    49. 49. Drupal’s way to support small users • Allows one to import data from other sources, create RDF, and expose RDF dumps • At conversion time, one can chose the vocabulary to use • Then, it becomes the tool for data maintenance • No programming skill required, still some competency on Drupal! And you need to understand RDF and your data!
    50. 50. Other attempts along the same line • AgriDrupal – Drupal especially customized for small institutions – And bibliographic data, data on people, organizations • ScratchPad – Customized for biodiversity data
    51. 51. URIs
    52. 52. Is assigning URIs also a problem? • Often not a technical issue… • Choice may have to do with the languages of the data – AGROVOC uses numbers because it was not possible to chose one language over the others, but software developers often complain  • Or with the internal organizations’ asset • It may require longer time than one would expect…
    53. 53. An AGROVOC URIs
    54. 54. Linking data is a bottleneck
    55. 55. Example of linking from AGROVOC skos:exactMatch “farmland” from AGROVOC exact match …chinese term…
    56. 56. Linking entities • Still active research area • Maintenance still an issue – see example of AGROVOC linked to Chinese thesaurus… • Data validation usually outside the rest of the data lifecycle
    57. 57. Data maintenance • Choice: keep everything in your db and continue periodic generation of rdf • Move maintenance in different tools
    58. 58. In what language is your data?
    59. 59. Certainly, there are many languages beyond English…
    60. 60. Written in various ways… 汉语/漢語
    61. 61.
    62. 62. Some considerations from a managerial perspective…
    63. 63. Assuming an institution with constrained resources has already planned to go Linked Data, what to do?
    64. 64. Options • Go ahead on your own • Organize a collaboration – A network creation effort
    65. 65. AGRIS is an example of network Data coordination Partner Partner Partner Partner Partner Partner Can be much smaller or bigger! Partner Partner
    66. 66. Our conclusions
    67. 67. 1) Semantic Web is energy intensive • Because of infrastructure requirements • The biggest bottleneck is often on the side of IT competencies, and at the interface between IT and domain knowledge, especially for data modeling • Linked Data-related technologies must become lighter in order to be adoptable in low resource conditions
    68. 68. 2) In low resource conditions… • Do a careful assessment of your data and in- house skills • It is a good idea to organize your effort in collaboration • Start mobilizing IT specialists, data curators
    69. 69. 3) Start with Step 0: identify and use standards to describe your data • Mobilize IT specialists, data curators
    70. 70. The AGRIS network 7171
    71. 71. ……a bibliographical record original
    72. 72. …the same record transformed
    73. 73. Data Flow 74
    74. 74. OpenAGRIS data flow
    75. 75. How is linked data produced
    76. 76. ……using title and author
    77. 77. ……using title and author
    78. 78. ……using the key words
    79. 79. ……using the key words
    80. 80. …using the journal name
    81. 81.
    82. 82. Linking URIs
    83. 83. Linking vocabularies
    84. 84. Questions?
    85. 85. NISO/DCMI Webinar Implementing Linked Data in Developing Countries and Low-Resource Conditions NISO/DCMI Webinar • September 25, 2013 Questions? All questions will be posted with presenter answers on the NISO website following the webinar:
    86. 86. Thank you for joining us today. Please take a moment to fill out the brief online survey. We look forward to hearing from you! THANK YOU