NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions

NISO/DCMI Webinar:
Implementing Linked Data in Developing
Countries and Low-Resource Conditions
September 25, 2013
Speakers:
Johannes Keizer - Information Systems Officer, Food and Agriculture
Organization of the United Nations
Caterina Caracciolo - Senior Information Specialist at the Food and
Agriculture Organization of the United Nations
http://www.niso.org/news/events/2013/dcmi/developing

Implementing Linked Data in
Developing Countries and Low
Resource Conditions
NISO/DCMI Webminar
25 September, 2013
Caterina Caracciolo, Johannes Keizer
{caterina.caracciolo},{johannes.keizer}@fao.org

Goal of this Webinar
• Overview of Linked data stack and
components
• LOD in low resource conditions
– Possible? Why to do it?
• What to think of when doing LOD in low
resources
• Explain some initiatives to enable LOD in low
resources
• Exemplify a real world LOD Szenario

The importance of the issue
Source: United Nations Population Division, World Population
Prospects: The 2010 Revision, medium variant (2011).

World by population
www.worldmapper.org
http://www.worldmapper.org/extraindex/language_notes.html

• ~ 7000 languages
http://w3techs.com/technologi
es/overview/content_language
/all
And there is something more
~ 7000 languages

The world by languages spoken
www.worldmapper.org

Let’s get into the nitty gritty

Implementing Linked Data in
Developing Countries and Low
Resource Conditions
Part 2
NISO/DCMI Webminar
25 September, 2013
Caterina Caracciolo
caterina.caracciolo@fao.org

Today
• A bird’s eye view on Linked Data lifecycle, from
data consumption to data generation
• Discussion on major difficulties, especially in
the data generation phase
• Some considerations on possible solutions,
especially from a strategic and organizational
point of view
• No ambition to have a comprehensive survey
of tools!

What are low resource conditions
really?

CPU, memory and technology
constraints...

Electricity may be unreliable…

Internet connection may be slow...

… and dependent on the weather…

Funding...
is always a problem 

IT competencies…
Few IT people, over-busy, trained on different
technologies, with little or no incentives to
learn/adopt new ones

IT and domain-specific
competencies
• Usually, complete separation between those
working on IT and those working on
collecting/analysing/maintaining data
(domain specialists)
• Domain specialists do not want to spend time
changing formats, validating conversions,
explaining intended meaning of data etc.
– Tendency to consider data as “my” data

Scenario
An institution has data to publish as Linked Data
– Data is produced internally, e.g. list of
publications produced by the institution,
specimens in the local museum, factsheets on
local plants, statistics on production, …
– Data may be online or inside somebody’s
computer
– Typically in some RDB, or spreadsheets in file
system

Remark
• Although not necessary, strictly speaking, here
we consider RDF as the format for Linked Data

A typical Linked Data flow
SPARQL endpoint
HTML/RDF
Content negotiation
RDF store
RDF dump
LOD based
applications
Data consumptionData exposureData storageData lifecycle
Data conversion
Data linking
Data maintenance

Building LOD based applications
is easy…
(relatively)

Relatively easy…
• It is about making mash up applications…
• But interfacing with the data may be an issue
– Developers need to know SPARQL
– And how to use it within his/her framework of
choice

A pointer
• Research to Impact Hackathon, Kenya, Jan
2013
– @iHub Research, Kenya
• local agricultural and nutritional sector
– Comments on that in Tim Davies’ blog
• http://www.timdavies.org.uk/
• Other blogs around … (search for them!)

Data exposure can be done in
various ways

Exposing de-referenceable URIs
• Need to set up content negotiation mechanism
– Serving content for URIs
• In our experience, not a big problem
– Simple back-ends are available, e.g. Pubby
• Still, need server 24/7… properly configured

Provide an RDF dump
• Always a good choice
– Data is downloaded for inclusion in applications
– Efficiency of access to data is under control
– Perhaps not always clear how to produce the
dump, what to include in it…
• Only the data? Also the links?

Expose SPARQL endpoint
• Endpoint typically provided by triple store
• Heavy on server side
• Query processing is left to the SPARQL engine
– Implementation of reasoning
– Implementation of order in clause processing –
filters, unions, select
• Require 24/7 server availability

Expose Web Services
• Known technology
• May be built on top RDF stores
• Good performances
• Control on what data may be accessed
• API formats to simplify use of linked data by
web developers https://code.google.com/p/linked-data-api/

Triple stores are well known
resource-guzzlers
• Intense use of CPU, memory
• Server configuration needs to be appropriate
• Internet connection may be a bottleneck
• Again, some tech know-how needed to
choose the best solution
– Also considering other technologies, e.g. NoSQL

The Semantic Web is resource
guzzler!
Downscale the Semantic Web!
http://worldwidesemanticweb.org/events/downscale2012/
http://worldwidesemanticweb.org/events/downscale2013/

Producing RDF may be a daunting
task

Getting to RDF… from what?
• In many cases, RDF means an abrupt jump
from formats that we consider long
abandoned
• From a recent survey, we learn that some
AGROVOC users (libraries, institutions) use the
paper version
– Last published in 1992

RDF generation
• It is a simple format, simply triples
• But requires some familiarity with the
technology, and especially acquaintance with
the mentality around, especially on standards
and reuse

A much simplified example from
AGROVOC
TermCode 1 TermCode 2 TermSpell1 TermSpell2 LangCode 1 LangCode 2 LinkType
1 2 Irrigated
farm
Farm EN EN BT
1 3 Irrigated
farm
irrigation EN EN RT

Can be turned into some RDF…
Subject Predicate Object
Entity1 TermSpell Irrigated
farm
Entity1 BT Entity2
Entity2 TermSpell Farm
Entity3 TermSpell Irrigation
Entity2 BT Entity3

The problem is the middle column
• These are locally defined
predicates
• One has to guess what they
stand for!
Predicate
TermSpell
BT
TermSpell
TermSpell
BT

Better something like that..
Subject Predicate Object
URI_1 rdfs:label “Irrigated farm”
URI_1 skos:broader URI_2
URI_2 rdfs:label “Farm”
URI_3 rdfs:label “Irrigation”
URI_1 skos:related URI_3

Using standard vocabularies is the
key
• Standard, or de facto standard
• Only a few of them:
– Dublin Core, BIBO, FOAF, SKOS, ..
• Ensure possibility of reuse of data

Standard vocabularies as Step 0 of
Linked Data
• Reusing existing vocabularies is the first step
to have some indications of what data may be
linked and what not
– E.g. dct:subject in a bibliographic record indicates
the “topic” of the record

How to know what vocabulary to
use?
• And how to know if the right vocabulary
exists?
– We very often receive questions about this from
local institutions (who expect to use AGROVOC for
that…)
• This is probably the very first conceptual
blocker!

Need to support data managers
• Initiatives such as Linked Open Vocabularies
(LOV) are useful:
– http://lov.okfn.org/dataset/lov/index.html
• But also need usable and stable tools to
support data managers

Drupal’s way to support small users
• Allows one to import data from other sources,
create RDF, and expose RDF dumps
• At conversion time, one can chose the
vocabulary to use
• Then, it becomes the tool for data
maintenance
• No programming skill required, still some
competency on Drupal! And you need to
understand RDF and your data!

Other attempts along the same
line
• AgriDrupal
– Drupal especially customized for small institutions
– And bibliographic data, data on people,
organizations
• ScratchPad
– Customized for biodiversity data

Is assigning URIs also a problem?
• Often not a technical issue…
• Choice may have to do with the languages of
the data
– AGROVOC uses numbers because it was not
possible to chose one language over the others,
but software developers often complain 
• Or with the internal organizations’ asset
• It may require longer time than one would
expect…

Example of linking from AGROVOC
http://aims.fao.org/aos/agrovoc/c_2808 skos:exactMatch http://www.caas.net.cn/caas/cat/c_33429
“farmland” from AGROVOC exact match …chinese term…

Linking entities
• Still active research area
• Maintenance still an issue
– see example of AGROVOC linked to Chinese
thesaurus…
• Data validation usually outside the rest of the
data lifecycle

Data maintenance
• Choice: keep everything in your db and
continue periodic generation of rdf
• Move maintenance in different tools

In what language is your data?

Certainly, there are many
languages beyond English…

Written in various ways…
汉语/漢語

http://ioannis.parapontis.com/

Some considerations from a
managerial perspective…

Assuming an institution with
constrained resources has already
planned to go Linked Data, what
to do?

Options
• Go ahead on your own
• Organize a collaboration
– A network creation effort

AGRIS is an example of network
Data coordination
Partner
Partner
Partner
Partner
Partner
Partner
Can be much smaller or bigger!
Partner
Partner

1) Semantic Web is energy
intensive
• Because of infrastructure requirements
• The biggest bottleneck is often on the side of
IT competencies, and at the interface between
IT and domain knowledge, especially for data
modeling
• Linked Data-related technologies must
become lighter in order to be adoptable in low
resource conditions

2) In low resource conditions…
• Do a careful assessment of your data and in-
house skills
• It is a good idea to organize your effort in
collaboration
• Start mobilizing IT specialists, data curators

3) Start with Step 0: identify and
use standards to describe your
data
• Mobilize IT specialists, data curators

……a bibliographical record original

…the same record transformed

http://agris.fao.org/openagris/search.do?recordID=PL2009000495

NISO/DCMI Webinar
Implementing Linked Data in Developing Countries and
Low-Resource Conditions
NISO/DCMI Webinar • September 25, 2013
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2013/dcmi/developing

Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU

NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions

More Related Content

What's hot

Similar to NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions

More from National Information Standards Organization (NISO)

Recently uploaded

NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countries and Low-Resource Conditions

Editor's Notes