EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

Ontology Engineering
for and by the masses:
are we already there?
19th International Conference on Knowledge
Engineering and Knowledge Management
EKAW2014
27/11/2014
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho

License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“Ontology Engineering for and by the masses: are we
already there?” by O. Corcho”
• Non-commercial
• Share-Alike

A walk through our Brave Little World of
The world is living at year 21 After Gruber (A.G.)
All inhabitants of this world have this motto:
“Nothing is more beautiful than a formal,
explicit specification of a shared conceptualization”
repeated to them every night while the sleep (since year 5 A.G.)
Everybody loves Gruberliness, the property of creating models that are
SHARED, FORMAL and EXPLICIT
It is written in every single building and ontology repository

The world is divided in 10 regions, led by 10 world controllers who have
dominated it so far (according to Google Scholar)
Oxford (Ian Horrocks, 35K cit.) Milton Keynes (Enrico Motta, 11K cit.)
Buffalo (Barry Smith, 18K cit.) Trento (Nicola Guarino, 17K cit.)
Stanford (Mark Musen, 25K cit.) Karlsruhe (Rudi Studer, unknown cit.)
Madrid (Asunción Gómez-Pérez, 13K cit.) Amsterdam (Frank van Harmelen, 25K cit.)
Toronto (Mark S. Fox, 13K cit.) Osaka (Riichiro Mizoguchi, 7K cit.)
Disclaimer: This calculation is not exact. It only considers individuals, and favours geographical
distribution

Several wars in these 21 years of existence, which led to the current status:
• The “Language War”. It lasted 4 years. W3C Treaty Signed in Year 11 A.G.
Description logic won over frames, first order logic and semantic networks.
• The “Tool War”. It lasted 10 years. Tools like Protégé, OilEd, SWOOP,
WebODE, OntoEdit, or NeOn Toolkit fought among each other to get
installed on the computers of our world citizens. Protégé won. No treaty
signed

Five social classes
SHARED, FORMAL
and EXPLICIT
Y0
Y10 A.G. Y16 A.G. Y20 A.G.

Average age: 50+
Number of individuals: 100+
Education: Formal logic and philosophy.
Sometimes Computer Science
Contribution to the world:
Write formal upper-level ontologies
DOLCE, BFO, GFO, SUMO
Languages spoken:
They are polyglots
First order and many other logics
OWL, OBO
Secret meetings: FOIS
Daily routine:
Wake up
Write a new term in a whiteboard
Think about it carefully
Incorporate it in an upper-level ontology
Some days they don’t include new terms
Alphas

Betas
Average age: 40+
Number of individuals: 1,000+
Education: Computer Science, Biology,
Geography. Some courses on logic
Domain and application ontologies
Heavyweight, with many axioms, properties
and concepts
DOLCE, BFO, GFO, SUMO
Languages spoken:
Mostly OWL
Secret meetings: EKAW, KCAP
Daily routine:
Wake up
Open ontology design methodology book
Open ontology design pattern website
Open Protégé and activate reasoning
Work on 10 new terms

Gammas
Average age: 30+
Education: Mostly Computer Science
Courses on ontologies as undergrads
Write lightweight ontologies
Call them vocabularies
Create Linked (Open) Data
Languages spoken:
Native RDF Schema, and a bit of OWL
Secret meetings: ESWC, ISWC, WWW
Daily routine:
Wake up
Open LOV or prefix.cc
Look for vocabularies for their dataset
Select, extend and upload them in LOV
Update dataset in datahub.io

Deltas
Average age: unknown
Education: Library Science
Some course on Computer Science
Write codelists and thesauri
Languages spoken:
SKOS, RDF Schema
Secret meetings: Dublin Core Conference
Daily routine:
Wake up
Write a couple of new codelists/thesauri
Submit them to metadataregistry.org
Annotate some documents with them

Epsilons
Average age: 20+
Education: Web development
Write schema.org annotations
Some contributions to schema.org classes
and structured types
Languages spoken:
HTML, RDFa, JSON-LD
Secret meetings: Webinars, Hangouts, meetups
Daily routine:
Wake up
Check positioning of their site in Google
Annotate it with more schema.org tags
Wait for Google/Yahoo! to crawl them
Go back to next step

Open vote
• Are you an alpha, a beta, a gamma, a delta or an
epsilon?
• Or do you think that you can belong to several social
classes?
https://es.surveymonkey.com/s/3933H7C
• There should be a recent tweet from me (@ocorcho,
#ekaw2014), with a link to the survey

• A happy world where all
sorts of ontologies,
vocabularies and
annotations are developed,
as efficiently as possible
• Everybody is happy in their
social class
• And where Gruberliness is
everywhere
Shared
Formal
Explicit
Alphas
Betas
Gammas
Deltas
Epsilons

And if somebody is not happy… a few grammes of recognition
• Every social class can take drugs after (even during)
work, to get even happier.
• One gramme of drug every time that…
• Alphas: a new term included in an upper-level ontology, or a
domain ontology is cleaned with their well-founded terms
• Betas: an inconsistency is found in some data thanks to the
logical axioms of their ontologies, or when their ontology is
included in BioPortal.
• Gammas: their ontology is listed in the LOV repository; and
ten grammes when used in some Linked Data dataset
• Deltas: the same for metadataregistry.org
• Epsilons: one gramme every 10.000 new Web pages
annotated according to schema.org

But is our happy ontology engineering world big enough?
How many other people live outside of it?
There are savage reservations, where ugly non-ontologists live…
• They use relational databases
• Some of them are not even in normal form
• And “oh-my-God” CSVs
• And they communicate in natural language, HTML and
using UML class diagrams
However, sometimes they are visited by our inhabitants…

The savage reservation in Madrid
• AENOR PNE 178301
• Norm on Open Data for
Smart Cities
• Organised by
• Spanish Ministry of
Industry
• AENOR
• AENOR CTN 178 group
• Subcommitee 3 on
Government and Mobility
• Workgroup on
Government
• Subgroup on Open Data

Some of the individuals in the savage reservation
• Coordinator
• Esther Minguela (Localidata)
• 35 members who belong to…
• Medium&Large Cities (10) – mostly City Information
Managers
• Private companies working for the public sector (6)
• Regions (3) – mostly Region Information Managers
• Ministries or alike (3)
• Geographic sector (3)
• I visited them for six months (January-June 2014),
trying to show the advantages of living in our Brave
Little World of Ontology Engineering
• Did I succeed? … No vote now (don’t spoil my presentation)

The current status of Open Data in Spain
Source: CTIC and OKFN

Main objectives of the work being done
• Make open data projects from cities more systematic
• Provide a reference guidelines for local administrations to
define, document and develop open data projects
• Evaluate the maturity of open data projects (through
indicators)
• Kickstarting them
• Continuous improvement
• Sustainability
• Quality and efficiency of the project
• Improve interoperability
• Decide on the 10 top-priority datasets to be opened
• Work on common data structures and vocabularies for these
datasets

37 Metrics, grouped in domains (and dimensions)
Strategic Domain
1. Strategy
2. Leadership
3. Service-level agreement
4. Sustainability
Legal Domain
5. External and internal legal norms
6. Usage and licensing conditions
Organisational Domain
7. Responsible unit
8. Skilled team
9. Inventory of data
10. Priority
11. Measurement of the process
12. Measurements of usage and impact
Technical Domain
13. Catalogue
14. Available in the public sector catalogue
15. Documented datasets
16. Categories and search facilities
17. Availability
18. Persistent and friendly references
19. Accessibility
20. Access for free
21. Access systems in place
22. Primary data
23. Completeness
24. Documentation of data
25. Correctness
26. Geo-referencing
27. Linked Data
28. Update processes
29. Update frequency
30. New dataset inclusion
31. Data quantity
32. Data format
33. Vocabularies
Economic and social domain
34. Transparency, participation and
collaboration
35. Complaint/Conflict management
36. Fostering reuse
37. Developed reuse initiatives

• Each metric gets a number
• And each one has a weight,
agreed by group members
• A final indicator is then calculated
Total Value 0-200 201-400 401-600 601-800 801-1000
Open data indicator 1 2 3 4 5
Weight
Strategy
Strategy 25 %
Leadership 50 %
Service-level agreements 10 %
Sustainability 15 %
Level achieved Score
Level 0 (nothing) 0
Level 1 (you have
started doing it)
1
Level 2 (you are
good)
2
Level 3 (excellent) 3
An indicator on the maturity of open data projects

10 Highest-Priority Datasets for 2015
• Listing based on the
current inventories from
all cities (and regions)
• Harmonisation
• Votes according to PSI-
reuse requests
Datasets
Cultural Agenda
Traffic
Population
Streets
Public Transport
Touristic Places and POIs
Budget
Shop Census
Air Quality
Contracts
Parkings

And now the meat…
• All that previous work may
have been done even by
our epsilons…
• Now it’s time to start
working on common data
structures and
vocabularies…
• Did I tell you that these
people were often visited
by some of the people
from our world?
• Before continuing, let’s
see some of the
conversations that we
managed to get acess
to…

Some results of previous visits

Cool, we have a methodology…
Knowledge Resources
Non Ontological Resource
Reuse
Non Ontological Resource
Reengineering
2
2
2
Non Ontological Resources
Thesauri
DictionariesGlossaries Lexicons
Taxonomies
Classification
Schemas
O. Localization
9
Ontological Resource
Reengineering
4
4
4
O. Aligning
O. Merging
Alignments5
5
5
6
6
6
6
3
Ontological Resource
Reuse
3
Ontological Resources
O. Repositories and Registries
RDF(S)
OWL
Ontology Design
Pattern Reuse
7
O. Design Patterns
Ontology Restructuring
(Pruning, Extension,
Specialization, Modularization)
8
O. Specification O. Conceptualization O. ImplementationO. Formalization
1
RDF(S)
OWL
Scheduling

However…
• Our methodologies do not explain so much to domain
experts on what they have to do at each step
• So we just gave easy indications (as most of you do
normally)
• Start with competency questions, with a few answers
• This must come from data reusers’ requests
• And we call them “user stories”
• Extract terms, and classify them in nouns, adjectives, verbs
• Organise them a little bit
• Find common data structures out there (vocabularies,
ontologies) that use those terms or synonyms
• Decide which ones to use
• …

Savages working on their vocabularies…
• We used an agile-like method with a “competency
question backlog” (first some questions, and go down
the whole path, then some others, etc.)
• And used “common” tools
Google Docs Excel Card-sorting
• And now, let’s build the ontology
• Deadlock!!!

Deadlock 1. I have been told to reuse other ontologies
• We recommend reusing other ontological and non-
ontological resources (well, except for epsilons)
• That’s one of the bases of ontological engineering
• However, savages tend to do that at an early stage of
ontology development
• It causes confusion to them
• Should I use FOAF, or the Organization Ontology, or
vCard, or schema.org?
• And prevents people from being creative
• It causes endless discussions about terms (and lots of
problems with translations)
• Rec1: tell them to forget about reuse. Let them start
providing their own (wrong?) definitions, and agree
on those

Deadlock 2. I want my ontology to do inferences…
• A beta told me..
• OWL is funny to teach at University (especially for betas)
• It’s nice to see reasoning, consistency checking, OWA, etc.
• It is useful in many domains
• But developing such ontologies is not a task for our savages
• Rec2: Just work with text patterns, and guide them to write
good term definitions
• A district contains only neighbourhoods and census sections
• A shop can have at most three economic activities associated
to it
Note: Rabbit may be useful here (although I did not have time to
practice with it with this group)

Deadlock 3. I want my ontology to be ligthweight…
• A gamma told me…
• My ontology will be used for Linked Data publishing
(so that I am 5 stars!!)
• I have been said not to put domains or ranges
• I have been said to create only light taxonomies
• I have been said to use only RDF Schema
• Rec3: again, text patterns are a good option here
• Don’t make your experts worry about languages or formal
aspects

Deadlock 4. Which tool should I use?
• We thought that the war had ended?
• Alphas and betas told me to use Protégé
• Some of them said that I could use a Web-based version
• A gamma told me to use Neologism
• And an epsilon called me and said that it was enough if I used
tables with attributes, as in schema.org
• And then I saw an old tool, not available
anymore, that used schema.org-like
table-like descriptions and generated
ontologies in different languages
• WebODE
• Rec4: Use simple tools (e.g., Excel) that allow discussing
easily, without weird constraints

Deadlock 5. But these ontologies to reuse are in English
• These developers and data reusers prefer Spanish
terms
• We all know that identifiers are just symbols
• e.g., labels and comments in different languages should be
enough
• However…
• Should we mix term identifiers in different languages?
• Do we translate all terms to our language?
• Rec5: no idea yet about what to do…

The results so far…
Datasets Vocabulary
General vocabularies Postal Address: http://vocab.linkeddata.es/datosabiertos/def/urbanismo-
infraestructuras/direccionPostal
Administrative: http://vocab.linkeddata.es/datosabiertos/def/sector-publico/territorio
Streets http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/callejero
SKOS: http://vocab.linkeddata.es/datosabiertos/kos/urbanismo-infraestructuras/tipo-via
Tourism http://vocab.linkeddata.es/datosabiertos/def/turismo/lugar
Cultural Agenda http://vocab.linkeddata.es/datosabiertos/def/cultura-ocio/agenda
Shop Census http://vocab.linkeddata.es/datosabiertos/def/comercio/tejidoComercial
SKOS (NACE): http://vocab.linkeddata.es/datosabiertos/kos/comercio/cnae
Population http://www.w3.org/TR/vocab-data-cube/
SKOS:
o Age: http://eurostat.linked-statistics.org/dic/age.rdf
o Gender: http://eurostat.linked-statistics.org/dic/sex.rdf
o Geo: http://eurostat.linked-statistics.org/dic/geo.rdf
Budget http://vocab.linkeddata.es/datosabiertos/def/hacienda/presupuesto
Contracts http://contsem.unizar.es/def/sector-publico/pproc
Air Quality http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
Traffic http://vocab.linkeddata.es/datosabiertos/def/transporte/trafico
Public Transport http://vocab.linkeddata.es/datosabiertos/def/transporte/transportePublico
Parkings http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/aparcamiento

A walk through the Brave Little World of
Why are we still discussing about what ontologies should be used for?
(see recent thread in Google+’s LOV community, started by Bernard Vatant,
on the “intended and real usage of vocabularies in LOV”)
https://plus.google.com/u/0/+BernardVatant/posts/SDYTN3FGkEr
How will our world be at year 25 After Gruber (A.G.)? And at year 50 A.G.?
Will there be soon a revolution led by epsilons to rule the world?
Are we the ones that live in a savage reservation in a larger world?
Or will we conquer the rest of the world?

Which social class do EKAW2014 participants belong to?
https://es.surveymonkey.com/results/SM-F75GGL2V/

Acknowledgements
• First of all, my acknowledgements go to
Aldous Huxley for writing the always-
inspiring “Brave New World” book.
• I would also like to give thanks to some of those who have
helped with comments on this presentation
• Asunción Gómez-Pérez and the whole Ontology Engineering
Group team
• José Manuel Gómez-Pérez
• Those who provided some material (acknowledgements in the
corresponding slides)
• And to all those with whom I have enjoyed building ontologies
for so many years (far too many to enumerate here)
• Specially those from the savage reservation in Madrid

Disclaimers
• The contents of this slideset
represent my own view on this topic
• Not necessarily the views of all
members of the
Ontology Engineering Group (UPM)
• They are based on some of my own
experiences in ontology engineering
• These are not necessarily generalisable
• Specially not valid, probably, in ontology-savvy domains
• And more important…
• I was trying to be provocative here, to generate discussion

EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

Similar to EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there? (20)

More from Oscar Corcho

More from Oscar Corcho (20)

Recently uploaded

Recently uploaded (20)

EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

Editor's Notes