Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

4,118 views

Published on

Presentation for one of the keynotes at EKAW2014, where I talked about the need to lower the barrier for ontology development for those who have no experience with ontologies.

Published in: Technology

EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?

  1. 1. Ontology Engineering for and by the masses: are we already there? 19th International Conference on Knowledge Engineering and Knowledge Management EKAW2014 27/11/2014 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  2. 2. License • This work is licensed under the license CC BY-NC-SA 4.0 International • http://purl.org/NET/rdflicense/cc-by-nc-sa4.0 • You are free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work • Under the following conditions • Attribution — You must attribute the work by inserting • “[source Oscar Corcho]” at the footer of each reused slide • a credits slide stating: “These slides are partially based on “Ontology Engineering for and by the masses: are we already there?” by O. Corcho” • Non-commercial • Share-Alike
  3. 3. A walk through our Brave Little World of Ontology Engineering The world is living at year 21 After Gruber (A.G.) All inhabitants of this world have this motto: “Nothing is more beautiful than a formal, explicit specification of a shared conceptualization” repeated to them every night while the sleep (since year 5 A.G.) Everybody loves Gruberliness, the property of creating models that are SHARED, FORMAL and EXPLICIT It is written in every single building and ontology repository
  4. 4. The world is divided in 10 regions, led by 10 world controllers who have dominated it so far (according to Google Scholar) Oxford (Ian Horrocks, 35K cit.) Milton Keynes (Enrico Motta, 11K cit.) Buffalo (Barry Smith, 18K cit.) Trento (Nicola Guarino, 17K cit.) Stanford (Mark Musen, 25K cit.) Karlsruhe (Rudi Studer, unknown cit.) Madrid (Asunción Gómez-Pérez, 13K cit.) Amsterdam (Frank van Harmelen, 25K cit.) Toronto (Mark S. Fox, 13K cit.) Osaka (Riichiro Mizoguchi, 7K cit.) Disclaimer: This calculation is not exact. It only considers individuals, and favours geographical distribution
  5. 5. Several wars in these 21 years of existence, which led to the current status: • The “Language War”. It lasted 4 years. W3C Treaty Signed in Year 11 A.G. Description logic won over frames, first order logic and semantic networks. • The “Tool War”. It lasted 10 years. Tools like Protégé, OilEd, SWOOP, WebODE, OntoEdit, or NeOn Toolkit fought among each other to get installed on the computers of our world citizens. Protégé won. No treaty signed
  6. 6. Five social classes SHARED, FORMAL and EXPLICIT Y0 Y10 A.G. Y16 A.G. Y20 A.G.
  7. 7. Average age: 50+ Number of individuals: 100+ Education: Formal logic and philosophy. Sometimes Computer Science Contribution to the world: Write formal upper-level ontologies DOLCE, BFO, GFO, SUMO Languages spoken: They are polyglots First order and many other logics OWL, OBO Secret meetings: FOIS Daily routine: Wake up Write a new term in a whiteboard Think about it carefully Incorporate it in an upper-level ontology Some days they don’t include new terms Alphas
  8. 8. Betas Average age: 40+ Number of individuals: 1,000+ Education: Computer Science, Biology, Geography. Some courses on logic Contribution to the world: Domain and application ontologies Heavyweight, with many axioms, properties and concepts DOLCE, BFO, GFO, SUMO Languages spoken: Mostly OWL Secret meetings: EKAW, KCAP Daily routine: Wake up Open ontology design methodology book Open ontology design pattern website Open Protégé and activate reasoning Work on 10 new terms
  9. 9. Gammas Average age: 30+ Number of individuals: 10,000+ Education: Mostly Computer Science Courses on ontologies as undergrads Contribution to the world: Write lightweight ontologies Call them vocabularies Create Linked (Open) Data Languages spoken: Native RDF Schema, and a bit of OWL Secret meetings: ESWC, ISWC, WWW Daily routine: Wake up Open LOV or prefix.cc Look for vocabularies for their dataset Select, extend and upload them in LOV Update dataset in datahub.io
  10. 10. Deltas Average age: unknown Number of individuals: 10,000+ Education: Library Science Some course on Computer Science Contribution to the world: Write codelists and thesauri Languages spoken: SKOS, RDF Schema Secret meetings: Dublin Core Conference Daily routine: Wake up Write a couple of new codelists/thesauri Submit them to metadataregistry.org Annotate some documents with them
  11. 11. Epsilons Average age: 20+ Number of individuals: 100,000+ Education: Web development Contribution to the world: Write schema.org annotations Some contributions to schema.org classes and structured types Languages spoken: HTML, RDFa, JSON-LD Secret meetings: Webinars, Hangouts, meetups Daily routine: Wake up Check positioning of their site in Google Annotate it with more schema.org tags Wait for Google/Yahoo! to crawl them Go back to next step
  12. 12. Open vote • Are you an alpha, a beta, a gamma, a delta or an epsilon? • Or do you think that you can belong to several social classes? https://es.surveymonkey.com/s/3933H7C • There should be a recent tweet from me (@ocorcho, #ekaw2014), with a link to the survey
  13. 13. SHARED, FORMAL and EXPLICIT • A happy world where all sorts of ontologies, vocabularies and annotations are developed, as efficiently as possible • Everybody is happy in their social class • And where Gruberliness is everywhere Shared Formal Explicit Alphas Betas Gammas Deltas Epsilons
  14. 14. And if somebody is not happy… a few grammes of recognition • Every social class can take drugs after (even during) work, to get even happier. • One gramme of drug every time that… • Alphas: a new term included in an upper-level ontology, or a domain ontology is cleaned with their well-founded terms • Betas: an inconsistency is found in some data thanks to the logical axioms of their ontologies, or when their ontology is included in BioPortal. • Gammas: their ontology is listed in the LOV repository; and ten grammes when used in some Linked Data dataset • Deltas: the same for metadataregistry.org • Epsilons: one gramme every 10.000 new Web pages annotated according to schema.org
  15. 15. SHARED, FORMAL and EXPLICIT But is our happy ontology engineering world big enough? How many other people live outside of it? There are savage reservations, where ugly non-ontologists live… • They use relational databases • Some of them are not even in normal form • And “oh-my-God” CSVs • And they communicate in natural language, HTML and using UML class diagrams However, sometimes they are visited by our inhabitants…
  16. 16. The savage reservation in Madrid • AENOR PNE 178301 • Norm on Open Data for Smart Cities • Organised by • Spanish Ministry of Industry • AENOR • AENOR CTN 178 group • Subcommitee 3 on Government and Mobility • Workgroup on Government • Subgroup on Open Data
  17. 17. Some of the individuals in the savage reservation • Coordinator • Esther Minguela (Localidata) • 35 members who belong to… • Medium&Large Cities (10) – mostly City Information Managers • Private companies working for the public sector (6) • Regions (3) – mostly Region Information Managers • Ministries or alike (3) • Geographic sector (3) • I visited them for six months (January-June 2014), trying to show the advantages of living in our Brave Little World of Ontology Engineering • Did I succeed? … No vote now (don’t spoil my presentation)
  18. 18. The current status of Open Data in Spain Source: CTIC and OKFN
  19. 19. Main objectives of the work being done • Make open data projects from cities more systematic • Provide a reference guidelines for local administrations to define, document and develop open data projects • Evaluate the maturity of open data projects (through indicators) • Kickstarting them • Continuous improvement • Sustainability • Quality and efficiency of the project • Improve interoperability • Decide on the 10 top-priority datasets to be opened • Work on common data structures and vocabularies for these datasets
  20. 20. 37 Metrics, grouped in domains (and dimensions) Strategic Domain 1. Strategy 2. Leadership 3. Service-level agreement 4. Sustainability Legal Domain 5. External and internal legal norms 6. Usage and licensing conditions Organisational Domain 7. Responsible unit 8. Skilled team 9. Inventory of data 10. Priority 11. Measurement of the process 12. Measurements of usage and impact Technical Domain 13. Catalogue 14. Available in the public sector catalogue 15. Documented datasets 16. Categories and search facilities 17. Availability 18. Persistent and friendly references 19. Accessibility 20. Access for free 21. Access systems in place 22. Primary data 23. Completeness 24. Documentation of data 25. Correctness 26. Geo-referencing 27. Linked Data 28. Update processes 29. Update frequency 30. New dataset inclusion 31. Data quantity 32. Data format 33. Vocabularies Economic and social domain 34. Transparency, participation and collaboration 35. Complaint/Conflict management 36. Fostering reuse 37. Developed reuse initiatives
  21. 21. • Each metric gets a number • And each one has a weight, agreed by group members • A final indicator is then calculated Total Value 0-200 201-400 401-600 601-800 801-1000 Open data indicator 1 2 3 4 5 Weight Strategy Strategy 25 % Leadership 50 % Service-level agreements 10 % Sustainability 15 % Level achieved Score Level 0 (nothing) 0 Level 1 (you have started doing it) 1 Level 2 (you are good) 2 Level 3 (excellent) 3 An indicator on the maturity of open data projects
  22. 22. 10 Highest-Priority Datasets for 2015 • Listing based on the current inventories from all cities (and regions) • Harmonisation • Votes according to PSI- reuse requests Datasets Cultural Agenda Traffic Population Streets Public Transport Touristic Places and POIs Budget Shop Census Air Quality Contracts Parkings
  23. 23. And now the meat… • All that previous work may have been done even by our epsilons… • Now it’s time to start working on common data structures and vocabularies… • Did I tell you that these people were often visited by some of the people from our world? • Before continuing, let’s see some of the conversations that we managed to get acess to…
  24. 24. Some results of previous visits
  25. 25. And some other visits…
  26. 26. Cool, we have a methodology… Knowledge Resources Non Ontological Resource Reuse Non Ontological Resource Reengineering 2 2 2 Non Ontological Resources Thesauri DictionariesGlossaries Lexicons Taxonomies Classification Schemas O. Localization 9 Ontological Resource Reengineering 4 4 4 O. Aligning O. Merging Alignments5 5 5 6 6 6 6 3 Ontological Resource Reuse 3 Ontological Resources O. Repositories and Registries RDF(S) OWL Ontology Design Pattern Reuse 7 O. Design Patterns Ontology Restructuring (Pruning, Extension, Specialization, Modularization) 8 O. Specification O. Conceptualization O. ImplementationO. Formalization 1 RDF(S) OWL Scheduling
  27. 27. However… • Our methodologies do not explain so much to domain experts on what they have to do at each step • So we just gave easy indications (as most of you do normally) • Start with competency questions, with a few answers • This must come from data reusers’ requests • And we call them “user stories” • Extract terms, and classify them in nouns, adjectives, verbs • Organise them a little bit • Find common data structures out there (vocabularies, ontologies) that use those terms or synonyms • Decide which ones to use • …
  28. 28. Savages working on their vocabularies… • We used an agile-like method with a “competency question backlog” (first some questions, and go down the whole path, then some others, etc.) • And used “common” tools Google Docs Excel Card-sorting • And now, let’s build the ontology • Deadlock!!!
  29. 29. Deadlock 1. I have been told to reuse other ontologies • We recommend reusing other ontological and non- ontological resources (well, except for epsilons) • That’s one of the bases of ontological engineering • However, savages tend to do that at an early stage of ontology development • It causes confusion to them • Should I use FOAF, or the Organization Ontology, or vCard, or schema.org? • And prevents people from being creative • It causes endless discussions about terms (and lots of problems with translations) • Rec1: tell them to forget about reuse. Let them start providing their own (wrong?) definitions, and agree on those
  30. 30. Deadlock 2. I want my ontology to do inferences… • A beta told me.. • OWL is funny to teach at University (especially for betas) • It’s nice to see reasoning, consistency checking, OWA, etc. • It is useful in many domains • But developing such ontologies is not a task for our savages • Rec2: Just work with text patterns, and guide them to write good term definitions • A district contains only neighbourhoods and census sections • A shop can have at most three economic activities associated to it Note: Rabbit may be useful here (although I did not have time to practice with it with this group)
  31. 31. Deadlock 3. I want my ontology to be ligthweight… • A gamma told me… • My ontology will be used for Linked Data publishing (so that I am 5 stars!!) • I have been said not to put domains or ranges • I have been said to create only light taxonomies • I have been said to use only RDF Schema • Rec3: again, text patterns are a good option here • Don’t make your experts worry about languages or formal aspects
  32. 32. Deadlock 4. Which tool should I use? • We thought that the war had ended? • Alphas and betas told me to use Protégé • Some of them said that I could use a Web-based version • A gamma told me to use Neologism • And an epsilon called me and said that it was enough if I used tables with attributes, as in schema.org • And then I saw an old tool, not available anymore, that used schema.org-like table-like descriptions and generated ontologies in different languages • WebODE • Rec4: Use simple tools (e.g., Excel) that allow discussing easily, without weird constraints
  33. 33. Deadlock 5. But these ontologies to reuse are in English • These developers and data reusers prefer Spanish terms • We all know that identifiers are just symbols • e.g., labels and comments in different languages should be enough • However… • Should we mix term identifiers in different languages? • Do we translate all terms to our language? • Rec5: no idea yet about what to do…
  34. 34. The results so far… Datasets Vocabulary General vocabularies Postal Address: http://vocab.linkeddata.es/datosabiertos/def/urbanismo- infraestructuras/direccionPostal Administrative: http://vocab.linkeddata.es/datosabiertos/def/sector-publico/territorio Streets http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/callejero SKOS: http://vocab.linkeddata.es/datosabiertos/kos/urbanismo-infraestructuras/tipo-via Tourism http://vocab.linkeddata.es/datosabiertos/def/turismo/lugar Cultural Agenda http://vocab.linkeddata.es/datosabiertos/def/cultura-ocio/agenda Shop Census http://vocab.linkeddata.es/datosabiertos/def/comercio/tejidoComercial SKOS (NACE): http://vocab.linkeddata.es/datosabiertos/kos/comercio/cnae Population http://www.w3.org/TR/vocab-data-cube/ SKOS: o Age: http://eurostat.linked-statistics.org/dic/age.rdf o Gender: http://eurostat.linked-statistics.org/dic/sex.rdf o Geo: http://eurostat.linked-statistics.org/dic/geo.rdf Budget http://vocab.linkeddata.es/datosabiertos/def/hacienda/presupuesto Contracts http://contsem.unizar.es/def/sector-publico/pproc Air Quality http://www.w3.org/2005/Incubator/ssn/ssnx/ssn Traffic http://vocab.linkeddata.es/datosabiertos/def/transporte/trafico Public Transport http://vocab.linkeddata.es/datosabiertos/def/transporte/transportePublico Parkings http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/aparcamiento
  35. 35. A walk through the Brave Little World of Ontology Engineering Why are we still discussing about what ontologies should be used for? (see recent thread in Google+’s LOV community, started by Bernard Vatant, on the “intended and real usage of vocabularies in LOV”) https://plus.google.com/u/0/+BernardVatant/posts/SDYTN3FGkEr How will our world be at year 25 After Gruber (A.G.)? And at year 50 A.G.? Will there be soon a revolution led by epsilons to rule the world? Are we the ones that live in a savage reservation in a larger world? Or will we conquer the rest of the world?
  36. 36. Which social class do EKAW2014 participants belong to? https://es.surveymonkey.com/results/SM-F75GGL2V/
  37. 37. Acknowledgements • First of all, my acknowledgements go to Aldous Huxley for writing the always- inspiring “Brave New World” book. • I would also like to give thanks to some of those who have helped with comments on this presentation • Asunción Gómez-Pérez and the whole Ontology Engineering Group team • José Manuel Gómez-Pérez • Those who provided some material (acknowledgements in the corresponding slides) • And to all those with whom I have enjoyed building ontologies for so many years (far too many to enumerate here) • Specially those from the savage reservation in Madrid
  38. 38. Disclaimers • The contents of this slideset represent my own view on this topic • Not necessarily the views of all members of the Ontology Engineering Group (UPM) • They are based on some of my own experiences in ontology engineering • These are not necessarily generalisable • Specially not valid, probably, in ontology-savvy domains • And more important… • I was trying to be provocative here, to generate discussion
  39. 39. Ontology Engineering for and by the masses: are we already there? 19th International Conference on Knowledge Engineering and Knowledge Management EKAW2014 27/11/2014 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho

×