Your SlideShare is downloading. ×
0
Knowledge engineering
and the Web
Guus Schreiber
VU University Amsterdam
Computer Science, Network
Institute
Overview of this talk
• Web data representation
– a meta view
• Knowledge for the Web: categories
– key sources
– Alignmen...
My journey
knowledge engineering
• design patterns for
problem solving
• methodology for
knowledge systems
• models of dom...
My journey
access to digital heritage
My journey
Web standards
Chair of
•Web metadata: RDF 1.1
•OWL Web Ontology Language 1.0
•SKOS model for publishing vocabul...
A few words about
Web standardization
• Key success factor!
• Consensus process actually works
– Some of the time at least...
Example: W3C RDF 1.1 group
• 8K group messages (publicly visible)
• 2K messages about external comments
• 125+ teleconfere...
Web data representation
Caution
• Representation languages are there for
you
• And not the other way around ….
HTML5: a leap forward
Rationale
•Consistent separation of
content and presentation
•Semantics of the
structure of informat...
RDF: triples and graphs
RDF is simply labeling resources and links
RDF: multiple graphs
www.example.org/bob
Data modeling on the Web
RDF
•Class hierarchy
•Property hierarchy
•Domain and range
restrictions
•Data types
• OWL
• Prope...
Writing in an ontology language
does not make it an ontology!
• Ontology is vehicle for sharing
• Papers about your own id...
Rationale
•A vocabulary represents
distilled knowledge of a
community
•Typically product of a
consensus process over
longe...
The strength of SKOS lies its
simplicity
Baker et al: Key choices
in the design of SKOS
Beware of ontological over-
commitment
• We have the understandable tendency
to use semantic modeling constructs
whenever ...
Knowledge on the web:
categories
The concept triad
Source: http://www.jamesodell.com/Ontology_White_Paper_2011-07-15.pdf.
Categorization
• OWL (Description logic) takes an
extensional view of classes
– A set is completely defined by its members...
Categories (Rosch)
• Help us to organize the world
• Tools for perception
• Basic-level categories
– Are the prime categor...
Basic-level categories
22
23
FOAF: Friend of a Friend
24
Dublin Core: metadata of Web resources
Iconclass
categorizing image scene
schema.org
categories for TV programs
schema.org
the notion of “Role”
schema.org issues
• Top-down versus bottom-up
• Ownership and control
• Who can update/extend?
• Does use for general sear...
The myth of a unified
vocabulary
• In large virtual collections there are always
multiple vocabularies
– In multiple langu...
Category alignment vs.
identity disambiguation
• Alignment concerns finding links
between (similar) categories, which
typi...
Alignment techniques
• Syntax: comparison of characters of the terms
– Measures of syntactic distance
– Language processin...
Alignment evaluation
Limitations of categorical
thinking
Be modest! Don’t recreate,
but enrich and align
• Knowledge engineers should refrain
from developing their own idiosyncrat...
Using knowledge: visualization
and search
Visualising piracy events
Extracting piracy events
from piracy reports & Web sources
Enriching description of
search results
Using alignment in search
“Tokugawa”
SVCN period
Edo
SVCN is local in-house
ethnology thesaurus
AAT style/period
Edo (Japa...
Sample graph search algorithm
From search term (literal) to art work
•Find resources with matching label
•Find path from r...
Graph search
Example of path clustering
Issues:
•number of clusters
•path length
Location-based search:
Moulin de la Galette
relatively easy
Relation search:
Picasso, Matisse & Braque
Acknowledgements
• Long list of people
• Projects: COMMIT, Agora, PrestoPrime,
EuropeanaConnect, Poseidon,
BiographyNet, M...
Knowledge engineering and the Web
Upcoming SlideShare
Loading in...5
×

Knowledge engineering and the Web

1,409

Published on

Keynote at Web Science Summer School, 25 July 2014, University of Southampton

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,409
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Knowledge engineering and the Web"

  1. 1. Knowledge engineering and the Web Guus Schreiber VU University Amsterdam Computer Science, Network Institute
  2. 2. Overview of this talk • Web data representation – a meta view • Knowledge for the Web: categories – key sources – Alignment • Using knowledge: visualization and search
  3. 3. My journey knowledge engineering • design patterns for problem solving • methodology for knowledge systems • models of domain knowledge • ontology engineering
  4. 4. My journey access to digital heritage
  5. 5. My journey Web standards Chair of •Web metadata: RDF 1.1 •OWL Web Ontology Language 1.0 •SKOS model for publishing vocabularies on the Web •Deployment & best practices
  6. 6. A few words about Web standardization • Key success factor! • Consensus process actually works – Some of the time at least • Public review – Taking every comment seriously • The danger of over-designing – Principle of minimal commitment
  7. 7. Example: W3C RDF 1.1 group • 8K group messages (publicly visible) • 2K messages about external comments • 125+ teleconferences • 200 issues resolved
  8. 8. Web data representation
  9. 9. Caution • Representation languages are there for you • And not the other way around ….
  10. 10. HTML5: a leap forward Rationale •Consistent separation of content and presentation •Semantics of the structure of information Typical new elements <article> <section> <aside> <header> <footer>
  11. 11. RDF: triples and graphs RDF is simply labeling resources and links
  12. 12. RDF: multiple graphs www.example.org/bob
  13. 13. Data modeling on the Web RDF •Class hierarchy •Property hierarchy •Domain and range restrictions •Data types • OWL • Property characteristics – E.g., inverse, functional, transitive, ….. • Identify management – E.g., same as, equivalent class • …….. I prefer a pick-and-choose approach
  14. 14. Writing in an ontology language does not make it an ontology! • Ontology is vehicle for sharing • Papers about your own idiosyncratic “university ontology” should be rejected at conferences • The quality of an ontology does not depend on the number of, for example, OWL constructs used
  15. 15. Rationale •A vocabulary represents distilled knowledge of a community •Typically product of a consensus process over longer period of time Use •200+ vocabularies published •E.g.: Library of Congress Subject Headings •Mainly in library field SKOS: making existing vocabularies Web accessible
  16. 16. The strength of SKOS lies its simplicity Baker et al: Key choices in the design of SKOS
  17. 17. Beware of ontological over- commitment • We have the understandable tendency to use semantic modeling constructs whenever we can • Better is to limit any Web model to the absolute minimum
  18. 18. Knowledge on the web: categories
  19. 19. The concept triad Source: http://www.jamesodell.com/Ontology_White_Paper_2011-07-15.pdf.
  20. 20. Categorization • OWL (Description logic) takes an extensional view of classes – A set is completely defined by its members • This puts the emphasis on specifying class boundaries • Work of Rosch et al. takes a different view 20
  21. 21. Categories (Rosch) • Help us to organize the world • Tools for perception • Basic-level categories – Are the prime categories used by people – Have the highest number of common and distinctive attributes – What those basic-level categories are may depend on context 21
  22. 22. Basic-level categories 22
  23. 23. 23 FOAF: Friend of a Friend
  24. 24. 24 Dublin Core: metadata of Web resources
  25. 25. Iconclass categorizing image scene
  26. 26. schema.org categories for TV programs
  27. 27. schema.org the notion of “Role”
  28. 28. schema.org issues • Top-down versus bottom-up • Ownership and control • Who can update/extend? • Does use for general search bias the vocabulary?
  29. 29. The myth of a unified vocabulary • In large virtual collections there are always multiple vocabularies – In multiple languages • Every vocabulary has its own perspective – You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links – “Vocabulary alignment”
  30. 30. Category alignment vs. identity disambiguation • Alignment concerns finding links between (similar) categories, which typically have no identity in the real world • Identity disambiguation is finding out whether two or more IDs point to the same object in the real world (e.g., person, building, ship) • The distinction is more subtle that “class versus instance”
  31. 31. Alignment techniques • Syntax: comparison of characters of the terms – Measures of syntactic distance – Language processing • E.g. Tokenization, single/plural, • Relate to lexical resource – Relate terms to place in WordNet hierarchy • Taxonomy comparison – Look for common parents/children in taxonomy • Instance based mapping – Two classes are similar if their instances are similar.
  32. 32. Alignment evaluation
  33. 33. Limitations of categorical thinking
  34. 34. Be modest! Don’t recreate, but enrich and align • Knowledge engineers should refrain from developing their own idiosyncratic ontologies • Instead, they should make the available rich vocabularies, thesauri and databases available in an interoperable (web) format • Techniques: learning, alignment
  35. 35. Using knowledge: visualization and search
  36. 36. Visualising piracy events
  37. 37. Extracting piracy events from piracy reports & Web sources
  38. 38. Enriching description of search results
  39. 39. Using alignment in search “Tokugawa” SVCN period Edo SVCN is local in-house ethnology thesaurus AAT style/period Edo (Japanese period) Tokugawa AAT is Getty’s Art & Architecture Thesaurus
  40. 40. Sample graph search algorithm From search term (literal) to art work •Find resources with matching label •Find path from resource to art work – Cost of each step (step when above cost threshold) – Special treatment of semantics: sameAs, inverseOf, … •Cluster results based on path similarities
  41. 41. Graph search
  42. 42. Example of path clustering Issues: •number of clusters •path length
  43. 43. Location-based search: Moulin de la Galette relatively easy
  44. 44. Relation search: Picasso, Matisse & Braque
  45. 45. Acknowledgements • Long list of people • Projects: COMMIT, Agora, PrestoPrime, EuropeanaConnect, Poseidon, BiographyNet, Multimedian E-Culture
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×