This keynote at the Cooperative Intelligent Agents Workshop was a good opportunity to give my view on the current state of Semantic Web research: what is it about, what is it not about, what has been achieved, what remains to be done. (Includes the now infamous slide "What's it like to be a machine")
How to Troubleshoot Apps for the Modern Connected Worker
Semantic Web research anno 2006:main streams, popular falacies, current status, future challenges
1. Semantic Web research
anno 2006:
main streams, popular falacies,
current status, future challenges
Frank van Harmelen
Vrije Universiteit Amsterdam
2. This is NOT
a Semantic Web
evangelization talk
(I assume
you are already
converted)
2
3. This is a “topical” talk:
Webster:
“referring to the topics of the day,
of temporary interest”
4. Semantic Web research anno 2006:
main streams popular falacies,
main streams,
current status, future challenges
Which Semantic Web
are we talking about?
5. General idea of Semantic Web
Make current web more machine accessible
(currently all the intelligence is in the user)
Motivating use-cases
Search engines
• concepts, not keywords
• semantic narrowing/widening of queries
Shopbots
• semantic interchange, not screenscraping
E-commerce
q Negotiation, catalogue mapping, data-integration
Web Services
q Need semantic characterisations to find them
Navigation
• by semantic proximity, not hardwired links 5
6. General idea of Semantic Web(2)
Do this by:
Making data and meta-data
available on the Web
in machine-understandable form
(formalised)
Structure the data and meta-data in
These are non-trivial
ontologies design decisions.
Alternative would be:
6
7. “machine-understandable form”
(What it’s like to be a machine)
alleviates
META-DATA
<treatment>
<name>
<symptoms>
IS-A <disease>
<drug>
<drug
administration> 7
9. Which Semantic Web?
Version 1:
"Semantic Web as Web of Data" (TBL)
recipe:
expose databases on the web,
use RDF, integrate
meta-data from:
q expressing DB schema semantics
in machine interpretable ways
enable integration and unexpected re-use
9
10. Which Semantic Web?
Version 2:
“Enrichment of the current Web”
recipe:
Annotate, classify, index
meta-data from:
q automatically producing markup:
named-entity recognition,
concept extraction, tagging, etc.
enable personalisation, search, browse,..
10
11. Which Semantic Web?
Version 1:
“Semantic Web as Web of Data”
Version 2:
“Enrichment of the current Web”
Different use-cases
Different techniques
Different users
11
12. Semantic Web research anno 2006:
main streams, popular falacies,
falacies
current status, future challenges
Four popular falacies
about the Semantic Web
13. First: clear up some popular
misunderstandings
False statement No :
“Semantic Web people try to
enforce meaning from the top”
They only “enforce” a language.
They don’t enforce what is said in that language
Compare: HTML “enforced” from the top,
But content is entirely free.
13
14. First: clear up some popular
misunderstandings
False statement No :
“The Semantic Web people will require
everybody to subscribe to a single predefined
"meaning" for the terms we use.”
Of course, meaning is fluid, contextual, etc.
Lot’s of work on (semi)-automatically
bridging between different vocabularies.
14
15. First: clear up some popular
misunderstandings
False statement No :
“The Semantic Web will require users to
understand the complicated details of
formalised knowledge representation.”
All of this is “under the hood”.
15
16. First: clear up some popular
misunderstandings
False statement No :
“The Semantic Web people will require us to
manually markup all the existing web-pages.”
Lots of work on automatically producing
semantic markup:
named-entity recognition,
concept extraction, etc.
16
17. Semantic Web research anno 2006:
main streams, popular falacies,
current status future challenges
current status,
The current state of
Semantic Web
18. 4 hard questions on the
Semantic Web:
Q1: "where does the meta-data come from?”
NL technology is delivering on concept-extraction
Socially emerging (learning from tagging).
Q2: “where do the meta-data-schema
come from?”
many handcrafted schema
hierarchy learning remains hard
relation extraction remains hard.
Q3: “what to do with many meta-data schema?”
ontology mapping/aligning remains VERY hard.
Q4: “where’s the ‘Web’ in the Semantic Web?”
more attention to social aspects (P2P, FOAF)
non-textual media remains hard 18
deal with typical Web requirements.
19. Q1: Where do the ontologies
come from?
Professional bodies, scientific communities,
companies, publishers, ….
Good old fashioned Knowledge Engineering
Convert from DB-schema, UML, etc.
Learning remains very hard…
19
20. Q1: Where do the ontologies
come from?
handcrafted
q music: CDnow (2410/5), MusicMoz (1073/7)
community efforts
q biomedical: SNOMED (200k), GO (15k),
commercial: Emtree(45k+190k)
ranging from lightweight (Yahoo)
to heavyweight (Cyc)
ranging from small (METAR)
to large (UNSPC) 20
21. Q2: Where do the annotations
come from?
- Automated learning
- shallow natural language analysis
- Concept extraction
Example: Encyclopedia Britannica on “Amsterdam”
trade
antwerp europe
amsterdam netherlands
merchant center
city town
21
22. Q2: Where do the annotations
come from?
lightweight NLP
q Dutch language semantic search engine
exploit existing legacy-data
q Amazon
q Lab equipment
side-effect from user interaction
q MIT Lab photo-annotator
NOT from manual effort
22
23. Q3: What to do with many
ontologies?
Mesh
q Medical Subject Headings, National Library of Medicine
q 22.000 descriptions
EMTREE
q Commercial Elsevier, Drugs and diseases
q 45.000 terms, 190.000 synonyms
UMLS
q Integrates 100 different vocabularies
SNOMED
q 200.000 concepts, College of American Pathologists
Gene Ontology
q 15.000 terms in molecular biology
NCI Cancer Ontology:
23
q 17,000 classes (about 1M definitions),
24. Q3: What to do with many
ontologies?
Stitching all this together by hand?
24
25. Q3: What to do with many
ontologies?
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
25
26. Where are we now: tools
Languages are stable
Tooling is rapidly emerging
q HP, IBM, Oracle, Adobe, …
q Parsers,
q Editors,
q visualisers,
q large scale storage and querying
q Portal generation, search
26
27. Where are we now: applications
healthy uptake in some areas:
knowledge management / intranets
data-integration
life-sciences
convergence with Semantic Grid
cultural heritage
still very few applications in
personalisation
mobility/context awareness
Most applications for companies,
few applications for the public 27
28. Semantic Web research anno 2006:
main streams, popular falacies,
current status, future challenges
future challenges
Future
directions/challenges
29. Semantic Web as an integrator
of many different subfields
Databases
Natural Language Processing
Knowledge Representation
Machine Learning
Information Retrieval
Agents
HCI
….
29
30. Provocation…
Ontology research is done……
q We know how to
make, maintain & deploy them
q We have tools & methods for
editing, storing, inferencing, visualising, etc
… except for two problems:
q Learning
q Mapping
Natural lang. technology is also done…
q at least it’s good enough
30
31. Large open questions
Ontology learning & mapping
emerging semantics (social & statistical)
Semantic Web services
q discovery, composition: realistic?
non-textual media
q the semantic gap: text or social?
Deployment:
1. data-integration
2. search
3. personalisation
31
32. Changing focus
centralised,
formalised,
complete,
precise
distributed,
heterogeneous,
open, P2P,
approximate,
lightweight
Web 3.0 = Web 2.0 + Semantic Web 32
33. Slide by Carol Goble
Predicting the future…
Artificial Intelligence
Decision making
OWL
Lots SWRL Knowledge
Discovery
Semantics
Ontology
Building Semantic Information
Web linking
Services NLP
Flexible &
RDF FOAF
Not extensible Social RSS
much Metadata bookmarking
schemas
Collective Intelligence
Not Lots
much Web
33