Frank van Harmelen
All the questions
we couldn’t ask
10 years ago
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
The bad news:
you’re going to get 3 talks
1. Where are we now?
– The Semantic Web in 4 principles & a movie
– Did we get anywhere?
2. Now what?
– Questions we couldn’t ask 10 years ago
3. Methodological hobby horse
– Science or engineering?
Semantic Web:
What is it?
a web page
in English
about
Frank
And this
page is
about
LarKC
and another
web page
about
Frank
And this
page is
about
Stefano
This page
is about
the Vrije
Uniersitei
“The Semantic Web” a.k.a. “The Web of Data”
http://www.youtube.com/watch?v=tBSdYi4EY3s
P1. Give all things a name
P2. Relations form a graph
between things
P3. The names are addresses on the Web
x T
[<x> IsOfType <T>]
different
owners & locations
<analgesic>
P1+P2+P3 = Giant Global Graph
P4. explicit & formal semantics
• assign types to things
• assign types to relations
• organise types in a hierarchy
• impose constraints on
possible interpretations
Examples of “semantics”
Frank Lynda
married-to
• Frank is male
• married-to relates
males to females
• married-to relates
1 male to 1 female
• Lynda = Hazel
lowerbound upperbound
Hazel
Semantic Web:
Where are we now?
Did we get anywhere?
• Google = meaningful search
• NXP = data integration
• BBC = content re-use
• Wallmart= SEO (RDF-a)
• data.gov = data-publishing
NXP: data integration
about 26.000 products
Triple store
Triple store
Departments
Customers
Notice the 3-layer architecture
BBC
Notice the 3-layer architecture
Did we get anywhere?
• Google = meaningful search
• NXP = data integration
• BBC = content re-use
• BestBuy = SEO (RDF-a)
• data.gov = data-publishing
Oracle DB, IBM DB2
Reuters,
New York Times, Guardian
Sears, Kmart, OverStock,
Volkswagen, Renault
GoodRelations ontology,
schema.org
Size Matters: 25-45 billion facts
The questions
that we couldn’t ask
10 years ago
• Heterogeneity
• Self-organisation, long tails
• Distribution
• Provenance & trust
• Dynamics
• Errors & Noise
• Scale
heterogeneity
is unavoidable
•Linguistic,
•Structural,
•Logical,
•Statistical,
....
Socio-
economic
first to
market
market-
share
Self-organisation
Self-organisation
Self-organisation
Self-organisation
Self-organisation
Bio-medical
ontologies in
Bio-portal > 5 links
Self-organisation
knowledge follows
a long-tail
incidental
or universal?
impact on
mapping?
impact on
reasoning?
impact on
storage?
Distribution
Caching?
Subgraphs?
Payload
priority?
query-
planning?
Provenance
Representation?
From provenance
to trust?
(Re)construction?
knowledge about
knowledge?
Dynamics
Streams? Incremental
reasoning?
Non-
monotonicity?
versioning?
Errors & noise
Maximally
consistent
subsets?
Fuzzy
Semantics?
Uncertainty
Semantics?
Rough
Semantics?
Modules?
Repair?
Argumentation?
Maximally
consistent
subsets?
Modules?
Repair?
Argumentation?
Fuzzy
Semantics?
Uncertainty
Semantics?
Rough
Semantics?
Streams?
Incremental
reasoning?
Non-
monotonicity?
versioning?
Representation?
From provenance
to trust?
(Re)construction?
knowledge about
knowledge?
Caching?
Subgraphs?
Payload
priority?
incidental
or universal?
impact on
mapping?
impact on
reasoning?
impact on
storage?
Socio-
economic
first to
market
market-
share
Methodological
Hobby horse
Laws about the physical universe
Laws about the information universe ?
knowledge follows
a long-tail
Law: F = a-br
Law: |T|<< |A|
T = terminological knowledge
A = assertional knowledge
Dataset Closure of
T
Closure of
T + A
Ratio
LUBM 8sec 1h15min 562
Linked Life Data 332sec 1h05min 11
FactForge 89sec 2h45min 111
We don’t have any good laws on complexity
Semantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years ago

Semantic Web questions we couldn't ask 10 years ago

Editor's Notes

  • #14 @TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc
  • #22 @TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc@TODO@: replace company names with logo’s?
  • #25 @@ Add: trust@@Add: noisy data (inconsistency, misleading, incomplete)
  • #27 Suggests to let a 1000 ontologies blossom, to have lots of connections between lots of datasets.
  • #33 Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
  • #34 @add another long-tail example@ (e.g. in-degree?)
  • #35 Physical distribution doesn’t work the web is not a database (and never will be)@@ADD: even worse for long tail
  • #41 - Compare to physics laws: gravity F = G m_1 m_2 / r^2 conservation of energy (dE/dt = 0), increase of entropy (dS/dt \geq 0),we cannot yet hope for such beautifully mathematised laws,in such a concise language that fits on a very compact space computer science is like alchemy, a &quot;protoscience&quot;
  • #42 Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
  • #43 this only works because terminologies are in general only simple hierarchies. (it’s easy to build examples where this doesn’t hold, but in practice it turns out to hold).So, this law depends on the previous lawas an aside: the graph is now big enough to do statistics on it.
  • #45 use complexity” as a measure, not just “size”. spell out LLD,don’t break FactForge
  • #46 - Semantic Web = engineering enterprise.- This talk = what are the scientific observations/facts/theories after 10 yearsWhat are the big CS (or: KR?) lessons we can learn from a decade of SemWeb?(= regard SemWeb adoption as a giant laboratory for CS laws)Did we learn any science? (and of course the laws won’t be specific to SemWeb? Hopefully not. Hopefully they are generic laws about the structure and behaviour of informaiton!)
  • #47 a gazillion new open questionsdon’t just try to build things, also try to understand thingsdon’t just ask how, also ask why