Semantic Web questions we couldn't ask 10 years ago

6,134 views

Published on

Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )

Published in: Education
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,134
On SlideShare
0
From Embeds
0
Number of Embeds
1,398
Actions
Shares
0
Downloads
39
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide
  • @TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc
  • @TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc@TODO@: replace company names with logo’s?
  • @@ Add: trust@@Add: noisy data (inconsistency, misleading, incomplete)
  • Suggests to let a 1000 ontologies blossom, to have lots of connections between lots of datasets.
  • Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
  • @add another long-tail example@ (e.g. in-degree?)
  • Physical distribution doesn’t work the web is not a database (and never will be)@@ADD: even worse for long tail
  • - Compare to physics laws: gravity F = G m_1 m_2 / r^2 conservation of energy (dE/dt = 0), increase of entropy (dS/dt \geq 0),we cannot yet hope for such beautifully mathematised laws,in such a concise language that fits on a very compact space computer science is like alchemy, a "protoscience"
  • Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
  • this only works because terminologies are in general only simple hierarchies. (it’s easy to build examples where this doesn’t hold, but in practice it turns out to hold).So, this law depends on the previous lawas an aside: the graph is now big enough to do statistics on it.
  • use complexity” as a measure, not just “size”. spell out LLD,don’t break FactForge
  • - Semantic Web = engineering enterprise.- This talk = what are the scientific observations/facts/theories after 10 yearsWhat are the big CS (or: KR?) lessons we can learn from a decade of SemWeb?(= regard SemWeb adoption as a giant laboratory for CS laws)Did we learn any science? (and of course the laws won’t be specific to SemWeb? Hopefully not. Hopefully they are generic laws about the structure and behaviour of informaiton!)
  • a gazillion new open questionsdon’t just try to build things, also try to understand thingsdon’t just ask how, also ask why
  • Semantic Web questions we couldn't ask 10 years ago

    1. 1. Frank van Harmelen All the questions we couldn’t ask 10 years ago Creative Commons License: allowed to share & remix, but must attribute & non-commercial
    2. 2. The bad news: you’re going to get 3 talks 1. Where are we now? – The Semantic Web in 4 principles & a movie – Did we get anywhere? 2. Now what? – Questions we couldn’t ask 10 years ago 3. Methodological hobby horse – Science or engineering?
    3. 3. Semantic Web: What is it?
    4. 4. “The Semantic Web” a.k.a. “The Web of Data”
    5. 5. http://www.youtube.com/watch?v=tBSdYi4EY3s
    6. 6. P1. Give all things a name
    7. 7. P2. Relations form a graph between things
    8. 8. P3. The names are addresses on the Web x T [<x> IsOfType <T>] different owners & locations <analgesic>
    9. 9. P1+P2+P3 = Giant Global Graph
    10. 10. P4. explicit & formal semantics • assign types to things • assign types to relations • organise types in a hierarchy • impose constraints on possible interpretations
    11. 11. Examples of “semantics” married-to • is male • married-to relates males to females • married-to relates 1 male to 1 female • = lowerbound upperbound
    12. 12. Semantic Web: Where are we now?
    13. 13. Did we get anywhere? • Google = meaningful search • NXP = data integration • BBC = content re-use • Wallmart= SEO (RDF-a) • data.gov = data-publishing
    14. 14. NXP: data integration about 26.000 products Triple store Triple store Departments Customers Notice the 3-layer architecture
    15. 15. BBC Notice the 3-layer architecture
    16. 16. Did we get anywhere? • Google = meaningful search • NXP = data integration • BBC = content re-use • BestBuy = SEO (RDF-a) • data.gov = data-publishing Oracle DB, IBM DB2 Reuters, New York Times, Guardian Sears, Kmart, OverStock, Volkswagen, Renault GoodRelations ontology, schema.org
    17. 17. Size Matters: 25-45 billion facts
    18. 18. The questions that we couldn’t ask 10 years ago
    19. 19. • Heterogeneity • Self-organisation, long tails • Distribution • Provenance & trust • Dynamics • Errors & Noise • Scale
    20. 20. heterogeneity is unavoidable •Linguistic, •Structural, •Logical, •Statistical, .... Socio- economic first to market market- share
    21. 21. Self-organisation
    22. 22. Self-organisation
    23. 23. Self-organisation
    24. 24. Self-organisation
    25. 25. Self-organisation
    26. 26. Bio-medical ontologies in Bio-portal > 5 links Self-organisation
    27. 27. knowledge follows a long-tail
    28. 28. incidental or universal? impact on mapping? impact on reasoning? impact on storage?
    29. 29. Distribution Caching? Subgraphs? Payload priority? query- planning?
    30. 30. Provenance Representation? From provenance to trust? (Re)construction? knowledge about knowledge?
    31. 31. Dynamics Streams? Incremental reasoning? Non- monotonicity? versioning?
    32. 32. Errors & noise Maximally consistent subsets? Fuzzy Semantics? Uncertainty Semantics? Rough Semantics? Modules? Repair? Argumentation?
    33. 33. Maximally consistent subsets? Modules? Repair? Argumentation? Fuzzy Semantics? Uncertainty Semantics? Rough Semantics? Streams? Incremental reasoning? Non- monotonicity? versioning? Representation? From provenance to trust? (Re)construction? knowledge about knowledge? Caching? Subgraphs? Payload priority? incidental or universal? impact on mapping? impact on reasoning? impact on storage? Socio- economic first to market market- share
    34. 34. Methodological Hobby horse
    35. 35. Laws about the physical universe Laws about the information universe ?
    36. 36. knowledge follows a long-tail Law: F = - r
    37. 37. Law: |T|<< |A| T = terminological knowledge A = assertional knowledge
    38. 38. Dataset Closure of T Closure of T + A Ratio LUBM 8sec 1h15min 562 Linked Life Data 332sec 1h05min 11 FactForge 89sec 2h45min 111 We don’t have any good laws on complexity

    ×