LarKC: the large knowledge collider

1,555 views

Published on

Brief presentation of vision and mission of the LarKC project, building the Large Knowledge Collider http://www.larkc.eu

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

LarKC: the large knowledge collider

  1. 1. the Large Knowledge Collider Frank van Harmelen Creative Commons License: allowed to share & remix, but must attribute & non-commercial Vrije Universiteit Amsterdam
  2. 2. • The vision • The project • The consortium • The plan Oh Yes! Shit…
  3. 3. The Vision “a configurable platform for infinitely scalable semantic web reasoning”
  4. 4. Why we need The Large Knowledge Collider Gartner (May 2007): "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies” • Semantic Technologies at Web Scale? – 20% of 30 billion pages @ 1000 triples per page = 6 trillion triples – 30 billion and 1000 are underestimates, imagine in 6 years from now… – data-integration and semantic search at web-scale? 27-June-07
  5. 5. 1 triple: Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 5 http://www.aifb.uni-karlsruhe.de/WBS
  6. 6. Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 6 http://www.aifb.uni-karlsruhe.de/WBS
  7. 7. Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 7 http://www.aifb.uni-karlsruhe.de/WBS
  8. 8. Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 8 http://www.aifb.uni-karlsruhe.de/WBS
  9. 9. Suez Canal 107 Triples [OWLIM] Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 9 http://www.aifb.uni-karlsruhe.de/WBS
  10. 10. Moon RDF Store subsecond querying 108 Triples [Ingenta] Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 10 http://www.aifb.uni-karlsruhe.de/WBS
  11. 11. Earth ~109 Triples Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 11 http://www.aifb.uni-karlsruhe.de/WBS
  12. 12. [LarKC proposal] Jupiter ~1010 Triples ≈ 1 triple per web-page ≈ 1 triple per web-page Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 12 http://www.aifb.uni-karlsruhe.de/WBS
  13. 13. ~1011 Triples Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 13 http://www.aifb.uni-karlsruhe.de/WBS
  14. 14. Distance Sun – Pluto ~1014 Triples Fensel / Harmelen estimate 1014 Triples Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 14 http://www.aifb.uni-karlsruhe.de/WBS
  15. 15. Infinitely scalable (1/2) • by giving up 100% correctness: • trading quality for size • often completeness is not needed • sometimes even correctness is not needed precision (soundness) logic A logician’s nightmare (Dieter Fensel) Semantic Web IR recall (completeness)
  16. 16. Infinitely scalable (2/2) • by parallelisation: • cluster computing • wide area distribution “Thinking@home”, “self-computing semantic Web” • cloud computing? (Amazon now, Google soon?)
  17. 17. “Configurable platform” “a configurable platform for infinitely scalable semantic web reasoning”
  18. 18. Why “LarKC” ? • The Large Knowledge Collider A configurable platform for experimentation by others
  19. 19. Why “LarKC” ? But also: and also: 1. a merry, carefree adventure. 2. innocent or good-natured mischief; a prank. 3. something extremely easy to accomplish
  20. 20. • The vision • The consortium • The project • The plan
  21. 21. The consortium 50 people present
  22. 22. The Consortium • Combining consortium competence – IR, Cognition – ML, Ontologies – Statistics, ML, Cognition,DB – Logic,DB, Probabilistic Inference – Economics, Decision Theory
  23. 23. Use Case 2 Use Case 1 Database Technology RDF technology Probabilistic Inference Machine Learning human problemsolving Information Retrieval The Consortium Distributed Computing Logic Semantic Web WHO-IARC CEFRIEL Siemens Ontotext CycEur Saltlux USFD HLRS UIBK MPG WICI VUA
  24. 24. • The vision • The consortium • The project • The plan Oh Shit…
  25. 25. The project • 10M€ budget • 3.5 years • 80 person years • 3 case studies • 14 partners • obtained in FP7 Call1: – overall < 10% funding rate – LarKC has highest funding, longest runtime
  26. 26. Project Workpackages & timeline Exploitation and WP1 – Conceptual Framework & Evaluation standards WP 10: Project Management WP 9: WP 2: Retrieval WP3: Abstraction WP4: Reasoning and Selection and Learning and Deciding WP5: Collider Platform WP 8: Training, dissemination, community building WP 6: Use case: WP 7a: Use case: WP 7b: Use case: Real Time City Early Clinical Carcinogenesis Development Reference Production
  27. 27. Use case:white paper Discovery FDA Drug Innovation or Stagnation (March 2004): “developers have no choice but to use the tools of the last century • Problem: pharmaceutical R&D in early clinical to assess this century's candidate solutions.” development is stagnating “industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs” “Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.” (Q1∩Q2∩Q3) Q1 Q2 Q3 Show me all liver toxicity “Show me all liver toxicity “Show me all liver toxicity associated with the target associated with compounds from the public literature and with similar structure” or the pathway. internal reports that are related to the drug class, disease and patient population” Genetics Chemistry LITERATURE Current NCBI: linking but no inference
  28. 28. Use Case: City on-line • Our cities face many challenges • Urban Computing is the ICT way to address them • How can we redevelop existing neighborhoods Is public transportation where the people are?improve the quality of and business districts to life? Which • How can we create more choices in landmarks attract more people? housing, accommodating diverse lifestyles and all income levels? Where are people concentrating? • How can we reduce traffic congestion yet stay connected? Where is traffic moving? • How can we include citizens in planning their communities rather than limiting input to only those affected by the next project? • How can we fund schools, bridges, roads, and clean water while meeting short-term costs of increased security?
  29. 29. • The vision • The consortium • The project • The plan Oh Shit…
  30. 30. Project Timeline • Surveys (plugins, platform) • Requirements (use cases) Prototype Internal Release Public Release Final Release 0 6 10 18 33 42 Use Cases Use Cases Use Cases V1 V2 V3
  31. 31. Communication • Early Access Group • Usage Competition – “we will win if we start to loose” • We deliver: – software – publications – not “deliverables”
  32. 32. And Finally…. • People are already looking at us: – “Damn... the EU is where all the cool semweb work is happening these days” – “This kind of infrastructure is exactly the kind of rocket fuel that is needed at this stage of semweb maturity.” – “The LarKC-inspired workshop on new formstiareasoning a” of l ten this the semantic web was a conference highlight for me” a re for po i Web, LarKC the possible will quickly – “With the current growth rates of RDF on then which started out as technologically ork has le w ectit alleop become operationally necessary” – “this project really jhas pro y p (potentially) in terms of both science his impact” a and “T the w • “projectsnge ch a already seeking collaboration: OKKAM, MUSING to

×