Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
LarKC: the large knowledge collider
1. the
Large Knowledge Collider
Frank van Harmelen
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Vrije Universiteit Amsterdam
2. • The vision
• The project
• The consortium
• The plan
Oh
Yes!
Shit…
3. The Vision
“a configurable platform for
infinitely scalable semantic web reasoning”
4. Why we need
The Large Knowledge Collider
Gartner (May 2007):
"By 2012,
70% of public Web pages will have some level of semantic markup,
20% will use more extensive Semantic Web-based ontologies”
• Semantic Technologies at Web Scale?
– 20% of 30 billion pages @ 1000 triples per page =
6 trillion triples
– 30 billion and 1000 are underestimates,
imagine in 6 years from now…
– data-integration and semantic search at web-scale?
27-June-07
15. Infinitely scalable (1/2)
• by giving up 100% correctness:
• trading quality for size
• often completeness is not needed
• sometimes even correctness is not needed
precision (soundness) logic
A logician’s nightmare
(Dieter Fensel) Semantic Web
IR
recall (completeness)
16. Infinitely scalable (2/2)
• by parallelisation:
• cluster computing
• wide area distribution
“Thinking@home”,
“self-computing semantic Web”
• cloud computing?
(Amazon now, Google soon?)
18. Why “LarKC” ?
• The Large Knowledge Collider
A configurable platform
for experimentation
by others
19. Why “LarKC” ?
But also:
and also:
1. a merry, carefree adventure.
2. innocent or good-natured mischief; a prank.
3. something extremely easy to accomplish
20. • The vision
• The consortium
• The project
• The plan
22. The Consortium
• Combining consortium competence
– IR, Cognition
– ML, Ontologies
– Statistics, ML,
Cognition,DB
– Logic,DB,
Probabilistic Inference
– Economics,
Decision Theory
23. Use Case 2
Use Case 1
Database
Technology
RDF
technology
Probabilistic
Inference
Machine
Learning
human
problemsolving
Information
Retrieval
The Consortium
Distributed
Computing
Logic
Semantic Web
WHO-IARC
CEFRIEL
Siemens
Ontotext
CycEur
Saltlux
USFD
HLRS
UIBK
MPG
WICI
VUA
24. • The vision
• The consortium
• The project
• The plan
Oh
Shit…
25. The project
• 10M€ budget
• 3.5 years
• 80 person years
• 3 case studies
• 14 partners
• obtained in FP7 Call1:
– overall < 10% funding rate
– LarKC has highest funding, longest runtime
26. Project Workpackages
& timeline
Exploitation and WP1 – Conceptual Framework & Evaluation
standards
WP 10: Project Management
WP 9:
WP 2: Retrieval WP3: Abstraction WP4: Reasoning
and Selection and Learning and Deciding
WP5: Collider Platform
WP 8: Training,
dissemination,
community
building
WP 6: Use case: WP 7a: Use case: WP 7b: Use case:
Real Time City Early Clinical Carcinogenesis
Development Reference
Production
27. Use case:white paper Discovery
FDA
Drug Innovation or Stagnation (March 2004):
“developers have no choice but to use the tools of the last century
• Problem: pharmaceutical R&D in early clinical
to assess this century's candidate solutions.”
development is stagnating
“industry scientists often lack cross-cutting information about an
entire product area, or information about techniques that may be
used in areas other than theirs”
“Show me any potential liver toxicity associated with the
compound’s drug class, target, structure and disease.”
(Q1∩Q2∩Q3)
Q1 Q2 Q3
Show me all liver toxicity “Show me all liver toxicity “Show me all liver toxicity
associated with the target associated with compounds from the public literature and
with similar structure”
or the pathway. internal reports that are related
to the drug class, disease and
patient population”
Genetics Chemistry LITERATURE
Current NCBI: linking but no inference
28. Use Case: City on-line
• Our cities face many challenges
• Urban Computing
is the ICT way to
address them • How can we redevelop existing neighborhoods
Is public transportation where the people are?improve the quality of
and business districts to
life?
Which • How can we create more choices in
landmarks attract more people? housing,
accommodating diverse lifestyles and all
income levels?
Where are people concentrating?
• How can we reduce traffic congestion yet stay
connected?
Where is traffic moving?
• How can we include citizens in planning their
communities rather than limiting input to only
those affected by the next project?
• How can we fund schools, bridges, roads, and
clean water while meeting short-term costs of
increased security?
29. • The vision
• The consortium
• The project
• The plan
Oh
Shit…
30. Project Timeline
• Surveys (plugins, platform)
• Requirements (use cases)
Prototype Internal Release Public Release Final Release
0 6 10 18 33 42
Use Cases Use Cases Use Cases
V1 V2 V3
31. Communication
• Early Access Group
• Usage Competition
– “we will win if we start to loose”
• We deliver:
– software
– publications
– not “deliverables”
32. And Finally….
• People are already looking at us:
– “Damn... the EU is where all the cool semweb work is
happening these days”
– “This kind of infrastructure is exactly the kind of rocket fuel
that is needed at this stage of semweb maturity.”
– “The LarKC-inspired workshop on new formstiareasoning a”
of l
ten this
the semantic web was a conference highlight for me” a re for
po i Web, LarKC
the possible will quickly
– “With the current growth rates of RDF on then
which started out as technologically ork
has le w
ectit alleop
become operationally necessary”
– “this project really jhas
pro y p (potentially) in terms of both
science his impact” a
and
“T the w
• “projectsnge
ch a already seeking collaboration:
OKKAM, MUSING
to