LarKC Tutorial at ISWC 2009 - Introduction

815 views

Published on

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and introduces the platform and the project in general.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
815
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

LarKC Tutorial at ISWC 2009 - Introduction

  1. 1. Agenda for today October 2010 @ ISWC Time Presentation Title Presenter 08.30-09.00 Setup 09.00 – 09.30 Introduction to LarKC Frank van Harmelen 09.30 – 10:30 LarKC Architecture Michael Witbrock 10:30 – 11:00 Coffee break 11:00 – 11:30 Hands-on: work with an existing LarKC workflow Florian Fischer 11:30 – 12:00 LarKC Data Layer Florian Fischer 12:00 – 13.00 Builder a LarKC DECIDEr & create a workflow from existing plugins Luka Bradesko, Blaz Fortuna 13.00 – 14:00 Lunch 14:00 – 14:30 Distributed Processing in LarKC Michael Witbrock 14:30 – 15:30 Hands-on: Building a LarKC Plugin and integrating it in a worflow Florian Fischer 15.30 – 16:00 Coffee break Eyal Oren 16:00 – 17:00 Hands-on: the Urban computing workflow Emanuele della Valle 17.00 – 18:00 Wrap up - Discussion and feedback Frank van Harmelen
  2. 2. Welcome to the 2nd LarKC Early Adopters Workshop Frank van Harmelen Vrije Universiteit Amsterdam
  3. 3. Health Warning <ul><li>Today is a WORK shop </li></ul><ul><li>we first tell you some stuff, </li></ul><ul><li>then you do stuff </li></ul><ul><li>(repeat) </li></ul><ul><li>Goal of today: </li></ul><ul><li>ours: show LarKC to outsiders < who are we >, </li></ul><ul><li>yours: <tell us now> </li></ul>
  4. 4. Goals of today <ul><li>At the end of today you will </li></ul><ul><li>understand the goals of LarKC </li></ul><ul><li>understand the architecture of LarKC </li></ul><ul><li>have hands on experience with platform and plugins </li></ul><ul><li>At the end of the day, you will be able to: </li></ul><ul><ul><li>roll your own LarKC plugin </li></ul></ul><ul><ul><li>roll your own LarKC application </li></ul></ul>
  5. 5. Goals of LarKC LarKC = a platform for large scale reasoning “ LarKC's value is as an experimental platform . LarKC is as an environment where people can go to replicate (or extend) their results in an environment where all the infrastructural heavy lifting has already been taken care of ” Quote from EU Project Officer:
  6. 6. Goals of LarKC LarKC = a platform for large scale reasoning Semantic web research is stifled by the complexity of writing a large scale engine, with services for data access, storage, aggregation, inference, transport, transformation, etc, Physics research has dealt with a similar problem by providing large scale infrastructure into which experiments can be plugged. The idea behind LarKC, which I found so compelling, is that people who wanted to build small scale plugins, for example, plugins for some non-standard deduction, or transformation of text to triples, or estimating the weights for relational models, could do so, taking advantage of the EU's investment in a platform with significant capabilities .“ Quote from US high-tech CTO:
  7. 7. Goals of LarKC LarKC = a platform for large scale reasoning “ Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs of something that was only possible before at high cost” Quote from EU Reviewer:
  8. 8. What do we mean by: <ul><li>reusable components </li></ul><ul><li>reconfigurable workflows </li></ul><ul><li>provide infrastructure needed by all users: </li></ul><ul><ul><li>storage & retrieval </li></ul></ul><ul><ul><li>registration of plugins </li></ul></ul><ul><ul><li>communication (plugin2datalayer, plugin2plugins) </li></ul></ul><ul><ul><li>synchronisation (anytime behaviour) </li></ul></ul><ul><ul><li>remote execution (abstracts from local/remote storage) </li></ul></ul><ul><ul><li>remote data-access (abstracts from local/remote invation) </li></ul></ul><ul><ul><li>(will) provide instrumentation & measuring </li></ul></ul><ul><ul><li>(will) provide caching & data-locality </li></ul></ul><ul><li>integration of very heterogeneous components </li></ul><ul><ul><li>heterogeneous data: unstructured text, (semi)structured data </li></ul></ul><ul><ul><li>heterogeneous code: Java, scripts, remote services (&quot;wrap & integrate&quot;) </li></ul></ul>LarKC = a platform for large scale reasoning
  9. 9. What do we mean by: LarKC = a platform for large scale reasoning <ul><li>raw large numbers </li></ul><ul><ul><li>from performant data-layer </li></ul></ul><ul><ul><li>from parallel deployment of plugins </li></ul></ul><ul><ul><li>from load-balancing strategies </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>interaction of multiple components </li></ul><ul><ul><li>e.g. avoid reasoning through selection: SELECT + REASON </li></ul></ul><ul><li>allowing for incompletenes and anytime behaviour </li></ul>but also from not only from
  10. 10. What do we mean by: <ul><li>not only: deductive inference over given axioms </li></ul><ul><li>but also: </li></ul>LarKC = a platform for large scale reasoning LarKC = a platform for large scale reasoning LarKC = a platform for large scale reasoning LarKC = a platform for large scale reasoning where do the axioms come from? ( IDENTIFY ) which part of knowledge & data is required ( SELECT ion) when is an answer &quot;good enough&quot; or &quot;best possible&quot; ( DECIDE r) non-deductive inference (inductive, statistical) ( REASON er) “ ReaSearch: integrating reasoning and search &quot;
  11. 11. Overall approach of LarKC <ul><li>Very lightweight platform </li></ul><ul><ul><li>communication, synchronisation, registration </li></ul></ul><ul><ul><li>LarKC = “SPARQL endpoint on steroids” </li></ul></ul><ul><li>The real work happens in the plugins </li></ul><ul><li>LarKC gives you: </li></ul><ul><ul><li>very scalable datalayer </li></ul></ul><ul><ul><li>standardised interfaces for combining components </li></ul></ul><ul><ul><li>utilities & infrastructure </li></ul></ul><ul><li>Three types of LarKC users: </li></ul><ul><ul><li>people building plugins </li></ul></ul><ul><ul><li>people configuring workflows </li></ul></ul><ul><ul><li>people using workflows </li></ul></ul>
  12. 12. How to deploy LarKC <ul><li>All local: </li></ul><ul><ul><li>platform local, plugins local </li></ul></ul><ul><ul><li>Example: workstation </li></ul></ul><ul><li>Calling remote plugins: </li></ul><ul><ul><li>platform local, (some) plugins remote </li></ul></ul><ul><ul><li>Example: laptop </li></ul></ul><ul><li>Fully remote </li></ul><ul><ul><li>platform remote (eg. as a web-service) </li></ul></ul><ul><ul><li>plugins remote </li></ul></ul><ul><ul><li>Example: cluster </li></ul></ul>
  13. 13. Why would people (like you) want to use LarKC <ul><li>workflow builders: </li></ul><ul><ul><li>easier to get some application scenario running </li></ul></ul><ul><li>Plugin builders: </li></ul><ul><ul><li>easier integration with components by others, </li></ul></ul><ul><ul><li>wider take up of your own component by others </li></ul></ul>
  14. 14. What does a workflow look like? Identifier Info Set Transformer Reasoner Decider Selector Query Transformer Data Layer
  15. 15. What does a workflow look like? Identifier Info Set Transformer Reasoner Decider Selector Query Transformer Data Layer Data Layer Data Layer Data Layer Data Layer
  16. 16. What does a workflow look like? Identifier Info Set Transformer Reasoner Decider Selector Query Transformer
  17. 17. What does a workflow look like? Identifier Reasoner Decider Selector
  18. 18. What does a workflow look like? Reasoner Decider Selector
  19. 19. What does a workflow look like? Reasoner Decider Selector
  20. 20. What does a workflow look like? Identifier Info Set Transformer Reasoner Decider Selector Query Transformer Identifier Identifier Info Set Transformer ETCETERA
  21. 21. What does a DECIDEr look like? <ul><li>Can be a hardcoded sequence of plugins </li></ul><ul><li>Can be a self-configuring selection of plugins </li></ul><ul><li>Can make run-time decisions on progress and resource consumption </li></ul><ul><li>Coded as: </li></ul><ul><ul><li>Java </li></ul></ul><ul><ul><li>a Cyc knowledge base </li></ul></ul><ul><ul><li>... </li></ul></ul><ul><ul><li>as long as it complies with the DECIDE r API </li></ul></ul>
  22. 22. Already any plugins available? <ul><li>5x IDENTIFY </li></ul><ul><li>3x TRANSFORM </li></ul><ul><li>10x SELECT </li></ul><ul><li>4x REASON </li></ul><ul><li>4x DECIDE </li></ul><ul><li>Sometimes sophisticated, sometimes simple </li></ul><ul><li>Sometimes novel, sometimes wrapped </li></ul><ul><li>existing web-services (e.g. Sindice, Swoogle) </li></ul><ul><li>another RDF store (geo-queries in Allegrograph) </li></ul><ul><li>a very large (workflow-based) system (GATE) </li></ul><ul><li>existing reasoners (Jena, Pellet, Cyc, IRIS) </li></ul><ul><li>XSLT scripts (XML-2-RDF) </li></ul><ul><li>spreading activitation (new) </li></ul><ul><li>RDF-2-weightedRDF (new) </li></ul>
  23. 23. Goals of LarKC, and where we are <ul><li>Scalable: > 10 9 triples, lazy pipes </li></ul><ul><li>Reconfigurable: plugins with standard API’s </li></ul><ul><li>Open: Apache license </li></ul><ul><li>heterogenous: TRANSFORM, wrappers </li></ul><ul><li>experimentation: wrap & integrate </li></ul><ul><li>allow incompleteness: IDENTIFY, SELECT </li></ul><ul><li>enable distribution: plugin containers </li></ul><ul><li>anytime behaviour: streaming APIs </li></ul><ul><li>web-enabled: remote plugins & data </li></ul>
  24. 24. What we will not show today <ul><li>Available but not demo’d: </li></ul><ul><li>lot’s of plugins </li></ul><ul><li>C-SPARQL: extension of SPARQL to enable stream-querying </li></ul><ul><li>cognition-based heuristics (e.g. selection rules, stopping rules) </li></ul><ul><li>very cool data-sets </li></ul><ul><ul><li>Linked Life Data (1.4B explicit, 2.3B closure, 1.3M links) </li></ul></ul><ul><ul><li>Milan traffic grid (2M explicit +2Tb sensor-data (to come)) </li></ul></ul><ul><ul><li>Interest-enhanced DBLP (615k authors + interests) </li></ul></ul><ul><ul><li>LDSR (358M explit + 512 inferred, 100m URIs) </li></ul></ul><ul><li>very large/fast inference engines: MarVIN, Reasoning-Hadoop </li></ul><ul><li>Not yet available (but will be): </li></ul><ul><ul><li>plugin-farming on remote CPU’s (cloud, cluster) </li></ul></ul><ul><ul><li>instrumentation & measuring </li></ul></ul><ul><ul><li>smart data caching </li></ul></ul>
  25. 25. Agenda for today October 2010 @ ISWC Time Presentation Title Presenter 08.30-09.00 Setup 09.00 – 09.30 Introduction to LarKC Frank van Harmelen 09.30 – 10:30 LarKC Architecture Michael Witbrock 10:30 – 11:00 Coffee break 11:00 – 11:30 Hands-on: work with an existing LarKC workflow Florian Fischer 11:30 – 12:00 LarKC Data Layer Florian Fischer 12:00 – 13.00 Builder a LarKC DECIDEr & create a workflow from existing plugins Luka Bradesko, Blaz Fortuna 13.00 – 14:00 Lunch 14:00 – 14:30 Distributed Processing in LarKC Michael Witbrock 14:30 – 15:30 Hands-on: Building a LarKC Plugin and integrating it in a worflow Florian Fischer 15.30 – 16:00 Coffee break Eyal Oren 16:00 – 17:00 Hands-on: the Urban computing workflow Emanuele della Valle 17.00 – 18:00 Wrap up - Discussion and feedback Frank van Harmelen

×