Iswc 2009 LarKC Tutorial: Architecture

863 views

Published on

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the platform architecture.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Iswc 2009 LarKC Tutorial: Architecture

  1. 1. LarKC Architecture and Technology<br />Michael Witbrock, Cycorp Europe (+UIBK)<br />with contributions from all LarKC developers<br />
  2. 2. Realising the Architecture<br />Workflow<br />Support<br />System<br />Plug-in Manager<br />Plug-in Registry<br />Plug-in API<br />Data Layer API<br />RDF<br />Store<br />Data Layer<br />2<br />
  3. 3. LarKC Plug-in API: General Plug-in Model<br /><ul><li>Functionalproperties
  4. 4. Non-functionalproperties
  5. 5. WSDL description</li></ul>Plug-in<br />Plug-in <br />description<br />+ URI getIdentifier()<br />+ QoSInformationgetQoSInformation()<br />Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application<br />Plug-ins are identified by a URI (Uniform Resource Identifier)<br />Plug-ins provide MetaData about what they do (Functional properties): e.g. type = Selecter<br />Plug-ins provide information about their behaviour and needs, including Quality of Service information (Non-functional properties): e.g. Throughput, MinMemory, Cost,… <br />Plug-ins can be provided with a Contract that tells them how to behave (e.g. Contract : “give me the next 10 results”) and Context information used to store state between invocations<br />3<br />
  6. 6. LarKC Plug-in API: IDENTIFY<br />Identifier <br />+ Collection&lt;InformationSet&gt; identify<br />(Query theQuery, Contractcontract, Contextcontext)<br />IDENTIFY: Given a query, identify resources that could be used to answer it<br /><ul><li>Sindice – Triple Pattern Query  RDF Graphs
  7. 7. Google – Keyword Query  Natural Language Document
  8. 8. Triple Store – SPARQL Query  RDF Graphs</li></ul>4<br />
  9. 9. LarKC Plug-in API: TRANSFORM (1/2)<br />QueryTransformer<br />+ Set&lt;Query&gt; transform(Query theQuery, Contract theContract, Context theContext)<br />Query TRANSFORM: Transforms a query from one representation to another <br /><ul><li>SPARQL Query  Triple Pattern Query
  10. 10. SPARQL Query  Keyword Query
  11. 11. SPARQL Query  SPARQL Query (different abstraction)
  12. 12. SQARQL Query  CycL Query</li></ul>5<br />
  13. 13. LarKC Plug-in API: TRANSFORM (2/2)<br />InformationSetTransformer<br />+ InformationSettransform(InformationSettheInformationSet, ContracttheContract, ContexttheContext)<br />Information Set TRANSFORM: Transforms data from one representation to another<br /><ul><li>Natural Language Document  RDF Graph
  14. 14. Structured Data Sources  RDF Graph
  15. 15. RDF Graph  RDF Graph (e.g. foaf vocabulary to facebook vocabulary)</li></ul>6<br />
  16. 16. LarKC Plug-in API: SELECT<br />Selecter<br />+ SetOfStatements select(SetOfStatementstheSetOfStatements, Contract contract,<br />Contextcontext)<br />SELECT: Given a set of statements (e.g. a number of RDF Graphs) will choose a selection/sample from this set<br />Collection of RDF Graphs  Triple Set (Merged)<br />Collection of RDF Graphs  Triple Set (10% of each)<br />Collection of RDF Graphs  Triple Set (N Triples)<br />7<br />
  17. 17. LarKC Plug-in API: REASON<br />Reasoner<br />+ VariableBindingsparqlSelect(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)<br />+ SetOfStatementssparqlConstruct(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)<br />+ SetOfStatementssparqlDescribe(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)<br />+ BooleanInformationSetsparqlAsk(SPARQLQuerytheQuery, <br />SetOfStatementstheSetOfStatements, Contract contract, Context context)<br />REASON: Executes a query against the supplied set of statements<br />SPARQL Query  Variable Binding (Select)<br />SPARQL Query  Set of statements (Construct)<br />SPARQL Query  Set of statements (Describe)<br />SPARQL Query  Boolean (Ask)<br />8<br />
  18. 18. LarKC Plug-in API: DECIDE<br />Decider<br />+ VariableBindingsparqlSelect(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)<br />+ SetOfStatementssparqlConstruct(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)<br />+ SetOfStatementssparqlDescribe(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)<br />+ BooleanInformationSetsparqlAsk(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)<br />DECIDE: Builds the workflow and manages the control flow<br />Scripted Decider: Predefined workflow is built and executed<br />Self-configuring Decider: Uses plug-in descriptions (functional and non-functional properties) to build the workflow<br />9<br />
  19. 19. Released System: larkc.sourceforge.net<br /><ul><li>Open Apache 2.0 license
  20. 20. Previous early adopters workshop @ ESWC
  21. 21. 20 people attended
  22. 22. participants modified plug-ins, modified workflows</li></ul>Standard Open Environment: subversion connection, command line build, or eclipse, netbeans soon?<br />Plug-in API<br />Decider<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />Plug-in Registry<br />Pipeline<br />Support<br />System<br />10<br />
  23. 23. LarKC Plug-in API<br />11<br />Decider<br />Reasoner<br />Identifier <br />QueryTransformer<br />InformationSetTransformer<br />Selecter<br />+ VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters)<br />+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters)<br />+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters)<br />+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters)<br />+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)<br />+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)<br />+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)<br />+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, <br />SetOfStatements theSetOfStatements, Contract contract, Context context)<br /><ul><li>5 types of plug-ins
  24. 24. Plug-in API enables interoperability (between plug-in and platform and between plug-ins)
  25. 25. Plug-ins I/O abstract data structures of RDF triples => flexibility for assembling plug-ins and for plug-in writers
  26. 26. Compatibility ensured by DECIDER and workflow configurators, based on plug-in description</li></ul>+ Collection&lt;InformationSet&gt; identify<br />(Query theQuery, Contract contract, Context context)<br />+ Set&lt;Query&gt; transform(Query theQuery, Contract theContract, Context theContext)<br />+ InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext)<br />+ SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context)<br />
  27. 27. LarKC Architecture<br />Application<br />Plug-in API<br />Decider<br />Pipeline<br />Support<br />System<br />Plug-in Registry<br />Plug-in API<br />Platform Utility Functionality<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />APIs<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-ins<br />Query<br />Transformer<br />Identifier<br />Selecter<br />Reasoner<br />Info. Set<br />Transformer<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />External systems<br />External data sources<br />Data Layer API<br />Data Layer<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Doc<br />RDF<br />Doc<br />RDF<br />Doc<br />LarKC Plug-in API<br />12<br />
  28. 28. Plug-in API<br />Decider<br />Decider<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />Info Set Transformer<br />Identifier<br />Selecter<br />Query<br />Transformer<br />Reasoner<br />Plug-in Registry<br />Workflow<br />Support<br />System<br />RDF<br />Store<br />What does a workflow look like?<br />13<br />
  29. 29. What Does a Workflow Look Like?<br />Plug-in API<br />Default Graph<br />Decider<br />Decider<br />RDF Graph<br />RDF Graph<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />Info Set Transformer<br />Identifier<br />Selecter<br />Query<br />Transformer<br />Reasoner<br />Plug-in Registry<br />Workflow<br />Support<br />System<br />RDF Graph<br />RDF Graph<br />Data Layer<br />Data Layer<br />Data Layer<br />Data Layer<br />RDF<br />Store<br />RDF Graph<br />14<br />
  30. 30. LarKC Data Model :Transport By Reference<br />Labeled Set: <br />Pointers to data<br />Dataset: Collection<br />of named graphs<br />Default Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br /> Current Scale: O(1010) triples<br />15<br />
  31. 31. What Does a Workflow Look Like? <br />Plug-in API<br />Default Graph<br />Decider<br />Decider<br />RDF Graph<br />RDF Graph<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />Info Set Transformer<br />Identifier<br />Selecter<br />Query<br />Transformer<br />Reasoner<br />Plug-in Registry<br />Workflow<br />Support<br />System<br />RDF Graph<br />RDF Graph<br />Data Layer<br />Data Layer<br />Data Layer<br />Data Layer<br />RDF<br />Store<br />RDF Graph<br />16<br />
  32. 32. What Does a Pipeline Look Like? <br />Plug-in API<br />Decider<br />Decider<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Info Set Transformer<br />Identifier<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />Identifier<br />Info Set Transformer<br />Identifier<br />Selecter<br />Query<br />Transformer<br />Reasoner<br />Plug-in Registry<br />Wlorkflow<br />Support<br />System<br />Data Layer<br />Data Layer<br />Data Layer<br />Data Layer<br />RDF<br />Store<br />17<br />
  33. 33. Remote and Heterogeneous Plug-ins<br />Remote<br />Plug-in Manager<br />TRANSFORM<br />TRANSFORM<br />IDENTIFY<br />IDENTIFY<br />Adaptor<br />SPARQL- GATE API<br />SPARQL<br />SPARQL-CycL<br />SPARQL<br />External or non-Java Code<br />Research Cyc<br />GATE<br />Data Layer<br />SINDICE<br />Medical <br />Data<br />18<br />
  34. 34. What Does a Workflow Look Like? <br />Plug-in API<br />Decider<br />Decider<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Reasoner<br />Info Set Transformer<br />Identifier<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selecter<br />Query<br />Transformer<br />Identifier<br />Reasoner<br />Info. Set<br />Transformer<br />Info Set Transformer<br />Identifier<br />Info Set Transformer<br />Identifier<br />Selecter<br />Query<br />Transformer<br />Reasoner<br />Plug-in Registry<br />Workflow<br />Support<br />System<br />Data Layer<br />Data Layer<br />Data Layer<br />Data Layer<br />RDF<br />Store<br />19<br />
  35. 35. Decider Using Plug-in Registry to Create Pipeline<br />D 1.3.1<br />Represent Properties<br /><ul><li> Functional
  36. 36. Non-functional (e.g. QoS)
  37. 37. WSMO-LiteSyntax</li></ul>Q<br />Q<br />T<br />T<br />I<br />I<br />Logical Representation<br /><ul><li>Describes role
  38. 38. Describes Inputs/Outputs
  39. 39. Automatically extracted using API
  40. 40. Decider can use for dynamic configuration
  41. 41. Rule-based
  42. 42. Fast</li></ul>A<br />B<br />S<br />R<br />S<br />R<br />VB<br />VB<br />20<br />
  43. 43. LarKC Plug-ins <br /><ul><li> Provide SPARQL end-points
  44. 44. Run in separate threads
  45. 45. Automatically add meta-data to registry when loaded
  46. 46. Communicate RDF data by passing labelled sets or references to labelled sets</li></ul>Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Selector<br />Selector<br />Query<br />Transformer<br />Transformer<br />Identifier<br />Identifier<br />Identifier<br />Identifier<br /><ul><li> Parallelisation in progress</li></ul>ransformer<br />Transformer<br /><ul><li> Split/Join connectors in progress</li></ul>Transformer<br />21<br />
  47. 47. Application<br />Plug-in API<br />Decider<br />Pipeline<br />Support<br />System<br />Plug-in Registry<br />Platform Utility Functionality<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />APIs<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-ins<br />Query<br />Transformer<br />Identifier<br />Selecter<br />Reasoner<br />Info. Set<br />Transformer<br />Data Layer API<br />Data Layer<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Doc<br />RDF<br />Doc<br />RDF<br />Doc<br />LarKC Data Layer<br />22<br />External systems<br />Data Layer API<br />External data sources<br />Data Layer<br />
  48. 48. LarKC Data Layer <br />23<br />Main goal:<br />The LarKC Data Layer supports all LarKC plug-ins with respect to:<br />storage, retrieval and light-weight inference on top of large volumes of data<br />automates the exchange of RDF data by reference and by value<br />offers other utility tools to manage data (e.g. merger) <br />Labeled Set<br />Default Graph<br />Dataset<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />RDF Graph<br />
  49. 49. The implementation of the data layer was evaluated against<br />Well-known benchmarks: LUBM (Lehigh Univ. Benchmark) and BSBM (Berlin SPARQL Benchmark), and<br />Two views to the web of linked data used in LarKC: PIKB (Pathway and Interaction Knowledge Base) and LDSR (Linked Data Semantic Repository)<br />Loading: <br />15B statements at 18 KSt/sec. on $10,000 server <br />1B statements at 66 KSt/sec. on $2,000 desktop<br />Reasoning & Materialization: <br />LUBM: 21 KSt/sec for 1BSt and 10 KSt/sec for 7B expl. statements<br />LDSR: 14 KSt/sec for 357M expl. statements<br />PIKB: 10 KSt/sec for 1.5B expl. Statements<br />Competitive with State of the Art<br />24<br />LarKC Data Layer Performance<br />
  50. 50. 25<br />LarKC Data Layer Evaluation: Loading<br />
  51. 51. Inference with both LDSR and PIKB prove to be much more complex than LUBM, because<br />The datasets are much better interconnected <br />There are plenty of owl:sameAs links<br />OWL vocabulary is used disregarding its formal semantics<br />E.g. in DBPedia there are skos:broader cycles of categories with length 180<br />Optimizations of the handling of owl:sameAs are crucial<br />PIKB: 1.47B explicit statements + 842M inferred<br />LDSR loaded in 7 hours on desktop:<br />Number of imported statements (NIS): 357M<br />Number of new inferred statements: 512M<br />Number of stored statements (NSS): 869M<br />Number of retrievable statements (NRS): 1.14B<br />owl:sameAs optimisation allowed reducing the indices by 280M statements<br />26<br />LarKC Data Layer Evaluation: Linked Data<br />
  52. 52. Plug-in Architecture Signs of Success<br /><ul><li>Platform and Plug-in APIs are useable
  53. 53. In the twenties of plug-ins already
  54. 54. Plug-ins written with little help from architects
  55. 55. Plug-ins run successfully, and perform together
  56. 56. Outside plugin-writers:
  57. 57. OKKAM, NeOn, Aberdeen</li></ul>Plug-in Manager<br />Plug-in API<br />Identifier<br />27<br />
  58. 58. Active and Ready for the Public<br />2170 check-outs<br />1380 commits<br />23 users of code repository <br />LarKC + Alpha<br /> Plus Early Adopters Workshop branch<br />20 downloads of alpha 1 public release since 30th May 2009.<br />28<br />
  59. 59. Lessons Learned (1/2)<br />API Design<br />Types of Plug-ins: 5 (+1 =&gt; 2 types of TRANSFORM)<br />I/O data structures more abstract =&gt; more flexibility for assembling plug-ins and for plug-in writers<br />Test API Implementation<br />Validation and refinement of API (introduction of ‘Contract’ and ‘Context’ parameters)<br />Transforming Cyc into LarKC Platform<br />Minimization and reorganization of Cyc code as a basis for the LarKC Platform<br />Plug-ins and Use cases implementation<br />Feedback collected, as our first early adopters, on different topics (how-to guidelines, context parameter, plug-ins types, data caching,…)<br />29<br />
  60. 60. Lessons Learned (2/2)<br />Licensing:<br />Licensing policies aligned with partners’ and project’s interests =&gt; maximize openess and external contributions without preventing from exploitation<br />Components’ licenses monitoring to avoid conflicts<br />MaRVIN and IBIS: <br />strategy applicable to large-scale deployment, autonomous and symmetric nodes, asynchronous communication between nodes, well-balanced load needed<br />abstraction layer hiding resources heterogeneity (IBIS)<br />30<br />
  61. 61. Project Timeline<br />42<br />0<br />6<br />18<br />33<br />10<br />Use Cases V2<br />Use Cases V3<br />Use Cases V1<br />Plug-ins<br /> Surveys (plug-ins, platform) & Requirements (use cases)<br />Offer computing resources<br />Monitoring & instrumentation<br />Anytime behaviour<br />Prototype<br /> Internal Release<br /> Public Release<br /> Final Release<br />Data caching<br />14<br />31<br />
  62. 62. Rapid Progress, but We’re Not Finished…<br />Application<br />Detailedinformation in D5.3.1 Requirements Analysis andreport on lessons learned during prototyping<br />Requirements (WP 5)<br /><ul><li>Optimisation of complex workflows.
  63. 63. Extend meta-data representation for QoS, parallelism and use it.
  64. 64. Concentrate on parallel and distributed execution.
  65. 65. Concentrate on parallel and distributed data layer; caching and data migration.
  66. 66. Support more plug-in needs while maintaining platform integrity (e.g. efficient weight modification for spreading activation)
  67. 67. Data write for persistent transformation (e.g. rumination reasoning in Marvin experiments)</li></ul>Plug-in API<br /><ul><li>Sources
  68. 68. Initial Project Objectives (DoW)
  69. 69. LarKC Collider Platform (WP5 discussions)
  70. 70. LarKC Rapid Prototyping
  71. 71. LarKC Use Cases (WP6, WP7a, WP7b)
  72. 72. LarKC Plug-ins (WP2, WP3, WP4)</li></ul>Decider<br />Pipeline<br />Support<br />System<br />Plug-in Registry<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in Manager<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Plug-in API<br />Query<br />Transformer<br />Identifier<br />Selecter<br />Reasoner<br />Info. Set<br />Transformer<br /><ul><li>Classified according to:
  73. 73. Resources
  74. 74. Heterogeneity
  75. 75. Usage
  76. 76. Interoperability
  77. 77. Parallelization “within plug-ins”
  78. 78. Distributed/remote execution
  79. 79. Data Layer
  80. 80. Data Caching
  81. 81. Anytime Behaviour
  82. 82. Plug-in Registration and Discovery
  83. 83. Plug-in Monitoring and Measurement
  84. 84. Support for Developers
  85. 85. Plug-ins </li></ul>Data Layer API<br /><ul><li> Support workflows inspired by human cognition (e.g. workflow interruption for optimal stopping)
  86. 86. Support anytime/streaming
  87. 87. Experimental instrumentation and monitoring</li></ul>Data Layer<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Store<br />RDF<br />Doc<br />RDF<br />Doc<br />RDF<br />Doc<br />32<br />
  88. 88. Distributed Data Layer<br />Caching, data warming/cooling<br />Data Streaming between remote components<br />Parallelization and distribution on different types of environments (high-performance grid, desktop grid, etc.)<br />Experimental instrumentation and monitoring<br />33<br />Open Issues & Next Steps<br />Platform validation<br /><ul><li>Requirements traceability and update
  89. 89. Architecture refinement</li></ul>Early Adopters<br />
  90. 90. fin<br />

×