• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LarKC Tutorial at ISWC 2009 - Architecture
 

LarKC Tutorial at ISWC 2009 - Architecture

on

  • 1,404 views

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete ...

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the platform architecture

Statistics

Views

Total Views
1,404
Views on SlideShare
1,401
Embed Views
3

Actions

Likes
0
Downloads
21
Comments
0

2 Embeds 3

http://www.larkc.eu 2
http://static.slidesharecdn.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    LarKC Tutorial at ISWC 2009 - Architecture LarKC Tutorial at ISWC 2009 - Architecture Presentation Transcript

    • LarKC Architecture and Technology Michael Witbrock, Cycorp Europe (+UIBK) with contributions from all LarKC developers
    • Realising the Architecture Workflow Support System Plug-in Registry Data Layer Plug-in API Data Layer API RDF Store Plug-in Manager
    • LarKC Plug-in API: General Plug-in Model
      • Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application
      • Plug-ins are identified by a URI ( Uniform Resource Identifier)
      • Plug-ins provide MetaData about what they do (Functional properties): e.g. type = Selecter
      • Plug-ins provide information about their behaviour and needs, including Quality of Service information (Non-functional properties): e.g. Throughput, MinMemory, Cost,…
      • Plug-ins can be provided with a Contract that tells them how to behave (e.g. Contract : “give me the next 10 results”) and Context information used to store state between invocations
      + URI getIdentifier() + QoSInformation getQoSInformation() Plug-in
      • Functional properties
      • Non-functional properties
      • WSDL description
      Plug-in description
    • LarKC Plug-in API: IDENTIFY
      • IDENTIFY : Given a query, identify resources that could be used to answer it
        • Sindice – Triple Pattern Query  RDF Graphs
        • Google – Keyword Query  Natural Language Document
        • Triple Store – SPARQL Query  RDF Graphs
      + Collection<InformationSet> identify (Query theQuery, Contract contract, Context context) Identifier
    • LarKC Plug-in API: TRANSFORM (1/2)
      • Query TRANSFORM : Transforms a query from one representation to another
        • SPARQL Query  Triple Pattern Query
        • SPARQL Query  Keyword Query
        • SPARQL Query  SPARQL Query (different abstraction)
        • SQARQL Query  CycL Query
      + Set<Query> transform(Query theQuery, Contract theContract, Context theContext) QueryTransformer
    • LarKC Plug-in API: TRANSFORM (2/2)
      • Information Set TRANSFORM : Transforms data from one representation to another
        • Natural Language Document  RDF Graph
        • Structured Data Sources  RDF Graph
        • RDF Graph  RDF Graph (e.g. foaf vocabulary to facebook vocabulary )
      + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext) InformationSetTransformer
    • LarKC Plug-in API: SELECT
      • SELECT : Given a set of statements (e.g. a number of RDF Graphs) will choose a selection/sample from this set
        • Collection of RDF Graphs  Triple Set (Merged)
        • Collection of RDF Graphs  Triple Set (10% of each)
        • Collection of RDF Graphs  Triple Set (N Triples)
      + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context) Selecter
    • LarKC Plug-in API: REASON
      • REASON : Executes a query against the supplied set of statements
        • SPARQL Query  Variable Binding (Select)
        • SPARQL Query  Set of statements (Construct)
        • SPARQL Query  Set of statements (Describe)
        • SPARQL Query  Boolean (Ask)
      + V ariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, S etOfStatements theSetOfStatements, Contract contract, Context context) Reasoner
    • LarKC Plug-in API: DECIDE
      • DECIDE : Builds the workflow and manages the control flow
        • Scripted Decider: Predefined workflow is built and executed
        • Self-configuring Decider: Uses plug-in descriptions (functional and non-functional properties) to build the workflow
      + V ariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters) Decider
    • Released System: larkc.sourceforge.net
      • Open Apache 2.0 license
      • Previous early adopters workshop @ ESWC
        • 20 people attended
        • participants modified plug-ins, modified workflows
      • Standard Open Environment: subversion connection, command line build, or eclipse, netbeans soon?
      Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Pipeline Support System
    • LarKC Plug-in API + Collection<InformationSet> identify (Query theQuery, Contract contract, Context context) Identifier + Set<Query> transform(Query theQuery, Contract theContract, Context theContext) QueryTransformer + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext) InformationSetTransformer + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context) Selecter + VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, S etOfStatements theSetOfStatements, Contract contract, Context context) Reasoner + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters) Decider
      • 5 types of plug-ins
      • Plug-in API enables interoperability (between plug-in and platform and between plug-ins)
      • Plug-ins I/O abstract data structures of RDF triples => flexibility for assembling plug-ins and for plug-in writers
      • Compatibility ensured by DECIDER and workflow configurators, based on plug-in description
    • LarKC Plug-in API LarKC Architecture Data Layer API Pipeline Support System Plug-in Registry RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Application Platform Utility Functionality APIs Plug-ins External systems External data sources Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API
    • What does a workflow look like? Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer Reasoner Decider Selecter Query Transformer
    • What Does a Workflow Look Like? Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer Reasoner Decider Selecter Query Transformer Data Layer Data Layer Data Layer Data Layer RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph Default Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph
    • LarKC Data Model :Transport By Reference RDF Graph Dataset: Collection of named graphs Labeled Set: Pointers to data Current Scale: O(10 10 ) triples RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph Default Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph
    • What Does a Workflow Look Like? Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer Reasoner Decider Selecter Query Transformer Data Layer Data Layer Data Layer Data Layer RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph Default Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph
    • What Does a Pipeline Look Like? Info Set Transformer Identifier Identifier Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Wlorkflow Support System RDF Store Identifier Info Set Transformer Reasoner Decider Selecter Query Transformer Data Layer Data Layer Data Layer Data Layer
    • Remote and Heterogeneous Plug-ins Remote Plug-in Manager Adaptor External or non-Java Code TRANSFORM SPARQL-CycL Research Cyc TRANSFORM SPARQL- GATE API GATE IDENTIFY SPARQL SINDICE IDENTIFY SPARQL Medical Data Data Layer
    • What Does a Workflow Look Like? Info Set Transformer Identifier Identifier Info Set Transformer Reasoner Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer Reasoner Decider Selecter Query Transformer Data Layer Data Layer Data Layer Data Layer
    • Decider Using Plug-in Registry to Create Pipeline D 1.3.1
      • Represent Properties
      • Functional
      • Non-functional (e.g. QoS)
      • WSMO-Lite Syntax
      • Logical Representation
      • Describes role
      • Describes Inputs/Outputs
      • Automatically extracted using API
      • Decider can use for dynamic configuration
        • Rule-based
        • Fast
      Q T I S R VB A Q T I S R VB B
    • LarKC Plug-ins
      • Provide SPARQL end-points
      • Run in separate threads
      • Automatically add meta-data to registry when loaded
      • Communicate RDF data by passing labelled sets or references to labelled sets
      Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API
      • Parallelisation in progress
      Plug-in Manager Transformer Plug-in API Plug-in Manager Identifier Plug-in API ransformer Transformer Transformer Plug-in Manager Identifier Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Selector Plug-in API Plug-in Manager Selector Plug-in API
      • Split/Join connectors in progress
    • LarKC Data Layer Data Layer API Data Layer Data Layer API Pipeline Support System Plug-in Registry RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Application Platform Utility Functionality APIs Plug-ins External systems External data sources
    • LarKC Data Layer
      • Main goal:
      • The LarKC Data Layer supports all LarKC plug-ins with respect to:
        • storage, retrieval and light-weight inference on top of large volumes of data
        • automates the exchange of RDF data by reference and by value
        • offers other utility tools to manage data (e.g. merger)
      RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph Default Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph Dataset Labeled Set
      • The implementation of the data layer was evaluated against
        • Well-known benchmarks: LUBM (Lehigh Univ. Benchmark) and BSBM (Berlin SPARQL Benchmark), and
        • Two views to the web of linked data used in LarKC: PIKB (Pathway and Interaction Knowledge Base) and LDSR ( Linked Data Semantic Repository )
      • Loading:
        • 15B statements at 18 KSt/sec. on $10,000 server
        • 1B statements at 66 KSt/sec. on $2,000 desktop
      • Reasoning & Materialization:
        • LUBM: 21 KSt/sec for 1BSt and 10 KSt/sec for 7B expl. statements
        • LDSR: 14 KSt/sec for 357M expl. statements
        • PIKB: 10 KSt/sec for 1.5B expl. Statements
      • Competitive with State of the Art
      LarKC Data Layer Performance
      • Inference with both LDSR and PIKB prove to be much more complex than LUBM , because
        • The datasets are much better interconnected
        • There are plenty of owl:sameAs links
        • OWL vocabulary is used disregarding its formal semantics
          • E.g. in DBPedia there are skos:broader cycles of categories with length 180
      • Optimizations of the handling of owl:sameAs are crucial
      • PIKB: 1.47B explicit statements + 842M inferred
      • LDSR loaded in 7 hours on desktop:
        • Number of imported statements (NIS): 357M
        • Number of new inferred statements: 512M
        • Number of stored statements (NSS): 869M
        • Number of retrievable statements (NRS): 1.14B
          • owl:sameAs optimisation allowed reducing the indices by 280M statements
      LarKC Data Layer Evaluation: Linked Data
    • Plug-in Architecture Signs of Success
      • Platform and Plug-in APIs are useable
      • In the twenties of plug-ins already
      • Plug-ins written with little help from architects
      • Plug-ins run successfully, and perform together
      • Outside plugin-writers:
        • OKKAM, NeOn, Aberdeen
      Plug-in Manager Identifier Plug-in API
    • Active and Ready for the Public
      • 2170 check-outs
      • 1380 commits
      • 23 users of code repository
        • LarKC + Alpha
        • Plus Early Adopters Workshop branch
      • 20 downloads of alpha 1 public release since 30 th May 2009.
    • Project Timeline 14 Surveys (plug-ins, platform) & Requirements (use cases) Prototype Internal Release Public Release Final Release 42 0 6 18 33 10 Plug-ins Use Cases V1 Use Cases V2 Use Cases V3 Data caching Offer computing resources Anytime behaviour Monitoring & instrumentation
    • Rapid Progress, but We’re Not Finished…
      • Optimisation of complex workflows.
      • Extend meta-data representation for QoS, parallelism and use it.
      • Concentrate on parallel and distributed execution.
      • Concentrate on parallel and distributed data layer; caching and data migration.
      • Support more plug-in needs while maintaining platform integrity (e.g. efficient weight modification for spreading activation)
      • Data write for persistent transformation (e.g. rumination reasoning in Marvin experiments)
      • Support workflows inspired by human cognition (e.g. workflow interruption for optimal stopping)
      • Support anytime/streaming
      • Experimental instrumentation and monitoring
      Data Layer API Pipeline Support System Plug-in Registry RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Application
        • Data Caching
        • Anytime Behaviour
        • Plug-in Registration and Discovery
        • Plug-in Monitoring and Measurement
        • Support for Developers
        • Plug-ins
      • Classified according to:
        • Resources
        • Heterogeneity
        • Usage
        • Interoperability
        • Parallelization “within plug-ins”
        • Distributed/remote execution
        • Data Layer
      • Sources
        • Initial Project Objectives (DoW)
        • LarKC Collider Platform (WP5 discussions)
        • LarKC Rapid Prototyping
        • LarKC Use Cases (WP6, WP7a, WP7b)
        • LarKC Plug-ins (WP2, WP3, WP4)
      Detailed information in D5.3.1 Requirements Analysis and report on lessons learned during prototyping Requirements (WP 5)
      • Distributed Data Layer
      • Caching, data warming/cooling
      • Data Streaming between remote components
      • Parallelization and distribution on different types of environments (high-performance grid, desktop grid, etc.)
      • Experimental instrumentation and monitoring
      Open Issues & Next Steps
      • Requirements traceability and update
      • Architecture refinement
      Platform validation Early Adopters
    • fin