• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Iswc 2009 LarKC Tutorial: Architecture
 

Iswc 2009 LarKC Tutorial: Architecture

on

  • 1,083 views

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete ...

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the platform architecture.

Statistics

Views

Total Views
1,083
Views on SlideShare
1,074
Embed Views
9

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 9

http://www.linkedin.com 6
http://www.slideshare.net 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We’ve implemented a platform that realizes the goal of the proposal: Supporting the experimentation that allow massive, and necessarily incomplete, reasoning over web scale data. Most of the work will be in the plug-ins, and we’ve already got interesting ones that we’ll demonstrate, but to support them, we’ve added services that support the plug-ins in as lightweight a fashion as possible, but no more. This includes the workflow support system, that allows the plug-ins to execute, in the right order; a plug-in management system that integrates the plug-ins with the platform and a plug-in registry, which supports meta-reasoning and quality of service, and a data-layer, designed to make handling massive data practical.We also provide a default RDF store, and default meta-reasoning support, and are currently working on the first versions of parallelisation support.
  • GiveexampleofMetaData (include in slides) andQoSinfo => isthisincluded in WP1 ppt?? GiveexampleofContractandContextparameters"are they web services?".At the moment they are not and much of the WSDL parts are empty.The reason to use WSDL at all is that WSMO-Lite SAWSDL with a WSMO-Lite ontology and SAWSDL is an extension of WSDL.And anyway, maybe they will be full-fledged WSDL web services one day.
  • "What is a triple pattern?"
  • Better example for the last bullet would be foaf vocabulary to facebook vocabulary.
  • "What is a triple pattern?“During the implementation of the first prototype it was realised that there are essentially two types of transform components in a workflow. The first prototype workflow used the Sindice [10] Web service to ‘identify’ RDF resources on the Web that could be used to answer the input SPARQL query. However, the Sindice service comes in two forms – triple pattern search and keyword search – neither of which can use the input SPARQL query directly. Indeed similar services such as SWOOGLE and Watson also use a variety of input data forms. Hence it became clear that a transformation of the input SPARQL query is required and to facilitate this, a new plug-in interface was created, ‘QueryTransformer’, as a special case of TRANSFORM plug-in. Originally, it was planned for plug-ins to accept and return certain data structures that were identified from the proposed LarKC data model. For example, it made sense that a SELECT plug-in would accept a collection of RDF graphs (data-set) and return a subset of these triples (triple-set). However, this approach meant that it became impossible to wire together two select components in series in a workflow without significant extra programming. So after several revisions of the API, it was realised that from a plug-in’s point of view, the type of the data structures used as input was not relevant. The plug-in just needs to be able to access and process the triples. Therefore, the plug-in interfaces were modified to accept and return only the most abstract data structures containing RDF triples, thus imposing less restrictions on how plug-ins are assembled in a workflow and giving plug-in writers greater freedom to return RDF triples in data-structures appropriate for the algorithm encapsulated within their plug-in. Ensuring compatibility between plug-ins will be done by the DECIDER plug-ins and/or workflow configurators, based on plug-ins metadata (plug-ins description through plug-in annotation language).
  • Remember here the overal architecture picture from Michael´s presentation, in order to introducte the details in next slides (APIs,…) Mention shortly what we will explain later and where in the platform it is located: APIs, parallelization/distribution (where they are “hidden” within the arch picture)Purple = Platform Utility FunctionalityGreen = APIsBlue = Plug-ins (Not sure if Data Layer should be viewed as a plug-in and thus blue)Orange = external systemsRed = external data sources
  • LarKC workflows are more like work flows.
  • LarKC workflows are more like work flows.
  • The LarKC data model allows triple sets to be physically moved between plug-ins, but this can be expensive, especially during identification and selection, so the data layer also supports the transfer of references to named sets of triples in an RDF store or out in the web.
  • LarKC workflows are more like work flows,Data transfer can be virtualised
  • LarKC workflows are more like work flows,Data transfer can be virtualised
  • - heterogenous: - heterogenous data: TRANSFORM, combine text and triples, (WP7 – gate and medical data)  combine different vocabularies - heterogenous code: wrappers, combine new & legacy, java & non-java,  local code and calling a web-service, etc
  • LarKC workflows are more like work flows,Data transfer can be virtualised
  • Meta-reasoning can dynamically construct, or reorder workflows. A decider, reasoning about the contents of the plug-in registry, here, constructs two different workflows to answer the same query, when provided with two different sets of plugins, A, and B.Logical representation of plug-in’s meta-dataPlug-in rolesDescription of inputs and outputsLogical representation automatically extracted using only the functions from the API and Java classesCan automatically assemble API v0.2 plug-ins into working workflowUsing predefined rules for composing plug-insFast – can be done in on the flyOngoing:Adding QoS parameters to the meta-dataUse QoS parameters at assembling and modifying pipe-lines
  • Remember here the overal architecture picture from Michael´s presentation, in order to introducte the details in next slides (APIs,…) Mention shortly what we will explain later and where in the platform it is located: APIs, parallelization/distribution (where they are “hidden” within the arch picture)Purple = Platform Utility FunctionalityGreen = APIsBlue = Plug-ins (Not sure if Data Layer should be viewed as a plug-in and thus blue)Orange = external systemsRed = external data sources
  • The data layer API gives you powerful (maybe too powerful tools) to manipulate different structured of RDF like: Merge arbitrary sets of RDF types (e.g., dataset with RDF published on remote URI) and thread them as a single RDF data unit to be consumed by the plugins Execute SPARQL queries over any type of RDF structureBe warned that some of this methods are too powerful, because they try to guarantee the complete results (no SPARQL distribution is used just the data is replicated temporary locally to execute the query, which make take a lot of IO and CPU)
  • Berlin SPARQL Benchmark (BSBM)Lehigh University Benchmark (LUBM)Linked Data Semantic Repository (LDSR)Pathway and Interaction Knowledge Base (PIKB) Uniprot (only curated entries; schema) Entrez - Gene (complete dataset; custom schema) BioPAX - Cancer Cell Map (BioPAX distribution) BioPAX - NCI Pathway Interaction Database (BioPAX distribution) BioPAX - Reactome (BioPAX distribution) BioPAX - BioGRID (complete dataset schema aligned to BioPAX) BioPAX - iProClass (complete dataset; custom schema) Gene Ontology (complete dataset; original schema) NCBI Taxonomy (complete dataset; custom schema)
  • D5.5.2 presents update on the state of the art in scalable RDF engines, as a basis for evaluation of the results of OWLIMThe map presents the loading speed of few of the most scalable repositories in relation to the size of the dataset and the complexity of the loading. The best published evaluation results have been used for each system. For OWLIM, ORACLE and DAML DB, loading includes forward-chaining and materialization. This diagram shows results up to 1.5billion explicit statementsOne can see that the results for loading are comparable, taking into account that the engines differ in features. Taking BigOWLIM’s results, one can observe how the difference in the semantics supported can alter the loading time almost by a factor of three. Overal, the evaluation demonstrated that LarKC data layer is very well positioned with respect to the other outstanding engines in the highly competitive niche of the so-called semantic repositories
  • The results of loading LDSR and PIKB are presented on the first “bubble-chart” – the bubbles are bigger than those for LUBM, to indicate higher complexity. Generally, the notion of “reason-able views” makes reasoning with linked data feasibleThe Linked Data Semantic Repository (LDSR) is discussed in WP2The Pathway Interaction KB (PIKB) is presented in WP7AThere will be demos based on LDSR and PIKB at the “demo market”
  • LarKC API DefinitionV0.1: non-streaming execution onlyV0.2: from non-streaming execution to streaming anytime behaviourV0.3: integration of Data Layer API <= Current Stable VersionImplementation of two test-rigs in order to validate the API and the general LarKC ideas Scripted DECIDE PlatformSelf-configuring DECIDE PlatformThey only differ in their code for the Decider plug-in, and in some minimal wrapping code for each plug-in to register the required information about itself in the meta-knowledge-base. All the other code is exactly the same between the two test-rigs, giving us confidence that indeed plug-ins will be re-usable under different Decider plug-ins.What we want to achieve in the next year wrt parallelization and distribution. Mention we have achieved gross-grain distribution and explain how, concrete technologies (IBIS,…). Layered architecture (implementation oriented), according to updated slide presented in EAW by Alexey.): Details of kind of parallelization,… concrete details on parallelization techniques,… to speed up performance… Give concrete details on how to apply concrete technologies. Distributed Data LayerData Streaming between remote componentsCaching, data warming/coolingMonitoring / InstrumentationFurther investigation and application of parallelization and distribution techniques to different types of distributed environments (high-performance grid, desktop grid, etc.)Further investigation and application of parallelization “within plug-ins” techniquesArchitecture refinementRequirements traceability (and possible update)

Iswc 2009 LarKC Tutorial: Architecture Iswc 2009 LarKC Tutorial: Architecture Presentation Transcript

  • LarKC Architecture and Technology
    Michael Witbrock, Cycorp Europe (+UIBK)
    with contributions from all LarKC developers
  • Realising the Architecture
    Workflow
    Support
    System
    Plug-in Manager
    Plug-in Registry
    Plug-in API
    Data Layer API
    RDF
    Store
    Data Layer
    2
  • LarKC Plug-in API: General Plug-in Model
    • Functionalproperties
    • Non-functionalproperties
    • WSDL description
    Plug-in
    Plug-in
    description
    + URI getIdentifier()
    + QoSInformationgetQoSInformation()
    Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application
    Plug-ins are identified by a URI (Uniform Resource Identifier)
    Plug-ins provide MetaData about what they do (Functional properties): e.g. type = Selecter
    Plug-ins provide information about their behaviour and needs, including Quality of Service information (Non-functional properties): e.g. Throughput, MinMemory, Cost,…
    Plug-ins can be provided with a Contract that tells them how to behave (e.g. Contract : “give me the next 10 results”) and Context information used to store state between invocations
    3
  • LarKC Plug-in API: IDENTIFY
    Identifier
    + Collection<InformationSet> identify
    (Query theQuery, Contractcontract, Contextcontext)
    IDENTIFY: Given a query, identify resources that could be used to answer it
    • Sindice – Triple Pattern Query  RDF Graphs
    • Google – Keyword Query  Natural Language Document
    • Triple Store – SPARQL Query  RDF Graphs
    4
  • LarKC Plug-in API: TRANSFORM (1/2)
    QueryTransformer
    + Set<Query> transform(Query theQuery, Contract theContract, Context theContext)
    Query TRANSFORM: Transforms a query from one representation to another
    • SPARQL Query  Triple Pattern Query
    • SPARQL Query  Keyword Query
    • SPARQL Query  SPARQL Query (different abstraction)
    • SQARQL Query  CycL Query
    5
  • LarKC Plug-in API: TRANSFORM (2/2)
    InformationSetTransformer
    + InformationSettransform(InformationSettheInformationSet, ContracttheContract, ContexttheContext)
    Information Set TRANSFORM: Transforms data from one representation to another
    • Natural Language Document  RDF Graph
    • Structured Data Sources  RDF Graph
    • RDF Graph  RDF Graph (e.g. foaf vocabulary to facebook vocabulary)
    6
  • LarKC Plug-in API: SELECT
    Selecter
    + SetOfStatements select(SetOfStatementstheSetOfStatements, Contract contract,
    Contextcontext)
    SELECT: Given a set of statements (e.g. a number of RDF Graphs) will choose a selection/sample from this set
    Collection of RDF Graphs  Triple Set (Merged)
    Collection of RDF Graphs  Triple Set (10% of each)
    Collection of RDF Graphs  Triple Set (N Triples)
    7
  • LarKC Plug-in API: REASON
    Reasoner
    + VariableBindingsparqlSelect(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)
    + SetOfStatementssparqlConstruct(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)
    + SetOfStatementssparqlDescribe(SPARQLQuerytheQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context)
    + BooleanInformationSetsparqlAsk(SPARQLQuerytheQuery,
    SetOfStatementstheSetOfStatements, Contract contract, Context context)
    REASON: Executes a query against the supplied set of statements
    SPARQL Query  Variable Binding (Select)
    SPARQL Query  Set of statements (Construct)
    SPARQL Query  Set of statements (Describe)
    SPARQL Query  Boolean (Ask)
    8
  • LarKC Plug-in API: DECIDE
    Decider
    + VariableBindingsparqlSelect(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)
    + SetOfStatementssparqlConstruct(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)
    + SetOfStatementssparqlDescribe(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)
    + BooleanInformationSetsparqlAsk(SPARQLQuerytheQuery, QoSParameterstheQoSParameters)
    DECIDE: Builds the workflow and manages the control flow
    Scripted Decider: Predefined workflow is built and executed
    Self-configuring Decider: Uses plug-in descriptions (functional and non-functional properties) to build the workflow
    9
  • Released System: larkc.sourceforge.net
    • Open Apache 2.0 license
    • Previous early adopters workshop @ ESWC
    • 20 people attended
    • participants modified plug-ins, modified workflows
    Standard Open Environment: subversion connection, command line build, or eclipse, netbeans soon?
    Plug-in API
    Decider
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    Plug-in Registry
    Pipeline
    Support
    System
    10
  • LarKC Plug-in API
    11
    Decider
    Reasoner
    Identifier
    QueryTransformer
    InformationSetTransformer
    Selecter
    + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters)
    + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters)
    + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters)
    + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters)
    + VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)
    + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)
    + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)
    + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery,
    SetOfStatements theSetOfStatements, Contract contract, Context context)
    • 5 types of plug-ins
    • Plug-in API enables interoperability (between plug-in and platform and between plug-ins)
    • Plug-ins I/O abstract data structures of RDF triples => flexibility for assembling plug-ins and for plug-in writers
    • Compatibility ensured by DECIDER and workflow configurators, based on plug-in description
    + Collection<InformationSet> identify
    (Query theQuery, Contract contract, Context context)
    + Set<Query> transform(Query theQuery, Contract theContract, Context theContext)
    + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext)
    + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context)
  • LarKC Architecture
    Application
    Plug-in API
    Decider
    Pipeline
    Support
    System
    Plug-in Registry
    Plug-in API
    Platform Utility Functionality
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    APIs
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-ins
    Query
    Transformer
    Identifier
    Selecter
    Reasoner
    Info. Set
    Transformer
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    External systems
    External data sources
    Data Layer API
    Data Layer
    RDF
    Store
    RDF
    Store
    RDF
    Store
    RDF
    Doc
    RDF
    Doc
    RDF
    Doc
    LarKC Plug-in API
    12
  • Plug-in API
    Decider
    Decider
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    Info Set Transformer
    Identifier
    Selecter
    Query
    Transformer
    Reasoner
    Plug-in Registry
    Workflow
    Support
    System
    RDF
    Store
    What does a workflow look like?
    13
  • What Does a Workflow Look Like?
    Plug-in API
    Default Graph
    Decider
    Decider
    RDF Graph
    RDF Graph
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    RDF Graph
    RDF Graph
    RDF Graph
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    Info Set Transformer
    Identifier
    Selecter
    Query
    Transformer
    Reasoner
    Plug-in Registry
    Workflow
    Support
    System
    RDF Graph
    RDF Graph
    Data Layer
    Data Layer
    Data Layer
    Data Layer
    RDF
    Store
    RDF Graph
    14
  • LarKC Data Model :Transport By Reference
    Labeled Set:
    Pointers to data
    Dataset: Collection
    of named graphs
    Default Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    Current Scale: O(1010) triples
    15
  • What Does a Workflow Look Like?
    Plug-in API
    Default Graph
    Decider
    Decider
    RDF Graph
    RDF Graph
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    RDF Graph
    RDF Graph
    RDF Graph
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    Info Set Transformer
    Identifier
    Selecter
    Query
    Transformer
    Reasoner
    Plug-in Registry
    Workflow
    Support
    System
    RDF Graph
    RDF Graph
    Data Layer
    Data Layer
    Data Layer
    Data Layer
    RDF
    Store
    RDF Graph
    16
  • What Does a Pipeline Look Like?
    Plug-in API
    Decider
    Decider
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Info Set Transformer
    Identifier
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    Identifier
    Info Set Transformer
    Identifier
    Selecter
    Query
    Transformer
    Reasoner
    Plug-in Registry
    Wlorkflow
    Support
    System
    Data Layer
    Data Layer
    Data Layer
    Data Layer
    RDF
    Store
    17
  • Remote and Heterogeneous Plug-ins
    Remote
    Plug-in Manager
    TRANSFORM
    TRANSFORM
    IDENTIFY
    IDENTIFY
    Adaptor
    SPARQL- GATE API
    SPARQL
    SPARQL-CycL
    SPARQL
    External or non-Java Code
    Research Cyc
    GATE
    Data Layer
    SINDICE
    Medical
    Data
    18
  • What Does a Workflow Look Like?
    Plug-in API
    Decider
    Decider
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Reasoner
    Info Set Transformer
    Identifier
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selecter
    Query
    Transformer
    Identifier
    Reasoner
    Info. Set
    Transformer
    Info Set Transformer
    Identifier
    Info Set Transformer
    Identifier
    Selecter
    Query
    Transformer
    Reasoner
    Plug-in Registry
    Workflow
    Support
    System
    Data Layer
    Data Layer
    Data Layer
    Data Layer
    RDF
    Store
    19
  • Decider Using Plug-in Registry to Create Pipeline
    D 1.3.1
    Represent Properties
    • Functional
    • Non-functional (e.g. QoS)
    • WSMO-LiteSyntax
    Q
    Q
    T
    T
    I
    I
    Logical Representation
    • Describes role
    • Describes Inputs/Outputs
    • Automatically extracted using API
    • Decider can use for dynamic configuration
    • Rule-based
    • Fast
    A
    B
    S
    R
    S
    R
    VB
    VB
    20
  • LarKC Plug-ins
    • Provide SPARQL end-points
    • Run in separate threads
    • Automatically add meta-data to registry when loaded
    • Communicate RDF data by passing labelled sets or references to labelled sets
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Selector
    Selector
    Query
    Transformer
    Transformer
    Identifier
    Identifier
    Identifier
    Identifier
    • Parallelisation in progress
    ransformer
    Transformer
    • Split/Join connectors in progress
    Transformer
    21
  • Application
    Plug-in API
    Decider
    Pipeline
    Support
    System
    Plug-in Registry
    Platform Utility Functionality
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    APIs
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-ins
    Query
    Transformer
    Identifier
    Selecter
    Reasoner
    Info. Set
    Transformer
    Data Layer API
    Data Layer
    RDF
    Store
    RDF
    Store
    RDF
    Store
    RDF
    Doc
    RDF
    Doc
    RDF
    Doc
    LarKC Data Layer
    22
    External systems
    Data Layer API
    External data sources
    Data Layer
  • LarKC Data Layer
    23
    Main goal:
    The LarKC Data Layer supports all LarKC plug-ins with respect to:
    storage, retrieval and light-weight inference on top of large volumes of data
    automates the exchange of RDF data by reference and by value
    offers other utility tools to manage data (e.g. merger)
    Labeled Set
    Default Graph
    Dataset
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
    RDF Graph
  • The implementation of the data layer was evaluated against
    Well-known benchmarks: LUBM (Lehigh Univ. Benchmark) and BSBM (Berlin SPARQL Benchmark), and
    Two views to the web of linked data used in LarKC: PIKB (Pathway and Interaction Knowledge Base) and LDSR (Linked Data Semantic Repository)
    Loading:
    15B statements at 18 KSt/sec. on $10,000 server
    1B statements at 66 KSt/sec. on $2,000 desktop
    Reasoning & Materialization:
    LUBM: 21 KSt/sec for 1BSt and 10 KSt/sec for 7B expl. statements
    LDSR: 14 KSt/sec for 357M expl. statements
    PIKB: 10 KSt/sec for 1.5B expl. Statements
    Competitive with State of the Art
    24
    LarKC Data Layer Performance
  • 25
    LarKC Data Layer Evaluation: Loading
  • Inference with both LDSR and PIKB prove to be much more complex than LUBM, because
    The datasets are much better interconnected
    There are plenty of owl:sameAs links
    OWL vocabulary is used disregarding its formal semantics
    E.g. in DBPedia there are skos:broader cycles of categories with length 180
    Optimizations of the handling of owl:sameAs are crucial
    PIKB: 1.47B explicit statements + 842M inferred
    LDSR loaded in 7 hours on desktop:
    Number of imported statements (NIS): 357M
    Number of new inferred statements: 512M
    Number of stored statements (NSS): 869M
    Number of retrievable statements (NRS): 1.14B
    owl:sameAs optimisation allowed reducing the indices by 280M statements
    26
    LarKC Data Layer Evaluation: Linked Data
  • Plug-in Architecture Signs of Success
    • Platform and Plug-in APIs are useable
    • In the twenties of plug-ins already
    • Plug-ins written with little help from architects
    • Plug-ins run successfully, and perform together
    • Outside plugin-writers:
    • OKKAM, NeOn, Aberdeen
    Plug-in Manager
    Plug-in API
    Identifier
    27
  • Active and Ready for the Public
    2170 check-outs
    1380 commits
    23 users of code repository
    LarKC + Alpha
    Plus Early Adopters Workshop branch
    20 downloads of alpha 1 public release since 30th May 2009.
    28
  • Lessons Learned (1/2)
    API Design
    Types of Plug-ins: 5 (+1 => 2 types of TRANSFORM)
    I/O data structures more abstract => more flexibility for assembling plug-ins and for plug-in writers
    Test API Implementation
    Validation and refinement of API (introduction of ‘Contract’ and ‘Context’ parameters)
    Transforming Cyc into LarKC Platform
    Minimization and reorganization of Cyc code as a basis for the LarKC Platform
    Plug-ins and Use cases implementation
    Feedback collected, as our first early adopters, on different topics (how-to guidelines, context parameter, plug-ins types, data caching,…)
    29
  • Lessons Learned (2/2)
    Licensing:
    Licensing policies aligned with partners’ and project’s interests => maximize openess and external contributions without preventing from exploitation
    Components’ licenses monitoring to avoid conflicts
    MaRVIN and IBIS:
    strategy applicable to large-scale deployment, autonomous and symmetric nodes, asynchronous communication between nodes, well-balanced load needed
    abstraction layer hiding resources heterogeneity (IBIS)
    30
  • Project Timeline
    42
    0
    6
    18
    33
    10
    Use Cases V2
    Use Cases V3
    Use Cases V1
    Plug-ins
    Surveys (plug-ins, platform) & Requirements (use cases)
    Offer computing resources
    Monitoring & instrumentation
    Anytime behaviour
    Prototype
    Internal Release
    Public Release
    Final Release
    Data caching
    14
    31
  • Rapid Progress, but We’re Not Finished…
    Application
    Detailedinformation in D5.3.1 Requirements Analysis andreport on lessons learned during prototyping
    Requirements (WP 5)
    • Optimisation of complex workflows.
    • Extend meta-data representation for QoS, parallelism and use it.
    • Concentrate on parallel and distributed execution.
    • Concentrate on parallel and distributed data layer; caching and data migration.
    • Support more plug-in needs while maintaining platform integrity (e.g. efficient weight modification for spreading activation)
    • Data write for persistent transformation (e.g. rumination reasoning in Marvin experiments)
    Plug-in API
    • Sources
    • Initial Project Objectives (DoW)
    • LarKC Collider Platform (WP5 discussions)
    • LarKC Rapid Prototyping
    • LarKC Use Cases (WP6, WP7a, WP7b)
    • LarKC Plug-ins (WP2, WP3, WP4)
    Decider
    Pipeline
    Support
    System
    Plug-in Registry
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in Manager
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Plug-in API
    Query
    Transformer
    Identifier
    Selecter
    Reasoner
    Info. Set
    Transformer
    • Classified according to:
    • Resources
    • Heterogeneity
    • Usage
    • Interoperability
    • Parallelization “within plug-ins”
    • Distributed/remote execution
    • Data Layer
    • Data Caching
    • Anytime Behaviour
    • Plug-in Registration and Discovery
    • Plug-in Monitoring and Measurement
    • Support for Developers
    • Plug-ins
    Data Layer API
    • Support workflows inspired by human cognition (e.g. workflow interruption for optimal stopping)
    • Support anytime/streaming
    • Experimental instrumentation and monitoring
    Data Layer
    RDF
    Store
    RDF
    Store
    RDF
    Store
    RDF
    Doc
    RDF
    Doc
    RDF
    Doc
    32
  • Distributed Data Layer
    Caching, data warming/cooling
    Data Streaming between remote components
    Parallelization and distribution on different types of environments (high-performance grid, desktop grid, etc.)
    Experimental instrumentation and monitoring
    33
    Open Issues & Next Steps
    Platform validation
    • Requirements traceability and update
    • Architecture refinement
    Early Adopters
  • fin