Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Web Scale Reasoning and the LarKC Project

on

  • 1,232 views

 

Statistics

Views

Total Views
1,232
Views on SlideShare
837
Embed Views
395

Actions

Likes
0
Downloads
9
Comments
0

4 Embeds 395

http://blog.saltlux.com 311
http://www.saltlux.com 53
http://in2.saltlux.com 30
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Web Scale Reasoning and the LarKC Project Web Scale Reasoning and the LarKC Project Presentation Transcript

  • Web Scale Reasoning and the LarKC Project (Introduction) Luka Bradeško Cycorp Europe
  • Goals of LarKC LarKC = Large Knowledge Collider • Build an integrated pluggable platform for large scale reasoning • Support for parallelization, distribution, remote execution, data storage • Use existing plug-ins, develop new • Easy integration of components • Enables low cost experimentation “Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs 2 of something that was only possible before at high cost” 2
  • Overall approach of LarKC • Very lightweight platform – communication, synchronisation, registration – LarKC = “SPARQL endpoint on steroids” • The real work happens in the plugins • LarKC gives you: – very scalable datalayer – standardised interfaces for combining components – utilities & infrastructure to abstract from remote deployment • Three types of LarKC users: – people building plugins – people configuring workflows – people using workflows 3
  • LarKC Architecture SPARQL Endpoint, Application Plug-in API Workflow Plug-in Platform Utility Support Functionality Decider Registry System APIsPlug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-ins Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Query Info. Set External Identifier Selecter Reasoner Transformer Transformer systems External data Data Layer API sources Data Layer RDF RDF RDF RDF RDF RDF Store Store Store Doc Doc Doc 4
  • LarKC Plug-in API Identifier Selecter QueryTransformer + Collection<InformationSet> + SetOfStatements + Set<Query> identify select(SetOfStatements transform(Query theQuery, (Query theQuery, Contract theSetOfStatements, Contract theContract, contract, Context context) Contract contract, Context Context theContext) InformationSetTransformer context) • 5 types of plug-ins + InformationSet • Plug-in API enables interoperability (between plug-in Reasoner transform(InformationSet theInformationSet, Contract and platform and between plug-ins) theContract, Context theContext) • Plug-ins I/O abstract data structures of RDF triples =>+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context) flexibility for assembling plug-ins and for plug-in writers+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements • Compatibility ensured by DECIDER and workflowtheSetOfStatements, Contract contract, Context context) Decider configurators, based on plug-in description+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatementstheSetOfStatements, Contract contract, Context context) + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, theQoSParameters)SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters) 5
  • What does a workflow look like? Decider Query Info Set Identifier Selecter Reasoner Transformer Transformer Plug-in API Workflow Plug-in Decider Support System Registry Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager RDF Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Store Query Info. Set Identifier Selecter Reasoner Transformer Transformer 6
  • What does a workflow look like? Decider Info Set Identifier Transformer Identifier Query Info Set Identifier Selecter Reasoner Transformer Transformer Data Layer Data Layer Data Layer Data Layer Plug-in API Workflow Plug-in Decider Support System Registry Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager RDF Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Store Query Info. Set Identifier Selecter Reasoner Transformer Transformer 7
  • Decider Using Plug-in Registry to Create Workflow D 1.3.1 Represent Properties Q • Functional • Non-functional (e.g. QoS) B • WSMO-Lite Syntax I T S R VB Q Logical Representation T A • Describes role I • Describes Inputs/Outputs • Automatically extracted using API S R • Decider can use for dynamic configuration • Rule-based VB • Fast 8
  • LarKC Plug-in Managers Plug-in Manager Plug-in Manager Plug-in API Plug-in API Query• Run in separate threads Identifier Transformer• Automatically add meta-data to registry when loaded• Communicate RDF data by value or by reference Plug-in Manager Plug-in Manager• Parallelisation Plug-in API Plug-in Manager Plug-in APIManager Plug-in Plug-in API Transformer Identifier API Plug-in ransformer Identifier Plug-in Manager Transformer Transformer Identifier Plug-in API Selector Plug-in Manager • Split/Join connectors Plug-in API in progress Selector 9
  • Example workflowPREFIX cyc: <http://www.cycfoundation.org/concepts/> QuerySELECT ?company WHERE Identify{ ?company cyc:mentionedInArticle Internet"http://shodan.ijs.si/article.txt" .?company cyc:isa cyc:PubliclyHeldCorporation } Transform GATE Select Research Cyc Result Reason
  • LarKC Data Layer Application Plug-in API Pipeline Plug-in Platform Utility Support Functionality Decider Registry System APIsPlug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-ins Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Query Info. Set External Identifier Selecter Reasoner Transformer Transformer systems External data Data Layer API sources Data Layer API Data Layer Data Layer RDF RDF RDF RDF RDF RDF Store Store Store Doc Doc Doc 11
  • LarKC Data LayerMain goal: Labeled Set Dataset• The Data Layer supports LarKC plug-ins: Default Graph – storage, retrieval and light-weight inference on top RDF RDF of large volumes of data Graph Graph RDF RDF RDF – automates the exchange of RDF data by reference Graph Graph Graph and by value RDF RDF – offers other utility tools to manage data (e.g. Graph Graph RDF merger) RDF Graph RDF Graph Graph RDF RDF Graph Graph RDF Graph 12
  • Used Concepts in the Data Model NG1 NG3 NG2 Labelled groups of statements Labelled groups of statements NG4 NG5
  • Supported Sets of StatementsRDF data types Description ExampleSet of statement RDF statements s1, p1, o1, ng1 s2, p2, o2 s3, p3, o3, ng3, {group1}RDF graph Named graph s1, p1, o1, ng1, {group1} s2, p2, o2, ng1, {group2} s3, p3, o3, ng1Dataset SPARQL dataset s1, p1, o1, ng1 represents a collection s2, p2, o2, ng2 of graphs s3, p3, o3, ng3Labelled group of RDF group of s1, p1, o1, ng1, {group1}statements statements s2, p2, o2, ng2, {group1} s3, p3, o3, ng3, {group1}
  • Current Status 15
  • Released System v1.1: larkc.sourceforge.net • SINDICE • Open Apache 2.0 license Identify • SWOOGLE • Previous early adopters • Spreading workshops @ ESWC ’09,10 and Select Activation • Geolocation ISWC ‘09 – participants modified plug-ins, • Annotate GATE modified workflows • Annotate Cyc Transform • SPARQL-CycL Standard Open Environment: Java, subversion, packaged • Jena, IRIS, Pellet • Cyc, PION release, command line build, or eclipse Reason • Siemens Plug-in API • Scripted: Real- Pipeline Plug-in Time City Decider Support System Registry Decide • Dynamic Cyc Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in API Plug-in API Plug-in API Plug-in API Plug-in API Query Info. Set Identifier Selecter Reasoner Transformer Transformer 16
  • Next Steps Platform validation• Distributed Data Layer • Requirements traceability and• Caching, data warming/cooling update• Data Streaming between • Architecture refinement remote components• Experimental instrumentation and monitoring Early Adopters 17
  • End 18
  • Rapid Progress, but We’re Not Finished… Requirements (WP 5)• Concentrating on parallel and Applicationdistributed execution. • Sources Detailed information• OptimisationProject Objectives (DoW) – Initial API complex Plug-in of Pipeline in D5.3.1 Plug-inworkflows. Decider Support – LarKC Collider Platform (WP5 discussions) Registry Requirements• Extend LarKC Rapid Prototyping System – meta-data Analysis andrepresentation for QoS, report on lessons – LarKC Use Cases (WP6, WP7a, WP7b)parallelism and use it. learned during – LarKC Plug-ins (WP2, WP3, WP4) Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager Plug-in Manager• Concentrating on parallel and prototyping • Plug-in API Plug-in API Classified according to: Plug-in APIdistributed data layer; caching Info. Set Query Plug-in API Plug-in API Identifier Selecter Reasonerand Transformer dataResources – migration. Transformer – Data Caching• Support more plug-in needs – Heterogeneity – Anytime Behaviourwhile maintaining platform Data Layer API Support workflows inspired by • – Usage – Plug-in Registration andintegrity human cognition (e.g. workflow – Interoperability Discovery• Integrate distributed plug-insData Layer interruption for optimal stopping) – Parallelization “within plug- • Support anytime/streaming – Plug-in Monitoring and ins” • Experimental instrumentation and Measurement RDF RDF RDF RDF RDF RDF – Distributed/remote execution monitoring Doc Developers Store Store Store – Support for Doc Doc – Data Layer – Plug-ins 19
  • LarKC Plug-in API: General Plug-in Model Plug-in Plug-in description Functional properties + URI getIdentifier() Non-functional properties + QoSInformation getQoSInformation() WSDL description• Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application• Plug-ins are identified by a URI (Uniform Resource Identifier)• Plug-ins provide MetaData about what they do (Functional properties): e.g. type = Selecter• Plug-ins provide information about their behaviour and needs, including Quality of Service information (Non-functional properties): e.g. Throughput, MinMemory, Cost,…• Plug-ins can be provided with a Contract that tells them how to behave (e.g. Contract : “give me the next 10 results”) and Context information used to store state between invocations 20
  • LarKC Plug-in API: IDENTIFY Identifier + Collection<InformationSet> identify (Query theQuery, Contract contract, Context context)• IDENTIFY: Given a query, identify resources that could be used to answer it • Sindice – Triple Pattern Query RDF Graphs • Google – Keyword Query Natural Language Document • Triple Store – SPARQL Query RDF Graphs 21
  • LarKC Plug-in API: TRANSFORM (1/2) QueryTransformer + Set<Query> transform(Query theQuery, Contract theContract, Context theContext)• Query TRANSFORM: Transforms a query from one representation to another • SPARQL Query Triple Pattern Query • SPARQL Query Keyword Query • SPARQL Query SPARQL Query (different abstraction) • SPARQL Query CycL Query 22
  • LarKC Plug-in API: TRANSFORM (2/2) InformationSetTransformer + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext)• Information Set TRANSFORM: Transforms data from one representation to another • Natural Language Document RDF Graph • Structured Data Sources RDF Graph • RDF Graph RDF Graph (e.g. foaf vocabulary to facebook vocabulary) 23
  • LarKC Plug-in API: SELECT Selecter + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context)• SELECT: Given a set of statements (e.g. a number of RDF Graphs) will choose a selection/sample from this set – Collection of RDF Graphs Triple Set (Merged) – Collection of RDF Graphs Triple Set (10% of each) – Collection of RDF Graphs Triple Set (N Triples) 24
  • LarKC Plug-in API: REASON Reasoner + VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context)• REASON: Executes a query against the supplied set of statements – SPARQL Query Variable Binding (Select) – SPARQL Query Set of statements (Construct) – SPARQL Query Set of statements (Describe) – SPARQL Query Boolean (Ask) 25
  • LarKC Plug-in API: DECIDE Decider + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters)• DECIDE: Builds the workflow and manages the control flow – Scripted Decider: Predefined workflow is built and executed – Self-configuring Decider: Uses plug-in descriptions (functional and non-functional properties) to build the workflow 26