Web Scale Reasoning and the LarKC Project

1,233 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,233
On SlideShare
0
From Embeds
0
Number of Embeds
378
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Web Scale Reasoning and the LarKC Project

  1. 1. Web Scale Reasoning and the LarKC Project (Introduction) Luka Bradeško Cycorp Europe
  2. 2. Goals of LarKC LarKC = Large Knowledge Collider • Build an integrated pluggable platform for large scale reasoning • Support for parallelization, distribution, remote execution, data 2 2 “Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs of something that was only possible before at high cost” distribution, remote execution, data storage • Use existing plug-ins, develop new • Easy integration of components • Enables low cost experimentation
  3. 3. Overall approach of LarKC • Very lightweight platform – communication, synchronisation, registration – LarKC = “SPARQL endpoint on steroids” • The real work happens in the plugins • LarKC gives you: – very scalable datalayer 3 – very scalable datalayer – standardised interfaces for combining components – utilities & infrastructure to abstract from remote deployment • Three types of LarKC users: – people building plugins – people configuring workflows – people using workflows
  4. 4. Workflow Support System Plug-in RegistryDecider Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API SPARQL Endpoint, Application Platform Utility Functionality APIs Plug-ins LarKC Architecture Data Layer API RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Query Transformer Plug-in API Identifier Plug-in API Info. Set Transformer Plug-in API Selecter Plug-in API Reasoner Plug-in API External systems External data sources 4
  5. 5. LarKC Plug-in API + Collection<InformationSet> identify (Query theQuery, Contract contract, Context context) Identifier + Set<Query> transform(Query theQuery, Contract theContract, Context theContext) QueryTransformer + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext) InformationSetTransformer + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context) Selecter + VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) Reasoner • 5 types of plug-ins • Plug-in API enables interoperability (between plug-in and platform and between plug-ins) • Plug-ins I/O abstract data structures of RDF triples => flexibility for assembling plug-ins and for plug-in writers 5 + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters) Decider flexibility for assembling plug-ins and for plug-in writers • Compatibility ensured by DECIDER and workflow configurators, based on plug-in description
  6. 6. Decider What does a workflow look like? Decider Plug-in APIPlug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer ReasonerSelecter Query Transformer 6
  7. 7. What does a workflow look like? Decider Info Set Transformer Identifier Identifier Decider Plug-in APIPlug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Workflow Support System RDF Store Identifier Info Set Transformer ReasonerSelecter Query Transformer Data Layer Data Layer Data Layer Data Layer 7
  8. 8. Decider Using Plug-in Registry to Create Workflow Q TI S R VB B D 1.3.1 Represent Properties • Functional • Non-functional (e.g. QoS) • WSMO-Lite Syntax Represent Properties • Functional • Non-functional (e.g. QoS) • WSMO-Lite Syntax Q T I S R VB A VB Logical Representation • Describes role • Describes Inputs/Outputs • Automatically extracted using API • Decider can use for dynamic configuration • Rule-based • Fast Logical Representation • Describes role • Describes Inputs/Outputs • Automatically extracted using API • Decider can use for dynamic configuration • Rule-based • Fast 8
  9. 9. LarKC Plug-in Managers Plug-in Manager Query Transformer Plug-in APIPlug-in API Plug-in Manager Identifier Plug-in APIPlug-in API • Run in separate threads • Automatically add meta-data to registry when loaded • Communicate RDF data by value or by reference • Parallelisation Plug-in Manager Transformer Plug-in APIPlug-in API Plug-in Manager Identifier Plug-in APIPlug-in API ransformer Transformer Transformer Plug-in Manager Identifier Plug-in APIPlug-in API Plug-in Manager Identifier Plug-in APIPlug-in API Plug-in Manager Selector Plug-in API Plug-in Manager Selector Plug-in API • Split/Join connectors in progress 9
  10. 10. Example workflow Query Identify Transform GATE Internet PREFIX cyc: <http://www.cycfoundation.org/concepts/> SELECT ?company WHERE { ?company cyc:mentionedInArticle "http://shodan.ijs.si/article.txt" . ?company cyc:isa cyc:PubliclyHeldCorporation } Select ReasonResult Research Cyc GATE
  11. 11. Pipeline Support System Plug-in RegistryDecider Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Application LarKC Data Layer Platform Utility Functionality APIs Plug-ins Data Layer API RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Query Transformer Plug-in API Identifier Plug-in API Info. Set Transformer Plug-in API Selecter Plug-in API Reasoner Plug-in API 11 External systems External data sources Data Layer API Data Layer
  12. 12. LarKC Data Layer RDF Graph RDF Graph RDF Graph Default Graph RDF Graph RDF Graph Dataset Labeled Set Main goal: • The Data Layer supports LarKC plug-ins: – storage, retrieval and light-weight inference on top of large volumes of data – automates the exchange of RDF data by reference and by value 12 RDF Graph RDF Graph RDF Graph Graph Graph RDF Graph RDF Graph RDF Graph RDF Graph RDF Graph and by value – offers other utility tools to manage data (e.g. merger)
  13. 13. Used Concepts in the Data Model NG1 NG3NG2 Labelled groups NG4 NG5 Labelled groups of statements Labelled groups of statements
  14. 14. Supported Sets of Statements RDF data types Description Example Set of statement RDF statements s1, p1, o1, ng1 s2, p2, o2 s3, p3, o3, ng3, {group1} RDF graph Named graph s1, p1, o1, ng1, {group1} s2, p2, o2, ng1, {group2} s3, p3, o3, ng1 Dataset SPARQL dataset represents a collection of graphs s1, p1, o1, ng1 s2, p2, o2, ng2 s3, p3, o3, ng3 Labelled group of statements RDF group of statements s1, p1, o1, ng1, {group1} s2, p2, o2, ng2, {group1} s3, p3, o3, ng3, {group1}
  15. 15. Current Status 15
  16. 16. Released System v1.1: larkc.sourceforge.net Identify • SINDICE • SWOOGLE • SINDICE • SWOOGLE Select • Spreading Activation • Geolocation • Spreading Activation • Geolocation • Annotate GATE • Annotate Cyc • Annotate GATE • Annotate Cyc • Open Apache 2.0 license • Previous early adopters workshops @ ESWC ’09,10 and ISWC ‘09 – participants modified plug-ins, modified workflows Transform • Annotate Cyc • SPARQL-CycL • Annotate Cyc • SPARQL-CycL Reason • Jena, IRIS, Pellet • Cyc, PION • Siemens • Jena, IRIS, Pellet • Cyc, PION • Siemens Decide • Scripted: Real- Time City • Dynamic Cyc • Scripted: Real- Time City • Dynamic Cyc Decider Plug-in API Plug-in Manager Query Transformer Plug-in API Plug-in Manager Identifier Plug-in API Plug-in Manager Info. Set Transformer Plug-in API Plug-in Manager Selecter Plug-in API Plug-in Manager Reasoner Plug-in API Plug-in Registry Pipeline Support System Standard Open Environment: Java, subversion, packaged release, command line build, or eclipse 16
  17. 17. • Distributed Data Layer • Caching, data warming/cooling • Data Streaming between remote components Next Steps • Requirements traceability and update • Architecture refinement Platform validationPlatform validation remote components • Experimental instrumentation and monitoring 17 Early Adopters
  18. 18. End 18
  19. 19. Rapid Progress, but We’re Not Finished… Pipeline Support System Plug-in RegistryDecider Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Plug-in Manager Plug-in API Application • Classified according to: • Sources – Initial Project Objectives (DoW) – LarKC Collider Platform (WP5 discussions) – LarKC Rapid Prototyping – LarKC Use Cases (WP6, WP7a, WP7b) – LarKC Plug-ins (WP2, WP3, WP4) Detailed information in D5.3.1 Requirements Analysis and report on lessons learned during prototyping Requirements (WP 5)• Concentrating on parallel and distributed execution. • Optimisation of complex workflows. • Extend meta-data representation for QoS, parallelism and use it. • Concentrating on parallel and distributed data layer; caching Data Layer API RDF Store RDF Store RDF Store RDF Doc RDF Doc RDF Doc Data Layer Query Transformer Plug-in API Identifier Plug-in API Info. Set Transformer Plug-in API Selecter Plug-in API Reasoner Plug-in API – Data Caching – Anytime Behaviour – Plug-in Registration and Discovery – Plug-in Monitoring and Measurement – Support for Developers – Plug-ins • Classified according to: – Resources – Heterogeneity – Usage – Interoperability – Parallelization “within plug- ins” – Distributed/remote execution – Data Layer prototyping distributed data layer; caching and data migration. • Support more plug-in needs while maintaining platform integrity • Integrate distributed plug-ins • Support workflows inspired by human cognition (e.g. workflow interruption for optimal stopping) • Support anytime/streaming • Experimental instrumentation and monitoring 19
  20. 20. LarKC Plug-in API: General Plug-in Model + URI getIdentifier() + QoSInformation getQoSInformation() Plug-in Functional properties Non-functional properties WSDL description Plug-in description • Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application • Plug-ins are identified by a URI (Uniform Resource Identifier) • Plug-ins provide MetaData about what they do (Functional properties): e.g. type = Selecter • Plug-ins provide information about their behaviour and needs, including Quality of Service information (Non-functional properties): e.g. Throughput, MinMemory, Cost,… • Plug-ins can be provided with a Contract that tells them how to behave (e.g. Contract : “give me the next 10 results”) and Context information used to store state between invocations 20
  21. 21. LarKC Plug-in API: IDENTIFY + Collection<InformationSet> identify (Query theQuery, Contract contract, Context context) Identifier • IDENTIFY: Given a query, identify resources that could be used to answer it • Sindice – Triple Pattern Query RDF Graphs • Google – Keyword Query Natural Language Document • Triple Store – SPARQL Query RDF Graphs 21
  22. 22. LarKC Plug-in API: TRANSFORM (1/2) Set<Query> transform(Query theQuery, Contract+ Set<Query> transform(Query theQuery, Contract theContract, Context theContext) QueryTransformer • Query TRANSFORM: Transforms a query from one representation to another • SPARQL Query Triple Pattern Query • SPARQL Query Keyword Query • SPARQL Query SPARQL Query (different abstraction) • SPARQL Query CycL Query 22
  23. 23. LarKC Plug-in API: TRANSFORM (2/2) + InformationSet transform(InformationSet theInformationSet, Contract theContract, Context theContext) InformationSetTransformer • Information Set TRANSFORM: Transforms data from one representation to another • Natural Language Document RDF Graph • Structured Data Sources RDF Graph • RDF Graph RDF Graph (e.g. foaf vocabulary to facebook vocabulary) 23
  24. 24. LarKC Plug-in API: SELECT + SetOfStatements select(SetOfStatements theSetOfStatements, Contract contract, Context context) Selecter • SELECT: Given a set of statements (e.g. a number of RDF Graphs) will choose a selection/sample from this set – Collection of RDF Graphs Triple Set (Merged) – Collection of RDF Graphs Triple Set (10% of each) – Collection of RDF Graphs Triple Set (N Triples) 24
  25. 25. LarKC Plug-in API: REASON + VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) Reasoner • REASON: Executes a query against the supplied set of statements – SPARQL Query Variable Binding (Select) – SPARQL Query Set of statements (Construct) – SPARQL Query Set of statements (Describe) – SPARQL Query Boolean (Ask) theSetOfStatements, Contract contract, Context context) + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, SetOfStatements theSetOfStatements, Contract contract, Context context) 25
  26. 26. LarKC Plug-in API: DECIDE + VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters) + SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters) Decider • DECIDE: Builds the workflow and manages the control flow – Scripted Decider: Predefined workflow is built and executed – Self-configuring Decider: Uses plug-in descriptions (functional and non-functional properties) to build the workflow + BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters) 26

×