Stardog Linked Data Catalog
Upcoming SlideShare
Loading in...5
×
 

Stardog Linked Data Catalog

on

  • 671 views

A talk from Semtech NYC 2012 about Stardog Linked Data Catalog, a portfolio management system for enterprise linked data.

A talk from Semtech NYC 2012 about Stardog Linked Data Catalog, a portfolio management system for enterprise linked data.

Statistics

Views

Total Views
671
Views on SlideShare
671
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Stardog Linked Data Catalog Stardog Linked Data Catalog Presentation Transcript

  • StardogLinked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}@clarkparsia.com
  • Who are we?● Clark & Parsia is a semantic software startup● HQ in Washington, DC & office in Boston● Provides software development and integration services● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers http://clarkparsia.com/ Twitter: @candp
  • Whats SLDC?● Stardog Linked Data Catalog● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ...● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  • Use Cases● Sources ○ Management, import, subscription, categorization, sharing● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration● Locating sources ○ Search, browse● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  • Application layer Middleware layerNLP/AI analytics layer Data layer
  • Demo
  • Semantic Technologies● W3C standards ○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  • Data Model● Similar to DCAT from W3C ○ Catalog entries● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  • Modeling the Domain● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  • Security● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  • Deriving Permissions● Users have permission roles● Permission roles have permission relations with resources
  • Deriving Permissions● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource)● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  • Deriving Permissions● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  • Impact of ReasoningCan user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  • Impact of Reasoning● Are you sure youre not missing anything?● New awesome way of getting delete permissions you came up with yesterday● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  • Too much Inference?When I say :deleteUserPermission domain :User :deleteUserPermission range :ResourceI mean that for every triple :user1 :deleteUserPermission :resource1the individual :user1 must be an instance of :User and :resource1 of :Resource.But the reasoner doesnt find the error!!
  • Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  • Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWAConsistent true false Infer that Assume thatReason :user1 type :User :user1 type not :User
  • CWA or OWA?● Which one? ○ Of course use both!● Some axioms should be interpreted under CWA :deleteUserPermission domain :User● And others under OWA :SuperUser subClassOf :User● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  • SLDC for Data Integration● SLDC provides descriptions of data sources, relationships between them, and information to query them● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics● Virtual, materialized, hybrid
  • Mappings● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  • Summing Up● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  • Questions?
  • Why?● Large organizations ○ Disparate departments ○ Independent, isolated sources● Where is what? ○ Do we have a data source about clients? ○ Where is it?● Who created what? ○ Who owns it?● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  • Source Management● Management ○ Create, delete, update, clone● Import ○ RDF, HTML, XML● Subscription ○ Endpoint location● Categorization ○ Categories ○ External vocabularies● Sharing ○ To specific users ○ Public
  • Querying Sources● Querying metadata ○ Queries about the catalog itself● External query ○ Querying a particular source● Integrated query ○ Querying a set of integrated sources● Query management● Query sharing● Results export
  • Finding Sources● Browse ○ Facets ○ Pelorus● Search ○ Text-based search ○ Rich query language
  • Last but not least● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search● Graph algorithms ○ Whats the shortest path between these resources?● Clustering ○ Can we discover similar sources based on a given criteria?
  • Axioms● Its not always about simple taxonomies...● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)?● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)?● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  • Data Validation● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc.● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings● For some use cases, data validation is critical but we still want to do it intelligently
  • Participation ConstraintEach resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWAConsistent true false Infer that Assume that ● _:b : _:b :resourceCreator :Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  • Uniqueness ConstraintEach data source must belong to at most onecatalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWAConsistent true false Assume that Infer thatReason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false