Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stardog Linked Data Catalog


Published on

A talk from Semtech NYC 2012 about Stardog Linked Data Catalog, a portfolio management system for enterprise linked data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stardog Linked Data Catalog

  1. 1. StardogLinked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}
  2. 2. Who are we?● Clark & Parsia is a semantic software startup● HQ in Washington, DC & office in Boston● Provides software development and integration services● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers Twitter: @candp
  3. 3. Whats SLDC?● Stardog Linked Data Catalog● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ...● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  4. 4. Use Cases● Sources ○ Management, import, subscription, categorization, sharing● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration● Locating sources ○ Search, browse● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  5. 5. Application layer Middleware layerNLP/AI analytics layer Data layer
  6. 6. Demo
  7. 7. Semantic Technologies● W3C standards ○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  8. 8. Data Model● Similar to DCAT from W3C ○ Catalog entries● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  9. 9. Modeling the Domain● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  10. 10. Security● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  11. 11. Deriving Permissions● Users have permission roles● Permission roles have permission relations with resources
  12. 12. Deriving Permissions● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource)● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  13. 13. Deriving Permissions● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  14. 14. Impact of ReasoningCan user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  15. 15. Impact of Reasoning● Are you sure youre not missing anything?● New awesome way of getting delete permissions you came up with yesterday● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  16. 16. Too much Inference?When I say :deleteUserPermission domain :User :deleteUserPermission range :ResourceI mean that for every triple :user1 :deleteUserPermission :resource1the individual :user1 must be an instance of :User and :resource1 of :Resource.But the reasoner doesnt find the error!!
  17. 17. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  18. 18. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWAConsistent true false Infer that Assume thatReason :user1 type :User :user1 type not :User
  19. 19. CWA or OWA?● Which one? ○ Of course use both!● Some axioms should be interpreted under CWA :deleteUserPermission domain :User● And others under OWA :SuperUser subClassOf :User● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  20. 20. SLDC for Data Integration● SLDC provides descriptions of data sources, relationships between them, and information to query them● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics● Virtual, materialized, hybrid
  21. 21. Mappings● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  22. 22. Summing Up● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  23. 23. Questions?
  24. 24. Why?● Large organizations ○ Disparate departments ○ Independent, isolated sources● Where is what? ○ Do we have a data source about clients? ○ Where is it?● Who created what? ○ Who owns it?● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  25. 25. Source Management● Management ○ Create, delete, update, clone● Import ○ RDF, HTML, XML● Subscription ○ Endpoint location● Categorization ○ Categories ○ External vocabularies● Sharing ○ To specific users ○ Public
  26. 26. Querying Sources● Querying metadata ○ Queries about the catalog itself● External query ○ Querying a particular source● Integrated query ○ Querying a set of integrated sources● Query management● Query sharing● Results export
  27. 27. Finding Sources● Browse ○ Facets ○ Pelorus● Search ○ Text-based search ○ Rich query language
  28. 28. Last but not least● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search● Graph algorithms ○ Whats the shortest path between these resources?● Clustering ○ Can we discover similar sources based on a given criteria?
  29. 29. Axioms● Its not always about simple taxonomies...● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)?● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)?● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  30. 30. Data Validation● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc.● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings● For some use cases, data validation is critical but we still want to do it intelligently
  31. 31. Participation ConstraintEach resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWAConsistent true false Infer that Assume that ● _:b : _:b :resourceCreator :Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  32. 32. Uniqueness ConstraintEach data source must belong to at most onecatalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWAConsistent true false Assume that Infer thatReason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false