Stardog Linked Data Catalog

773 views
662 views

Published on

A talk from Semtech NYC 2012 about Stardog Linked Data Catalog, a portfolio management system for enterprise linked data.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
773
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Stardog Linked Data Catalog

  1. 1. StardogLinked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}@clarkparsia.com
  2. 2. Who are we?● Clark & Parsia is a semantic software startup● HQ in Washington, DC & office in Boston● Provides software development and integration services● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers http://clarkparsia.com/ Twitter: @candp
  3. 3. Whats SLDC?● Stardog Linked Data Catalog● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ...● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  4. 4. Use Cases● Sources ○ Management, import, subscription, categorization, sharing● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration● Locating sources ○ Search, browse● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  5. 5. Application layer Middleware layerNLP/AI analytics layer Data layer
  6. 6. Demo
  7. 7. Semantic Technologies● W3C standards ○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  8. 8. Data Model● Similar to DCAT from W3C ○ Catalog entries● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  9. 9. Modeling the Domain● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  10. 10. Security● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  11. 11. Deriving Permissions● Users have permission roles● Permission roles have permission relations with resources
  12. 12. Deriving Permissions● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource)● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  13. 13. Deriving Permissions● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  14. 14. Impact of ReasoningCan user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  15. 15. Impact of Reasoning● Are you sure youre not missing anything?● New awesome way of getting delete permissions you came up with yesterday● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  16. 16. Too much Inference?When I say :deleteUserPermission domain :User :deleteUserPermission range :ResourceI mean that for every triple :user1 :deleteUserPermission :resource1the individual :user1 must be an instance of :User and :resource1 of :Resource.But the reasoner doesnt find the error!!
  17. 17. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  18. 18. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWAConsistent true false Infer that Assume thatReason :user1 type :User :user1 type not :User
  19. 19. CWA or OWA?● Which one? ○ Of course use both!● Some axioms should be interpreted under CWA :deleteUserPermission domain :User● And others under OWA :SuperUser subClassOf :User● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  20. 20. SLDC for Data Integration● SLDC provides descriptions of data sources, relationships between them, and information to query them● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics● Virtual, materialized, hybrid
  21. 21. Mappings● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  22. 22. Summing Up● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  23. 23. Questions?
  24. 24. Why?● Large organizations ○ Disparate departments ○ Independent, isolated sources● Where is what? ○ Do we have a data source about clients? ○ Where is it?● Who created what? ○ Who owns it?● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  25. 25. Source Management● Management ○ Create, delete, update, clone● Import ○ RDF, HTML, XML● Subscription ○ Endpoint location● Categorization ○ Categories ○ External vocabularies● Sharing ○ To specific users ○ Public
  26. 26. Querying Sources● Querying metadata ○ Queries about the catalog itself● External query ○ Querying a particular source● Integrated query ○ Querying a set of integrated sources● Query management● Query sharing● Results export
  27. 27. Finding Sources● Browse ○ Facets ○ Pelorus● Search ○ Text-based search ○ Rich query language
  28. 28. Last but not least● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search● Graph algorithms ○ Whats the shortest path between these resources?● Clustering ○ Can we discover similar sources based on a given criteria?
  29. 29. Axioms● Its not always about simple taxonomies...● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)?● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)?● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  30. 30. Data Validation● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc.● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings● For some use cases, data validation is critical but we still want to do it intelligently
  31. 31. Participation ConstraintEach resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWAConsistent true false Infer that Assume that ● _:b : _:b :resourceCreator :Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  32. 32. Uniqueness ConstraintEach data source must belong to at most onecatalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWAConsistent true false Assume that Infer thatReason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false

×