Your SlideShare is downloading. ×
0
StardogLinked Data Catalog      Héctor Pérez-Urbina     Edgar Rodríguez-Díaz       Clark & Parsia, LLC {hector, edgar}@cla...
Who are we?● Clark & Parsia is a semantic software startup● HQ in Washington, DC & office in Boston● Provides software dev...
Whats SLDC?● Stardog Linked Data Catalog● A catalog of data sources    ○ Semi structured    ○ Relational    ○ Object-orien...
Use Cases● Sources   ○ Management, import, subscription,     categorization, sharing● Query   ○ Management, sharing, resul...
Application layer  Middleware layerNLP/AI analytics layer     Data layer
Demo
Semantic Technologies● W3C standards   ○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity   ○ Cooperati...
Data Model● Similar to DCAT from W3C   ○ Catalog entries● Enhanced with   ○ SSD   ○ VoID datasets   ○ SKOS background mode...
Modeling the Domain● Use of axioms to model  relationships between  classes   ○ :Query subClassOf :     Resource   ○ :Entr...
Security● Authentication   ○ Shiro-Based implementation   ○ Extensible to LDAP and/or AD● Authorization   ○ Eat-your-own-f...
Deriving Permissions● Users have permission  roles● Permission roles have  permission relations with  resources
Deriving Permissions● If a user has a permission role containing a  read permission associated to a resource,  then the us...
Deriving Permissions● User :user1 has delete permissions over any  source   ○ :deleteUserPermission(?user,:anySource),    ...
Impact of ReasoningCan user :user1 delete resource :source1?     ASK WHERE {         { :user1 :deleteUserPermission :sourc...
Impact of Reasoning● Are you sure youre not missing anything?● New awesome way of getting delete permissions  you came up ...
Too much Inference?When I say   :deleteUserPermission domain :User   :deleteUserPermission range :ResourceI mean that for ...
Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPerm...
Typing ConstraintOnly users can have delete user permissions  ● :deleteUserPermission domain :User  ● :user1 :deleteUserPe...
CWA or OWA?● Which one?   ○ Of course use both!● Some axioms should be interpreted under  CWA        :deleteUserPermission...
SLDC for Data Integration● SLDC provides descriptions of data sources,  relationships between them, and information  to qu...
Mappings● Simple   ○ pops:Employee subClassOf foaf:Person   ○ pops:Project equivalentTo foaf:Project   ○ pops:hasEmployee ...
Summing Up● SLDC is a linked data catalog    ○ Manage a variety of sources    ○ Find sources    ○ Query sources● Implement...
Questions?
Why?● Large organizations   ○ Disparate departments   ○ Independent, isolated sources● Where is what?   ○ Do we have a dat...
Source Management● Management    ○ Create, delete, update, clone● Import    ○ RDF, HTML, XML● Subscription    ○ Endpoint l...
Querying Sources● Querying metadata    ○ Queries about the catalog itself● External query    ○ Querying a particular sourc...
Finding Sources● Browse   ○ Facets   ○ Pelorus● Search   ○ Text-based search   ○ Rich query language
Last but not least● NLP processing   ○ Entity/Event extraction from natural language     source descriptions   ○ Better so...
Axioms● Its not always about simple taxonomies...● What about domain/range axioms?   ○ :someProperty domain :SomeClass   ○...
Data Validation● Fundamental data management problem   ○ Verify data integrity and correctness   ○ Data corruption can lea...
Participation ConstraintEach resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some   ...
Uniqueness ConstraintEach data source must belong to at most onecatalog entry ● :dataSource inverseFunctional ● :entry1 :d...
Stardog Linked Data Catalog
Stardog Linked Data Catalog
Upcoming SlideShare
Loading in...5
×

Stardog Linked Data Catalog

520

Published on

A talk from Semtech NYC 2012 about Stardog Linked Data Catalog, a portfolio management system for enterprise linked data.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
520
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Stardog Linked Data Catalog"

  1. 1. StardogLinked Data Catalog Héctor Pérez-Urbina Edgar Rodríguez-Díaz Clark & Parsia, LLC {hector, edgar}@clarkparsia.com
  2. 2. Who are we?● Clark & Parsia is a semantic software startup● HQ in Washington, DC & office in Boston● Provides software development and integration services● Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers http://clarkparsia.com/ Twitter: @candp
  3. 3. Whats SLDC?● Stardog Linked Data Catalog● A catalog of data sources ○ Semi structured ○ Relational ○ Object-oriented ○ ...● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
  4. 4. Use Cases● Sources ○ Management, import, subscription, categorization, sharing● Query ○ Management, sharing, results export ○ Querying ■ Metadata, external sources, integration● Locating sources ○ Search, browse● NLP/AI ○ Entity extraction, graph algorithms, clustering analysis
  5. 5. Application layer Middleware layerNLP/AI analytics layer Data layer
  6. 6. Demo
  7. 7. Semantic Technologies● W3C standards ○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity ○ Cooperation without coordination ○ Appropriate abstractions ○ Declarative is better than imperative ○ Correctness when it matters; sloppiness when it doesn’t
  8. 8. Data Model● Similar to DCAT from W3C ○ Catalog entries● Enhanced with ○ SSD ○ VoID datasets ○ SKOS background models ○ Axioms & rules
  9. 9. Modeling the Domain● Use of axioms to model relationships between classes ○ :Query subClassOf : Resource ○ :Entry subClassOf : Resource● Retrieve the resources user :u can see ○ SELECT ?resource WHERE { ?resource type :Resource . }
  10. 10. Security● Authentication ○ Shiro-Based implementation ○ Extensible to LDAP and/or AD● Authorization ○ Eat-your-own-food approach ○ Reasoning-Based ○ Use of axioms & rules
  11. 11. Deriving Permissions● Users have permission roles● Permission roles have permission relations with resources
  12. 12. Deriving Permissions● If a user has a permission role containing a read permission associated to a resource, then the user has the same permission over the resource :permissionRole(?user,?role), :readPermission(?role,?resource) -> :readUserPermission(?user,?resource)● Everybody has read access to public resources :User(?user), :PublicResource(?resource) -> :readUserPermission(?user,?resource)
  13. 13. Deriving Permissions● User :user1 has delete permissions over any source ○ :deleteUserPermission(?user,:anySource), :DataSource(?source) -> :deleteUserPermission(?user,?source) ○ :user1 :deleteUserPermission :anySource● Everybody has all permissions to the resources they created ○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource) ○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource) ○ ...
  14. 14. Impact of ReasoningCan user :user1 delete resource :source1? ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
  15. 15. Impact of Reasoning● Are you sure youre not missing anything?● New awesome way of getting delete permissions you came up with yesterday● Model knowledge where it belongs and let the reasoner do the work for you: ASK WHERE { { :user1 :deleteUserPermission :source1 . } }
  16. 16. Too much Inference?When I say :deleteUserPermission domain :User :deleteUserPermission range :ResourceI mean that for every triple :user1 :deleteUserPermission :resource1the individual :user1 must be an instance of :User and :resource1 of :Resource.But the reasoner doesnt find the error!!
  17. 17. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1
  18. 18. Typing ConstraintOnly users can have delete user permissions ● :deleteUserPermission domain :User ● :user1 :deleteUserPermission :resource1 OWA CWAConsistent true false Infer that Assume thatReason :user1 type :User :user1 type not :User
  19. 19. CWA or OWA?● Which one? ○ Of course use both!● Some axioms should be interpreted under CWA :deleteUserPermission domain :User● And others under OWA :SuperUser subClassOf :User● So the right thing happens :user1 :deleteUserPermission :resource1 :user1 type :SuperUser
  20. 20. SLDC for Data Integration● SLDC provides descriptions of data sources, relationships between them, and information to query them● We can treat data sources as an integrated single data source ○ Distributed querying ○ AI analytics● Virtual, materialized, hybrid
  21. 21. Mappings● Simple ○ pops:Employee subClassOf foaf:Person ○ pops:Project equivalentTo foaf:Project ○ pops:hasEmployee subPropertyOf foaf:member● SWRL-Based ○ pops:firstName(?person, ?first), pops:lastName(?person, ?last), swrlb:concat(?name, ?first, " ", ?last) -> foaf:name(?person, ?name) ○ pops:worksOnProject(?person,?project), pops:ActiveProject(?project) -> foaf:currentProject(?person,?project)
  22. 22. Summing Up● SLDC is a linked data catalog ○ Manage a variety of sources ○ Find sources ○ Query sources● Implemented using Semantic Technologies ○ Reasoning ■ Axioms & Rules ○ Data validation ○ Data integration
  23. 23. Questions?
  24. 24. Why?● Large organizations ○ Disparate departments ○ Independent, isolated sources● Where is what? ○ Do we have a data source about clients? ○ Where is it?● Who created what? ○ Who owns it?● Who has access to what? ○ Do I have access to it? ○ Who do I talk to to get it?
  25. 25. Source Management● Management ○ Create, delete, update, clone● Import ○ RDF, HTML, XML● Subscription ○ Endpoint location● Categorization ○ Categories ○ External vocabularies● Sharing ○ To specific users ○ Public
  26. 26. Querying Sources● Querying metadata ○ Queries about the catalog itself● External query ○ Querying a particular source● Integrated query ○ Querying a set of integrated sources● Query management● Query sharing● Results export
  27. 27. Finding Sources● Browse ○ Facets ○ Pelorus● Search ○ Text-based search ○ Rich query language
  28. 28. Last but not least● NLP processing ○ Entity/Event extraction from natural language source descriptions ○ Better source classification & search● Graph algorithms ○ Whats the shortest path between these resources?● Clustering ○ Can we discover similar sources based on a given criteria?
  29. 29. Axioms● Its not always about simple taxonomies...● What about domain/range axioms? ○ :someProperty domain :SomeClass ○ :a :someProperty :b ○ :SomeClass(x)?● What about complex subclass chains? ○ :SomeClass subClassOf :someProperty some :OtherClass ○ :someProperty some :OtherClass subClassOf :AnotherClass ○ :a type :SomeClass ○ :AnotherClass(x)?● What about cardinality constraints, universal quantification, datatype reasoning, ...?
  30. 30. Data Validation● Fundamental data management problem ○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors in decision making, security vulnerabilities, etc.● Relevant in many scenarios ○ Storing data for stand-alone applications ○ Exchanging data in distributed settings● For some use cases, data validation is critical but we still want to do it intelligently
  31. 31. Participation ConstraintEach resource must have been created by a user ● :Resource subClassOf inv(resourceCreator) some :User ● :resource1 type :Resource OWA CWAConsistent true false Infer that Assume that ● _:b : _:b :resourceCreator :Reason resourceCreator : resource1 resource1 is false ● _:b type :Resource
  32. 32. Uniqueness ConstraintEach data source must belong to at most onecatalog entry ● :dataSource inverseFunctional ● :entry1 :dataSource :dataSource1 ● :entry2 :dataSource :dataSource1 OWA CWAConsistent true false Assume that Infer thatReason :entry1 sameAs :entry2 :entry1 sameAs :entry2 is false
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×