Stardog Linked Data Catalog

Stardog
Linked Data Catalog
Héctor Pérez-Urbina
Edgar Rodríguez-Díaz

Clark & Parsia, LLC
{hector, edgar}@clarkparsia.com

Who are we?
● Clark & Parsia is a semantic software startup
● HQ in Washington, DC & office in Boston
● Provides software development and integration
services
● Specializing in Semantic Web, web services, and
advanced AI technologies for federal and
enterprise customers

http://clarkparsia.com/
Twitter: @candp

What's SLDC?
● Stardog Linked Data Catalog
● A catalog of data sources
○ Semi structured
○ Relational
○ Object-oriented
○ ...
● Provides a coherent view over existing data
repositories so that users and/or
applications can easily find them and query
them

Use Cases
● Sources
○ Management, import, subscription,
categorization, sharing
● Query
○ Management, sharing, results export
○ Querying
■ Metadata, external sources, integration
● Locating sources
○ Search, browse
● NLP/AI
○ Entity extraction, graph algorithms, clustering
analysis

Application layer

Middleware layer

NLP/AI analytics layer

Data layer

Semantic Technologies
● W3C standards
○ RDF(S), OWL, SPARQL
● Lower operational costs and raise productivity
○ Cooperation without coordination
○ Appropriate abstractions
○ Declarative is better than imperative
○ Correctness when it matters; sloppiness
when it doesn’t

Data Model
● Similar to DCAT from W3C
○ Catalog entries
● Enhanced with
○ SSD
○ VoID datasets
○ SKOS background models
○ Axioms & rules

Modeling the Domain
● Use of axioms to model
relationships between
classes
○ :Query subClassOf :
Resource
○ :Entry subClassOf :
Resource
● Retrieve the resources
user :u can see
○ SELECT ?resource
WHERE { ?resource
type :Resource . }

Security
● Authentication
○ Shiro-Based implementation
○ Extensible to LDAP and/or AD
● Authorization
○ Eat-your-own-food approach
○ Reasoning-Based
○ Use of axioms & rules

Deriving Permissions
● Users have permission
roles
● Permission roles have
permission relations with
resources

● If a user has a permission role containing a
read permission associated to a resource,
then the user has the same permission over
the resource
:permissionRole(?user,?role),
:readPermission(?role,?resource) ->
:readUserPermission(?user,?resource)
● Everybody has read access to public
resources
:User(?user),
:PublicResource(?resource) ->

● User :user1 has delete permissions over any
source
○ :deleteUserPermission(?user,:anySource),
:DataSource(?source) ->
:deleteUserPermission(?user,?source)
○ :user1 :deleteUserPermission :anySource
● Everybody has all permissions to the resources
they created
○ :resourceCreator(?user,?resource) ->
:allUserPermissions(?user,?resource)
○ :allUserPermissions(?user,?resource) ->
○ ...

Impact of Reasoning
Can user :user1 delete resource :source1?
ASK WHERE {
{ :user1 :deleteUserPermission :source1 . }
UNION
{ :user1 :permissionRole ?role .
?role :deletePermission :source1 . }
UNION
{ :user1 :resourceCreator :source1 . }
UNION
{ :user1 :deleteUserPermission :anyResource . }
UNION
{ :user1 :allUserPermissions :source1 . }
UNION
{ ... }
UNION
...

Impact of Reasoning
● Are you sure you're not missing anything?
● New awesome way of getting delete permissions
you came up with yesterday
● Model knowledge where it belongs and let the
reasoner do the work for you:
ASK WHERE {
{ :user1 :deleteUserPermission :source1 . }
}

Too much Inference?
When I say
:deleteUserPermission domain :User
:deleteUserPermission range :Resource
I mean that for every triple
:user1 :deleteUserPermission :resource1
the individual :user1 must be an instance of :
User and :resource1 of :Resource.

But the reasoner doesn't find the error!!

Typing Constraint
Only users can have delete user permissions
● :deleteUserPermission domain :User
● :user1 :deleteUserPermission :resource1

Typing Constraint
Only users can have delete user permissions
● :deleteUserPermission domain :User
● :user1 :deleteUserPermission :resource1

OWA CWA
Consistent true false

Infer that Assume that
Reason :user1 type :User :user1 type not :User

CWA or OWA?
● Which one?
○ Of course use both!
● Some axioms should be interpreted under
CWA
:deleteUserPermission domain :User
● And others under OWA
:SuperUser subClassOf :User
● So the right thing happens
:user1 :deleteUserPermission :resource1
:user1 type :SuperUser

SLDC for Data Integration
● SLDC provides descriptions of data sources,
relationships between them, and information
to query them
● We can treat data sources as an integrated
single data source
○ Distributed querying
○ AI analytics
● Virtual, materialized, hybrid

Mappings
● Simple
○ pops:Employee subClassOf foaf:Person
○ pops:Project equivalentTo foaf:Project
○ pops:hasEmployee subPropertyOf foaf:member
● SWRL-Based
○ pops:firstName(?person, ?first),
pops:lastName(?person, ?last),
swrlb:concat(?name, ?first, " ", ?last) ->
foaf:name(?person, ?name)
○ pops:worksOnProject(?person,?project),
pops:ActiveProject(?project) ->
foaf:currentProject(?person,?project)

Summing Up
● SLDC is a linked data catalog
○ Manage a variety of sources
○ Find sources
○ Query sources
● Implemented using Semantic Technologies
○ Reasoning
■ Axioms & Rules
○ Data validation
○ Data integration

Why?
● Large organizations
○ Disparate departments
○ Independent, isolated sources
● Where is what?
○ Do we have a data source about clients?
○ Where is it?
● Who created what?
○ Who owns it?
● Who has access to what?
○ Do I have access to it?
○ Who do I talk to to get it?

Source Management
● Management
○ Create, delete, update, clone
● Import
○ RDF, HTML, XML
● Subscription
○ Endpoint location
● Categorization
○ Categories
○ External vocabularies
● Sharing
○ To specific users
○ Public

Querying Sources
● Querying metadata
○ Queries about the catalog itself
● External query
○ Querying a particular source
● Integrated query
○ Querying a set of integrated sources
● Query management
● Query sharing
● Results export

Finding Sources
● Browse
○ Facets
○ Pelorus
● Search
○ Text-based search
○ Rich query language

Last but not least
● NLP processing
○ Entity/Event extraction from natural language
source descriptions
○ Better source classification & search
● Graph algorithms
○ What's the shortest path between these
resources?
● Clustering
○ Can we discover similar sources based on a
given criteria?

Axioms
● It's not always about simple taxonomies...
● What about domain/range axioms?
○ :someProperty domain :SomeClass
○ :a :someProperty :b
○ :SomeClass(x)?
● What about complex subclass chains?
○ :SomeClass subClassOf :someProperty
some :OtherClass
○ :someProperty some :OtherClass subClassOf
:AnotherClass
○ :a type :SomeClass
○ :AnotherClass(x)?
● What about cardinality constraints, universal
quantification, datatype reasoning, ...?

Data Validation
● Fundamental data management problem
○ Verify data integrity and correctness
○ Data corruption can lead to failures in applications, errors
in decision making, security vulnerabilities, etc.
● Relevant in many scenarios
○ Storing data for stand-alone applications
○ Exchanging data in distributed settings
● For some use cases, data validation is critical but
we still want to do it intelligently

Participation Constraint
Each resource must have been created by a user
● :Resource subClassOf inv(resourceCreator) some
:User
● :resource1 type :Resource

OWA CWA

Infer that
Assume that
● _:b : _:b :resourceCreator :
Reason resourceCreator :
resource1
resource1
is false
● _:b type :Resource

Uniqueness Constraint
Each data source must belong to at most one
catalog entry
● :dataSource inverseFunctional
● :entry1 :dataSource :dataSource1
● :entry2 :dataSource :dataSource1

OWA CWA

Assume that
Infer that
Reason :entry1 sameAs :entry2
:entry1 sameAs :entry2
is false

Stardog Linked Data Catalog

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stardog Linked Data Catalog

Similar to Stardog Linked Data Catalog (20)

More from Clark & Parsia LLC

More from Clark & Parsia LLC (9)

Recently uploaded

Recently uploaded (20)

Stardog Linked Data Catalog