Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SHACL-based data life cycle management

286 views

Published on

PoolParty Semantic Suite is Semantic Web Company’s platform for enterprise information integration based on Linked Data principles. PoolParty consists of several components that process and manage RDF based data sets. These components have consistency requirements towards the data they work on.
Also, users have requirements towards the quality of the data they manage. We want to express constraints for both in a standard way throughout PoolParty components. SKOS-based PoolParty Thesaurus project data requires both consistency and quality.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

SHACL-based data life cycle management

  1. 1. Robert David CTO, Semantic Web Company Maura Moran Senior Content Consultant, Mekon PoolParty Semantic Suite Data Validation along the Linked Data Life Cycle
  2. 2. About 2 ▸ Linked Data Lifecycle ▸ Software Components ▸ Data consistency requirements ▸ Data validation standards ▸ Validation use cases ▸ Live demo
  3. 3. Knowledge Graph Management Along the Linked Data Life Cycle 3
  4. 4. Fact sheet: PoolParty PoolParty Semantic Suite ▸ Most complete Semantic Middleware on the Global Market ▸ Semantic AI: Fusion of Knowledge Graphs, NLP, and Machine Learning ▸ Linked Data Management along the whole Data Life Cycle ▸ W3C standards compliant ▸ First release in 2009 ▸ Current version 7.0 ▸ Over 200 installations world-wide ▸ On-premises or cloud-based ▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016 and 2017 ▸ www.poolparty.biz 4
  5. 5. PoolParty Components 5 PoolParty Thesaurus Server PoolParty Extractor PoolParty GraphEditor PoolParty UnifiedViews PoolParty Semantic Classifier PoolParty GraphSearch
  6. 6. 6 ▸ UnifiedViews ▸ Extractor ▸ Thesaurus Server ▸ GraphEditor ▸ GraphEditor ▸ UnifiedViews ▸ UnifiedViews ▸ Extractor ▸ Thesaurus Server ▸ Extractor ▸ UnifiedViews ▸ Semantic Classifier ▸ UnifiedViews ▸ Thesaurus Server ▸ API ▸ GraphSearch ▸ 3rd party Knowledge Graph Management Along the Linked Data Life Cycle
  7. 7. Data must be consistent so that: ▸ Applications can process them correctly ▸ Data quality is as expected But data is often dirty and complicated, especially if sourced from several applications Perform checks to ▸ ensure it conforms to the scheme you’ve set out ▸ is accurate Use relationships between concepts to perform better checks Data Consistency Motivation 7
  8. 8. Validation for the Linked Data Lifecycle RDF based validation approaches: ▸ SPARQL ▸ Closed World OWL ▸ ShEx ▸ SHACL Data Validation Standards 8
  9. 9. “a language for validating RDF graphs against a set of conditions” ▸ Use RDF to define the conditions ▸ Easy to understand by humans ▸ Can be processed by machines ▸ Well defined semantics ▸ Extendible via SPARQL ▸ W3C Recommendation SHACL Shapes Constraint Language 9
  10. 10. How does it work? ▸ Define shapes using RDF ▸ Shapes define how the data should look like ▸ A processor validates existing data against shapes ▹ detect inconsistencies ▹ improve quality ▸ The result is a conformance report listing violations where the data does not match the shapes SHACL Shapes Constraint Language 10
  11. 11. shape:PoolPartyConceptShape defines a SHACL shape a sh:NodeShape ; for a graph node sh:targetClass skos:Concept ; applied to all skos:Concepts sh:property [ which must satisfy sh:path skos:prefLabel ; sh:disjoint skos:altLabel ; skos:prefLabel and skos:altLabel have to be disjoint sh:uniqueLang true ; the language for skos:prefLabel literals is unique ] ; ... sh:property [ there is a path for each skos:Concept to a skos:ConceptScheme sh:path ( via skos:broader and skos:topConceptOf (and inverse) [ sh:zeroOrMorePath [ sh:alternativePath ( skos:broader [ sh:inversePath skos:narrower ])]] [ sh:alternativePath ( skos:topConceptOf [ sh:inversePath skos:hasTopConcept ])]) ; sh:minCount 1 ; ] ; sh:message "The concept violates PoolParty's concept definition" . reporting this message on violations SHACL Shapes Constraint Language 11
  12. 12. Validation Use Cases Validating data for PoolParty components 12
  13. 13. Component: PoolParty Thesaurus Server ▸ SKOS based data model ▸ Users can import RDF into project ▸ The components has requirements: ▹ SKOS ▹ Additional component-specific constraints ▸ Data has to be validated on import ▸ Data can be repaired for conformance Use Case 1 SKOS Thesaurus Import Validation 13
  14. 14. Component: PoolParty GraphEditor ▸ Ontology based data model ▸ Ontology driven UI ▸ Users can connect to graphs ▸ Users can work freely with RDF data ▸ Not restricted to SKOS ▸ But also less stability for data ▸ Flexible data validation is needed ▸ Define checks for different use cases Use Case 2 Graph Data Validation 14
  15. 15. Component: PoolParty GraphEditor Constraint: There must not be more than two active board members for each Legal Entity. Use Case 3 Legal Data Legal Definitions 15 Board MemberLegal Entity Active hasBoardMembership hasBoardMemberStatus
  16. 16. Component: PoolParty GraphEditor Constraint: If a Legal Entity has a country and a city assigned, then both must be related with a skos:narrower path, so that the geo information is consistent. Use Case 4 Legal Data Geo Consistency 16 Legal Entity Country City isLocatedInCountry isLocatedInCity skos:narrower
  17. 17. Component: PoolParty UnifiedViews ▸ Linked data orchestration tool ▸ Users process different formats XLS, CSV, XML creating “free-form” RDF ▸ RDF data processing works in pipelines ▸ Pipelines consist of Data Processing Units ▸ Data validation using SPARQL and ASK queries ▸ Standardized data validation is needed Use Case 5 UnifiedViews Validation 17
  18. 18. Component: PoolParty UnifiedViews/GraphEditor journal ⇒ impactFactor ⇔ ¬journal ∨ impactFactor Constraint: If a publication has a relation to a journal, that journal must have an impactFactor and a skos:prefLabel. Use Case 5 UnifiedViews Validation Publication dataset 18 Publication impactFactor skos:prefLabel journal
  19. 19. Use Case 5 Shape with logical operators vs SPARQL :PublicationShape a sh:NodeShape ; sh:targetClass :Publication ; sh:property [ sh:path sweb:journal ; sh:sparql [ a sh:SPARQLConstraint ; sh:select """SELECT $this WHERE { $this $PATH ?journal; FILTER NOT EXISTS { ?journal :impactFactor ?impactFactor . ?journal skos:prefLabel ?label . } }""" ; ] ; ] . 19
  20. 20. CONNECT Robert David CTO, Semantic Web Company ▸ robert.david@semantic-web.com ▸ https://www.linkedin.com/in/robert-david-39b47692/ ▸ https://twitter.com/semwebcompany ▸ https://blog.semantic-web.at/ 20 © Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/
  21. 21. CONNECT Maura Moran Senior Content Consultant, ▸ Maura.moran@mekon.com ▸ @MauraMoran17 ▸ linkedin.com/in/moranmaura 21
  22. 22. Why do we need data consistency? Software components: ▸ Stability for application logic ▸ Correctness of processed results Users: ▸ Correctness of analysis results ▸ Quality of data Data Consistency Motivation 22
  23. 23. ▸ Software components support the Linked Data Lifecycle ▸ Managed data has to conform to requirements of software components ▸ Components need input / output validation for data ▸ Ensure stability for software components ▸ Correctness of processed results Data Consistency Software Components and the Linked Data Life Cycle 23
  24. 24. PoolParty Component Data Flow 24

×