A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

A Two-Fold Quality Assurance Approach
for Dynamic Knowledge Bases:
The 3cixty Use Case
31st of May, 2016
1st International Workshop on Completing and Debugging the Semantic Web
at the 13th Extended Semantic Web Conference
Nandana Mihindukulasooriya1, Giuseppe Rizzo2 , Raphaël Troncy3 ,
Oscar Corcho1, and Raúl Garcı́a-Castro1
1Ontology Engineering Group, UPM, Spain.
2ISMB, Italy.
3EURECOM, France.
Acknowledgments:
FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital,
and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)

Outline
2Ontology Engineering Group, Universidad Politécnica de Madrid
• 3cixty use case
• Motivation
• Techniques and tools
• Results

3cixty knowledge base
A semantic web platform that enables to build real-world and
comprehensive knowledge bases in the domain of culture and tourism
for cities using the public the information about places and events.

The 3cixty architecture

Motivation
:
• Data with 4Vs
• Volume, Variety, Velocity, Veracity
• Evolving schema
• Plenty of tools involved in the process
• Multiple geographically dispersed teams
• Dependent applications
Many chances for potential errors
The need for a good quality assurance approach

Can we adapt some
lessons learnt from
Software Engineering for
knowledge base
generation?

Continuous Integration is essential

Cost of defects Vs. Time
Time
Cost

Agile testing quadrants
check for
expected
outputs
analyze
undefined,
unknown,
& unexpected

A Two-Fold Quality Assurance Approach
• Two techniques
• Scripted fine-grained analysis
• checking for expected results
• Exploratory testing
• analyzing the unexpected results
• Two techniques are complementary
• Exploratory testing can provide heuristics for fine-grained
analysis
• Supported by two tools
• SPARQL Interceptor
• Loupe

Exploratory Testing
simultaneous learning, test
design and test execution
minimal planning and
maximum text execution

Loupe – Linked Data Inspector
• Web application for exploring and inspecting datasets
• Class explorer
• Property explorer
• Triple pattern explorer
• Named graph explorer
• Starts from high-levels statistics and allows to “zoom
in” several levels of details
• Analysis of different datatypes
• most common and least common values
• numeric - min, max, mode, std. dev
• string – string length, uri like strings
• Avoid the need for boiler-plate SPARQL queries
• Ability to view the relevant data directly

Loupe Architecture
http://loupe.linkeddata.es/

Loupe UI

Fine-grained analysis
• a set of user-defined SPARQL queries (as unit tests)
• Knowledge-based specific
Test
SPARQL
Queries
System
Requirements
Schema
Constraints
Conventions
and other
restrictions
Inputs from
Exploratory
Testing

SPARQL Interceptor
• seamless integration with Jenkins continuous
integration system
• executes automatically for each build
• provides
• summary reports
• configurable email notifications
• for each failed test
• the reason for the failure
• a description of the query
• a link to failed data using an SPARQL endpoint

SPARQL Interceptor
Designed and implemented by Localidata.

Defects found in exploratory testing
• Inconsistencies in using vocabularies
• locn:hasAddress Vs schema:streetAddress
• http://xmlns.com/foaf/0.1/ and http://xmlns.com/foaf/spec/
• URIs as strings
• ¨http://.....¨
• Outliers
• Typos
• class names with small letters
• Inconsistencies with the schema
• domain, range
• Value patterns
• codes with 5 letters, URIs with given prefix
• Date time format inconsistencies

Defects found in fine-grained analysis
• property cardinalities related issues
• missing of properties
• Each dul:Place or lode:Event must have a title
• presence of duplicated properties
• dul:Place or lode:Event must have exactly one geo
location
• missing language labels
• one label per each language
• Out of bound values for a fixed upper and lower limits
• Neighboring cells in a grid (3 to 8)
• Datatype syntax errors
• numeric types
• Datetime types

Defects found in fine-grained analysis
• Constraints on value ranges
• geo:lat and geo:long must be in a within the city’s bounding
box area
• triples not associated with producer graphs
• each triple belongs to a producer graph
• presence of unsolicited instances
• home locations are removed from the knowledge base

Conclusions and future work
• Dynamic knowledge bases require good quality
assurance approaches
• Knowledge-base publishers can learn from / adapt
practices from software engineering
• Supporting tools improve quality assurance
• In the future,
• Integration with outlier detection algorithms
• Generation of constraints in Loupe
• Integration of SPARQL Interceptor with W3C SHACL

A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

Similar to A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case (20)

More from Nandana Mihindukulasooriya

More from Nandana Mihindukulasooriya (20)

Recently uploaded

Recently uploaded (20)

A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case