This document discusses Jurion's quality assurance efforts. It describes a quality assurance trial that used video to check for errors in legal data. Examples of errors checked for include inconsistent representations of people in companies over time and sums of shares not matching partnership capital values. The document then outlines next steps to develop a "schema change" use case to improve linking of data from different sources and lifecycles in a holistic manner.
7. Example errors in data which we
would like to find
• Relations-based: Same person for same
company as representative and oversight at
any moment in time.
• Data-based:
– Do shares sum up to partnership capital value at
every moment in time
– Did a company publish multiple annual reports
• Mixed: Are there multiple shareholders if
company is labeled as „Sole Shareholder”.
8. JURION Content Pipeline
PCI
Meta
data
External
meta data
Sources
Crawler;
Importer
XML
Metadata extraction
and enrichment
Linking/
pattern
recogn. XML
CMS
Meta
Data DB
Meta
Data
editor
Thesaurus
Manager
PCI
Indexer
SQL DB
Proprietary
Data
Sources
Search
Conceptual
Data
model
Retrieval/Search/Application
End user
App
Content Management
Model
and
DB
Quality
Check RDF
Ontology
12. 12
• Break up silos of isolated lifecycles; have a holistic view on
overall process and start optimizing on this basis
• Use LOD technology to improve data quality
• Build models of your lifecycles where you need them for
practical reasons (e.g. interaction points)
• Re-use standards wherever possible
• Solve problems where they come from, not where they
show up
• Never underestimate the complexity challenge that
comes with mass data from different sources in different
quality created for different purposes
• Don’t look for a generic solution. Take an iterative and
lean approach instead
Summary