Dr. Dimitris Gavrilis, Digital Curation Unit - IMIS, Athena R.C.
CARARE Workshop: Archaeology and Architecture in Europeana
Leiden, Netherlands
13 -14th June 2017
Data adventures in heritage
science
Dimitris Gavrilis,
Digital Curation Unit – IMIS,
Athena Research Centre
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Introduction
•Large number of content (data) exist and are produced in
heritage science
• Ease through technology
• Organizations such as Europeana
• …
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Technology brings changes
•Richness of both data and metadata (resolution,
expressiveness) has become a problem
• Smartphones can now make use of 4K
• APIs enable easy re-use of content
•A search bar/form is not what people want to use anymore
•Stories / collections / curated content is the new trend
• Navigation using tap/gestures … not the keyboard
• Tap
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Resolution and Quality
•All of these “new” trends require better quality content:
• Higher resolution for data
• Richer, more expressive metadata
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Data models, metadata schemas
•One to rule them all ?
•We’re not there yet…
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Diversity & aggregation
•Specialized, highly expressive data models
CARARE schema
•Aggregators enable transformation of metadata among
different data models (or metadata schemas)
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Quality driven enrichment ?
Lot’s of services out there
but…
How do I detect which parts of the record need to be
enriched ?
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Identifying quality issues
•Presence of an element
•Rules
• e.g. Schematron
•Statistical metrics
• e.g. Distinct number of values
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Content based quality metrics
<car:subject> Art on a portable stone: burial cairn </car:subject>
Extract some features
Length of text Number of words
Number of common
words
34, 7, 2
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
Let’s put his on a graph
• Extract features from all the content
available
• Cluster them into 2 classes
• Good
• Bad
• For every new record, identify the
subject, extract the features
• Measure it’s distance between the two
centers (Good, Bad).
Archaeology and Architecture in Europeana,
Leiden, 13-14th
June 2017
What then ?
•Automatic enrichment
•Crowd sourcing
•Gamification
•…