LoCloud EVA / Minerva Workshop 2015
Workshop organised by LoCloud as part of XIIth Annual International Conference for Professionals in Cultural Heritage,
Vassilis Tzouvaras and Dimitris Gavrilis,
National Technical University of Athens
Digital Curation Unit - IMIS, Athena Research Center
Jerusalem, Israel
8 November 2015
1. The Mint Mapping tool
The MoRe aggregator
Vassilis Tzouvaras, Dimitris Gavrilis
National Technical University of Athens
Digital Curation Unit - IMIS, Athena Research Center
LoCloud is funded by the
European Commission's ICT Policy Support Programme
2. Cultural Heritage Content
• Diversity of cultural heritage content
– Numerous metadata schemas to annotate content
(LIDO, CIDOC-CRM, EAD, METS )
• Massive digitization and annotation activities are in
progress
• Need for interoperability
3. MINT Mapping Tool
• Provides users the ability to perform a mapping of
their own metadata schemas to reference domain
models
• Follows a typical web based architecture
• It was developed for ATHENA, but it is currently used
for EUScreen, CARARE, Judaica, ECLAP, DCA and
Linked Heritage
4. MINT 2 – What’s new?
• The backend was reconstructed for better
performance
– File size for imports is extended
• The frontend was updated
– New interface
– Workflow is integrated in UI
– Facilitated browsing of input and target schema
5.
6.
7.
8.
9.
10.
11.
12. MORe Overall Architecture
Registry
Apache Cassandra cluster
Fedora-commons
Temporary storage
Vocabulary services
Storage
JMS logging
Messaging
Core services
Enrichment service
management
Entity matching / NLP
Geocoding / Historic
Place names
REST
External enrichment
services
Publish service
management
OAI-PMH
RDF Store
Elastic Search
Archive
14. Distributed
• Enrichment services run on:
– Austria
– Spain
– Greece
– Lithuania
– Slovenia
– Norway
• Scalability can be facilitated through a virtualization
infrastructure
17. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Harvests content from metadata sources
OAI-PMH repository
MINT
LoCloud Collections
Wikimedia
Multiple schemas are supported
OAI_DC
CARARE
CARARE 2.0
LIDO
EAD
EDM
ESE
18. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Validates incoming information packages
Executes validation schemes
Validation micro-services
Structure
Schema
Linking
Schematron rules
Flexible
How it is used in MoRe:
Pre-validation
Post-validation
19. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Ingest content into storage
Uses storage layer API
Pluggable drivers for attaching different technologies /
repositories
Apache Cassandra
Filesystem-based
Fedora-commons
Versioning support
Complex digital object support
20. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Content Model
Digital objects comprise data streams
Each data stream can hold any kind of information
• XML/RDF, Image, Video, Documents, etc.
Each different representation of an information object is
stored as a different data stream
Each curation action generates a new version
• Transformation, Enrichment
21. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Transforms entire information packages into the
Europeana Data Model (EDM), or any other schema
Multiple transformation routines
Per schema
Per project
Per provider
User can attach rights statement
22. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
The generic enrichment service facilitates the execution
of the enrichment micro-services
• Hides the complexity from the user by using
enrichment plans
• Provides seamless integration with the UI of
MORE
Virtual Enrichment driver
• Allows developers/creative industries to create
their own enrichment services and declare/use
them within MoRe
23. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Preview the XML record information for all datastreams
Preview the record in HTML (using the Europeana style
sheet)
24. • Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Publish transformed / enriched information
• Internal OAI-PMH provider
• XML export
• Publish directly to RDF repositories
• Sesame
• Virtuoso
• SolR index server
25. • Thematic
– Thesauri collections
– Vocabulary matching
– Background links
• Spatial
– Geo normalization
– Geo coding
– Reverse geo-coding
– Historic place names
• Other
– Language identification
Enrichment micro-services
SKOS Thesauri
Geo-Names
DBPedia
Wikipedia
26. Enrichment Plan
• Enrichment micro-services are used
within enrichment workflows:
– Enrichment plans
• Each enrichment plan applies to a
specific schema
• Each enrichment plan executes
enrichment micro-services in a specific
order
Enrichment plans
Language
identification
Vocabulary matching
Geo-normalization
Geo-coding
27. Enrichment Plan
• Each enrichment plan defines run-time
parameters for specific services
– Content based
Enrichment plans
Language
identification
Vocabulary matching
Geo-normalization
Geo-coding
Add subject collection
A only if term X or Y
are matched