The document discusses ARIADNE, a project funded by the European Commission to integrate archaeological data. It aims to go beyond traditional data aggregation by modeling domain information and using curation services to enrich data. The key points are:
1) ARIADNE uses a flexible microservices architecture and the ARIADNE Catalog Data Model to ingest, validate, clean, enrich and integrate heterogeneous archaeological data.
2) It assigns unique identifiers to resources and represents them in RDF to facilitate integration based on subject, spatial, temporal and resource type dimensions.
3) Curation services help normalize locations, dates, map terms and link related resource types to improve search and discovery.
1. ARIADNE is funded by the European Commission's Seventh Framework Programme
Integrating Data for ArchaeologyIntegrating Data for Archaeology
Dimitris Gavrilis, Eleni Afiontzi, Johan Fihn, Olof Olsson,
Achille Felicetti, Franco Nicollucci, Sebastian Cuy
Dimitris Gavrilis, Eleni Afiontzi, Johan Fihn, Olof Olsson,
Achille Felicetti, Franco Nicollucci, Sebastian Cuy
2. Introduction
• Traditional projects in Archaeology focused on aggregating
data into one single format / system
– Provide users with a unified interface
– Improve search and retrieval
– Improve retrieval semantics through specialized metadata schemas
• ARIADNE goes one step further : data integration
– Try to model the domain information (ARIADNE Catalog Data Model)
– Use a curation aware aggregator to enrich information using the
above model
– Improve user experience through more substantial and powerful
queries
• Traditional projects in Archaeology focused on aggregating
data into one single format / system
– Provide users with a unified interface
– Improve search and retrieval
– Improve retrieval semantics through specialized metadata schemas
• ARIADNE goes one step further : data integration
– Try to model the domain information (ARIADNE Catalog Data Model)
– Use a curation aware aggregator to enrich information using the
above model
– Improve user experience through more substantial and powerful
queries
3. Innovation
• Why hasn’t anyone done this before ?
– Complexity
– Performance
– Domain knowledge
• Standard aggregation systems / architectures are
insufficient.
ARIADNE Infrastructure
• Why hasn’t anyone done this before ?
– Complexity
– Performance
– Domain knowledge
• Standard aggregation systems / architectures are
insufficient.
ARIADNE Infrastructure
4. ARIADNE Infrastructure
• Flexibility
– Ingest diverse and heterogeneous data
• XML, RDF, Excel, CSV, …
– Handle each datastream independently and according to
it’s requirements
• Adapting aggregation, validation, enrichment workflows
– Add new curation services easily and on demand
• Flexibility
– Ingest diverse and heterogeneous data
• XML, RDF, Excel, CSV, …
– Handle each datastream independently and according to
it’s requirements
• Adapting aggregation, validation, enrichment workflows
– Add new curation services easily and on demand
5. ARIADNE Infrastructure
• Complexity
– De-couple services complexity through a micro-service
oriented architecture
– Use loosely connecting services in a highly scalable
environment.
• Performance
– Scalable technologies
• Complexity
– De-couple services complexity through a micro-service
oriented architecture
– Use loosely connecting services in a highly scalable
environment.
• Performance
– Scalable technologies
6. ARIADNE Infrastructure
• Domain knowledge
– Integrate the domain model (ACDM) into the
infrastructure
– Make extensive use of domain thesauri (e.g. AAT) and
label every resource accordingly
– Create specialized micro-services for curating content
according to the domain needs
• Domain knowledge
– Integrate the domain model (ACDM) into the
infrastructure
– Make extensive use of domain thesauri (e.g. AAT) and
label every resource accordingly
– Create specialized micro-services for curating content
according to the domain needs
7. Data Integration Overall Architecture
RepositoryRepository
Excel SheetExcel Sheet
ARIADNE
Registry
ARIADNE
Registry
ValidationValidation
CleaningCleaning
EnrichmentEnrichment
IntegrationIntegration
RDF Store
(RDF)
RDF Store
(RDF)
Elastic
Search
Elastic
Search
RDF Store
(CRM)
RDF Store
(CRM)
ArchiveArchive
ARIADNE
Portal
ARIADNE
Portal
Integration
Experiments
Integration
Experiments
8. Use of RDF
• Every resource is assigned a unique and persistent
identifier that is resolved through a URI
• Every resource has an RDF representation according to
the ACDM schema
• Every resource is assigned a unique and persistent
identifier that is resolved through a URI
• Every resource has an RDF representation according to
the ACDM schema
9. Data Curation
• Use of curation micro-services for enriching content
– Geo-normalization (identify, extract and normalize places and
coordinates)
– Geo-coding (e.g. Geo-names)
– Thesauri mappings (map native subject terms to a common thesauri :
AAT)
– Temporal normalization (identify, extract and normalize dates)
– Gazetteers (e.g. DAI Gazetteer)
– Historical & Ancient place names identification (Pelagios & Pleiades)
– Temporal information mappings (Perio.do)
• Use of curation micro-services for enriching content
– Geo-normalization (identify, extract and normalize places and
coordinates)
– Geo-coding (e.g. Geo-names)
– Thesauri mappings (map native subject terms to a common thesauri :
AAT)
– Temporal normalization (identify, extract and normalize dates)
– Gazetteers (e.g. DAI Gazetteer)
– Historical & Ancient place names identification (Pelagios & Pleiades)
– Temporal information mappings (Perio.do)
10. Data Integration
• Data Integration is based on a 3+1 dimensions
– Subject
– Space
– Time
– Resource type
• Data Integration is based on a 3+1 dimensions
– Subject
– Space
– Time
– Resource type
11. Identify & Link together Resource Types
• Model individual information resource types (e.g.
collections, bibliographic reports, databases, datasets,
etc).
• Identify each resources type during ingestion
• Link / group different resource types
– E.g. put all related heterogeneous resource types
(reports, datasets,…) under the same collections
• Model individual information resource types (e.g.
collections, bibliographic reports, databases, datasets,
etc).
• Identify each resources type during ingestion
• Link / group different resource types
– E.g. put all related heterogeneous resource types
(reports, datasets,…) under the same collections
12. Thematic integration
• ARIADNE uses the AAT thesaurus to semantically label
ALL aggregated information.
• AAT terms act as a glue and when combined with spatial
and temporal information can produce great results
• Semantic expansion of terms is extensively being used in
order to improve retrieval.
• Expansion of multi-lingual terms facilitates cross-
language search without requiring automatic
translation.
• ARIADNE uses the AAT thesaurus to semantically label
ALL aggregated information.
• AAT terms act as a glue and when combined with spatial
and temporal information can produce great results
• Semantic expansion of terms is extensively being used in
order to improve retrieval.
• Expansion of multi-lingual terms facilitates cross-
language search without requiring automatic
translation.
13. Spatial & Temporal
• All resources with spatial information
– Are assigned WGS84 projected coordinates
• All resources with temporal information
– Are normalized according the ACDM dates (that takes into
account periods, period names and supports ISO date
format).
• All resources with spatial information
– Are assigned WGS84 projected coordinates
• All resources with temporal information
– Are normalized according the ACDM dates (that takes into
account periods, period names and supports ISO date
format).