Aggregation Workflow at Europeana Aggregator Forum


Published on

An overview of the Europeana Aggregation Workflow presented by Dimitra Atsidis of the Europeana Ingestion Team

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Aggregation Workflow at Europeana Aggregator Forum

  1. 1. Aggregation workflow Dimitra Atsidis Aggregator Forum, 22-23 May, The Hague
  2. 2. Content  Publication policy  Potential partners process  Submission deadlines for new and existing providers  Europeana ingestion workflow  Acceptance criteria and Europeana validation  Guidance and help – Europeana pro  Future plans for Europeana aggregation workflow  Exercise – The ideal aggregation workflow
  3. 3. Publication Policy
  4. 4. Publication Policy Clear criteria for acceptance or decline of metadata for publication and for take down of legacy metadata from the Europeana database Ingestion workflow (deadlines, timelines, prioritisation) Content scope (what is a digital object, kind of content) Technical validation of metadata quality (expected values) Metadata licensing (CC0) Rights Statements for digital objects • All digital objects with valid edm:rights • PD objects labelled as PD • edm:rights & dc:rights not contradictory
  5. 5. Publication Process and Workflow
  6. 6. How to become a data provider to Europeana?
  7. 7. New Provider Timeline
  8. 8. Regular Ingestion Cycle Diagram Timeline
  9. 9. Europeana Ingestion Workflow
  10. 10. Acceptance criteria
  11. 11. Acceptance criteria  Completed and submitted the Data Exchange Information Form.  Data Exchange Agreement to Europeana o Aggregators need to submit the signed Data Exchange Agreements of their data providers o Aggregators can use template clauses for the agreement between aggregators and data providers:  Metadata are accepted for publication after the feedback of the Europeana Operations Officers o EDM schema and guidelines o Rights labeling  Datasets are prioritised for publication if the edm:rights in the majority of the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA  Datasets submitted via OAI-PMH protocol, FTP or file
  12. 12. Automatic validation: Validation according to the EDM schema (or ESEv3.4) Validation of the mandatory properties Unique identifiers oMetadata records that don’t meet this validation are discarded oProviders can fix issues first and resubmit or let Europeana ingest the records that are valid, and fix the invalid records at a later stage Validation of urls for thumbnail creation (ImageMagick) Europeana validation
  13. 13. Applicable class Mandatory Properties (or alternatives) Aggregation edm:dataProvider Aggregation edm:isShownAt or edm:isShownBy Aggregation edm:provider Aggregation edm:rights Aggregation edm:aggregatedCHO Aggregation edm:ugc (when applicable) ProvidedCHO dc:title or dc:description ProvidedCHO dc:language for text objects ProvidedCHO dc:subject or dc:type or dc:coverage or dcterms:spatial ProvidedCHO edm:type Mandatory properties
  14. 14. Validation by the operations officers: Feedback is following to the EDM schema and guidelines Check if links are working, are direct links of reasonable size Recommendations to include thumbnails, geolocations, etc. Feedback on (near) duplicate records, and taking the advantages of the EDM Feedback on rights statements in edm:rights and dc:rights Relations between the EDM classes Correct use of vocabularies Literals vs resources (e.g. a thumbnail always need to be a valid url) Feedback on any other metadata quality related matters (duplication of properties, encoding in the data, wrongly mapped properties, etc.) Etc, etc. Europeana validation
  15. 15. Guidance and help
  16. 16. Guidance and help Europeana Professional: Content inbox – for all ingestion & metadata related matters
  17. 17. Questions?
  18. 18. Future plans for aggregation workflow
  19. 19. Future plans for aggregation workflow  The big plan is to open up part of the ingestion workflow to providers • Providers can log-in, identify the aggregator/project they work for • Providers can select the datasets they want to update, or add new datasets • Providers can upload their data – protocols besides OAI-PMH and FTP are under discussion • Providers can map their data to EDM, or edit data that is already EDM • Providers can validate the data against the EDM schema and preview in a preview portal  Europeana wants to provide tools for uploading data, validating, mapping, and previewing  Other tools and workflows being considered: link checking, thumbnail caching, enrichment  Start with a test environment, to preview and validate subset of data before sending to Europeana  Eventually to open up part of the workflow of Europeana to providers, not only for test but to integrate in the ingestion workflow.
  20. 20. Future plans for aggregation workflow  Benefits for providers: • Possibility to map to EDM • Validation according to the EDM schema (with schematron rules we implemented) • Previewing before publication • Self service, less dependent on Europeana, saving time (you can do many steps yourself, and you spot errors earlier)  Benefits for Europeana: • Scale up operations – amount of projects, aggregators and therefore datasets has grown exponentially in the last years • To focus more on metadata quality and assisting providers as much as possible with EDM, modelling and metadata related questions • Making the ingestion process transparent and more connected to the process at aggregators side
  21. 21. The ideal aggregation workflow Consider your own aggregator route from data provider, to the aggregator to data provision to Europeana Consider also the current aggregation workflow of Europeana and the future plans presented Now, draw the ideal workflow to get your data from the data provider, through your aggregator into Europeana. Make a diagram, a mindmap, or whatever comes to mind. Think, for instance, about the following questions: What steps in your current workflow could you use help with (e.g. mapping, validation, rights clearance) Would you use any of the workflow steps Europeana plans to open up? Why, or why not? Are there any tools you use already, you could recommend to everyone? Would the aggregator or the data providers (or both) benefit and use the tools? Use the yellow post-its to signal positive things, improvements, easy wins (and why?) Use the pink post-it to signal forseeable issues, or difficulties (and why?)
  22. 22. Thank you! Dimitra Atsidis