This document summarizes a presentation on an integrated preservation workflow for the ForgetIT project. It describes the key components of the Preservation Object Framework (PoF) middleware and cloud storage architecture. It then outlines the end-to-end workflow to preserve resources from an active system, including retrieving resources, processing content and metadata, preparing submission information packages, ingesting into an archive information system, and storing in a preservation data store. It also describes the resource restore workflow.
4. Preservation event triggered by the Active System
Request sent to PoF Middleware (REST API)
Synergetic Preservation Worklow is triggered
●
Scheduler creates a new job, initial message sent
Resource retrieved from the Active System CMIS server
●
Collector processes message, retrieves resource from CMIS server
●
ID Manager generates a new ID
●
Collector stores retrieved resource on the Middleware server
●
Flow control returned to Scheduler, new messages sent
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Preliminary integrated end-to-end workflow
5. Content processing
●
Extractor processes messages for image analysis (QA and concept
detection)
●
Contextualizer processes messages for text contextualization (mentions
and context)
●
Flow control returned to Scheduler, results messages sent
SIP preparation
●
Archiver processes message to prepare SIP, with all resources and MD
SIP ingest
●
Archiver processes message to ingest SIP into AIS
●
ID Manager updates IDs adding AIP ID
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Preliminary integrated end-to-end workflow - 2
6. AIP store
●
AIS Packager stores AIP into PDS
●
ID Manager updates IDs adding PDS ID
●
PDS triggers Storlets according to rules
Resource status update
●
The resource is shown as “preserved” in the Active System
Resource Restore Workflow
●
Active System can send request to PoF Middleware (REST API)
●
Scheduler creates a new job, initial message sent
●
Collector retrieves DIP, containing original resource (or new one after
transformation by PDS), which is returned to Active System
ForgetIT Project GA600826, 1st Review Meeting, Kaiserslautern, April 2014
Preliminary integrated end-to-end workflow - 3