Towards an Infrastructure ofMigration                               Dirk Roorda
• .
History of MIXED•   history•   defining•   developing•   using•   exploiting
what is it?MIXED is a file format converterplus a set of formats, called SDFP, i.e.  Standard Data Formats for Preservation
founding ideaNational Archive (NL): testbed
testbed: spreadsheetsXML is an appropriate choice for thelong-term preservation ofspreadsheets. XML can be used tospecify ...
testbed: databasesAt present, XML is the mosteffective strategy for thedurable preservation ofdatabases. XML is highlycapa...
what do repositories want Conversion to preservable formats.      Automatically       at most once                        ...
preservation strategyMigration and emulation are complementarystrategies. Migration is best for offeringusable content. Em...
Ingredientssuitable xml formats for your datasoftware to convert  legacy data to xml  ingest data to xml  xml to dissemina...
MIXED - snapshot
timeline
defining MIXED•   history•   defining•   developing•   using•   exploiting
XMLXML sounds greatwhat is MIXED’s XML?
Data kindsData comes in kinds, defined by the typicalapplications that manipulate it.Spreadsheets, databases, richtext, im...
standards for data kindsbinary vendor formats (doc)ascii vendor formats (rtf)open formats (HTML export)interchange formats...
SDFPStandard Data Formats for PreservationSpreadsheets: ODF subsetDatabases: e-David-XMLStatistical Data: DDI
SDFP as umbrella
Datatypesnumbers: ISO 6093date-time: ISO 8601-3characters: UNICODE
Scope (kinds)initially  tabular data       spreadsheets and databaseslater  statistical dataand then  text, still images, ...
Scope (aspects)         Content semantics databases            cell positions data model               values  data itself...
Aspects that didn’t make itpresentation details       action details       fonts           update, insert, delete      for...
developing MIXED•   history•   defining•   developing•   using•   exploiting
design principlesbuilding block in workflowsno built-in user interfaceeasily extensible / updatableuse and produce open so...
framework and pluginsframework  managing plugins  managing execution  administrationplugins  for each conversion   from/to...
issueshow loose/tight are the components  connected?pure own Java code / borrow existing  programs in other languages?modu...
Using MIXED•   history•   defining•   developing•   using•   exploiting
Data archives         collect         preserve         re-use
improvements for repositories• users can select format most usable to  them, irrespective of producer• users can select th...
further improvementscombine data from heterogeneous sources• different formats (straightforward)• different data models (a...
Exploiting MIXED•   history•   defining•   developing•   using•   exploiting
Research Infrastructures
Data on an Infrastructure•   higher demand for interoperability•   more needs for standards•   more opportunities for re-u...
Conversions needed            lots of them ...
Conversion as a service• a uniform resource  • yielding uniform results• easily accessible• product of community effort  •...
MIXED as Infrastructure• provides a standard for preservation  formats• implements the tools to maintain the  standard• ac...
when softwarevendors realizethat thereshould alwaysbean im/exportto apreservationformat,it means ...........   The End of ...
Upcoming SlideShare
Loading in …5
×

2009 PLANETS Vienna - MIXED migration to XML

457 views

Published on

Snapshot of how we thought about migration infrastructure then: PLANETS for the infrastructure, MIXED as a plugin for the tabular data conversion functionality.

Published in: Education
  • Be the first to comment

  • Be the first to like this

2009 PLANETS Vienna - MIXED migration to XML

  1. 1. Towards an Infrastructure ofMigration Dirk Roorda
  2. 2. • .
  3. 3. History of MIXED• history• defining• developing• using• exploiting
  4. 4. what is it?MIXED is a file format converterplus a set of formats, called SDFP, i.e. Standard Data Formats for Preservation
  5. 5. founding ideaNational Archive (NL): testbed
  6. 6. testbed: spreadsheetsXML is an appropriate choice for thelong-term preservation ofspreadsheets. XML can be used tospecify the context, content andstructure of spreadsheets.
  7. 7. testbed: databasesAt present, XML is the mosteffective strategy for thedurable preservation ofdatabases. XML is highlycapable of representing thecontext, content, and structureof databases.This strategy canimplemented using a numberof different methods.
  8. 8. what do repositories want Conversion to preservable formats. Automatically at most once Faithfully.
  9. 9. preservation strategyMigration and emulation are complementarystrategies. Migration is best for offeringusable content. Emulation is best forinvoking the original experience.Migration to XML isnormalised migration,hence we coin it smart migration.
  10. 10. Ingredientssuitable xml formats for your datasoftware to convert legacy data to xml ingest data to xml xml to dissemination dataconnectors to your repository workflow
  11. 11. MIXED - snapshot
  12. 12. timeline
  13. 13. defining MIXED• history• defining• developing• using• exploiting
  14. 14. XMLXML sounds greatwhat is MIXED’s XML?
  15. 15. Data kindsData comes in kinds, defined by the typicalapplications that manipulate it.Spreadsheets, databases, richtext, images, audio, video, drawings, ...The need for these applications are thebasic reason for the threat of data losscaused by software obsolescence.
  16. 16. standards for data kindsbinary vendor formats (doc)ascii vendor formats (rtf)open formats (HTML export)interchange formats (ad-hoc XML)standard formats (defined XML: OOXML)preservation formats (selected XML: SDFP)
  17. 17. SDFPStandard Data Formats for PreservationSpreadsheets: ODF subsetDatabases: e-David-XMLStatistical Data: DDI
  18. 18. SDFP as umbrella
  19. 19. Datatypesnumbers: ISO 6093date-time: ISO 8601-3characters: UNICODE
  20. 20. Scope (kinds)initially tabular data spreadsheets and databaseslater statistical dataand then text, still images, ...
  21. 21. Scope (aspects) Content semantics databases cell positions data model values data itself formulasspreadsheets
  22. 22. Aspects that didn’t make itpresentation details action details fonts update, insert, delete forms stored procedures triggers
  23. 23. developing MIXED• history• defining• developing• using• exploiting
  24. 24. design principlesbuilding block in workflowsno built-in user interfaceeasily extensible / updatableuse and produce open source code
  25. 25. framework and pluginsframework managing plugins managing execution administrationplugins for each conversion from/to SDFP
  26. 26. issueshow loose/tight are the components connected?pure own Java code / borrow existing programs in other languages?modularity of file type recognition (JHOVE)
  27. 27. Using MIXED• history• defining• developing• using• exploiting
  28. 28. Data archives collect preserve re-use
  29. 29. improvements for repositories• users can select format most usable to them, irrespective of producer• users can select the preservation format, in case usable formats are not supported• less uncertainties in interpretation, either by humans or by software
  30. 30. further improvementscombine data from heterogeneous sources• different formats (straightforward)• different data models (advanced)• different data kinds
  31. 31. Exploiting MIXED• history• defining• developing• using• exploiting
  32. 32. Research Infrastructures
  33. 33. Data on an Infrastructure• higher demand for interoperability• more needs for standards• more opportunities for re-use• more scope for digital preservation tools
  34. 34. Conversions needed lots of them ...
  35. 35. Conversion as a service• a uniform resource • yielding uniform results• easily accessible• product of community effort • a good conversion requires a lot of intelligent work • quality is reached in an iterative manner
  36. 36. MIXED as Infrastructure• provides a standard for preservation formats• implements the tools to maintain the standard• accumulates the shared wisdom of data formats
  37. 37. when softwarevendors realizethat thereshould alwaysbean im/exportto apreservationformat,it means ........... The End of MIXED

×