Digital dark age - Are we doing enough to preserve our website heritage?


While creating web sites we often see their lifespan only for up to 3 to 5 years. With every relaunch
and overhaul we are confronted with content migration and short term motives to delete maybe
valuable content. On the other hand what is the value of our content? Can we assess it
meaningfully? Do we really know in which context it is used?
Scientist stated that where as we are producing more and more digital artifacts we fail to see that
we are not keeping an eye on preserving it in a manner that will enable us to find and use it in more
that a few years in the future.
This talk will introduce you the aspects of digital preservation with a special look on how TYPO3 is
preparing to help it users to create a digital heritage.
This Talk is part of the "Concise Preservation by combining Managed Forgetting and
Contextualized Remembering" Project ForgetIT. The ForgetIT project is funded by the EC within the
7th Framework Programme under the objective "Digital Preservation" (GA 600826).

  Digital Dark Age
Olivier Dobberkau
  Agenda
Some questions to the attendees in the room
Digital Dark Age - What is the problem?
Why preservation is valuable
How to preserve
The ForgetIT project
Outlook for TYPO3 CMS
Q&A
  Olivier Dobberkau
CEO and founder of dkd
45 years old
TYPO3 "Reverend Neverend"
Member of the EAB TYPO3 Association
@T3RevNeverEnd
http://www.dkd.de
  Disclaimer!
I am not a data curator or preservation
specialist :-)
  Some questions
  How old is your website?
How old is your website?
When was the time you made a backup?
Are your sure you are keeping the right stuff?
And will it still work in 5 years from now?
  300 Funston Avenue
Richmond, San Francisco CA 94118
  300 Funston Avenue
The Internet Archive
founded in 1996 by Brewster Kahle
more than 2 petabytes of data
growing at 20 terabytes per month
  75?
75 Days
  77 days
75 days is the average lifetime period of a website.
Source:
  What is the problem?
A closer look into the digital dark age of websites
  Digital Dark Age
Wikipedia says:
"The digital dark age is a possible future situation
where it will be difficult or impossible to read
historical electronic documents and multimedia,
because they have been stored in an obsolete and
obscure file format."
  Digital Dark Age
first mentioned in 1997
all digital produced data is subject to it
problems arise from different angles
storage medium (disks, tapes, DVD etc)
format of the data
availability of the software and operating
systems
possible encryption
  One example
NASA Viking Mars landing 1976
Magnetic tapes in 1976
Format was not documented
Programmers left or died
Only by a high amount of reverse engineering
NASA was able to extract the images
  Websites a soon extinct species?
Risk of Digital Dark Age is also given with
websites we create and maintain
Danger factors we see
Relaunch from scratch
Technical standards change
Browser usage (aka Browser wars)
Marketing expectations
The Jungle we create in daily work...
  Why Preservation is
valuable?
  Why is preservation valuable?
Preservation is well established in memory
institutions such as national libraries and archives
Still in infancy in most other organizations
Preservation is percepted as „strategic" and not
as an „operational" goal
Preservation is done sometime only because of
legal requirements
Quick wins are not reached easily
  Why is preservation valuable?
Digital Data is your organizations raw material in
the future
Every one is going „Big Data"
Preserving is helping you to achieve sustainibility
within your organization
  How to preserve?
There is no golden bullet for preservation in
organisations. Preservation is a long term strategic
goal.
  Tools
Webarchiving
Website curator
Wayback machine
HTTRACK
Exporting to
PDF/A
XML
JSON
  Tools
Use a document store
Alfresco
Duraspace
d-store
Archivematica
OAIS Reference modell
  A good Website to explore
Digital Curation Center, Edinburgh
  The ForgetIT Project
  ForgetIT Project
Consortium of 10 partners
funded by the EC
started in 2013
3 years of research & development ForgetIT project is funded by the EC within
the 7th Framework Programme under the
objective "Digital Preservation" (GA 600826).
  3 Concepts of ForgetIT
Managed Forgetting
Contextualized Remembering
Synergetic Preservation
  Managed Forgetting
Managed Forgetting models resource selection as
a function of attention and significance
dynamics.
It is inspired by the important role of forgetting in
human memory and focuses on characteristic
signals of reduction in salience. For this purpose it
relies on multi-faceted information assessment
and offers customizable preservation options
such as full preservation, removing of
redundancy, resource condensation, and also
complete digital forgetting.
  Contextualized Remembering
Contextualized Remembering targets keeping
preserved content meaningful and useful.
It will be based on a process of dynamic
evolution-aware contextualization, which
combines context extraction and packaging with
evolution detection and intelligent
recontextualization.
  Synergetic Preservation
Synergetic Preservation crosses the chasm that
exists between active information use and
preservation management by making intelligent
preservation processes an integral part of the
content lifecycle in information management and
by developing solutions for smooth bi-directional
transitions.
  Expected Outcomes
Foundations and Models
Approaches for managed forgetting,
contextualized remembering and joint model
for synergetic preservation
Algorithms and methods
preservation-oriented summarization and
aggregation
multifaceted information assessment methods
evolution-aware contextualization and re-
contextualization
  Expected Outcomes
Infrastructure and services
Flexible and extensible Preserve-or-Forget
framework, providing an extensible and
adaptable set of services for extending
information management solutions with
intelligent preservation management
  Expected Outcomes
Application pilots
Personal preservation focusing on multimedia
coverage of personal events
Organizational preservation focusing on
smooth preservation in organizational content
management
  Expected Outcomes
Best Practices & Adoption Blueprints
Understand opportunities and barriers for
personal preservation
Form guidelines for offering personal
preservation as a service
  Partners in the ForgetIT Project
Centre for Research and Technology Hellas
dkd Internet Service GmbH
Deutsches Forschungszentrum für Künstliche
Intelligenz GmbH
EURIX Srl
Gottfried Wilhelm Leibniz Universität Hannover
IBM Israel - Science and Technology Ltd
  Partners in the ForgetIT Project
Luleå Tekniska Universitet
The Chancellor, Masters and Scholars of the
University of Oxford
The University of Edinburgh
The University of Sheffield
Turk Telekomunikasyon AS
  Outlook for TYPO3
CMS
  Working on the following
Content Dashboard
Metadata Directory
Semantic Layer
ForgetIt Backend Module
Feedback & Conflicts Module
Recycling & Inducing Module
CMIS integration & transposing
  Open to the TYPO3 Community
We are open to the TYPO3 Community
We want to raise awareness on the matter of
preservation
We will publish our modules on open source
licenses
Want to stay informed?
  Slides will be avaible at
Follow me on Twitter: @T3RevNeverEnd
email:
  Questions?
  d dkdevelopmentkommunikationdesign
thank you.