(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
125 Databases for the Year 2080
1. 125 Databases for the
Year 2080
A technology challenge and how it can be met
Dr. Kai Naumann – Landesarchiv Baden-Württemberg (Germany)
WADL Workshop on IJDC 2020, Wuhan (China)
2. Landesarchiv Baden-Württemberg at a glance
• knowledge centre about the past of
the state of Baden-Württenberg
• key research infrastructure
• saves records of all kinds as cultural
heritage, preserves them and makes
them accessible
• provides transparency of
governmental, administrative, and
judicial decision-making
• archives government websites and
other sites with relevance to Baden-
Württemberg since 2006 --> about
300 URLs twice a year
• 9 sites throughout the country
• 11 million EUR overall budget
• 308 employees
• 1207 years: oldest dated charter
• 10.138 consultations per year
• 152.284 meters of occupied shelves
• 2.095.106 photographs
• 13.226.262 pages of scanned
documents
• 290.783.182 datasets rows
• ∞ eternal survival as a task
3. Our Oldest Database – the 1961 census
• Conceived at Statistical Offices of Germany in 1960
• Populated in 1961 on rented IBM machines
• 6 million individual punched cards destroyed in 1968
by a flooding
• Surviving part: calculated sums on ca. 1,592,821
punched cards
• Migrated to magnetic tape in the 1960s
• Migrated to CD-ROM in the 1990s
• Transferred to the State Archives in 2006
• Can we do better?!
LABW StAL E 258 II Bü 214
http://www.landesarchiv-bw.de/plink/?f=2-335336
4. Why we set up the challenge
• Emulation as a service - enormous progress since 2010
• SIARD - method of long-term database normalization – efforts to
establish SIARD as an European Union Standard
5. The challenge
• How do you preserve 125 databases of diverse origin for future use
from the year 2080 onwards?
• Prepare them in such a way that they can be used in as many ways as
possible in 2080.
• In the following 60 years
• a) no costs should be incurred apart from secure storage
• b) the database contents must not be publicly accessible.
7. Political and legislative issues
Global Intellectual Property (IP) legislation is poorely prepared for
obsolesence.
Orphaned books (author and editor unknown) may freely be copied and
disseminated in most parts of the world.
The status of orphaned software is unclear, risks looming from unclear IP
claims.
In most countries of the world, no agency is responsible for preserving
software.
The European DSM directive has recently moved into a good direction, but
work has to continue in order to assure a risk-free environment for the
software emulation approaches.
8. CSV solution
• Choose the most important tables or prepare archival tables.
• Export them to CSV.
• Make an XML description of the fields and relations.
• Take screenshots of the graphical user interface (GUI).
• Add handbooks and tutorials for the database.
• Wait.
9. XML Solution
• Choose the most important tables or prepare archival tables.
• Export them to an XML Schema containing the most important
features of the DBMS (e.g. SIARD Schema).
• Take screenshots of the graphical user interface (GUI).
• Add handbooks and tutorials for the database.
• Wait.
10. Disk image solution
• Take a disk image of the client hardware.
• Take a disk image of the server hardware.
• Preserve necessary Operating System environments.
• Add handbooks or tutorials for the database.
• Regularly check performance of emulative software stack.
11. Docker image solution
• Take a Docker image of the client software.
• Take a Docker image of the server software.
• Preserve necessary Operating System environments.
• Add handbooks or tutorials for the database.
• Regularly check performance of emulative software stack.
12. Web Crawler solution
• This only works for databases with a full web-based frontend
displaying a complete list of their objects.
• Let a crawler translate all database content into an HTML/JavaScript
Container (e.g. WARC file).
• Regularly visit the crawl to test accessibility.
• In order to make quality assessments:
• Let Archive.org crawl the server as well
• Also use the CSV solution on the data
13. Solutions and their cost forecast
CSV Solution
XML Solution
Disk Image Solution
Docker Image Solution
Web Crawler Solution
0
50
100
150
200
250
01.01.2020
01.01.2022
01.01.2024
01.01.2026
01.01.2028
01.01.2030
01.01.2032
01.01.2034
01.01.2036
01.01.2038
01.01.2040
01.01.2042
01.01.2044
01.01.2046
01.01.2048
01.01.2050
01.01.2052
01.01.2054
01.01.2056
01.01.2058
01.01.2060
01.01.2062
01.01.2064
01.01.2066
01.01.2068
01.01.2070
01.01.2072
01.01.2074
01.01.2076
01.01.2078
01.01.2080
CSV Solution XML Solution Disk Image Solution Docker Image Solution Web Crawler Solution
14. Any questions? Want to join the quest?
• Further ideas, business models welcome!
• I will try to continue collecting answers at #WeMissiPRES
• Feel invited to a workshop on the issue at Stuttgart (Germany) in
2021!
• Contact me:
• Dr. Kai Naumann, Landesarchiv Baden-Württemberg
• kai <dot> naumann <at> la-bw <dot> de
• Twitter @Naumann_Kai
• Phone 0049 711 212 4284