About the Webinar
The digitization of resources can provide expanded access to information as well as a preservation mechanism for now-fragile materials. Preserving the digital copy of the resource is an issue now being addressed, but what about the software used to create digital files? How can software on media which can no longer be read -- or no longer be read easily -- be preserved? If that software can’t be accessed, what happens to the material created by, and only read by, that software?
Progress has been made in formulating standards for the preservation and description of digital materials and a framework for addressing digital item preservation has been proposed. Despite, however, meetings such as the Library of Congress’ “Preserving.exe: Toward a National Strategy for Preserving Software,” no formal standard or framework yet exists for software digitization and preservation. This webinar will feature three presenters who will speak on aspects of software digitization and preservation, including a how-to approach (technical aspects), a metadata component, and observations from the field as part of the continuing discussion on the state of the field and the need for standardization.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Software artifacts: Migration and Emulation
Michael Lesk, Professor of Library and Information Science, Rutgers University
Emulation in practice: Emulation as a Service at Yale University Library: Lessons learnt and plans for the future
Euan Cochrane, Digital Preservation Manager, Yale University Library
No (You Can't Expect To Run Your Files Just Because You Saved Them)
Jon Ippolito, Professor of New Media and Director of the Digital Curation graduate program, University of Maine
Larry Goldberg, Director, Media Access Group Director at WGBH and The Carl and Ruth Shapiro Family National Center for Accessible Media (NCAM)
Geoff Freed, Director of Technology Projects and Web Media Standards, The Carl and Ruth Shapiro Family National Center for Accessible Media NCAM
An introduction to evolving authoring and display technologies, including mobile devices and e-readers, that provide reading experiences for people with print disabilities. The presentation will include discussion of new reading systems, publishing tools and practices, the EPUB 3 standard, "smart graphics," best publishing practices, and the Content Model for accessible images. Also discussed will be Federal and state requirements for accessible textbooks and relevant public policy initiatives.
Larry Goldberg, Director, Media Access Group Director at WGBH and The Carl and Ruth Shapiro Family National Center for Accessible Media (NCAM)
Geoff Freed, Director of Technology Projects and Web Media Standards, The Carl and Ruth Shapiro Family National Center for Accessible Media NCAM
An introduction to evolving authoring and display technologies, including mobile devices and e-readers, that provide reading experiences for people with print disabilities. The presentation will include discussion of new reading systems, publishing tools and practices, the EPUB 3 standard, "smart graphics," best publishing practices, and the Content Model for accessible images. Also discussed will be Federal and state requirements for accessible textbooks and relevant public policy initiatives.
Education and Free Software - Jon Maddog Hall in Campus Party LondonAntonio Pérez
Jon Maddog Hall, hace un repaso de las ventajas del software libre en la educación, como puede emponderar pequeñas comunidades. También nos habla de las cosas que deberíamos enseñar y las que no. Consejos para una educación más libre y solidaria.
Open file formats favour real innovation and really free marketsMarco Fioretti
Slides of the 2009 edition of one of my seminars on open file formats, and digital standards in general, and their effects on (among other things...) economy, culture, privacy and free speech. More details, reviews and info about hosting newer editions at http://mfioretti.com/2009/02/file-formats-can-favor-or-hamper-innovation-active-citizenship-and-really-free-markets/
Free Libre Open Source Software - Business Aspects of Software IndustryFrederik Questier
F. Questier, Free Libre Open Source Software - Guest Lecture for the course Business Aspects of Software Industry of Prof. M. Goldchstein, with students from management science and computer science, Vrije Universiteit Brussel (19/05/11)
Presented at Troopers 2016.
When Infosec and Digipres share interests...
TL;DR
- Attack surface with file formats is too big.
- Specs are useless (just a nice ‘guide’), not representing reality.
- We can’t deprecate formats because we can’t preserve and we can’t define how they really work
- We need open good libraries to simplify landscape, and create a corpus to express the reality of file format, which gives us real “documentation”.
- Then we can preserve and deprecate older format, which reduces attack surface.
- From then on, we can focus on making the present more secure.
- We don't need new formats: reality will diverge from the specs anyway - we need 'alive' (up to date, traceable) specs.
Education and Free Software - Jon Maddog Hall in Campus Party LondonAntonio Pérez
Jon Maddog Hall, hace un repaso de las ventajas del software libre en la educación, como puede emponderar pequeñas comunidades. También nos habla de las cosas que deberíamos enseñar y las que no. Consejos para una educación más libre y solidaria.
Open file formats favour real innovation and really free marketsMarco Fioretti
Slides of the 2009 edition of one of my seminars on open file formats, and digital standards in general, and their effects on (among other things...) economy, culture, privacy and free speech. More details, reviews and info about hosting newer editions at http://mfioretti.com/2009/02/file-formats-can-favor-or-hamper-innovation-active-citizenship-and-really-free-markets/
Free Libre Open Source Software - Business Aspects of Software IndustryFrederik Questier
F. Questier, Free Libre Open Source Software - Guest Lecture for the course Business Aspects of Software Industry of Prof. M. Goldchstein, with students from management science and computer science, Vrije Universiteit Brussel (19/05/11)
Presented at Troopers 2016.
When Infosec and Digipres share interests...
TL;DR
- Attack surface with file formats is too big.
- Specs are useless (just a nice ‘guide’), not representing reality.
- We can’t deprecate formats because we can’t preserve and we can’t define how they really work
- We need open good libraries to simplify landscape, and create a corpus to express the reality of file format, which gives us real “documentation”.
- Then we can preserve and deprecate older format, which reduces attack surface.
- From then on, we can focus on making the present more secure.
- We don't need new formats: reality will diverge from the specs anyway - we need 'alive' (up to date, traceable) specs.
Even internet computers want to be free: Using Linux and open source software...North Bend Public Library
Use of open source software (OSS) is common in the server rooms of many libraries. Many have even taken the step of switching their public workstations to the open source web browser Firefox. However, making the jump to an open source operating system for public computers has not caught on quite as well. In this presentation, we will detail how several libraries in Coos County, Oregon, have switched their public internet terminals predominantly to open source software, specifically Ubuntu Linux, Firefox, and OpenOffice. We show how Coos County libraries are able to provide the excellent range of services - and indeed improved over the services - available on Windows- or Mac-based public computers. We detail the software we use, the costs and benefits of the change, and how the switch has been received by the public and library staff. The presentation includes screenshots of what patrons experience when they sit down at a computer. It also provides tips for supporting the wide variety of media, file types, and devices that patrons may bring to the library.
This presentation was delivered on February 5, 2010, at the Online Northwest conference (http://www.ous.edu/onlinenw/).
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the closing segment of the NISO training series "AI & Prompt Design." Session Eight: Limitations and Potential Solutions, was held on May 23, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the seventh segment of the NISO training series "AI & Prompt Design." Session 7: Open Source Language Models, was held on May 16, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the sixth segment of the NISO training series "AI & Prompt Design." Session Six: Text Classification with LLMs, was held on May 9, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fifth segment of the NISO training series "AI & Prompt Design." Session Five: Named Entity Recognition with LLMs, was held on May 2, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fourth segment of the NISO training series "AI & Prompt Design." Session Four: Structured Data and Assistants, was held on April 25, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the third segment of the NISO training series "AI & Prompt Design." Session Three: Beginning Conversations, was held on April 18, 2024.
This presentation was provided by Kaveh Bazargan of River Valley Technologies, during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by Dana Compton of the American Society of Civil Engineers (ASCE), during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the second segment of the NISO training series "AI & Prompt Design." Session Two: Large Language Models, was held on April 11, 2024.
This presentation was provided by Teresa Hazen of the University of Arizona, Geoff Morse of Northwestern University. and Ken Varnum of the University of Michigan, during the Spring ODI Conformance Statement Workshop for Libraries. This event was held on April 9, 2024
This presentation was provided by William Mattingly of the Smithsonian Institution, during the opening segment of the NISO training series "AI & Prompt Design." Session One: Introduction to Machine Learning, was held on April 4, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the eight and final session of NISO's 2023 Training Series on Text and Data Mining. Session eight, "Building Data Driven Applications" was held on Thursday, December 7, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the seventh session of NISO's 2023 Training Series on Text and Data Mining. Session seven, "Vector Databases and Semantic Searching" was held on Thursday, November 30, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the sixth session of NISO's 2023 Training Series on Text and Data Mining. Session six, "Text Mining Techniques" was held on Thursday, November 16, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the fifth session of NISO's 2023 Training Series on Text and Data Mining. Session five, "Text Processing for Library Data" was held on Thursday, November 9, 2023.
This presentation was provided by Todd Carpenter, Executive Director, during the NISO webinar on "Strategic Planning." The event was held virtually on November 8, 2023.
This presentation was provided by Rhonda Ross of CAS, a division of the American Chemical Society, and Jonathan Clark of the International DOI Foundation, during the NISO webinar on "Strategic Planning." The event was held virtually on November 8, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the fourth session of NISO's 2023 Training Series on Text and Data Mining. Session four, "Data Mining Techniques" was held on Thursday, November 2, 2023.
More from National Information Standards Organization (NISO) (20)
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run Them?
1. NISO Webinar:
Software Preservation and Use:
I Saved the Files But Can I Run Them?
Wednesday, May 13, 2015
Speakers:
Michael Lesk
Professor of Library and Information Science, Rutgers University
Euan Cochrane
Digital Preservation Manager, Yale University Library
Jon Ippolito
Professor of New Media and
Director of the Digital Curation Graduate Program,
University of Maine
http://www.niso.org/news/events/2015/webinars/software/
3. Software preservation
The hard problem is not bad tape; it’s obsolescence.
There are two common answers to the obsolescence
problem.
Migration or emulation?
Migration: Convert the old information to a new
format, e.g., BMP to JPEG.
Emulation: Use old information on a new version of
an old machine, e.g. using a website that looks like
an arcade game platform.
4. Why might old software be lost?
All the copies were thrown away.
The copies still exist, but the media have worn out.
The media are OK, but we have no device to read them.
We can read the bits, but we don’t know what they mean.
We understand the bits, but have no software to process them.
We have software but nothing to run it on.
The software depends on an environment that no longer exists.
We could process the bits, but we lack legal permission.
5. Discarded
We know the first telegram:
“What hath God wrought?” May 24, 1844; Samuel F. Morse, in
Washington, to Alfred Vail, in Baltimore.
We know the first telephone call:
“Mr. Watson—Come here—I want to see you.” March 10, 1876.
Alexander Graham Bell to Thomas A. Watson, in Boston.
We don’t know the first email message. It was in the spring of
1964, in either Cambridge (UK), Cambridge (Mass.), or
Pittsburgh; but whatever it was, it was thrown out, and nobody
kept good records.
The solution to this problem is multiple copies. Digital copies are
perfect and cheap; use them.
6. Media fragility
In the 1970s Brazil stored Landsat space photography of their
country on magnetic tape. These tapes were stored in humid
conditions and deteriorated until they were unreadable.
Magnetic tape is often fragile; audio tape is lost as well. It helps
to start with better quality tape, and linear tape (audio) is better
than helical tape (VHS cassettes). Sometimes it helps to heat the
tape, once; hence one of the great titles in preservation
literature, “If I knew you were coming, I’d have baked a tape”
(Eddie Ciletti).
Again the solution to this is multiple copies, regularly inspected.
Note projects like LOCKSS: Lots of copies keep stuff safe.
7. Devices gone
Where today would you find a diskette drive? And that’s an easy
one: what about a paper tape reader?
The answer to those is eBay, but what about special-purpose
technology that failed in the marketplace, such as kinds of 12”
writeable optical disk from the early 1990s?
Again, the answer is multiple copies on current devices. Even if
your organization thinks it’s prepared to keep its 1980-vintage
DEC computer running for a long time, where would you find
spare parts when it broke? Or a technician who knew what to do
with them?
8. Forgot the format
It is possible to have a format and not know what it is. Suppose you
have a file made by Volkswriter, marketed by Lifetime Software
(which, despite its name, ceased operating independently in 1991).
How would you find out the control codes?
If you can’t find documentation, it may be easier to view this as a
decipherment problem: if you find a funny symbol at “plus ?a
change, plus c’est …” it’s the French ç character.
Now we’re into the real issues: is it better to try to find a copy of the
software or to convert the file a current standard, like Word? In this
case (word processing) conversion is probably easier.
Solution: use standard formats. Preferably public ones.
9. No software is available
Again, the vendor who wrote the software originally used for your
file might have gone out of business. If your file is in a public format,
there is probably an alternative. But if it was in a proprietary
format, it may be difficult to find something that reads it. There
was a time, for example, when Microsoft deliberately arranged for
old MS-Word documents to be unreadable on newer versions so
that customers would be forced to upgrade continuously. And in
those days, Microsoft tried to prohibit other vendors from selling
software that read and translated the “.doc” format; some of them
did it anyway, and Microsoft gave up.
The solution is public formats and current formats; for example the
newer “.docx” files in Microsoft Word have a public description.
10. No machine to run the software
Now we’re into the hard part of the problem: you might have some
kind of program but it was coded to run on a long-gone machine
(Commodore 64, anyone)? You choice is between
Finding a machine for sale on eBay – but you can’t get parts to fix it,
and you may have trouble finding out how to make it work.
Migrating whatever this is to a modern platform, ideally expressing
it in public standard terms.
Finding an emulator for the old machine: something that will run
the old code as it was.
11. Migration vs. Emulation
Migration means converting files to newer formats. For example,
Amiga graphics to Tiff or JPEG. If you migrate to a public standard
you minimize the chance of having to do it again. It’s hard to guess
which commercial formats will survive: if you had asked me in the
1990s whether a Kodak image format would survive, I would have
said yes. You have to do it for every format. But you get modern
capabilities with the converted files.
Emulation means programming a current machine to behave like an
old machine. This is a difficult task, but emulators exist for many
common machines, particularly game platforms. A notable project
is Olive (olivearchive.org) which is aimed at preservation of
intellectual content beyond video games (CMU, IBM, and others).
You get only the old behavior of the program.
12. Examples
Migration:
JSTOR, and many old journal systems: the early issues, whatever
their original formats, are now in PDF. Often they were just OCRd
from the printed version, rather than translated digitally (high
proofreading cost but minimal programming complexity). You can
use all modern PDF tools on the articles.
Emulation:
The Internet Arcade is a collection of 1970s-80s arcade games that
you can run in an emulator:
https://archive.org/details/internetarcade
13. Some very special cases
Colossus, 1942. Colossus re-build, 1996
Charles Babbage’s
Difference Engine,
as rebuilt by the
Science Museum
(London), 2002.
14. Analogy
Consider performing early music. Should you play it on old
instruments or modern ones? Old instruments are more authentic,
but have a different effect on the modern ear. Bach’s listeners had
not heard a piano and the organ did not sound “old fashioned”.
Emulation is finding an old church (there are some in Germany
whose architecture and organ pipes are not changed from Bach’s
day) and using old-fashioned performance techniques.
Migration is using a piano (and keyed flutes and trumpets, etc) but
trying to produce the same emotional effect.
Similarly with old books: Caslon and Baskerville did not look old to
people who had never seen Helvetica.
15. If you lack source code
In general, you can’t migrate a piece of software without the source
code, since you want to recompile it on a new machine. There are
de-assemblers, but the result is going to be a real pain to
understand. So if you have only the object code, you may be driven
to emulation. Since many software vendors keep source code very
secret, and did so in the past as well, it’s not uncommon to have
only the binary form of some program.
A legal warning: if you can’t find the vendor (out of business) and
get permission, you may not have permission even to use the binary
code, although this may depend on the terms of the original
purchase. It may or may not have allowed transferring the program
to a new user.
16. Features in old and new versions
Suppose you take an ancient word processor file and migrate it to a
modern format. Then you can do things like export HTML, or PDF.
Any tool that will use the modern format can work with your old
file. But the tool will give a modern result – it will run faster, use
modern display fonts, and the like.
If you are using an emulator, you get the old behavior. If the
program only displayed green on black, you get green on black. This
is “authentic” but you may not like it. And you may not be able to
create HTML or PDF from the program. If you are trying to merge
many such older documents into a digital library, the format
incompatibilities will make things worse.
17. Metadata
If you really want to preserve a complex software object, it
helps to know exactly what programs were used to create it.
That means not just the name, but the exact version. Other
issues that are more serious for digital preservation include
provenance: where did this come from? This is relevant for
answering questions about the material, or finding the people
who might know the answer. Similarly it may assist with rights
metadata, or technical metadata. Modern formats sometimes
have technical metadata included in the file (eg in a JPG
header) but older formats often don’t.
Again, it is easiest if you use well-known and common formats.
18. Standards
“The good thing about standards is that you have so much
choice.”
Even ASCII (ISO 646) is ambiguous. The UK changed the “#”
character to mean “£” and Germany changed “}” to “ü” .
Particularly worrisome are “wrapper” formats. Tiff may
contain different kinds of image compression algorithms (such
as G4 fax, or Lempel-Ziv), and thus a Tiff reader may not be
able to read all Tiff images. Some image viewers understand
progressive images in GIF or JPG; some don’t. PDF can include
the kitchen sink (eg 3-D viewers).
Solution: emphasize the best and most public formats.
19. Missing environment
What would it mean to preserve the “Amazon home page”? It is
different for every person using it and for each instance – it’s
synthesizing using the browsing and order history of the user, the
current incentives for sales at Amazon, and lots else (geography,
source computer, etc.). There are many pieces of software that
depend on almost everything around them- think about all the
install scripts that ask “we want to use your location,” “we want to
use your browser history,” and so on. (And of course many
programs don’t ask, they just use them.)
No good answer for this. You have to judge what you mean by
preserving the object – what will the users want the behavior to be?
20. Protection from abuse
If you run a general-purpose preservation operation, you need
to think about whether anything in your preservation files is
dangerous or doubtful in some way. People might try to use
your system to distribute malware (viruses) or to enable
software piracy.
Thus, unfortunately, you may want to put out calls like “please
send in examples of early APL software” but you can’t just
accept anything, and can’t rely on statements made by
unknown volunteers about what they are submitting.
21. Legal permission
You may have an object, and know what to do with it, but not have
legal permission to preserve it. For example, many of the video
game companies object to attempts to imitate the old games – to
them, this is creating competition for new games.
Unfortunately, given the copyright trolls out there, who try to make
a living by finding people who have downloaded something they
shouldn’t have, and then threatening them with lawsuits, this is not
an area where it is easier to get forgiveness than permission.
Libraries are often justifiably paranoid.
There is of course the preservation exception in the law; but it limits
a library to on-premises use.
22. Good and bad
Why software preservation is hard: the material is not self-
describing, there were many early products that vanished without
adequate documentation, software can be very complex, it requires
special hardware to run, and so on….
Why software preservation is easy: as with all digital information, it
can be copied without error; if one person has migrated a format or
emulated a machine, that can be used by others; and computers are
new enough that there is probably no computer without some user
who is still alive. I learned to program on a Univac I; that doesn’t
mean I have a tape drive that uses its steel tapes (yes, steel), but at
least I know what they are.
23. Conclusion
The biggest technical choice is migration vs. emulation. I
would generally say:
migration for static formats
emulation for executable programs
There are some ambitious programs: the Computer History
Museum in Mountain View has been able to salvage old
machines like the Xerox Alto.
But the industry does a lot less than we would like; it is more
common to have legal problems in salvaging software than to
get financial help from its original marketer.
24. Emulation in Practice
Emulation as a Service at Yale University Library; lessons learnt and plans for
the future
Euan Cochrane, Digital Preservation Manager, Yale University Library
25. Overview
1. Why should we care about emulation?
2. What is emulation?
3. How do we do emulation?
4. What is Emulation as a Service (EaaS)?
5. How we use EaaS
6. Lessons learnt using EaaS
7. Future work at Yale University Library (YUL)
27. Why? - Executable content
• Video games
• Research data workflows
• Digital Art
• Software as artifact
• Digital artifact museums
(preserving the tools and infrastructure of the digital age)
28. Why? – Software dependent
content
Content that requires software in order to be rendered
or interacted with:
• Office files (documents, spreadsheets, slide sets, etc)
• CAD files
• Outlook inboxes
• eBooks with note taking capability
• Desktop environments
• Code
• Any proprietary, or effectively proprietary, formats
30. Old software is required to
authentically render old content
Original content in original
software (WordPerfect in
Windows 95)
Original content in newer
software (LibreOffice Writer in
Windows Vista)
31. Research results are at risk of loss
without original software
Original content in original software
(WordStar for DOS in Microsoft DOS)
[NB: equation predicting tree growth
rates includes exponents documented
using upper line of text]
Original content in newer software
(LibreOffice Writer in Windows
Vista)
33. How? – Emulation and virtualization
software tools
• An emulation software package
(“emulator”) is used to create a
virtual version of one computer
within another computer that has
different hardware
• Old software can be run on the
“emulated” computer hardware just
like it was running on the original
physical computer.
• Many emulators were originally
developed to run old video games
34. How? – Software tools
• Emulation is often used to support old hardware
devices that require obsolete software
(e.g. assembly line management software, scientific instruments,
industrial machinery, etc)
• Emulation is widely used by mobile phone
application developers to develop software for
phone-hardware using desktop-PC hardware
(i.e. phone hardware is emulated on desktop pcs to build phone-
compatible applications)
• Virtualization = emulation but with compatible
hardware
(some of the host machine’s hardware is used directly by the
“virtualized” computer)
Virtualization bridges the gap between departure of recently obsolete
hardware and the arrival of hardware powerful enough to emulate it
35. How? – Preserving software
and dependencies
• We need to curate and preserve operating systems to support access to
assets that depend on them
• We need to curate and preserve software applications to support access to
content that depends on them
• We need to curate and preserve fonts, scripts, plug-ins and other
dependencies to support access to content that requires them
• We need to preserve whole desktop environments (e.g. Salmon Rushdie’s
desktop at Emory university) to support access to the experience of interacting
with it
• We need to curate and preserve pre-configured disk images with software
already installed on them – for running on emulated hardware
36. How? - Documentation
• We need unique, persistent identifiers for software
• We need software catalogues
• We need unique, persistent identifiers for disk images
(installed environments/virtual hard drives)
• We need disk image/virtual hard drive catalogues
• We need unique, persistent identifiers for
emulated/virtualized hardware configurations
• We need hardware configuration catalogues
37. How? - Documentation
• We need unique, persistent identifiers for software
• We need software catalogues
• We need unique, persistent identifiers for disk images
(installed environments/virtual hard drives)
• We need disk image/virtual hard drive catalogues
• We need unique, persistent identifiers for
emulated/virtualized hardware configurations
• We need hardware configuration catalogues
*Mostly, the internet
archive is doing great
work, as are NIST and
PRONOM
We
don’t
have
these
(yet!)*
39. How? – Configuring emulated hardware
• Admins configure an
emulator
• Admins install and/or
configure the emulated
software
• Requires various emulator
specific, technically
challenging tools
40. How? – accessing emulated
environments at libraries and
archives
• Users access
emulated
environments via
dedicated
machines
• Use dedicated
software
• At libraries and
archives this is
mostly restricted to
reading rooms
43. Emulation as a Service –What is
it?
Remote access to pre-configured emulated and virtualized
environments via any modern web browser
Abstracts configuration challenges away from end-users
Changes to environments can be saved or discarded at the end
of a session (a fresh/unchanged version is always available)
Interactivity can be restricted where appropriate (e.g. limited
ability to download or copy content to local computer)
Relatively simple way to provide custom online environments
(virtual reading rooms?)
44. Emulation as a Service (EaaS)–
Why?
• A lot of old digital content can only be properly accessed using
emulation tools
• Emulation is technically specialized
• Old software can be challenging for modern users to understand
• Modern users don’t expect to have to come into a reading room
to access digital content
• Maintain control over content: users can’t copy data in or out
unless authorized (screenshots are inevitably excluded)
45. Emulation as a Service (EaaS)–
Why?
• Strong separation between environments, objects and
emulators/configurations
• Emulation can be provided remotely (outsourced) with disk
image archives and/or content maintained locally)
• Small derivative environments can be created from base-
environments –saving space
• Standard environments can be reused and customized
• Provides ability to cite environments
48. EaaS – How it works
(For Technical Administrators)
• Admins configure
an emulator on
local PC
• Admins configure
the emulated
software on a local
PC
• Configured
environment gets
saved as a “disk
image” with
configuration
metadata
49. • Admins confirm the
software
environment stored
on the disk image
works on local PC
• Admins/Archivists/L
ibrarians ingest it
into the EaaS
service:
EaaS – How it works
(For Technical Administrators)
50. works
(For
Librarians/Archivist
s)• Pre-configured software
environments (e.g. a
Windows 95 + Office 95
environment) can have
files added to them and
be saved as a variant or as
a stand-alone new
environment
• Only difference (delta)
between base-
environments and
customized environment
retained – saving space by
not duplicating virtual
hard drive content
51. • CD-ROMs and
other software
can be ingested,
installed/configure
d on top of a base
environment, and
tested using an
online interface
• Newly customized
environment can
be stored for
future use and
works
(For
Librarians/Archivist
s)
52. • Librarians/Archivi
sts can also
ingest disk
images captured
from machines
they have
acquired (e.g.
authors’/politicia
ns’ desktops)
works
(For
Librarians/Archivist
s)
53. EaaS – How it works
(For end-users)
• Users can click on links in a
catalogue/finding aid to
access
environments/content
54. EaaS – How it works
(For developers and system
integrators)
• Provides generic access to functionality of many emulators and virtualization
tools vi a WebService and REST API
• Emulation functionality can be incorporated into existing workflows
• Emulated (or virtualized) environments can be embedded into web pages for
online access and online exhibitions
• Emulated environment citations, thumbnails, and URIs/URLs enable easy
integration with existing catalogues and finding aids
• One-click “image-disk-and-emulate” workflows being developed (collaborating
with digital forensics initiatives)
• Open Source (currently available on request, code will be published in the
future)
55. EaaS – Background
• bwFLA EaaS project from University of Freiburg in
Germany (http://bw-fla.uni-freiburg.de)
• Personally collaborated with bwFLA at Freiburg
while at Archives New Zealand
• Now at Yale University Library and brought
collaboration along
• Yale University Library have(/had!) only installation
outside of Germany
58. Related work
• Olive Archive https://olivearchive.org/
• Internet Archive
https://archive.org/details/software
• Keep Emulation framework
http://emuframework.sourceforge.net/
• QEMU http://wiki.qemu.org/Main_Page
59. EaaS at Yale
• Testing and providing requirements for ongoing
development
• Imaging general collections digital media & Trialing
access via EaaS
• Investigating workflow integration (virtual reading
rooms?)
• Finding gaps in supporting infrastructure
61. Lessons learnt
• Software licensing needs to be solved (abandonware
and out-of-cart software are huge problems)
• Scale is manageable through standardization and
sharing
• Archivists and Librarians can use EaaS with relatively
little training
• The possibilities of using EaaS in workflows are huge
• If EaaS becomes an assumption, creators may change
62. Future work at Yale University
Library
• Move EaaS into production
• Increase software archiving
• Develop standard shareable environment images
• Collaborate with others to maximize efficiency of software archiving
• Develop emulation testing standards and frameworks
• Explore options for preserving networked environments
• Make progress on the licensing issues
64. NISO Webinar • May 13, 2015
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/software/
NISO Webinar
Software Preservation and Use:
I Saved the Files But Can I Run Them?
65. Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU