Presentation at Best Practices Exchange conference in Harrisburg, PA. Enjoyable data rescue projects over the years and suggestions re IPDRC - Initiative to Preserve Data Rescue Capabilities.
When assessing backup software vendors, IT leaders must avoid being taken in by flashy new features. Backup software should really be called restore software as failure in backup is failure to meet recovery objectives. Always evaluate features in light of these objectives.
Use this research to:
•Understand new features and develop a strategy to meet new challenges to enterprise backup, such as ever-increasing backup sizes, backup of virtual infrastructures, and evolving backup architecture strategy.
•Evaluate eight different backup software vendors for best fit using Info-Tech’s Vendor Landscape.
•Use Info-Tech’s scenario analysis to shortlist vendors according to your current situation and submit an RFP to vendors, score their responses, and prepare a backup software demo script.
•Assess implementation pitfalls in light of overall data management, security, and compliance requirements.
Ensure that you make the best-fit backup software decisions for enterprise availability and restore requirements, from strategy to selection to implementation.
Brent Friedman
- Three hour workshop format, 30 minutes or so on history, trends, quick review of normalization, 45 minutes on a normalization walk-through including everyone, then break into small teams (3-6 ppl) to do a 'normalization challenge' on a real world practical problem
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserAngela Hey
A history of Big Data by Dr John Mashey. It discusses 1890 census data, IBM mainframes, DEC minicomputer clusters, Silicon Graphics (SGI) applications and a history of Open Source computing.
When assessing backup software vendors, IT leaders must avoid being taken in by flashy new features. Backup software should really be called restore software as failure in backup is failure to meet recovery objectives. Always evaluate features in light of these objectives.
Use this research to:
•Understand new features and develop a strategy to meet new challenges to enterprise backup, such as ever-increasing backup sizes, backup of virtual infrastructures, and evolving backup architecture strategy.
•Evaluate eight different backup software vendors for best fit using Info-Tech’s Vendor Landscape.
•Use Info-Tech’s scenario analysis to shortlist vendors according to your current situation and submit an RFP to vendors, score their responses, and prepare a backup software demo script.
•Assess implementation pitfalls in light of overall data management, security, and compliance requirements.
Ensure that you make the best-fit backup software decisions for enterprise availability and restore requirements, from strategy to selection to implementation.
Brent Friedman
- Three hour workshop format, 30 minutes or so on history, trends, quick review of normalization, 45 minutes on a normalization walk-through including everyone, then break into small teams (3-6 ppl) to do a 'normalization challenge' on a real world practical problem
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserAngela Hey
A history of Big Data by Dr John Mashey. It discusses 1890 census data, IBM mainframes, DEC minicomputer clusters, Silicon Graphics (SGI) applications and a history of Open Source computing.
In the year 1984, a young Steve Jobs unveiled the first Apple Macintosh computer with a verse from Bob Dylan’s famous song - “The times they are a changing”.
Mateo Valero - Big data: de la investigación científica a la gestión empresarialFundación Ramón Areces
El 3 de julio de 2014, organizamos en la Fundación Ramón Areces una jornada con el lema 'Big Data: de la investigación científica a la gestión empresarial'. En ella estudiamos los retos y oportunidades del Big data en las ciencias sociales, en la economía y en la gestión empresarial. Entre otros ponentes, acudieron expertos de la London School of Economics, BBVA, Deloite, Universidades de Valencia y Oviedo, el Centro Nacional de Supercomputación...
What's the Big Deal About Big Data?.pdfSteven Jong
If a CPU is a computer, then there are now hundreds of computers for each person on earth, all of them collecting raw data, some at prodigious rates. The total amount of data in the world is doubling every two years, and the uses to which that data can be put are amazing. As a market segment, Big Data is growing faster than the software industry. Documenting Big Data is thus a lucrative and interesting field for technical communicators.
What is Big Data and how is it fundamentally different from storing files? In what fields is Big Data being used today? What are some examples of how the data can be analyzed and used? Are there any ethical issues involved? What skills and knowledge do you need to know to break into the field? The presenter, who has worked for several years for a storage vendor, offers his experience and perspective.
Describing Everything - Open Web standards and classificationDan Brickley
Original title: Open Web standards and classification: Foundations for a hybrid approach
Keynote address, UDC Seminar:
Classification at a Crossroads
30 October 2009 Koninklijke Bibliotheek, The Hague
Dan Brickley, Vrije University Amsterdam
Design and development of a web-based data visualization software for politic...Alexandros Britzolakis
Presenting a tool for identifying political popularity over Twitter. AthPPA (which stands for Athena Political Popularity Analysis) is a tool for identifying how popular a political leader is over Twitter. For the purposes of this dissertation the Twitter accounts of the most prominent Greek political leaders have been identified. Structured data such as likes, re-tweets, text-length per tweet as well as the number of subscribers per account have been visualized. Furthermore, sentiment analysis is calculated and visualized using spaCy module and a sentiment lexicon which contains a set of emotion based labeled words.
Information is the basis of our society. Whether this information starts out on paper or as digital data, its ultimately too fragile to store long term in those forms. Here is the solution to store your data for more than 100 years with 98% savings on overheads and storage space.
In the year 1984, a young Steve Jobs unveiled the first Apple Macintosh computer with a verse from Bob Dylan’s famous song - “The times they are a changing”.
Mateo Valero - Big data: de la investigación científica a la gestión empresarialFundación Ramón Areces
El 3 de julio de 2014, organizamos en la Fundación Ramón Areces una jornada con el lema 'Big Data: de la investigación científica a la gestión empresarial'. En ella estudiamos los retos y oportunidades del Big data en las ciencias sociales, en la economía y en la gestión empresarial. Entre otros ponentes, acudieron expertos de la London School of Economics, BBVA, Deloite, Universidades de Valencia y Oviedo, el Centro Nacional de Supercomputación...
What's the Big Deal About Big Data?.pdfSteven Jong
If a CPU is a computer, then there are now hundreds of computers for each person on earth, all of them collecting raw data, some at prodigious rates. The total amount of data in the world is doubling every two years, and the uses to which that data can be put are amazing. As a market segment, Big Data is growing faster than the software industry. Documenting Big Data is thus a lucrative and interesting field for technical communicators.
What is Big Data and how is it fundamentally different from storing files? In what fields is Big Data being used today? What are some examples of how the data can be analyzed and used? Are there any ethical issues involved? What skills and knowledge do you need to know to break into the field? The presenter, who has worked for several years for a storage vendor, offers his experience and perspective.
Describing Everything - Open Web standards and classificationDan Brickley
Original title: Open Web standards and classification: Foundations for a hybrid approach
Keynote address, UDC Seminar:
Classification at a Crossroads
30 October 2009 Koninklijke Bibliotheek, The Hague
Dan Brickley, Vrije University Amsterdam
Design and development of a web-based data visualization software for politic...Alexandros Britzolakis
Presenting a tool for identifying political popularity over Twitter. AthPPA (which stands for Athena Political Popularity Analysis) is a tool for identifying how popular a political leader is over Twitter. For the purposes of this dissertation the Twitter accounts of the most prominent Greek political leaders have been identified. Structured data such as likes, re-tweets, text-length per tweet as well as the number of subscribers per account have been visualized. Furthermore, sentiment analysis is calculated and visualized using spaCy module and a sentiment lexicon which contains a set of emotion based labeled words.
Information is the basis of our society. Whether this information starts out on paper or as digital data, its ultimately too fragile to store long term in those forms. Here is the solution to store your data for more than 100 years with 98% savings on overheads and storage space.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Mastering the Concepts Tested in the Databricks Certified Data Engineer Assoc...SkillCertProExams
• For a full set of 760+ questions. Go to
https://skillcertpro.com/product/databricks-certified-data-engineer-associate-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Collapsing Narratives: Exploring Non-Linearity • a micro report by Rosie WellsRosie Wells
Insight: In a landscape where traditional narrative structures are giving way to fragmented and non-linear forms of storytelling, there lies immense potential for creativity and exploration.
'Collapsing Narratives: Exploring Non-Linearity' is a micro report from Rosie Wells.
Rosie Wells is an Arts & Cultural Strategist uniquely positioned at the intersection of grassroots and mainstream storytelling.
Their work is focused on developing meaningful and lasting connections that can drive social change.
Please download this presentation to enjoy the hyperlinks!
Obesity causes and management and associated medical conditions
Data Rescue and Preserving DR Capabilities
1. Presentation to
BEST PRACTICES EXCHANGE 2015
Rescuing Early Digital AssetsRescuing Early Digital Assets
andand
Preserving Data Rescue CapabilitiesPreserving Data Rescue Capabilities
chris@mullermedia.com
2. First, a great pre-digital example of
data rescue and preservation.
12,000 pages of New Amsterdam
records sat in an Albany vault for
over 300 years. Re-discovered in
1973, leading to Russell Shorto's
wonderful book "Island at the
Center of the World".
Fortunately, some visual records have had the
“Luxury of Languishing” (LOL, as we say. )
3. “McMoons”
Lunar Orbiter Image Recovery.
set up a lab in a former McD building. It was
an amazing challenge to revive and in some
cases fully create the needed capabilities.
Dennis Wingo said this project has “opened a
door that we call ‘Technoarchaeology’… The
preservation of our digital heritage, both
hardware and software is a growing need …”
See http://www.moonviews.com
Analogue magnetic tapes of a very rare sort, created in the
60’s. Further efforts to recover/re-purpose the data in the
1980’s. Funding lapsed. Equipment sat in Nancy Evan’s
garage until 2007. New project under lead of Dennis Wingo
4. Why be extra-aggressive regarding
older digital materials?
Because TIME is the Phantom Menace…
A. Time and fragile storage media can cause valuable
records to decay on the shelf. (No LOL.)
B. Time and migration to new computers, software and
media renders older electronic records incompatible and
unusable on the new systems.
C. Time, down-sizing and rapidly changing priorities lead to
undocumented files that are difficult to decipher.
D. Let’s not forget Time’s creepy ally… The same that most
historians now believe destroyed the Great Library and
many other collections: Lack of Attention.
5. Technology’s VaporVapor TrailTrail
Time & Computer Media (very rough scale)
Techno-
Rocket
Our “Vapor Trail”
of Older Information
Time
A not-so-good kind of Cloud Computing. In “ancient” times
(10-40 years ago), computer storage was expensive and
difficult to use. So most data had actual value. Without MP3s,
videos, tweets, porn and politics, it often yields a richer
harvest than thought. But it’s getting harder to retrieve.
6. Obstacles to Recovery and Conversion.
It's a bit like peeling back the layers of an onion:
1. Media Compatibility
2. Age, Chemistry, Storage Conditions
3. Recording Method keeping the door open
4. Operating System/Filing System
5. Backup, Exchange, or Archiving Software
6. File Structure
7. File Encoding
Lots and lots of different
onions in that vapor trail…
7. Media Compatibility
Who would have thought it would be hard to find a PC
with a floppy drive this soon? Let alone 9-track reels.
8. Dust Gravity Chemistry, Heat, Humidity
Age & Storage Conditions
Media OK for day-to-day backup is often not so for archiving.
10. QIC serpentine recording 4 track, 9 track, 18 track, etc. from 11mb to 24 gb
(also QIC-80 mini-cartridge, Travan, etc.)
8mm (Exabyte) helical scan recording 2.3gb … 14gb … Mammoth … AIT
4mm DAT helical scan recording DDS, DDS-C, DDS-2,3,4,5… 1 to 72gb
Lighter-Duty Tape Drives (mostly PCs, small Minis, Servers)
Muller Media Cnversions www.mullermedia.com
Recording Method (continued)
11. The way files are stored and identified-for example:
• Folder/directory organization.
• How long its name can be. What characters are valid.
• File types and their associations.
• How creation date or date of last access are saved.
• How space is allocated for a file.
• Indexed (ISAM) file operations--part of O/S?
Operating/Filing System
12. Backup software is often not best for archiving or data
exchange. Software intended for exchange of data with
other systems (e.g.- ANSI or IBM Standard, TAR, etc.)
produce the most widely readable tapes, but you can lose
valuable metadata that way.
Next best choice is often the O/S’s standard backup
program, such as Wang VS Backup, VAX VMS Backup,
Windows Backup.
(We are speaking here of older systems. Newer environments often use
software and methods to overcome or at least change these problems.)
Backup, Exchange, or
Archiving Software
13. Database, word-processing and other complex file types are
generally not organized sequentially, even within each file.
Most databases use some form of index-sequential layout.
Wang and MultiMate word-processing files used an internal
chain to control their organization. (WordStar, WordPerfect
and some others are sequential.) Microsoft Word and many
others applications stored information within a number of
data streams using a technology called OLE Structured
Storage. (Have since moved to XML-based.)
One generally wants to get databases into a flat or
delimited sequential format prior to preservation.
A good sequential storage format for W.P. files is (was) RTF.
File Structure
14. DATABASES: ASCII, EBCDIC, BCD, PACKED, FLOAT, INTEGER, etc.
WORD PROCESSORS: ASCII as a base. (Except for older IBM). But
that’s just the tip of the iceberg. Not only the codes vary but the way in which
they’re used. One simple example:
An underlined word. It may print like that, but internally it might be:
An underlined word. Extra bit set in u/l’d chars (Wang)
An underlined word. Same code toggles on/off. (WordStar)
An underlined word. Different code starts/ends.
An underlinedW word. Word underscore code (IBM)
An u<_n<_d<_e<_r<_...word. Char/backspace/’score (really old ones)
An underlined word. Attribute pointers from different part of file.
^…………^ (MS Word)
^…………………………...
File Encoding
15. WASHINGTON-- A slice of America's history has become as
unreadable as Egyptian hieroglyphics before the
discovery of the Rosetta stone. Vast untold volumes of
historic, scientific and business data are in danger of
dissolving into a meaningless jumble of letters, numbers
and computer symbols. Much information from the last
30 years is stranded on computer tape from primitive or
discarded systems-unintelligible or soon to be so…
...ASSOCIATED PRESS JANUARY 3, 1991
www.mullermedia.com
”The Nation’s Records Are a Mess”
16. Clinton + Moscow = ?
Today, we’d tend to think
of something like this:
But in 1969, a young fellow had visited Moscow, and
now (1991) he was becoming a contender for a
presidential nomination. Political rivals curious. FOIA
request. Old State Department tape.
Some pre-Whitewater fun.
This was fun, but our real exposure to the world of digital preservation
and the great people involved in it was yet to come. (See next slide.)
Problem: un-documented file format (a sadly
common occurrence). But the D.A.’s office
found some folks who enjoyed hacking away.
17. IAC/IRM honors federal I/T leaders
Fynette Eaton of the National Archives
and Records Administration's Center for
Electronic Records receives a GSA
Technology Excellence Award for
developing the Archival Preservation
System.…
…GOVERNMENT COMPUTER NEWS, JUNE, 1996
Next year, NARA issued an RFP for APS
FDR’s Fireside Chats. What the heck ?
Met great people—including Ken
Thibodeau and Fynette Eaton—and
were thrilled to begin a long, happy
relationship with NARA.
18. During the due-diligence process, the firm
requests data recovery from some folks in
New York. Obviously, the perp was not a big
fan of digital preservation—and he did not
realize that good old Wang 8 inch floppies are
not so easily erased. (Sorry, no details permitted.)
In the 1980’s at a law firm in Little Rock,
Arkansas, someone did legal work for a
real estate project named “Castle Grande”.
By early 1994, all related records had
mysteriously been “vacuumed” from the
firm’s paper files and computer systems.
Some real Whitewater stuff
19. In the early 1970’s, a Whitehouse mainframe used a
proprietary database format to store presidential
appointment calendar and notes. Two problems:
(a) 25-year-old tape was in danger of decay, and
(b) data format never been figured out.
Asked to puzzle it out and make a new database and
program for researchers. Luckily, the analyst had
done some work with Vietnam era military records
and noticed similarities in the data structure.
Major Discoveries! Ah, not exactly. Historians
may have noticed interesting items, but for us, at
least we learned that the first visitor to the Nixon
Whitehouse was the great Hank Aaron.
Some Watergate-ish Fun
20. A famous & beautiful litigator (see
that movie?) decided that oil wells on
school property had been endangering
student health since 1970.
In the school basement for decades
were dozens of Vydec floppy disks with
student records from the late 70’s.
Beverly Hills High School
Anyone old enough to remember
Vydec? A really weird onion. By 2003
ability to read them had almost
vanished. (Not physically compatible with
standard 8” drives—spins in the opposite direction!)
21. Tribal Mineral Rights Tapes
It was alleged that the U.S.
government had not properly
accounted for mining, drilling
and related payments. A great
deal of the information was on
9,000 3490 tape cartridges at
the Denver Federal Center.
The content of all of those tapes fit on
a single inexpensive external disk drive!
Future backups and conversions will be
thousands of times easier.
22. IPUMSIPUMS
A source of real enjoyment: the Minnesota Population Center
and IPUMS.org. They collect/analyze/publish population data
from around the world. It becomes apparent that the need for
data-rescue is even greater in the developing world. Tapes and
disks coming in from such places as:
•Bangladesh
•Egypt
•Kenya
•Mali
•Mexico
•Nepal
•Pakistan
•Peru
•Qatar
•Romania
•Santo Domingo
•Sudan
One example - the Peru Census of 1981; picking
up well-packed tapes at the Consulate -
23. Bangladesh Bureau of Statistics
Data Recovery Center
Very capable people, with other
pressing duties; daily power outages
and other problems making it
impossible to store
legacy tapes in an
optimal way, many
suffering from
decomposition. The initial plan was
to install read-convert system, tape cleaners, train BBS Staff and
recover 1981 Census micro data.
There’s a world-wide need for data rescue
which can lead to many fascinating projects,
if focus and funding are available.
24. He left many diskettes not compatible with
standard operating systems; char encoding
was morphed for unique display hardware.
Dr. Frank Siebert dedicated much of his life to
the preservation of the Penobscot language.
Many interviews with native-speakers, then
created detailed dictionary in a rare format.
American Philosophical Society and
Maine Folk Life Center undertook a project to
ensure the preservation of this cultural treasure.
Penobscot Dictionary
Working with great organizations like APS and MFLC to preserve the legacy of
someone like the brilliant Dr. Siebert can make data-rescue a real pleasure.
25. Ready to retire or move on
to another university—those
old tapes are my legacy!
Let’s have another look.
Old Professors’ Stashes (“OPS”)
Or maybe others realize that (for example) the health data,
re-purposed and combined with population and climate
information may well produce remarkable insights. Re-
discovered stashes like this are cropping up all the time!
OK, so we all know that there are digital data collections
that are at risk. But what else is fading away?
26. Examples of other things in that vapor trail…
1. In some cases, it’s the old professor himself/herself
who created unique data—or a rescue project, and
once retired, special tools and skills may go away too.
3. What we call the “elder geeks” that have
had careers creating many data rescue
software and hardware tools over the years. A
great example is a fellow named Bill King
2. There are still many thousands of older
tape reels on shelves around the world,
but the necessary cleaners and supplies for
them are almost impossible to find.
Documentation is a big deal, whether it’s about equipment,
software, data content or recovery procedures.
27. National Archives and Records Administration and Library
of Congress. (many, e.g. USC and McMoons drives)
Many special projects within government, universities,
libraries and museums.
NPO’s such as IEDRO (International Environmental Data
Rescue Organization) and many others.
CODATA Data-at-Risk Task Group, e.g. - creation of an
inventory of important scientific data at risk of loss.
RDA Data Rescue Interest Group, just getting under way.
And of course, there are computer museums, etc.
But there is not yet a broad
Initiative to Preserve Data Rescue Capabilities
Lots of Great Data Rescue EffortsLots of Great Data Rescue Efforts
28. Many of the tools and skills for rescuing older digital assets
apply across the boundaries of science and culture. The
preservation of those would be of support to researchers,
scientists, historians, librarians, and archivists.
IPDRC would collect details of capabilities and projects
across government, academia and commercial entities,
and stay in touch, ensuring that the tools don’t just fade
away when projects end or certain staffers retire.
When certain capabilities appear to be at risk of loss, it
would work to find homes for them.
There are still many details to be determined as to the
best structure, organization and methods.
IPDRC
29. New group? Hosted by an existing institution?
If new: NPO or “group initiative”? National or global?
What sort of leadership? (Note: to a large extent, it’s a
type of archive.)
What sort of tools for cataloguing capabilities?
How to design a web site to encourage submission of
capability details?
How to find new stewards for CAROLs? (capabilities at risk
of loss). Should IPDRC itself be a steward of last resort?
Many more issues…
Please help us (a) ask the right questions; (b) find good
answers. Hearing your comments on this will be greatly
appreciated.
IPDRC - How should it work?
30. Data RescueData Rescue
andand
Preserving Related CapabilitiesPreserving Related Capabilities
If it’s worth doing, it’s worth doingIf it’s worth doing, it’s worth doing nownow..
(No Luxury of Languishing)(No Luxury of Languishing)
Thank you very much for your time
chris@mullermedia.com