SlideShare a Scribd company logo
1 of 19
Download to read offline
The Challenges of Preserving Every
Digital Format on the Face of the Planet




                                 Leslie Johnston
                                    March 26, 2012
Well, not every format

But we often have little or
no control over what comes
into the Library of
Congress Digital
Collections, and we
manage and preserve a
wide variety of formats.
What are examples
of some of the
collecting and
preservation
challenges?
NATIONAL DIGITAL
NEWSPAPER PROGRAM
chroniclingamerica.loc.gov/
A partnership between the National Endowment for
the Humanities and the Library of Congress:
  Enhance access to America newspapers
  Sustainable digital collection
  Scalable, phased, cost-effective management

The program has:
  Multiple producers (25 now, ultimately 54)
  Digitization standards (http://loc.gov/ndnp/)
  Free and open public access
  APIs for machine access and automated processes

Files
  TIFFs, JPEGs, JPEG 2000s, and XML.
  Over 4 million newspaper pages ingested to date
  Over 250 Tb of data
WEB ARCHIVING
    http://www.loc.gov/webarchiving/
    lcweb2.loc.gov/diglib/lcwa/html/lcwa-
    home.html
The Library has been archiving the web since
2000. Subject area specialists curate the
collections, and Library catalogers create
collection-level metadata records.
The collections include:
         U.S. elections
         Web sites created by members of the
          House and Senate
         Thematic collections around events, such
          as elections in the Philippines, the Iraq
          war, and the appointment of Supreme
          Court Justices.
         Collections around an area of study,
          such as Legal “Blawgs”
The file formats include every format possible
on the web. The collection comprises
approximately 5 billion files in 300 TB.
NATIONAL DIGITAL
INFORMATION
INFRASTRUCTURE
& PRESERVATION
PROGRAM
digitalpreservation.gov
CONTENT TYPES




Images and Text   Audio Visual   Geospatial   Web Sites
PACKARD CAMPUS
NATIONAL AUDIO-VISUAL
CENTER
Preserving Film, Broadcast Television, and
Audio
The Packard Campus is a variety of preservation
workflows, including those for obsolete physical
formats such as wire recordings, wax cylinders,
and 2“ videotape. The Campus is fully equipped to
play back and preserve all antique film, video and
sound formats, and to maintain that capability far
into the future.

The facility also handles born-digital video and
audio received directly from producers.

The formats include MPEG-4, MP3, BWF, AVI,
and a wide variety of specialized commercial
formats.
eDEPOSIT FOR eSERIALS
 eDeposit for eSerials is a collaborative effort
  between the U.S. Copyright Office and the
  Library of Congress.

 Copyright Mandatory Deposit represents the
  largest acquisitions channel for the Library. In
  general, all U.S. publishers are legally required to
  submit for deposit two copies of each of their
  publications to the Copyright Office. This
  mechanism has allowed the Library to build the
  collection and to preserve the publications.

 eSerials became subject to mandatory deposit in
  January 2010, with the publication of a new
  interim regulation. Demands began in June 2010
  and files began to arrive in October 2010.

 The files must come to the Library “as published”
  – in whatever their original formats are. This
  means a wide variety of XML content and
  metadata, HTML, and PDFs.
WORLD DIGITAL LIBRARY
www.wdl.org
Deliver historically significant primary
  materials from cultures around the world to
  an international multilingual audience

   Over 100 participating partner institutions, and
    contributions from over 40 institutions so far.

   Representing all 193 UNESCO member
    countries.

   Maps, prints, photographs, rare books,
    manuscripts, journals, sound recordings, and
    motion pictures.

   Metadata in Arabic, Chinese, French, English,
    Portuguese, Russian, and Spanish.

   JPEG 2000s, PDFs, XML.
THE TWITTER ARCHIVE
Every public tweet since Twitter’s launch in March
  2006.
We have a historic 2006-2010 archive and ongoing
  access to new tweets.
We do not receive personal account information,
  linked images, or linked web page content.
Tweets will not move into the archive until six
  months after their initial posting.
The Library’s researcher services will not recreate
  twitter, and cannot be openly accessible.
We are testing various technologies, and entering a
  pilot phase with test researchers. We will
  announce it when the archive is open to all
  researchers.
The collection comprises only a few TB, but over 80
  billion tweets.

An FAQ is available online at:
  http://blogs.loc.gov/loc/2010/04/the-library-and-
  twitter-an-faq/
So how are we
making this easier
for the Library to
manage?
Preservation Infrastructure

•The Library developed the BagIt
transfer specification for the movement
of files between and within
organizations.
   • http://www.digitalpreservation.gov/documents/
     bagitspec.pdf

•The Library inventories all incoming
files, and is inventorying all digital
content.

• We maintain multiple copies of files
on servers and on tape, in
geographically distributed locations.
Preservation Partnerships

The Library cannot collect everything on
its own, so works as part of:

The National Digital Stewardship Alliance
http://www.digitalpreservation.gov/ndsa/

The International Internet Preservation
Consortium http://netpreserve.org/about/index.php
among others…
What are the Library’s
strategies for formats?
• The Library has documented
sustainability factors for file formats.
   • http://www.digitalpreservation.gov/format
     s/

• For cases where we do have control
over what comes in, we have a “Best
Edition” Preferred Formats statement,
which is currently being updated.
   • http://www.copyright.gov/circs/circ07b.pdf

• The Library is developing Format
Preservation Action Plans.
DISCUSSION?




                                Leslie Johnston
                   Chief of Repository Development
 Manager of Technical Architecture Initiatives, NDIIPP
                                       lesliej@loc.gov

More Related Content

What's hot

What's hot (20)

'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'
 
UCD Digital Library: Creating online access to historical and contemporary co...
UCD Digital Library: Creating online access to historical and contemporary co...UCD Digital Library: Creating online access to historical and contemporary co...
UCD Digital Library: Creating online access to historical and contemporary co...
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCTuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
 
NLW Linked Open Data Sets
NLW Linked Open Data SetsNLW Linked Open Data Sets
NLW Linked Open Data Sets
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
 
Monday 4 May: From linear to non-linear broadcast contents: considering an “...
Monday 4 May: From linear to non-linear broadcast contents:  considering an “...Monday 4 May: From linear to non-linear broadcast contents:  considering an “...
Monday 4 May: From linear to non-linear broadcast contents: considering an “...
 
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
 
C06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archivesC06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archives
 
C06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archivesC06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archives
 
Ifla's international advocacy and why it matters for you
Ifla's international advocacy and why it matters for youIfla's international advocacy and why it matters for you
Ifla's international advocacy and why it matters for you
 
Digitising Hansard
Digitising HansardDigitising Hansard
Digitising Hansard
 
Proquest service
Proquest serviceProquest service
Proquest service
 
Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...
 
IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017
 
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to MetadataAileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
 
Clare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES StudentsClare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES Students
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
 
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
 

Similar to Leslie Johnston: Challenges of Preserving Every Digital Format, 2012

Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
lljohnston
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
Archiving The Worlds E-Journals:The Keepers Registry As Global Monitor
Archiving The Worlds E-Journals:The Keepers Registry As Global MonitorArchiving The Worlds E-Journals:The Keepers Registry As Global Monitor
Archiving The Worlds E-Journals:The Keepers Registry As Global Monitor
EDINA, University of Edinburgh
 

Similar to Leslie Johnston: Challenges of Preserving Every Digital Format, 2012 (20)

An Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of CongressAn Introduction to digital preservation at the Library of Congress
An Introduction to digital preservation at the Library of Congress
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the Pond
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
 
Digitallibrary
DigitallibraryDigitallibrary
Digitallibrary
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Engage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your HistoryEngage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your History
 
Archiving The Worlds E-Journals:The Keepers Registry As Global Monitor
Archiving The Worlds E-Journals:The Keepers Registry As Global MonitorArchiving The Worlds E-Journals:The Keepers Registry As Global Monitor
Archiving The Worlds E-Journals:The Keepers Registry As Global Monitor
 
Tales from the Keepers Registry
Tales from the Keepers RegistryTales from the Keepers Registry
Tales from the Keepers Registry
 
Save This Book
Save This BookSave This Book
Save This Book
 
Digital initiatives in archival preservation
Digital initiatives in archival preservationDigital initiatives in archival preservation
Digital initiatives in archival preservation
 
Piloting an E-journals Preservation Registry Service: overview of PEPRS
Piloting an E-journals Preservation Registry Service: overview of PEPRSPiloting an E-journals Preservation Registry Service: overview of PEPRS
Piloting an E-journals Preservation Registry Service: overview of PEPRS
 
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingKeeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
 
Web 2.0, library 2.0, librarian 2.0, innovative services for sustainable car...
Web 2.0, library 2.0, librarian 2.0,  innovative services for sustainable car...Web 2.0, library 2.0, librarian 2.0,  innovative services for sustainable car...
Web 2.0, library 2.0, librarian 2.0, innovative services for sustainable car...
 
Slideshare1 phpapp01
Slideshare1  phpapp01Slideshare1  phpapp01
Slideshare1 phpapp01
 
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
 
Internet Archive and Open Library
Internet Archive and Open LibraryInternet Archive and Open Library
Internet Archive and Open Library
 

More from lljohnston (7)

Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...Technology and Service Trends in Libraries: The Library of Congress and the B...
Technology and Service Trends in Libraries: The Library of Congress and the B...
 
Strategies for Establishing Partnerships for Digital Preservation
Strategies for Establishing Partnerships for Digital PreservationStrategies for Establishing Partnerships for Digital Preservation
Strategies for Establishing Partnerships for Digital Preservation
 
Personal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of CongressPersonal Digital Archiving Initiatives at the Library of Congress
Personal Digital Archiving Initiatives at the Library of Congress
 
Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011Leslie Johnston on Citizen Archiving, iPres 2011
Leslie Johnston on Citizen Archiving, iPres 2011
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Leslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 KeynoteLeslie Johnston code4lib 2013 Keynote
Leslie Johnston code4lib 2013 Keynote
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 

Leslie Johnston: Challenges of Preserving Every Digital Format, 2012

  • 1. The Challenges of Preserving Every Digital Format on the Face of the Planet Leslie Johnston March 26, 2012
  • 2. Well, not every format But we often have little or no control over what comes into the Library of Congress Digital Collections, and we manage and preserve a wide variety of formats.
  • 3. What are examples of some of the collecting and preservation challenges?
  • 4. NATIONAL DIGITAL NEWSPAPER PROGRAM chroniclingamerica.loc.gov/ A partnership between the National Endowment for the Humanities and the Library of Congress:  Enhance access to America newspapers  Sustainable digital collection  Scalable, phased, cost-effective management The program has:  Multiple producers (25 now, ultimately 54)  Digitization standards (http://loc.gov/ndnp/)  Free and open public access  APIs for machine access and automated processes Files  TIFFs, JPEGs, JPEG 2000s, and XML.  Over 4 million newspaper pages ingested to date  Over 250 Tb of data
  • 5.
  • 6. WEB ARCHIVING http://www.loc.gov/webarchiving/ lcweb2.loc.gov/diglib/lcwa/html/lcwa- home.html The Library has been archiving the web since 2000. Subject area specialists curate the collections, and Library catalogers create collection-level metadata records. The collections include:  U.S. elections  Web sites created by members of the House and Senate  Thematic collections around events, such as elections in the Philippines, the Iraq war, and the appointment of Supreme Court Justices.  Collections around an area of study, such as Legal “Blawgs” The file formats include every format possible on the web. The collection comprises approximately 5 billion files in 300 TB.
  • 7.
  • 9. CONTENT TYPES Images and Text Audio Visual Geospatial Web Sites
  • 10. PACKARD CAMPUS NATIONAL AUDIO-VISUAL CENTER Preserving Film, Broadcast Television, and Audio The Packard Campus is a variety of preservation workflows, including those for obsolete physical formats such as wire recordings, wax cylinders, and 2“ videotape. The Campus is fully equipped to play back and preserve all antique film, video and sound formats, and to maintain that capability far into the future. The facility also handles born-digital video and audio received directly from producers. The formats include MPEG-4, MP3, BWF, AVI, and a wide variety of specialized commercial formats.
  • 11. eDEPOSIT FOR eSERIALS  eDeposit for eSerials is a collaborative effort between the U.S. Copyright Office and the Library of Congress.  Copyright Mandatory Deposit represents the largest acquisitions channel for the Library. In general, all U.S. publishers are legally required to submit for deposit two copies of each of their publications to the Copyright Office. This mechanism has allowed the Library to build the collection and to preserve the publications.  eSerials became subject to mandatory deposit in January 2010, with the publication of a new interim regulation. Demands began in June 2010 and files began to arrive in October 2010.  The files must come to the Library “as published” – in whatever their original formats are. This means a wide variety of XML content and metadata, HTML, and PDFs.
  • 12. WORLD DIGITAL LIBRARY www.wdl.org Deliver historically significant primary materials from cultures around the world to an international multilingual audience  Over 100 participating partner institutions, and contributions from over 40 institutions so far.  Representing all 193 UNESCO member countries.  Maps, prints, photographs, rare books, manuscripts, journals, sound recordings, and motion pictures.  Metadata in Arabic, Chinese, French, English, Portuguese, Russian, and Spanish.  JPEG 2000s, PDFs, XML.
  • 13.
  • 14. THE TWITTER ARCHIVE Every public tweet since Twitter’s launch in March 2006. We have a historic 2006-2010 archive and ongoing access to new tweets. We do not receive personal account information, linked images, or linked web page content. Tweets will not move into the archive until six months after their initial posting. The Library’s researcher services will not recreate twitter, and cannot be openly accessible. We are testing various technologies, and entering a pilot phase with test researchers. We will announce it when the archive is open to all researchers. The collection comprises only a few TB, but over 80 billion tweets. An FAQ is available online at: http://blogs.loc.gov/loc/2010/04/the-library-and- twitter-an-faq/
  • 15. So how are we making this easier for the Library to manage?
  • 16. Preservation Infrastructure •The Library developed the BagIt transfer specification for the movement of files between and within organizations. • http://www.digitalpreservation.gov/documents/ bagitspec.pdf •The Library inventories all incoming files, and is inventorying all digital content. • We maintain multiple copies of files on servers and on tape, in geographically distributed locations.
  • 17. Preservation Partnerships The Library cannot collect everything on its own, so works as part of: The National Digital Stewardship Alliance http://www.digitalpreservation.gov/ndsa/ The International Internet Preservation Consortium http://netpreserve.org/about/index.php among others…
  • 18. What are the Library’s strategies for formats? • The Library has documented sustainability factors for file formats. • http://www.digitalpreservation.gov/format s/ • For cases where we do have control over what comes in, we have a “Best Edition” Preferred Formats statement, which is currently being updated. • http://www.copyright.gov/circs/circ07b.pdf • The Library is developing Format Preservation Action Plans.
  • 19. DISCUSSION? Leslie Johnston Chief of Repository Development Manager of Technical Architecture Initiatives, NDIIPP lesliej@loc.gov