The Future of Archiving


Published on

Raj Kumar and I (from the Internet Archive), and Allison Vanderslice (from SF Heritage YP) gave a talk as part of the SF Architectural Heritage lecture series.

From the blurb:
"Come hear from the Internet Archive’s George Oates about how digital archiving works, see highlights from their San Francisco history collections, and learn about how these resources will influence the future of preservation. Perhaps even Heritage’s own collection could be digitized in the future…the possibilities are endless!"

Published in: Travel, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Future of Archiving

  1. 1. hello. Some rights reserved by mattdorkMonday, September 19, 2011Hi, I’m Raj Kumar, and this is George Oates. We work at the Internet Archive, and we’re here today totalk to you about digital archiving, what the Internet Archive is, and how it might help you in yourwork. There’ll be a little time at the end for Q&A.The Internet Archive,- a 501(c)(3) non-profit,- building a digital library- Like a paper library, we provide free access to researchers, historians, scholars, and the general public.- “universal access to all knowledge”
  2. 2. Why digitize?Monday, September 19, 2011Why digitize?- Because it’s a inexpensive way to preserve something forever.- 10 cents a page, including digitization costs, OCR, and lifetime storage costs
  3. 3. Why digitize?Monday, September 19, 2011Why digitize?- It becomes easy to increase public access to archival material.- Dont have to travel to a library- Accessible audio versions of books.- Full text search across almost 3 million texts, and the web archive
  4. 4. Some rights reserved by heatherMonday, September 19, 2011- not a traditional library- all of our materials are available online on
  5. 5. By rkumarMonday, September 19, 2011- 2.88 petabytes of hard drives- enough storage for about 2 billion books.- we have 10.5 petabytes online- paired storage
  6. 6. Monday, September 19, 2011 archive.orgAll our materials are accessible on 500,000 movies and videos- 1,000,000 audio recordings- 3 million scanned texts- 150,000,000,000 web pages
  7. 7. Monday, September 19, 2011- Known as the “Wayback Machine”- 165 Billion URLs- Started collecting web pages in 1996- We now crawl the web for LoC and many national libraries (UK, france, spain, chile,Australia)  , for 43 US states, and about 200 other partners.
  8. 8. Monday, September 19, 2011August 17, 2000
  9. 9. TV, Movies, AudioMonday, September 19, 2011- 500,000 moving images- full length movies, tv shows, home movies, advertisements- anyone can upload their movie for free- San Francisco-specific collections:  - Prelinger archive    - Trip down Market St    - Lost Landscapes  - SFGTV and SFGTV2 (board of supervisiors, planning commission meetings, etc)  - UCSF Tabacco archives, BAVC, Ourmedia
  10. 10., September 19, 2011 shows are available online an hour after they air.
  11. 11. Monday, September 19, 2011 9/11 – Television news archivePresent one week of TV news for study, research, and analysis- “Television is our pre-eminent medium of information, entertainment and persuasion, butuntil now it has not been a medium of record. This Archive attempts to address this gap bymaking TV news coverage of this critical week in September 2001 available to those studyingthese events and their treatment in the media.”- 3000 hours of TV news footage from 20 channels around the world
  12. 12. Monday, September 19, 2011 1,000,000 audio recordings- Anyone can upload for free- almost 100,000 live concert recordings  - popularized by the Grateful Dead  - growing by 50/day- Librivox – 5000 audio books- Old Time Radio
  13. 13. Book ScanningMonday, September 19, 2011 Almost 3 million text items- Mostly public-domain books before 1923 with audio (tts) versions- 300,000 modern audio books for those with NLS print-disabled credentials
  14. 14. Monday, September 19, 20111,000 books scanned EVERY day24 scanning centers in 5 countries, and we hope for more.high‐resoluCon archival‐quality color scans
  15. 15. Monday, September 19, 2011Zoom in with online bookreaderSearchable PDFs with OCR, Original uncropped camera images available
  16. 16. Monday, September 19, 2011We’re also scanning microfilm, which is much faster than individual books. Here’s an example of the record of the populaCon census from 1790 to 1930. Scanned from microfilm from the collecCons of the Allen County Public Library and originally from the United States NaConal Archives Record AdministraCon.
  17. 17. Monday, September 19, 2011Examples of Cross Writing from Boston Public Library
  18. 18. Monday, September 19, 2011Physical archive- Dont want books to be thrown away after they are digitized- We want libraries that are de-accessioning their materials to send them to us before theysend them to a landfill- The physical is the authentic and original version- Goal is 10 Million books
  19. 19. Monday, September 19, 2011Books, boxes, pallets, shipping containers...Over to you, George!
  20. 20. Monday, September 19, 2011 openlibrary.org - I’m George Oates and I run the Open Library project at the Internet Archive. I’d like totalk to you a bit about what can happen once you’ve digitized things. As well as work fromthe Internet Archive, I’d also like to show you some examples of other digital preservationprojects around the web that explore digital preservation...
  21. 21. A “Wikipedia for Books”Monday, September 19, 2011There’s a twist though... this library catalog is editable, by anyone, like a Wikipedia for books.
  22. 22. Monday, September 19, 2011
  23. 23. Monday, September 19, 2011
  24. 24. Monday, September 19, 2011
  25. 25. Monday, September 19, 2011
  26. 26. California, San Francisco (Calif.), United States, San Francisco Bay Area, Chinatown (San Francisco, Calif.), New York, Hunters Point (San Francisco, Calif.), San Francisco Bay Area (Calif.), South of Market (San Francisco, Calif.), Mission District (San Francisco, Calif.), Western Addition (San Francisco, Calif.), Hetch Hetchy Valley (Calif.), Presidio of San Francisco (Calif.), Diamond Heights (San Francisco, Calif.), Golden Gate Park (San Francisco, Calif.), New York (State), North Beach (San Francisco, Calif.), Los Angeles, Northern California, Bayview (San Francisco, Calif.)Monday, September 19, 2011
  27. 27. Monday, September 19, 2011
  28. 28. Monday, September 19, 2011
  29. 29. Monday, September 19, 2011
  30. 30. Monday, September 19, 2011De Young
  31. 31. Monday, September 19, 2011The Zamorano Club is a group of bibliophiles and collectors based in LA. A jewel in theircollection is the “Zamorano 80” - the books they feel best represent California history.Named after Agustin Zamorano, most noted for bringing the first printing press to California.This year, I’ve been working with Mary Elings at the Bancroft library to try to digitize theentire set of these 80 titles. We’re nearly there! And, I’ve collected them into an Open Librarylist for easy reference and access.Interesting to note here how related subjects are aggregated from the consitutent titles. Thesystem does that work for us.
  32. 32. Monday, September 19, 2011The annals of San Francisco by Frank Soulé, John H. Gihon, James Nisbet firstpublished in 1855
  33. 33. Monday, September 19, 2011Colonel John Geary, last alcalde & first mayor of San Francisco - unanimously elected to the post of First Alcalde - Big Cheese.Colonel Geary immediately set about the organization of thecity, and the establishment of an efficient police force. The taskwas herculean. Pandemonium had to be quieted - chaos reduced toorder. Here was a large maritime city, with a population of abouttwenty thousand persons, and embracing a strange medley of dangerousand desperate characters - without a solitary officer, or a singlelaw to govern or control them. All these rebellious elements hadto be subdued, and good citizens made of daring bravados. This taskfell upon the alcalde, who had to perform the duties of every oneof the customary officers of a city and county jurisdiction.On that happy note, I’d like to take a quick tour of some other useful digitalpreservation projects out there on the internet...
  34. 34., September 19, 2011
  35. 35. Monday, September 19, 2011
  36. 36. Monday, September 19, 2011Photograph of the Effect of Earthquake on Houses Built on Loose or Made GroundAfter the 1906 San Francisco Earthquake, 1906 By The U.S. National Archives
  37. 37. Monday, September 19, 2011 By Museum of Photographic Arts Collections in San Diego- circa 1880
  38. 38. Monday, September 19, 2011The City from California Street By Museum of Photographic Arts Collections - circa 1880
  39. 39. burritojustice.comMonday, September 19, 2011 guy, loves The Mission.
  40. 40. Monday, September 19, 2011
  41. 41. Monday, September 19, 2011
  42. 42. Monday, September 19, 2011
  43. 43. “You can pry my burrito out of my cold, dead hand.”Monday, September 19, 2011Jon began studying the old Southern Pacific train station at Valencia and 25th
  44. 44. Monday, September 19, 2011Jon began studying the old Southern Pacific train station at Valencia and 25th
  45. 45. BernalDweller permalink June 27, 2011 10:19 pm Lots of street renamings in SW Bernal. Jarboe was Jefferson, Tompkins was Union, Ogden was Old Hickory. I’ve spent some time researching street name origins in Bernal… must delve further. Great resource!Monday, September 19, 2011The thread is full of interested people throwing in all sorts of information.
  46. 46. Monday, September 19, 2011Some rights reserved by Paul HagonMike Migurski put out a call... to help “geo-rectify” the pages of the Sanborn atlas; to conectthem with contemporary map tiles, and stamp them with a latitude and longitude.I jumped in to help with the interaction design, how to make it easy to align an old map witha new one.
  47. 47. Monday, September 19, 2011
  48. 48. Monday, September 19, 2011
  49. 49. Monday, September 19, 2011
  50. 50. Monday, September 19, 2011
  51. 51. Monday, September 19, 2011 maptcha.orgIt was amazing. Within about 2 days of Mike announcing the Sanborn release, about 400pople added all 700 pages to the contemporary map. (There’s still a bit of confirmationhappening, but overall - amazingly fast!)
  52. 52. Monday, September 19, 2011 maptcha.orgIf you click on any of the little thumbnails, you’ll get to a bigger version and be able to seemaps & pages nearby.
  53. 53. Monday, September 19, 2011 oldsf.orgOLD SF is a project built by Dan Vanderkam and raven keller. Dan went through theSFPLs phptography collection and “geo-coded” photos wherever he could. That meansadding latitude/longitude data. That allowed him to add their photos to a map, likeyou see here.
  54. 54. Monday, September 19, 2011looking back to that similar view we saw before from the Museum of PhotographicArts CollectionsCorner California and Mason looking down Mason to Bay1906 April 27OldSF.org|-122.410818,e:AAC-3157|672,m:37.79001|-122.41202|16
  55. 55. Monday, September 19, 2011View of downtown San Francisco from Stockton and California streetsca. 1920|-122.407558,e:AAB-3087|526,m:37.79001|-122.41202|16
  56. 56. Monday, September 19, 2011 menus.nypl.org approximately 40,000 menus dating from the 1840s to the present, The New York Public Library’s restaurant menu collection is oneof the largest in the world, used by historians, chefs, novelists and everyday food enthusiasts. Trouble is, the menus are very difficult tosearch for the greatest treasures they contain: specific information about dishes, prices, the organization of meals, and all the stories thesethings tell us about the history of food and culture.As of Monday September 12, 2011, there have been 542,029 dishes transcribed from 9,557 menus (that’s how many they’ve digitized todate).
  57. 57. Monday, September 19, 2011
  58. 58. Monday, September 19, 2011
  59. 59. Monday, September 19, 2011
  60. 60. Monday, September 19, 2011 menus.nypl.orgCorned beef on 3,142 menus, so far.
  61. 61. Monday, September 19, 2011 zooniverse.orgzooniverse.orghome to the internets largest, most popular and most successful citizen scienceprojects
  62. 62. Monday, September 19, 2011 oldweather.org
  63. 63. Monday, September 19, 2011
  64. 64. digitization description distribution translation Re-presentationMonday, September 19, 2011To conclude... digital preservation is not just about turning paper into pictures. There’s a lotmore opportunity than that.It’s important to consider how digital materials are described and distributed.- No Known Restrictions / digital proliferationEnthusiasts out there can supplement your metadata, sometimes to a voracious degree! Theycan also help with the heavy lifting of transcription. In the digital world, you want *more*descriptions of things than less. The more ways people can find your content in the network,the better. You can see amazing examples of this sort of description working incredibly wellon sites that allow tagging and other metadata creation by the public.Transforming “old data” into new, like attaching a lat/lon to a photo, will allow that digitalartifact to be re-presented and re-mixed with other things, and will provide additionalcontext.And now, I’ll hand over to Allison, from SF Heritage YP, to talk through a case study on usingmaterials from IA and OL...
  65. 65. Using the Internet Archive A Case Study: The San Francisco WaterfrontMonday, September 19, 2011
  66. 66. Monday, September 19, 2011
  67. 67. Prelinger Collection: 1934 StrikeMonday, September 19, 2011
  68. 68. Prelinger Collection: 1934 StrikeMonday, September 19, 2011
  69. 69. Prelinger Collection: San Francisco Scenes, 1920sMonday, September 19, 2011
  70. 70. Monday, September 19, 2011
  71. 71. Harbor Rules, Regulations and ratesMonday, September 19, 2011
  72. 72. San Francisco City DirectoriesMonday, September 19, 2011
  73. 73. The California Architect and Building NewsMonday, September 19, 2011
  74. 74. Log in / Sign Up SUBJECTS AUTHORS ADD A BOOK LISTS RECENTLY ABOUT US One web page for every book. Only show eBooks Search More search options Search Results Search inside over 2 million books 13 hits Relevance | Most Editions | First Published | Most Recent Ferry Building San Francisco Search Only show ebooks Ferry Building complex by San Francisco (Calif.). Dept. of City Planning. Zoom In Focus your results using these filters 1 edition - first published in 1983 EBOOK? yes 0 Union depot and ferry house, San Francisco by San Francisco Port no 13 Commission. 1 edition - first published in 1978 AUTHOR Mary K. Grassick 3 The Ferry Building by Nancy Olmsted San Francisco Port Commission. 2 1 edition - first published in 1998 Tro Harper 1 San Francisco (Calif.). Dept. of City Planning. 1 Ferry Building marketplace by William Wilson & Associates. United States. National 1 edition - first published in 1998 Transportation Safety Board. 1 more Ferry Building State Park by Joint Committee of the Northern California SUBJECTS Chapter of the American Institute of Architects and the California Association of Buildings, structures 7 Landscape Architects. Ferry Station Post Office Building (San 1 edition - first published in 1955 Francisco, Calif.) 6 History 3 Request for qualifications by San Francisco Port Commission. Waterfronts 3 1 edition - first published in 1978 Historic sites 2 more City Walks: San Francisco by Christina Henry de Tessan PLACES 1 edition - first published in 2004 California 8 San Francisco 7 Remembered Treasures of San Francisco by Tro Harper San Francisco (Calif.) 5 1 edition - first published in 2003 Golden Gate National Recreation Area (Calif.) 2 United States 2 Aircraft accident report by United States. National Transportation Safety more Board. TIMES 99 editions - first published in 1975 1983 1 20th century 1 Fort Point by Mary K. Grassick 3 editions - first published in 1994 FIRST PUBLISHED 1978 2 1998 2 1905 1Monday, September 19, 2011
  75. 75. Open Library: History of the San Francisco DistrictMonday, September 19, 2011
  76. 76. Monday, September 19, 2011
  77. 77. Monday, September 19, 2011
  78. 78. Monday, September 19, 2011
  79. 79. Q&A? glo@archive.orgMonday, September 19, 2011