Raj Kumar and I (from the Internet Archive), and Allison Vanderslice (from SF Heritage YP) gave a talk as part of the SF Architectural Heritage lecture series.
From the blurb:
"Come hear from the Internet Archive’s George Oates about how digital archiving works, see highlights from their San Francisco history collections, and learn about how these resources will influence the future of preservation. Perhaps even Heritage’s own collection could be digitized in the future…the possibilities are endless!"
http://www.sfheritage.org/upcoming_events/lecture-series/
1. hello.
Some rights reserved by mattdork
Monday, September 19, 2011
Hi, I’m Raj Kumar, and this is George Oates. We work at the Internet Archive, and we’re here today to
talk to you about digital archiving, what the Internet Archive is, and how it might help you in your
work. There’ll be a little time at the end for Q&A.
The Internet Archive,
- a 501(c)(3) non-profit,
- building a digital library
- Like a paper library, we provide free access to researchers, historians, scholars, and the general public.
- “universal access to all knowledge”
2. Why digitize?
Monday, September 19, 2011
Why digitize?
- Because it’s a inexpensive way to preserve something forever.
- 10 cents a page, including digitization costs, OCR, and lifetime storage costs
3. Why digitize?
Monday, September 19, 2011
Why digitize?
- It becomes easy to increase public access to archival material.
- Don't have to travel to a library
- Accessible audio versions of books.
- Full text search across almost 3 million texts, and the web archive
4. Some rights reserved by heather
Monday, September 19, 2011
- not a traditional library
- all of our materials are available online on archive.org
5. By rkumar
Monday, September 19, 2011
- 2.88 petabytes of hard drives
- enough storage for about 2 billion books.
- we have 10.5 petabytes online
- paired storage
6. Monday, September 19, 2011
archive.org
All our materials are accessible on archive.org
- 500,000 movies and videos
- 1,000,000 audio recordings
- 3 million scanned texts
- 150,000,000,000 web pages
7. Monday, September 19, 2011
- Known as the “Wayback Machine”
- 165 Billion URLs
- Started collecting web pages in 1996
- We now crawl the web for LoC and many national libraries (UK, france, spain, chile,
Australia) , for 43 US states, and about 200 other partners.
9. TV, Movies, Audio
Monday, September 19, 2011
- 500,000 moving images
- full length movies, tv shows, home movies, advertisements
- anyone can upload their movie for free
- San Francisco-specific collections:
- Prelinger archive
- Trip down Market St
- Lost Landscapes
- SFGTV and SFGTV2 (board of supervisiors, planning commission meetings, etc)
- UCSF Tabacco archives, BAVC, Ourmedia
11. Monday, September 19, 2011
archive.org/911
Archive.org/details/911
Understanding 9/11 – Television news archive
Present one week of TV news for study, research, and analysis
- “Television is our pre-eminent medium of information, entertainment and persuasion, but
until now it has not been a medium of record. This Archive attempts to address this gap by
making TV news coverage of this critical week in September 2001 available to those studying
these events and their treatment in the media.”
- 3000 hours of TV news footage from 20 channels around the world
12. Monday, September 19, 2011
http://www.archive.org/search.php?query=san%20francisco%20AND%20mediatype%3Aetree
- 1,000,000 audio recordings
- Anyone can upload for free
- almost 100,000 live concert recordings
- popularized by the Grateful Dead
- growing by 50/day
- Librivox – 5000 audio books
- Old Time Radio
13. Book Scanning
Monday, September 19, 2011
http://www.archive.org/stream/sanfranciscobloc1906octbloc#page/n7/mode/2up
- Almost 3 million text items
- Mostly public-domain books before 1923 with audio (tts) versions
- 300,000 modern audio books for those with NLS print-disabled credentials
14. Monday, September 19, 2011
1,000 books scanned EVERY day
24 scanning centers in 5 countries, and we hope for more.
high‐resoluCon archival‐quality color scans
15. Monday, September 19, 2011
Zoom in with online bookreader
Searchable PDFs with OCR, Original uncropped camera images available
16. Monday, September 19, 2011
We’re also scanning microfilm, which is much faster than individual books. Here’s an example of the record of the populaCon census from
1790 to 1930. Scanned from microfilm from the collecCons of the Allen County Public Library and originally from the United States
NaConal Archives Record AdministraCon.
18. Monday, September 19, 2011
Physical archive
- Don't want books to be thrown away after they are digitized
- We want libraries that are de-accessioning their materials to send them to us before they
send them to a landfill
- The physical is the authentic and original version
- Goal is 10 Million books
19. Monday, September 19, 2011
Books, boxes, pallets, shipping containers...
Over to you, George!
20. Monday, September 19, 2011
openlibrary.org
http://openlibrary.org/
Hi - I’m George Oates and I run the Open Library project at the Internet Archive. I’d like to
talk to you a bit about what can happen once you’ve digitized things. As well as work from
the Internet Archive, I’d also like to show you some examples of other digital preservation
projects around the web that explore digital preservation...
21. A “Wikipedia for Books”
Monday, September 19, 2011
There’s a twist though... this library catalog is editable, by anyone, like a Wikipedia for books.
26. California, San Francisco (Calif.), United States,
San Francisco Bay Area, Chinatown (San Francisco,
Calif.), New York, Hunters Point (San Francisco,
Calif.), San Francisco Bay Area (Calif.), South of
Market (San Francisco, Calif.), Mission District (San
Francisco, Calif.), Western Addition (San Francisco,
Calif.), Hetch Hetchy Valley (Calif.), Presidio of
San Francisco (Calif.), Diamond Heights (San
Francisco, Calif.), Golden Gate Park (San Francisco,
Calif.), New York (State), North Beach (San
Francisco, Calif.), Los Angeles, Northern California,
Bayview (San Francisco, Calif.)
Monday, September 19, 2011
http://openlibrary.org/subjects/place:san_francisco
31. Monday, September 19, 2011
The Zamorano Club is a group of bibliophiles and collectors based in LA. A jewel in their
collection is the “Zamorano 80” - the books they feel best represent California history.
Named after Agustin Zamorano, most noted for bringing the first printing press to California.
This year, I’ve been working with Mary Elings at the Bancroft library to try to digitize the
entire set of these 80 titles. We’re nearly there! And, I’ve collected them into an Open Library
list for easy reference and access.
Interesting to note here how related subjects are aggregated from the consitutent titles. The
system does that work for us.
http://openlibrary.org/people/george08/lists/OL6387L/Zamorano_80_Editions
32. Monday, September 19, 2011
The annals of San Francisco by Frank Soulé, John H. Gihon, James Nisbet first
published in 1855
http://www.archive.org/stream/annalsofsanfranc00soul#page/n27/mode/2up
33. Monday, September 19, 2011
Colonel John Geary, last alcalde & first mayor of San Francisco
http://www.archive.org/stream/annalsofsanfranc00soul#page/n745/mode/1up
1849 - unanimously elected to the post of First Alcalde - Big Cheese.
Colonel Geary immediately set about the organization of the
city, and the establishment of an efficient police force. The task
was herculean. Pandemonium had to be quieted - chaos reduced to
order. Here was a large maritime city, with a population of about
twenty thousand persons, and embracing a strange medley of dangerous
and desperate characters - without a solitary officer, or a single
law to govern or control them. All these rebellious elements had
to be subdued, and good citizens made of daring bravados. This task
fell upon the alcalde, who had to perform the duties of every one
of the customary officers of a city and county jurisdiction.
On that happy note, I’d like to take a quick tour of some other useful digital
preservation projects out there on the internet...
36. Monday, September 19, 2011
Photograph of the Effect of Earthquake on Houses Built on Loose or Made Ground
After the 1906 San Francisco Earthquake, 1906 By The U.S. National Archives
http://www.flickr.com/photos/usnationalarchives/5553722800/in/photostream/
37. Monday, September 19, 2011
By Museum of Photographic Arts Collections in San Diego- circa 1880
http://www.flickr.com/photos/mopa1/5711511770/in/photostream/
38. Monday, September 19, 2011
The City from California Street By Museum of Photographic Arts Collections - circa 1880
http://www.flickr.com/photos/mopa1/5710949415/sizes/l/in/photostream/
43. “You can pry my
burrito out of my
cold, dead hand.”
Monday, September 19, 2011
Jon began studying the old Southern Pacific train station at Valencia and 25th
44. Monday, September 19, 2011
Jon began studying the old Southern Pacific train station at Valencia and 25th
http://burritojustice.com/2011/06/27/1905-sf-sanborn-maps-now-in-color/
45. BernalDweller permalink
June 27, 2011 10:19 pm
Lots of street renamings in
SW Bernal. Jarboe was
Jefferson, Tompkins was
Union, Ogden was Old
Hickory. I’ve spent some
time researching street
name origins in Bernal…
must delve further. Great
resource!
Monday, September 19, 2011
The thread is full of interested people throwing in all sorts of information.
46. Monday, September 19, 2011
Some rights reserved by Paul Hagon
Mike Migurski put out a call... to help “geo-rectify” the pages of the Sanborn atlas; to conect
them with contemporary map tiles, and stamp them with a latitude and longitude.
I jumped in to help with the interaction design, how to make it easy to align an old map with
a new one.
51. Monday, September 19, 2011
maptcha.org
It was amazing. Within about 2 days of Mike announcing the Sanborn release, about 400
pople added all 700 pages to the contemporary map. (There’s still a bit of confirmation
happening, but overall - amazingly fast!)
52. Monday, September 19, 2011
maptcha.org
If you click on any of the little thumbnails, you’ll get to a bigger version and be able to see
maps & pages nearby.
53. Monday, September 19, 2011
oldsf.org
OLD SF is a project built by Dan Vanderkam and raven keller. Dan went through the
SFPLs phptography collection and “geo-coded” photos wherever he could. That means
adding latitude/longitude data. That allowed him to add their photos to a map, like
you see here.
http://www.oldsf.org/about
54. Monday, September 19, 2011
looking back to that similar view we saw before from the Museum of Photographic
Arts Collections
Corner California and Mason looking down Mason to Bay
1906 April 27
OldSF.org
http://www.oldsf.org/#ll:37.791835|-122.410818,e:AAC-3157|672,m:
37.79001|-122.41202|16
55. Monday, September 19, 2011
View of downtown San Francisco from Stockton and California streets
ca. 1920
http://www.oldsf.org/#ll:37.792244|-122.407558,e:AAB-3087|526,m:
37.79001|-122.41202|16
56. Monday, September 19, 2011
menus.nypl.org
http://menus.nypl.org/
With approximately 40,000 menus dating from the 1840s to the present, The New York Public Library’s restaurant menu collection is one
of the largest in the world, used by historians, chefs, novelists and everyday food enthusiasts. Trouble is, the menus are very difficult to
search for the greatest treasures they contain: specific information about dishes, prices, the organization of meals, and all the stories these
things tell us about the history of food and culture.
As of Monday September 12, 2011, there have been 542,029 dishes transcribed from 9,557 menus (that’s how many they’ve digitized to
date).
64. digitization
description
distribution
translation
Re-presentation
Monday, September 19, 2011
To conclude... digital preservation is not just about turning paper into pictures. There’s a lot
more opportunity than that.
It’s important to consider how digital materials are described and distributed.
- No Known Restrictions / digital proliferation
Enthusiasts out there can supplement your metadata, sometimes to a voracious degree! They
can also help with the heavy lifting of transcription. In the digital world, you want *more*
descriptions of things than less. The more ways people can find your content in the network,
the better. You can see amazing examples of this sort of description working incredibly well
on sites that allow tagging and other metadata creation by the public.
Transforming “old data” into new, like attaching a lat/lon to a photo, will allow that digital
artifact to be re-presented and re-mixed with other things, and will provide additional
context.
And now, I’ll hand over to Allison, from SF Heritage YP, to talk through a case study on using
materials from IA and OL...
65. Using the Internet Archive
A Case Study: The San Francisco Waterfront
Monday, September 19, 2011
74. Log in / Sign Up
SUBJECTS
AUTHORS ADD A BOOK
LISTS RECENTLY ABOUT US One web page for every book. Only show eBooks
Search More search options
Search Results Search inside
over 2 million books
13 hits Relevance | Most Editions | First Published | Most Recent
Ferry Building San Francisco Search Only show ebooks
Ferry Building complex by San Francisco (Calif.). Dept. of City Planning. Zoom In
Focus your results using these filters
1 edition - first published in 1983
EBOOK?
yes 0
Union depot and ferry house, San Francisco by San Francisco Port
no 13
Commission.
1 edition - first published in 1978 AUTHOR
Mary K. Grassick 3
The Ferry Building by Nancy Olmsted
San Francisco Port Commission. 2
1 edition - first published in 1998
Tro Harper 1
San Francisco (Calif.). Dept. of City
Planning. 1
Ferry Building marketplace by William Wilson & Associates.
United States. National
1 edition - first published in 1998
Transportation Safety Board. 1
more
Ferry Building State Park by Joint Committee of the Northern California
SUBJECTS
Chapter of the American Institute of Architects and the California Association of
Buildings, structures 7
Landscape Architects. Ferry Station Post Office Building (San
1 edition - first published in 1955 Francisco, Calif.) 6
History 3
Request for qualifications by San Francisco Port Commission. Waterfronts 3
1 edition - first published in 1978 Historic sites 2
more
City Walks: San Francisco by Christina Henry de Tessan PLACES
1 edition - first published in 2004 California 8
San Francisco 7
Remembered Treasures of San Francisco by Tro Harper San Francisco (Calif.) 5
1 edition - first published in 2003 Golden Gate National Recreation Area
(Calif.) 2
United States 2
Aircraft accident report by United States. National Transportation Safety more
Board.
TIMES
99 editions - first published in 1975
1983 1
20th century 1
Fort Point by Mary K. Grassick
3 editions - first published in 1994 FIRST PUBLISHED
1978 2
1998 2
1905 1
Monday, September 19, 2011