The Internet Archive Copyright Commerce and Culture Fall 2009
Internet Library Nonprofit  organization based  out of San Francisco Founded in 1996 Brewster Kahle “Exercising our right to remember”
Internet Library “ Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world.” -The Internet Archive “ It's a fallacy that if something is on the web, it will stay there…It’s not like a piece of paper you put in a file folder and it will be there forever. There’s an urgent need for people to understand that the web is who we are. It's our culture and our social fabric, and we don't want to lose any of it.” -Kristine Hanna, director of web archiving  services for the Internet Archive
Brewster Kahle Graduated from MIT Designed supercomputers at  Thinking Machines Corp. Founded WAIS ( Wide Area  Information Servers ) first system for publishing  quantities of data in a searchable  form on the Internet Clients included: New York Times,  Wall Street Journal, Encyclopedia Britannica Sold WAIS to America Online for $15 million in 1995 Alexa Internet a for-profit search engine, which tracks and provides data for online traffic Bought by Amazon Kahle used the funds from the sale of WAIS to originally finance the nonprofit Internet Archive
Internet Archive Holdings Text Archive: 1,793,279 items Moving Image Gallery: 231,260 items Audio Archive:443,252 items Live Music Archive: 71,267 concerts, 3,870 bands Software: 33,364 items Wayback Machine: 150 billion webpages
The Wayback Machine Virtual Time Capsule Goes back to 1996 Operates by crawling the internet and systematically and routinely capturing webpages
Google Revisited 1998
Google Revisited 2000
Google Revisited 2004
Google Revisited Then and Now 1998 and 2009
Facebook Revisited 2004
Facebook Revisited 2005
Facebook Revisited 2006
Facebook Revisited Then and Now 2004 and 2009
Expansion Efforts Archive-It Subscription based service Sun Microsystems Storing massive amounts of data Ensuring data will be preserved Multiple copies around the world reCaptchas Professor Luis Von Ahn Completely Automated Public  Turing test to tell Computers  and Humans Apart NASA Images
Battleground Claims of Copyright Infringement in regards to the Wayback Machine Classification as a Library rather than a search engine
Copyright Restrictions “ The archive’s omnivorous hoarding puts it on ‘incredibly treacherous legal ground…the archive is a public service that is doing a public good...we want to have a historical record of what happened on the Web. Unfortunately, it is directly at odds with US copyright law, and copyright law in most countries’ since the archive collects and stores copyrighted images and texts without obtaining explicit permission from their owners.” -John Palfrey,  executive director  Berkman Center for Internet and  Society at Harvard
Healthcare Advocates Case Filed in 2005 Harding Earley Follmer & Frailey used material obtained with the Wayback Machine in a previous case against Healthcare Advocates Healthcare Advocates claimed  Copyright infringement by the Internet Archive Robots.txt opt-out vs. opt-in model DRM circumvention by Harding Earley Case settled out of court in 2006
Relevant Court Cases How might the Healthcare Advocates case have been decided? Kelly v. Arriba, Perfect 10 inc. v. Google Dealt with search engines copying material without permission Ruled Fair Use Google v. Field Dealt with caching of webpages Ruled Fair Use Key difference between the Internet Archive and a search engine: “storing and distributing the contents of other sites is only incidental to search engines’ activities, while it is archives’ primary activity.” –Kinari Patel
Libraries—Physical vs. Digital Copyright Act of 1998 Exempted libraries from copyright infringement Limited the term “library” to institutions operating out of a physical premise The American Memory Project Goal: a universal digital library of all  Library of Congress Holdings Achieved: Digitized 10% of holdings Kahle’s estimated cost for full  digitization: $260 Million The Internet Archive:  3 petabytes of    information, equivalent to 150 times    the information in the Library of        Congress The Internet Archive was officially recognized as a library by the State of California in June 2007
Libraries—Physical vs. Digital NSL (National Security Letter) served to the Internet Archive by the FBI NSL requested information about an archive user, and mandated silence from Kahle and the Internet Archive Patriot Act protects libraries from responding to NSLs Kahle fought back Sued FBI represented by EFF and ACLU Claimed gag order violated constitution Claimed library status Acting in tradition of librarians protecting the rights of their patrons FBI withdrew NSL Only the 3 rd  time ever Kahle recognized with the Robert B. Downs Intellectual Freedom Award from the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign
Kahle v. Ashcroft Kahle and the Internet Archive partnered with Prelinger Archives to challenge the constitutionality of four copyright laws Copyright Act of 1976   Berne Convention Implementation Act of 1988 Copyright Renewal Act of 1992   Sonny Bono Copyright Term Extension Act of 1998   The suit alleged that  creating ever-longer terms for copyright  +  Eliminating registration requirement = Congress fundamentally altering the copyright system The archives hoped to gain access to orphan works, old films, out of print books, etc. Judge dismissed the case without hearing arguments, basing decision around 1993 Eldred v. Ashcroft
Copyright as an obstacle “ As several high-profile disputes involving Google, the Internet Archive, and other digital libraries have illustrated, the potential of digital technology to archive and ensure easy access to all the world’s knowledge is being artificially impeded by overbroad statutory and judicial restraints on the Internet-enabled distribution of once-copyrighted material. The current regime for copyright protection of written and recorded works threatens to greatly impede the building of universal digital libraries.” -Hannibal Travis,  Pepperdine Law Review  October 2008

The Internet Archive

  • 1.
    The Internet ArchiveCopyright Commerce and Culture Fall 2009
  • 2.
    Internet Library Nonprofit organization based out of San Francisco Founded in 1996 Brewster Kahle “Exercising our right to remember”
  • 3.
    Internet Library “Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world.” -The Internet Archive “ It's a fallacy that if something is on the web, it will stay there…It’s not like a piece of paper you put in a file folder and it will be there forever. There’s an urgent need for people to understand that the web is who we are. It's our culture and our social fabric, and we don't want to lose any of it.” -Kristine Hanna, director of web archiving services for the Internet Archive
  • 4.
    Brewster Kahle Graduatedfrom MIT Designed supercomputers at Thinking Machines Corp. Founded WAIS ( Wide Area Information Servers ) first system for publishing quantities of data in a searchable form on the Internet Clients included: New York Times, Wall Street Journal, Encyclopedia Britannica Sold WAIS to America Online for $15 million in 1995 Alexa Internet a for-profit search engine, which tracks and provides data for online traffic Bought by Amazon Kahle used the funds from the sale of WAIS to originally finance the nonprofit Internet Archive
  • 5.
    Internet Archive HoldingsText Archive: 1,793,279 items Moving Image Gallery: 231,260 items Audio Archive:443,252 items Live Music Archive: 71,267 concerts, 3,870 bands Software: 33,364 items Wayback Machine: 150 billion webpages
  • 6.
    The Wayback MachineVirtual Time Capsule Goes back to 1996 Operates by crawling the internet and systematically and routinely capturing webpages
  • 7.
  • 8.
  • 9.
  • 10.
    Google Revisited Thenand Now 1998 and 2009
  • 11.
  • 12.
  • 13.
  • 14.
    Facebook Revisited Thenand Now 2004 and 2009
  • 15.
    Expansion Efforts Archive-ItSubscription based service Sun Microsystems Storing massive amounts of data Ensuring data will be preserved Multiple copies around the world reCaptchas Professor Luis Von Ahn Completely Automated Public Turing test to tell Computers and Humans Apart NASA Images
  • 16.
    Battleground Claims ofCopyright Infringement in regards to the Wayback Machine Classification as a Library rather than a search engine
  • 17.
    Copyright Restrictions “The archive’s omnivorous hoarding puts it on ‘incredibly treacherous legal ground…the archive is a public service that is doing a public good...we want to have a historical record of what happened on the Web. Unfortunately, it is directly at odds with US copyright law, and copyright law in most countries’ since the archive collects and stores copyrighted images and texts without obtaining explicit permission from their owners.” -John Palfrey, executive director Berkman Center for Internet and Society at Harvard
  • 18.
    Healthcare Advocates CaseFiled in 2005 Harding Earley Follmer & Frailey used material obtained with the Wayback Machine in a previous case against Healthcare Advocates Healthcare Advocates claimed Copyright infringement by the Internet Archive Robots.txt opt-out vs. opt-in model DRM circumvention by Harding Earley Case settled out of court in 2006
  • 19.
    Relevant Court CasesHow might the Healthcare Advocates case have been decided? Kelly v. Arriba, Perfect 10 inc. v. Google Dealt with search engines copying material without permission Ruled Fair Use Google v. Field Dealt with caching of webpages Ruled Fair Use Key difference between the Internet Archive and a search engine: “storing and distributing the contents of other sites is only incidental to search engines’ activities, while it is archives’ primary activity.” –Kinari Patel
  • 20.
    Libraries—Physical vs. DigitalCopyright Act of 1998 Exempted libraries from copyright infringement Limited the term “library” to institutions operating out of a physical premise The American Memory Project Goal: a universal digital library of all Library of Congress Holdings Achieved: Digitized 10% of holdings Kahle’s estimated cost for full digitization: $260 Million The Internet Archive: 3 petabytes of information, equivalent to 150 times the information in the Library of Congress The Internet Archive was officially recognized as a library by the State of California in June 2007
  • 21.
    Libraries—Physical vs. DigitalNSL (National Security Letter) served to the Internet Archive by the FBI NSL requested information about an archive user, and mandated silence from Kahle and the Internet Archive Patriot Act protects libraries from responding to NSLs Kahle fought back Sued FBI represented by EFF and ACLU Claimed gag order violated constitution Claimed library status Acting in tradition of librarians protecting the rights of their patrons FBI withdrew NSL Only the 3 rd time ever Kahle recognized with the Robert B. Downs Intellectual Freedom Award from the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign
  • 22.
    Kahle v. AshcroftKahle and the Internet Archive partnered with Prelinger Archives to challenge the constitutionality of four copyright laws Copyright Act of 1976 Berne Convention Implementation Act of 1988 Copyright Renewal Act of 1992 Sonny Bono Copyright Term Extension Act of 1998 The suit alleged that creating ever-longer terms for copyright + Eliminating registration requirement = Congress fundamentally altering the copyright system The archives hoped to gain access to orphan works, old films, out of print books, etc. Judge dismissed the case without hearing arguments, basing decision around 1993 Eldred v. Ashcroft
  • 23.
    Copyright as anobstacle “ As several high-profile disputes involving Google, the Internet Archive, and other digital libraries have illustrated, the potential of digital technology to archive and ensure easy access to all the world’s knowledge is being artificially impeded by overbroad statutory and judicial restraints on the Internet-enabled distribution of once-copyrighted material. The current regime for copyright protection of written and recorded works threatens to greatly impede the building of universal digital libraries.” -Hannibal Travis, Pepperdine Law Review October 2008

Editor's Notes

  • #5 WAIS, a company that helped publishers put information on the Web and make it searchable.
  • #6 Text Archive—which draws on materials from the Library of Congress, Project Gutenberg, and public and university libraries Moving Image—ranging from classic films in the public domain, to television shows, government videos, cartoons, and newsreels. The Audio Archive hosts a range of material including audio books, lectures, radio programs, and podcasts, Live music archive—recordings of over 47,000 concerts from 2,800 different bands
  • #22 Increased popularity with US Patriot Act, 50,000 issued annually
  • #23 Works without high commercial value, should be available free to public