The Internet Archive


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • WAIS, a company that helped publishers put information on the Web and make it searchable.
  • Text Archive—which draws on materials from the Library of Congress, Project Gutenberg, and public and university libraries Moving Image—ranging from classic films in the public domain, to television shows, government videos, cartoons, and newsreels. The Audio Archive hosts a range of material including audio books, lectures, radio programs, and podcasts, Live music archive—recordings of over 47,000 concerts from 2,800 different bands
  • Increased popularity with US Patriot Act, 50,000 issued annually
  • Works without high commercial value, should be available free to public
  • The Internet Archive

    1. 1. The Internet Archive Copyright Commerce and Culture Fall 2009
    2. 2. Internet Library <ul><li>Nonprofit </li></ul><ul><li>organization based </li></ul><ul><li>out of San Francisco </li></ul><ul><li>Founded in 1996 </li></ul><ul><li>Brewster Kahle </li></ul><ul><li>“Exercising our right to remember” </li></ul>
    3. 3. Internet Library <ul><li>“ Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world.” </li></ul><ul><li>-The Internet Archive </li></ul>“ It's a fallacy that if something is on the web, it will stay there…It’s not like a piece of paper you put in a file folder and it will be there forever. There’s an urgent need for people to understand that the web is who we are. It's our culture and our social fabric, and we don't want to lose any of it.” -Kristine Hanna, director of web archiving services for the Internet Archive
    4. 4. Brewster Kahle <ul><li>Graduated from MIT </li></ul><ul><li>Designed supercomputers at </li></ul><ul><li>Thinking Machines Corp. </li></ul><ul><li>Founded WAIS ( Wide Area </li></ul><ul><li>Information Servers ) </li></ul><ul><ul><li>first system for publishing </li></ul></ul><ul><ul><li>quantities of data in a searchable </li></ul></ul><ul><ul><li>form on the Internet </li></ul></ul><ul><ul><li>Clients included: New York Times, </li></ul></ul><ul><ul><li>Wall Street Journal, Encyclopedia Britannica </li></ul></ul><ul><li>Sold WAIS to America Online for $15 million in 1995 </li></ul><ul><li>Alexa Internet </li></ul><ul><ul><li>a for-profit search engine, which tracks and provides data for online traffic </li></ul></ul><ul><ul><li>Bought by Amazon </li></ul></ul><ul><li>Kahle used the funds from the sale of WAIS to originally finance the nonprofit Internet Archive </li></ul>
    5. 5. Internet Archive Holdings <ul><li>Text Archive: 1,793,279 items </li></ul><ul><li>Moving Image Gallery: 231,260 items </li></ul><ul><li>Audio Archive:443,252 items </li></ul><ul><ul><li>Live Music Archive: 71,267 concerts, 3,870 bands </li></ul></ul><ul><li>Software: 33,364 items </li></ul><ul><li>Wayback Machine: 150 billion webpages </li></ul>
    6. 6. The Wayback Machine <ul><li>Virtual Time Capsule </li></ul><ul><li>Goes back to 1996 </li></ul><ul><li>Operates by crawling the internet and systematically and routinely capturing webpages </li></ul>
    7. 7. Google Revisited 1998
    8. 8. Google Revisited 2000
    9. 9. Google Revisited 2004
    10. 10. Google Revisited Then and Now 1998 and 2009
    11. 11. Facebook Revisited 2004
    12. 12. Facebook Revisited 2005
    13. 13. Facebook Revisited 2006
    14. 14. Facebook Revisited Then and Now 2004 and 2009
    15. 15. Expansion Efforts <ul><li>Archive-It </li></ul><ul><ul><li>Subscription based service </li></ul></ul><ul><li>Sun Microsystems </li></ul><ul><ul><li>Storing massive amounts of data </li></ul></ul><ul><ul><li>Ensuring data will be preserved </li></ul></ul><ul><ul><li>Multiple copies around the world </li></ul></ul><ul><li>reCaptchas </li></ul><ul><ul><li>Professor Luis Von Ahn </li></ul></ul><ul><ul><li>Completely Automated Public </li></ul></ul><ul><ul><li>Turing test to tell Computers </li></ul></ul><ul><ul><li>and Humans Apart </li></ul></ul><ul><li>NASA Images </li></ul>
    16. 16. Battleground <ul><li>Claims of Copyright Infringement in regards to the Wayback Machine </li></ul><ul><li>Classification as a Library rather than a search engine </li></ul>
    17. 17. Copyright Restrictions <ul><li>“ The archive’s omnivorous hoarding puts it on ‘incredibly treacherous legal ground…the archive is a public service that is doing a public good...we want to have a historical record of what happened on the Web. Unfortunately, it is directly at odds with US copyright law, and copyright law in most countries’ since the archive collects and stores copyrighted images and texts without obtaining explicit permission from their owners.” </li></ul><ul><li>-John Palfrey, executive director Berkman Center for Internet and Society at Harvard </li></ul>
    18. 18. Healthcare Advocates <ul><li>Case Filed in 2005 </li></ul><ul><li>Harding Earley Follmer & Frailey used material obtained with the Wayback Machine in a previous case against Healthcare Advocates </li></ul><ul><li>Healthcare Advocates claimed </li></ul><ul><ul><li>Copyright infringement by the Internet Archive </li></ul></ul><ul><ul><ul><li>Robots.txt opt-out vs. opt-in model </li></ul></ul></ul><ul><ul><li>DRM circumvention by Harding Earley </li></ul></ul><ul><li>Case settled out of court in 2006 </li></ul>
    19. 19. Relevant Court Cases <ul><li>How might the Healthcare Advocates case have been decided? </li></ul><ul><li>Kelly v. Arriba, Perfect 10 inc. v. Google </li></ul><ul><ul><li>Dealt with search engines copying material without permission </li></ul></ul><ul><ul><li>Ruled Fair Use </li></ul></ul><ul><li>Google v. Field </li></ul><ul><ul><li>Dealt with caching of webpages </li></ul></ul><ul><ul><li>Ruled Fair Use </li></ul></ul><ul><li>Key difference between the Internet Archive and a search engine: “storing and distributing the contents of other sites is only incidental to search engines’ activities, while it is archives’ primary activity.” –Kinari Patel </li></ul>
    20. 20. Libraries—Physical vs. Digital <ul><li>Copyright Act of 1998 </li></ul><ul><ul><li>Exempted libraries from copyright infringement </li></ul></ul><ul><ul><li>Limited the term “library” to institutions operating out of a physical premise </li></ul></ul><ul><li>The American Memory Project </li></ul><ul><ul><li>Goal: a universal digital library of all </li></ul></ul><ul><ul><li>Library of Congress Holdings </li></ul></ul><ul><ul><li>Achieved: Digitized 10% of holdings </li></ul></ul><ul><ul><li>Kahle’s estimated cost for full </li></ul></ul><ul><ul><li>digitization: $260 Million </li></ul></ul><ul><li>The Internet Archive: 3 petabytes of information, equivalent to 150 times the information in the Library of Congress </li></ul><ul><ul><ul><ul><ul><li>The Internet Archive was officially recognized as a library by the State of California in June 2007 </li></ul></ul></ul></ul></ul>
    21. 21. Libraries—Physical vs. Digital <ul><li>NSL (National Security Letter) served to the Internet Archive by the FBI </li></ul><ul><li>NSL requested information about an archive user, and mandated silence from Kahle and the Internet Archive </li></ul><ul><li>Patriot Act protects libraries from responding to NSLs </li></ul><ul><li>Kahle fought back </li></ul><ul><ul><li>Sued FBI represented by EFF and ACLU </li></ul></ul><ul><ul><li>Claimed gag order violated constitution </li></ul></ul><ul><ul><li>Claimed library status </li></ul></ul><ul><ul><ul><li>Acting in tradition of librarians protecting the rights of their patrons </li></ul></ul></ul><ul><li>FBI withdrew NSL </li></ul><ul><ul><li>Only the 3 rd time ever </li></ul></ul><ul><li>Kahle recognized with the Robert B. Downs Intellectual Freedom Award from the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign </li></ul>
    22. 22. Kahle v. Ashcroft <ul><li>Kahle and the Internet Archive partnered with Prelinger Archives to challenge the constitutionality of four copyright laws </li></ul><ul><ul><li>Copyright Act of 1976 </li></ul></ul><ul><ul><li>Berne Convention Implementation Act of 1988 </li></ul></ul><ul><ul><li>Copyright Renewal Act of 1992 </li></ul></ul><ul><ul><li>Sonny Bono Copyright Term Extension Act of 1998 </li></ul></ul><ul><li>The suit alleged that </li></ul><ul><li>creating ever-longer terms for copyright </li></ul><ul><ul><li>+ </li></ul></ul><ul><ul><li>Eliminating registration requirement </li></ul></ul><ul><ul><li>= </li></ul></ul><ul><ul><li>Congress fundamentally altering the copyright system </li></ul></ul><ul><li>The archives hoped to gain access to orphan works, old films, out of print books, etc. </li></ul><ul><li>Judge dismissed the case without hearing arguments, basing decision around 1993 Eldred v. Ashcroft </li></ul>
    23. 23. Copyright as an obstacle <ul><li>“ As several high-profile disputes involving Google, the Internet Archive, and other digital libraries have illustrated, the potential of digital technology to archive and ensure easy access to all the world’s knowledge is being artificially impeded by overbroad statutory and judicial restraints on the Internet-enabled distribution of once-copyrighted material. The current regime for copyright protection of written and recorded works threatens to greatly impede the building of universal digital libraries.” </li></ul><ul><li>-Hannibal Travis, Pepperdine Law Review October 2008 </li></ul>