An essay on Google's Digitization project for Dave Parry's Networked Knowledge class at University of Texas at Dallas.

  1. 1. Maxakova 1 Vera Maxakova ATEC 6V81 David Parry December 13, 2008 Google’s Print Digitization Efforts: Benefits and Obstacles In its constant quest to “organize the world's information and make it universally accessible and useful” in indexing vast amounts of digital content, Google plunged into a previously uncharted territory of attempting to digitize and then index books, whose practice, unsurprisingly, raised the dreadful issue of copyright among some other, somewhat surprising concerns. The Google Book Search project digitizes and indexes books that were attained through their Library Project and the Partner Programme, which allows users to see not only website results, but also snippets of text or full books that match their query. This is something that has never been possible before the internet, and with this project Google is enabling millions of people all over the world to search and access books that they may never have been able to discover or much less have access to otherwise. In this project, Google realizes the dream of every library before it that would never have had the means to even come close to achieve the goal of providing free and easy access to the public. But not everyone seems to appreciate the potential or the benefit that this project may bring to the public in relation to current copyright laws, and the lawsuits were quick to follow as Google started adding non- public domain books to its new “library.”
  2. 2. Maxakova 2 One of the concerns was brought up by the president of the Bibliothèque nationale de France, Jean-Noël Jeanneney. In Google and the Myth of Universal Knowledge Jeanneney’s main argument is that Google Books, being an American company, will tend to give preference to English-language books and “the dominance of work from the United States may become even greater than it is today” (Jeanneney 6). In May of 2005 his fears were confirmed as Google released the first version of what was then known as Google Print, in which “the inevitable self-centering of the selections was immediately apparent” (Jeanneney 11). Jeanneney’s concern can be justified. Since Google strives to be the archive of all knowledge, according to the old model of the archive, if a title made it into the archive, it meant that it was an important piece of work that needed to be preserved. Due to the constraints of physical space the archives and libraries had to leave out some works in favor of other, more significant works, and other works were therefore deemed less important and were left out. The possibility that Google would give preference to U.S. works over European, non-English ones could have been interpreted by Jeanneney as Google considering the works somehow unworthy of inclusion in their project, but that would be a gross misinterpretation, considering that the Google Books project is still in very early stages of development. Google only recently reached a settlement with the Association of American Publishers who filed a copyright infringement lawsuit against them back in 2005, and it would be nothing short of suicidal for Google to try to digitize books in foreign countries when they are having so much trouble with just the local copyright laws. In 2006, in June alone there were two foreign lawsuits filed against Google’s new project. La Martiniere, a French publisher, accused
  3. 3. Maxakova 3 Google of “counterfeiting and breach of intellectual property rights” when Google indexed and published excerpts of about 100 of the publisher’s titles (French book publisher sues Google). The second lawsuit was filed by a German publisher WBG, backed by the German Publishers Association, which was dropped at the end of the month as “the [German] court ruled that there was no copyright violation resulting from the development of Google’s project” (Google's victory in court against German publisher). These cases are a clear indication of what is to come if Google attempts to expand into other countries, especially in this early stage of the project’s development and while the copyright situation in the U.S. is so unfavorable to the project. On the contrary, according to Google’s Chairman and CEO Eric Schmidt, Google’s practices are within the confines of the copyright law’s “fair use” doctrine, “balancing the rights of copyright-holders with the public benefits of free expression and innovation [that] allows a wide range of activity … without copyright-holder permission” (Mathes). On these grounds, the University of Michigan permitted Google access to digitize the university library, whose head librarian, Paul Courant, also agrees that Google is not breaking any copyright laws in scanning books and providing free access to them and states that “the University of Michigan (and the other partner libraries) and Google are changing the world for the better” (Anderson) in allowing Google to do so. Unlike the old notion of the archive, books that are digitized by Google are preserved, not only on their servers, but also in many cases the servers of providers of those works. The aforementioned University of Michigan, for example, not only keeps the books scanned by Google, but also gets the digital copies of the scanned works to use for their own purposes. Besides the obvious intention of allowing users to see book
  4. 4. Maxakova 4 texts in their search, another great benefit to this mass digitizing is preservation of these books from damage and loss, which often happens in today’s libraries. This is especially important for rare and out-of-print works. “Checking out” books through Google’s new system would accomplish the same principle of dispersal with no damage to the books or the risk of losing them, as well as having multiple backups of each book. The notion of the book is fully realized on the internet through projects like Wikipedia and now, Google Books, where knowledge is collected in one place can be easily accessed (dispersed) by anyone with an internet connection, and is not threatened by the effects of said dispersal, which is the main idea of the book and the archive (Paper Machine 15). Transitioning into a new, virtual space and out of the constraints of the physical space, allows not only for a vastly larger collection of books, but also for a new and more efficient way of searching and sorting them. Google takes the age-old concept of the card catalog which is no longer limited to the space on a note card, and includes the whole body of the publication into its index. However, some people are still struggling to make the mental switch from the old, physically limited model, among these Anne Bergman-Tahon, the head of the Federation of European Publishers (FEE). Bergman-Tahon believes that “virtual borrowing” will threaten the book and the libraries and bookstores that do not have the physical space to store the volume of books that can be stored on the internet, to remediate which, she plans to “limit the number of copies available to web users. When there are no copies left on the virtual bookshelves, they will have to either reserve a copy and wait, or go to the bookshop and buy an e-book” (Mompel). Ironically, this practice defeats the whole purpose of
  5. 5. Maxakova 5 having a “virtual bookshelf” with digital copies that are not restricted by the confines of the physical world and can be distributed to an unlimited number of potential readers in any part of the world, at any time. The idea of digitizing books is to distribute knowledge to as many people as possible with very little or no barriers to entry, which could not be accomplished previously with the brick-and-mortar bookstores and libraries. It is important to note that the library as we know it today did not always operate this way. In the seventeenth-century Oxford’s Bodleian Library, in their attempt to safekeep the books, refused all requests to check them out and take them home. The policy was so strict that even King Charles I himself was declined this luxury that we now take for granted. “The library was a temple of learning, where scholars might come to read and learn. The books stayed put” (Macintyre). But this is not the case today. Today, anyone can come to the library, take the book home and study it at their leisure. Google Books and similar book digitizing projects are simply taking this concept a step further by bringing the library online where the readers are not constrained by the library’s operation hours or physical location. “Technology has made achievable what the librarians of Alexandria could only dream of: one vast, searchable, all-encompassing book, the complete history of the race” (Macintyre). The seventeenth-century Bodleian Library model evolved, and it may be time for the 20th century library to follow its example as the technology changed and the library can become what it was always meant to be – a repository of all the world’s knowledge at the readers’ fingertips. Furthermore, Bergman-Tahon also fears that paperbacks will disappear and libraries and bookstores will be forced out of business. The argument is as old as the printed word itself. When the printing press gained more popularity, there was a similar
  6. 6. Maxakova 6 concern for the scribes being out of work, and as history showed, they adapted to the new technology. One such case was documented by Elizabeth Eisenstein in The Printing Revolution in Early Modern Europe, in which “the most celebrated Florentine book merchant” in the late fifteenth century, Vespasiano da Bisticci, was forced out of business due to “dealing exclusively in manuscripts,” while his rival Zanobi di Mariano’s business flourished since, unlike Vespasiano, he began selling printed books (Eisenstein 18). The bookstores as we know them today may in fact be forced out of business or face significant difficulty in trying to stay in business using the old model, but inevitably, new bookstores will emerge and will thrive as they embrace new technology. Google’s idea is not only to store the knowledge, but to make it easily accessible and usable as resources that cannot be found by the user may as well not exist. Indexing the whole text of a publication increases its chances of being found when an appropriate search query is entered. Jeanneney argues that a project of this magnitude and significance should not be left up to a private company but needs to be managed by a more stable agency, such as the government, contradicting an earlier statement that government-run libraries and archives are “chronically underfunded” (viii). More financial support from the government would definitely help such projects, but as history had shown, the government fails at this miserably, so why would this change now? Leaving this job up to the government with their poor history of funding such projects would mean that the digital library project would either never have been started or would not be as rich and successful as it will be in Google’s hands. And when a private company
  7. 7. Maxakova 7 with enough means and ambition wants to pursue this endeavor it should only be encouraged onward. As stated earlier, it is important that resources are findable, and who better to provide that “findability” than the search engine with the best search algorithms? “Libraries die when people forget what is in them: they thrive when we are reminded of their riches” (Macintyre). Inability to find a publication threatens the dispersal of knowledge, which renders the resources useless if they cannot be found and thus dispersed to the users. How are we to trust the government with this colossal task of collecting, digitizing and making easily available more books than it was in charge of managing in the old style libraries and archives at which it was obviously failing by neglecting to provide financial support? Even if the government were to accomplish the task of collecting and storing this vast body of work, how would it go about providing for easy access and use by the people? Democratic institutions can be measured by how much access its people have to the archive (Archive Fever 4), and considering the way most government-run websites are built and inexplicable malfunctioning and ineffectiveness of the search function, it is hard to imagine this project reaching its full potential while being under the administration of the government (as it is today). One of Jeanneney’s biggest concerns seems to be based on what criteria would Google choose what books should be included in the Google Books database and that it is up to Google to decide on those criteria (5). He seems to be very uncomfortable with the idea that a private company will have the power to make this important decision, which was so recently left to government-operated and -subsidized libraries and archives. His fears may be well justified in this case (although Google promises to
  8. 8. Maxakova 8 not be evil). As Jacques Derrida stresses in Archive Fever, “[t]here is no political power without control of the archive,” which would mean that the entity that controls the archive – in this case, the largest archive ever assembled – would hold unprecedented power, a monopoly on knowledge (Archive Fever 4). Thus, it is understandable why Jeanneney may be disturbed by the idea of one private company controlling the largest knowledge bank in the world and why he suggests that for this reason a government agency is a better fit for the job. So far the only obstacle preventing Google from indexing every book in the world is the copyright law. Unlike other companies that attempted to digitize books, Google first digitizes the books and then presents the publishers the opportunity to opt out of being “published” in Google’s library. Other services, such MSN, Yahoo! and even Amazon with their new Search Inside!™ feature, first obtain permission from the publisher before posting the titles’ full text or even a limited preview online. According to Google, this practice would slow down the digitization efforts (Eun) and most likely, significantly increase the cost if Google were to contact every author and try to obtain authorization for use of their content. This is what Google calls the Opt-Out Approach and this is the reason Google has been more successful at digitizing a larger amount of books than the competing services offered by MSN, Yahoo! and Amazon. This approach is more efficient as some authors may not even be aware of Google’s efforts to digitize books even if they are willing publish their works online through Google, and if they have not been contacted by Google to obtain permission, their work would not be published and users would not be able to find it.
  9. 9. Maxakova 9 What most people don’t realize in the midst of these lawsuits and criticisms of the project is that publishers will (and some already do) in fact benefit from this new exposure of their works on the web. The Google Book Search information site’s “Thought & Opinions” section provides some quotes from publishers and authors who understand the marketing potential and the benefits of the project and the benefits they receive from it and praise Google for undertaking such a substantial project. One such documented case is C.S. Lewis’s Mere Christianity, which in 16 months had acquired 351 page views, and only 14 clicks on the publisher HarperCollins’ site, meanwhile the same book on Google Book Search had over fifteen thousand views and almost three hundred click-throughs (“LBF Daily”). Google’s project helped this publisher raise awareness of their backlist books that may not have been discovered or bought otherwise. In light of these facts, it is ironic that companies like the AAG (Association of American Publishers) would seek reimbursement for damages for copyright infringement from Google, when they could have been benefiting from their services all this time. Another interesting law suit filed by a few large publishing companies which included Simon & Schuster, the Penguin Group, and McGraw-Hill, attempted to require Google to “destroy all unauthorized copies made by Google through the Google Library Project – [now Google Books Search] – of any copyrighted works” (Toobin). Although, as ridiculous as this request may be, it brings up a very interesting concept – how does one destroy a literary work on the web? In the physical world this could be accomplished with book burnings when books were still rare and there was a chance of exterminating works out of existence. This concept today seems ludicrous, and even
  10. 10. Maxakova 10 more so in the near future when full works will be available on the web and possibly even downloaded, where permitted by the publisher. But what is most interesting in this case goes back to Jeanneney’s fear of Google’s monopolization of digitized books. If Google Book Search ever becomes the main and sole source of digital works and is somehow forced to destroy a book and complies with the request, would that be the equivalent of a modern book burning? Copyright law was created to protect the creative work of authors and publishers, but never was it meant to limit the public’s access to said work. Google is trying to exploit the latter and provide virtually limitless access to works previously unsearchable and (in some cases), thus, undiscoverable on the web. If a book cannot be found by a potential reader, it undermines the whole idea and the reason for its existence. Google is not only trying to create an archive of all published works, but, most importantly, they are trying to make it easily accessible and searchable within context and relevant to a particular search query entered by the user to enable him or her to discover new books and articles which he or she may not have been able to access otherwise.
