(As you may have heard, this past week has been quite tumultuous for the Google Book Settlement. Last Friday, following more than 400 filings in response to the Settlement, the US DoJ recommended that it not be accepted in its current form, and this Tuesday, the plaintiffs filed a motion to delay final hearing on the settlement while it is reworked,. However, the DoJ did recognize the clear potential value of a a properly constructed settlement, and recommends that the plaintiffs and Google keep discussions going towards this end.) Given the pace of change these past few days we’ve decided that handouts for this talk might be impractical – I hope that’s okay? What Tony and I would like to do now is lay down the ground work for what is sure to be a lively panel discussion. I’ll start by setting out the general landscape of the Google Book Settlement and its components, flagging the places of particular concern as we go by. Then Tony will walk through the contentious issues in more depth for us. Our presentation is based on paper that we prepared, together with 6 other colleagues, for the directors of the OCUL libraries this spring. It’s been updated a couple of times since. The agreement is large and complex, so simplifying some details and leaving out others.
Lets start by going back to 2004. Many people were scanning out-of-copyright materials Much discussion of how to move into in-copyright materials People were thinking about orphan works – those books that are in copyright, but whose rights holders cannot be located. Into this arena stepped Google, announcing Google Book Search Google funded partnerships with library partners for older books, and publishers for new books. Differed from other similar projects in two key ways: Google was not proposing to make the scanned material freely available to be replicated on any server Google was going to scan in copyright materials. In all 10 million books have been scanned to date, with an eventual goal of 30 million books. Estimates vary, but one breakdown is: 2 million public domain (out of copyright) 7.5 million in copyright, but out of print and 0.5 million in copyright and in print
The products of the project, for the scanned material, were fairly typical: TIFF master images, JPEGs, OCR, and PDF derivatives for discovery and web delivery. In addition to the scanned materials obtained from libraries, publisher partners provided new titles, in xml-based digital text format.
At the outset, Google announced that the full text of in-copyright books would be indexed for search and discovery only, and argued that this fell within fair use. However, the American Publishers Association and the American Authors Guild did not agree, and so they each mounted class action lawsuits in the United States, against Google. The two suits were subsequently merged In October of last year, a settlement of the class action lawsuit was announced, subject to approval by the US Court. A final fairness hearing was originally scheduled for June of this year, but it was moved back to October, owing to the amount of reaction to the suit, and the clear need for further time to assess the issues it raises. Possible outcomes of a final fairness hearing are the acceptance, rejection, or court oversight of the agreement. Restructuring the agreement is not thought to be a possible action. Reaction continued apace over the summer, with more than 400 filings received by the court, Opinion varied as to whether the settlement should stand. Much discussion has arisen around areas that the settlement is silent, as well as those areas where it is explicit. Last Friday (Sept 18.) the US Department of Justice advised the US District Court (NY Southern District) that it should not accept the class action settlement as it stands, but that discussions should continue, as an improved settlement would offer important societal benefits. Specifically, the DOJ recommended the parties entertain modifications to the open-ended future licensing; conflicts among class members; additional protections for unknown rights holders; make comparable access for competitors possible; Judge Denny Chin stated yesterday in his order: “ The current settlement agreement raises significant issues, as demonstrated not only by the number of objections, but also by the fact that the objectors include countries, states, nonprofit organizations, and prominent authors and law professors.” But Judge Chin goes on to say “The settlement would offer many benefits to society, as recognized by supporters of the settlement as well as DoJ.,It would appear that if a fair and reasonable settlement can be struck, the public would benefit.” So what is the structure of this settlement that it has raised such concerns?
The settlement afforded Google the rights to provide and sell online access to books within the US. Several points are worth noting: It only covers books – not journals, newspapers, etc., nor the illustrations within books, as the rights holders of these materials were not represented. It covers all books in copyright that are covered by the Berne convention – anything, therefore, published in the 164 countries that signed the Berne convention. The class action lawsuit only pertains within the United States. No access beyond the United States was settled, though it was speculated that similar agreements might be made in other countries So Google was to pay $125 mill up front as follow: $34.5 million to establish an independent Book Registry to manage rights & revenues. $45 million to pay the rights holders of books scanned before May 2009 $60 per title $45.5 million for legal fees (To put that number in perspective, the class action lawsuit launched by Heather Robertson on behalf of freelance newspaper writers, against CTVglobemedia Inc., Thomson Reuters Canada and The Gale Group , settled for $11 million this May.) The settlement gave Google the right to sell online access to the books and to develop other related products , with the revenue stream being split, 63% to copyright holders and 37% to Google. The settlement only covered books that existed prior to January 2009. It is also a non-exclusive settlement. (Rights holders retain the rights to make other agreements with other providers). However, it does create a massive one-stop shop for online books. It might be thought, then, to be a compelling channel for publishers to market future works, and an intimidating channel for other ventures to compete against.
Let’s take a closer look at how books are displayed through Google Book Search, and what, by default is included and excluded in that display. The settlement lays out display uses and non-display uses of books, and then defines what uses are allowed for each pool of books: out-of-copyright, ou t-of-print, and in-print. Display uses are defined as the view or annotate an entire book; print or copy-and-paste chunks of a book; preview 20% of a book prior to purchase, a nd view short snippets or metadata about a book. The Non display uses include indexing for search and discovery but not display, and research inquiries across the entire corpus , and Google R&D. By default, in-print books are excluded from display uses, unless the rights holder opts in. (It’s worth noting in passing here that the agreement decreases access to in-print books, as snippet views were formerly available across Google Books.) Out-of-print books are included in display uses, unless the rights holder opts out. “ In print” was originally defined as commercially available in the US; however, Google announced in early September, in response to concerns raised by the EU, that it would view books available in Europe as “in-print” also. Revenue will be generated from the display uses, and so, by and large, the out-of-print book pool is the revenue stream. Two sales models are proposed: Individuals may purchase perpetual access to individual titles. The titles are hosted on the Google server, and may not be downloaded to portable book readers. For institutions, Google makes available an Institutional subscription database, which must comprise at least 85% of the out of print book pool. Institutions may subscribe to the whole ISD, or subject-based subsets of it on an annual basis. Google also has the right to display a dvertising on book pages, with permission of rights holders .
The Book Rights Registry defined in the settlement plays a pivotal role, and was a magnet for many concerns expressed. This is a non-profit independent agency, which plays the pivotal role of managing the database of book rights holders: What is the status of individual titles Who holds the rights to individual titles How is the ISD to be priced? There is no provision in the settlement for the BRR information to be made publicly available. The Board is to have representation (4 members each) from the author and publisher subclasses of the suit. Much discussion has arisen around the composition of the Book Registry Board, the interests that it represents, and the availability and transparency of the data it manages.
Let’s turn now to the proposed licensing models Google Book Search: Google is the primary host. All individual titles subscriptions, and the ISDs, will be accessed on the Google server. Some libraries – fully participating partners -- have limited rights to host a portion of the corpus, if they meet stringent security requirements, subject to audit by the BRR. There are other important b enefits for google’s library partners : Partners have a mechanism for challeng ing institutional pricing model, and may receive information about the ISD pricing strategy Some receive subscription discounts (UMich free for 25 years) Information about inclusion / exclusion of books: GB must disclose to partners information such as whether books are commercially available, and whether books are included in ISD. Only the identify of public domain books and the identity of books excluded from display uses for editorial reasons may be publicly disclosed. Libraries that do not contribute books for scannin g m ay subscribe to ISD on an annual basis, on the google server.
The agreement also provides for a Research Corpus Under the settlement, Google retains the rights to withhold up to 15% of titles from the Research Corpus for unspecified reasons. Concerns around academic freedom and privacy mounted up around this Corpus.
So there are the bones of the settlement, presented briefly. Before I finish, I want to pause here for a moment, on the subject of the out-of-print pool. As I mentioned earlier a major outcome of the s ettlement it that it enables Google to sell orphan books online. This is because the AAG and APA included all book copyright holders in their class, including the ones that couldn’t be found.The settlement is non-exclusive: rights holders do have the right to license their books elsewhere. But for the orphan books this is moot: no rights holders have come forward to exercise this right. So for this pool of books, through this settlement, and in the absence of any further legislation, Google is the only online seller. So how many orphans might there be ? Google argues that because they have created a commercial value for out-of-print books, and through the Book Registry, a mechanism for rights holders to claim their rights, the number of true orphans will diminish to perhaps 10% of the out-of-print pool, or 1 million of the 10 million currently scanned. However, if we look at the $45 million to be paid to the rights holders of books scanned prior to May 2009, that amount suggests an expected claim pool of 750,000 titles. Although the settlement stipulates that more money may be paid out in rights, this suggests a considerably larger orphan pool of 2.75 million out of the 7 million scanned to that point. Prior to the settlement, there was a great hope (in the scanning community at least) that the solution to the orphan book log jam was legislation to define how these books could be digitized and made available online. It has been argued that because the settlement creates a commercial value for these orphan works that did not exist before, it may have a negative impact on any future orphan works legislation that might provide that more general models for access. Another point of note is that under the settlement, revenue would be generated for all titles that are out of print. That re venue is split between Google and the active rights holders, even though orphans may comprise a large percentage of the out-of-print pool.
So the Google Book Settlement has generated strong reactions, both positive and negative, and some serious questions for us to ponder. First, it has made an enormous corpus of books available to a wide audience. Therefore, there is every reason to think that it could be highly useful to our users. However, the settlement in its current form also raises some serious concerns. Does the fact that only Google has the right to provide access to these books create a de facto monopoly? And what sort of chilling effect might this new product have on other similar products from other vendors? The proposed service involves user identification, and dictates that the book products remain on a very few servers. Does this raise issues of user privacy? As defined the Research Corpus use is carefully gate-kept, and the contents of the Rights Registry are not publicly available. Do we have concerns around transparency and intellectual freedom? And this massive corpus, which might have a chilling effect on other similar enterprises, is corporately managed, not freely reproducible, and thus far, only available in the US. What then are our concerns around equity of access and long-term security of data?
Is under US law, as Sian has indicated. Many commentators see an erosion of existing rights, eg interlibrary loans, and usage for public domain books Fair use is much broader than fair dealing, eg incl. teaching, education, parody. Can be seen as more restrictive than fair use, eg under the agreement, free snippets to in-print titles through Google has decreased….previously snippets could be viewed; going forward, only bib info and front-matter may be viewed. Settlement seen as a model for the future, in terms of setting various standards, such as no of pages that can be viewed/printed; the ability to archive and index text under specific conditions….
The BRR will be key to determining what rights are available, ie what can be used and how it can be used. Google has major advantage that it can use to promote the sales/marketing of books; transforming the book business Would be very difficult for anyone to compete with Google….they would have to seek agreement with authors; would not receive terms/conditions any better than Google’s. ‘ Most favoured nation status’ Not in the parties’ interests to favour OA or CC licensing – would undermine commercial model.
Reasons why opting out is problematic. BRR & Google will have enormous benefit from non-display uses to analyze the market and set prices accordingly. Huge impact of discoverability – and impact on ILL – but no ILL permitted on digital books. (only originals)
Various applications now available… Can we leverage SFX, eg create portfolios of subject area collections within Book Search and create targets?
Pricing is supposed to be comparable to existing pricing…but there is nothing comparable. Fully Participating Libraries will have their costs subsidized based on the scale of works that have been digitized…U Michigan gets free access for 25 years.
Conflict with privacy…how can we address if we don’t have representation on the board of the BRR?
XML –based content architecture will be important for incorporating content, metadata and rights, in a consistent and scalable manner; requires new processes and skill sets to
Google book settlement olita sept 2009
The Google Book Settlement:Where are we now, and how did we get here?Sian Meikle & Tony HoravaUniversity of TorontoSept. 25, 2009
Outline Overview of settlement Access to Google Books Copyright issues Marketplace impacts Integration/curation of content Competition issues Privacy matters Academic freedom Future business models
Google Book Search Started in 2004 42 Library Partners, many publishing partners Google-funded In-copyright and out-of-copyright material Google and selected library partner servers only 10 million books to date: 2 million public domain (20%) 7.5 million in copyright, out of print (75%) 0.5 million in copyright, in print (5%) Eventual aim: 30 million books
Google digital products Metadata Scanned (back files, library partner scans): Scanned images: TIFFs Access derivatives: JPEGs Image-based PDFs (one per page or one per book) Uncorrected OCR Born-digital (front file, new content) Digital text, xml format
Proposed Google Book Settlement 2005 US class-action lawsuit against Google American Publishers Association American Authors Guild October 28 2008 proposed settlement announced Oct 7 2009 (moved from June 11 09) Originally (final) Court Fairness Hearing Possible outcomes: accept, reject, court oversight Out of scope: change agreement Sept 18: US Department of Justice advises Court not to accept settlement but to encourage further discussion Sept 24: Court accepts motion to delay final hearing; will hold status conference on Oct 7 instead
Google Book Settlement Outline covers online access in US for books: published before January 5 2009 covered by Berne copyright convention Google pays $125 million $34.5 million to establish Book Registry $45 million to rights holders for books scanned prior to May 2009 ($60 per title) $45.5 million for legal fees Split of future revenues: 63% to copyright holders 37% to Google
Google Book Settlement: Products Display uses (saleable products): Access, preview, snippets, book records Non display uses (free and research products): Display of metadata only; full-text and geographic indexing without display of text; analytical research across corpus; and Google R&D Inclusion: In print books: opt-in to display uses Out of print books: opt-out of display uses In print = commercially available in USA and Europe Products: Individuals: sale of perpetual access per title via Google server Institutions: sale of annual access to ISD
Book Registry Non-profit independent agency representing plaintiff interests to: Manage rights database: book status, contact info Negotiate terms and prices of online book uses on behalf of rights holders Distribute share of revenue to rights holders
Institutional Subscription Licensing models Libraries that contribute books for scanning: Fully participating libraries Give in-copyright books; get digital copies, must meet security requirements Cooperating libraries Give in-copyright books; get no digital copies Libraries that do not contribute books for scanning: May subscribe to ISD (either whole or discipline based) Pricing model set by Google and Book Registry Benefits of partnering: Ability to challenge institutional pricing model Some subscription discounts Information about inclusion / exclusion of books
Google Research Corpus All Google books except in-copyright works whose rights holders have removed their works Hosted at Google, up to two other sites Non-consumptive research: linguistic analysis, automated translation, book relationships, index/search techniques Qualified users approved research agenda letter of support from participating library, book registry, google, or corpus host
How many orphans? One possible calculation Claimed 21% 3.5 million out of print books scanned prior to May 2009 $45 million (rights claims)÷ $60 per book______________ Orphans 79%= 750,000 claimed books, and 2.75 million orphans Out-of-print titles But $45 million is the minimum payment, so numbers may vary.
Google Book Settlement: Reactions Positive: huge corpus available to wide audience bypassed orphan works log jam Concerns: de facto monopoly user privacy intellectual freedom transparency equity of access long-term security of data
Copyright challenges Does the settlement erode statutory rights under copyright law? Is ‘fair use’ doctrine affected by the settlement? (NB – ‘fair use’ is much broader than Canadian ‘fair dealing’) Many argue that it does not….it is a private settlement among three parties. ‘Fair use’ legislative provisions haven’t changed. Many argue that it is more restrictive than ‘fair use’ General view that the settlement will be influential in setting de facto standards for ‘fair use’, such as number of pages that can be displayed or printed; the conditions for archiving and indexing of text for discovery purposes Contractual licensing is supplanting copyright legislation as the driver for reproducing and disseminating in-copyright books (in a commercial model) to our collective cultural heritage – enormous risk for the stewardship role of libraries.
Copyright challenges (2) The Registry will not be available to the public - a key tool is being developed privately The board of the Registry will have no librarian or reader representation… will a balanced approach to copyright, access, and pricing issues be possible? The settlement is silent on how agreements between libraries and the Registry would ensure that users can exercise their rights under the US Copyright Act Will have a damper effect on Open Access and Creative Commons licensing – how will affect the long-term plans of the Open Content Alliance, for example?
Pricing and market impacts No other provider can be offered license terms better than Google’s for ten years – creates a virtual monopoly and enormous lead time advantage. Upward pressure on pricing– affordability in a limited market. Pricing to be determined by: the pricing of similar products & services; the scope of books available; the quality of the scan; and features offered via the subscription The settlement refers to two goals: 1)market realization of revenues for rights holders, and 2) broad access by the public including institutions of higher education…to be based on “comparable products and services”. But which ones? Only Google can license ‘orphan works’ (in the absence of legislation) Opting out for orphan works would be very problematic. Creates a huge locked-in pool of books. Will have huge influence on market pricing for out-of-print books. Rights holder can set price, or the Registry will use a pricing algorithm based on similar books to determine price
Integration & curation issues Google has opened Book Search via APIs: libraries can embed book images, previews, and links to Book Search within discovery layers and catalogues Book Search won’t allow downloading of in-copyright books to mobile devices. How can we leverage SFX link resolver to obtain maximum benefit from enriched content in Book Search? Libraries can work closer with Google and OCLC to make it very easy to move from search results to purchased content (eg ‘Find in a Library’ link) Could lead to better partnership arrangements with libraries for developing finding aids and user tools Preservation is based on a commercial model, not on certifiable standards in a non-profit, research environment – what guarantee of permanence do we have? What if Google’s business model changes?
Integration & curation (2) Book Search offers a very limited form of collaboration (eg shared annotation among small & predefined groups) but: Doesn’t permit enhancements of texts; Doesn’t permit the layering of new services upon texts; Doesn’t permit use of texts in digital mash-ups. Compare this with dynamic developments in ebook interfaces for searching, sharing, storing, and managing ebook content
The Competition…. The sheer size and scope of Book Search will invite comparisons with established commercial products, such as EEBO, ECCO, and the backlists of major publishers like Oxford or Taylor & Francis. How will the pricing for the Institutional Database Subscription (IDS) affect pricing models in the academic marketplace? There will be much pressure on US libraries to acquire the IDS. If and when it is available in Canada, there will be pressure on libraries to acquired it. How will the ebook aggregators (eg NetLibrary, ebrary) be affected? Google has very deep pockets for R&D Turf wars - Google is providing public domain titles in ePub format to the Sony ebook reader and recently announced that “it would let anyone resell the millions of out-of-print books it has scanned from the nation’s libraries.” What was Amazon’s response? “an Amazon executive immediately rejected the idea of becoming Google’s affiliate”
The Competition (2) Comparison of DRM systems will be important for access to material. How will Google propose to integrate Book Search into the researcher’s workflow? Book Search won’t be able to offer the range of functionality and tools on Scholars Portal as a discovery environment for researchers Can we deem Book Search ‘a collection’ analogous to a library collection, eg on Scholars Portal?
Privacy matters Concerns over user privacy…will these be addressed? Google has an unprecedented opportunity to monitor and track user reading habits, eg when a user prints out pages from a book in the ISD, there will be a visible watermark displaying encrypted session information “which could be used to identify the authorized user that printed the material or the access point from which the material was printed” (art 4.1) “For purchases of online e-book access or access via institutional subscriptions, Google will have the technical ability to track every page that one views, even recording how long is spent on a page.” (Alan Inouye, ALA) What will privacy look like in a Google environment?
Intellectual Freedom If qualified users want to search the Research Corpus for ‘non consumptive’ research, e.g. textual or linguistic analysis, their research agenda needs to be approved by the host institution “Research Agenda” means a document that describes a research project in sufficient detail to demonstrate that it will be Non- Consumptive Research” (p. 17 of Settlement) Host institution is responsible. What will the criteria be? This will certainly conflict with academic freedom…fundamental values will be at play Google can exclude a book for editorial reasons: on what basis? Pressure from governments, powerful interest groups could have an important impact, e.g. Google saving itself from embarrassment or bad PR by suppressing a controversial book The Settlement requires Google to provide public access and the ISD for only 85% of the in-copyright, not commercially available books (potentially 1M books) Censorship & freedom of expression – another conflict with library values
Equity of Access Works within works might be excluded, depending on rights holder exercising his rights independently, eg an essay, a poem, a chart or a table The Settlement doesn’t include pictorial works, eg photographs and illustrations will be blacked out. Momentum driving supply & demand : “…it is possible that faculty and students at institutions of higher education will come to view the institutional subscription as an indispensable only because research libraries have invested significant resources in preserving out of print books.. They might insist that their institution’s library purchase such a subscription. The institution’s administration might also insist that the library purchase an institutional subscription so that the institution can remain competitive with other institutions of higher education in terms of the recruitment and retention of faculty and students.” ALA-ACRL-ARL Brief This can exacerbate inequalities among libraries, based on budget realities.
Future Business opportunities under thesettlement… Print on Demand Custom Publishing PDF downloads Consumer subscriptions Summaries, abstracts, compilations To compete, publishers will need to focus more on metadata, rights-management and new logically structured units (ie not pages) using a XML-based content architecture and workflow. Announcement last week: Google will provide the public domain books to On Demand Books for print-on-demand publishing using the Espresso Book Machine.
Conclusion We need to monitor developments closely, and engage in vigorous, balanced advocacy with our stakeholders, and show support for US libraries & organizations that are raising serious concerns What will be the future impact on our libraries? “Google is a behemoth, and the Google Settlement, if approved, will make it the behemoth of the book….Will the restraints of the Book Rights Registry be enough to keep it from abusing such a position, or will they be like the ropes of the Lilliputians around the sleeping Gulliver? This story is surely only in chapter one” - Grace Westcott, Globe & Mail Feb 20, 2009