A database of riches michael cairns


Published on

In this report, I have estimated the market opportunity that the Google Book database could represent and I have organized my review based on which customers are likely to purchase the product, how much and to what degree customers will purchase and I also explore how Google might go about selling and marketing the product. I have excerpted the management summary section on my blog and the full report is available in pdf here.

Readers interested in discussion this report in more detail and should contact me to arrange a meeting. Contact details are included in the document.

A database of riches michael cairns

  1. 1. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Michael Cairns – Managing Partner, Information Media Partners Tel: 908 938 4889
  2. 2. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 2 of 19 Author: Michael Cairns has been a publishing executive and consultant for over 25 years. As President, R.R. Bowker he led the team that transitioned the company from a print-based organization to one reliant on web subscription products, and also successfully broadened the company’s revenue base. During his tenure at Bowker, he managed the sale of Bowker from Reed Elsevier and, once that transaction was completed, he executed a strategic plan resulting in the acquisition and integration of five companies in three years. As a consultant, he has managed projects for many large media companies including Thomson Learning (Cengage), Simon & Schuster, Reed Elsevier, The Interpublic Group of Companies, Ogilvy & Mather, Hearst, Gruner + Jahr, Online Computer Library Center (OCLC), AARP and others. In addition, Michael has held executive positions at PricewaterhouseCoopers, Berlitz International, Inc., Macmillan, Inc, and In his current role at Information Media Partners, Michael consults with a wide spectrum of publishing and media companies helping them define market opportunities, develop business strategies, identify acquisition opportunities and manage through crisis. Potential clients are encouraged to contact Michael for more information (tel: 908 938 4889). Notes on this Report: In the summer of 2009, I started to wonder at the potential market opportunity that the Google Book Settlement could represent. Fellow industry consultant Mike Shatzkin and I began to discuss the agreement and I agreed to pull together a spreadsheet that could represent an ‘order of magnitude’ estimate of the market opportunity. This report does not rely on any direct interviews with Google nor representatives of the Book Rights Registry (BRR) and, as such, it only represents a structured approach to analyzing the opportunity. Nor is this report a definitive declaration of pricing, market penetration or approach in the manner in which this market opportunity may be leveraged. In addition to this report on market opportunity, I also constructed an estimate of the potential size of the orphan works population. This material has been available for some time on my blog (personanondata) and in several presentations I have made. I have included this analysis as an attachment to this report. (Other than a few minor punctuation edits, there have been no changes to my original). Several people helped in the review of this document and, for their time and effort, I am especially grateful. A special thanks to Mike Shatzkin of The Idea Logical Company who originally prompted me to look at the market potential of the Google Book Settlement and helped me organize my thoughts. Both OCLC’s WorldCat and Bowker’s Books In Print were invaluable in developing some of the conclusions formulated in this document. Specific citations are noted where applicable. Readers of this report may be interested in discussing the findings with me directly and in more detail. Please contact me to arrange a time: or 908 938 4889. Find me on LinkedIn, Twitter and Scribd. Copyright: Michael Cairns – Replication and Distribution By Permission 2
  3. 3. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 3 of 19 Introduction: Almost five years ago, Google embarked on the most ambitious library development project ever conceived: To create a “Noah’s Ark” of every book ever published and to start by digitizing books held by a rarefied group of five major academic libraries. The immediate response from US publishers was muted, until the implications of the project became clear: That Google proposed no boundaries to the digitization effort and initiated the scanning of books both in and out of copyright and in and out of print. Adding to publisher’s concerns, Google planned to display “snippets” (small selections) of the book’s content in search results. Despite some hurried conversations among publishers, author groups and Google, Google remained convinced that what they were doing represented a social ‘good’ and the partial display of the scanned books was legally within the boundaries of fair use. From the publisher perspective, this was a make-or-break moment, and the implications were more acutely felt by trade publishers who saw the potential for their business models to be obliterated by easy and ready access to high-quality content via a Google search over which they would exert little or no control. Even worse was the fear that rampant piracy of content would also develop – a debated and contentious point - given the easy access to a digitized version of a work that could be e-mailed or printed at will. The publishers determined that if Google were to ‘get away with it’ without challenge, then anyone would be able to digitize publisher content and possibly replicate what has been going on in the music and motion picture industries for almost ten years. In mid-2005, prompted by a law suit filed by The Authors Guild, the Association of American Publishers (AAP) led by four primary publishers filed suit against Google in an effort to halt the scanning of in-copyright materials. (The Authors Guild and AAP ultimately combined their filings). The initial Google Book Settlement (GBS) agreement, given preliminary approval by a court in October 2008, generated a vast amount of argument both in support of the agreement and in challenges to it. A revised agreement was drafted after the Federal District Court of Southern New York and Judge Chin agreed to delay the adjudication and final arguments which were heard in late February 2010. To date, Judge Chin has not given a timetable nor an indication of when and how he will decide the case. From the perspective of the early leading library participants, Google’s arrival and promise to digitize their purposefully conserved print collections looked like a miracle. Faced with forced declines in the dollars spent on monographs and the ever-rising expense of maintaining over 100 years of print archives, the Google digitization program provided a possible solution to many problems. All libraries believe they hold a social covenant to collect, maintain and preserve the most relevant materials of interest to their communities but maintaining that covenant becomes a challenge in an environment of increasing expenses while also enduring the challenges of migrating to an on-line world.1 1 It is important to acknowledge that, initially, the GBS may have been seen as a solution to libraries’ conservation and preservation needs; however, subsequently, libraries have determined that they need to develop their own preservation options in which The Hathi Trust is a clear leader. Copyright: Michael Cairns – Replication and Distribution By Permission 3
  4. 4. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 4 of 19 The library world is typically segmented into public and academic institutions and while these often varied ‘communities’ may differ in their philosophy towards, for example, collection development or preservation, they do share some common practices. Most importantly, all libraries are committed to resource sharing and while materials use has historically and primarily been ‘local’ to the library, every institution wants to make its collections available to virtually any patron and institution who requests them. In short, these library collections were always ‘accessible’ to all regardless of geography or copyright: First US Mail, FedEx, e-mail and then the Internet progressively made this sharing easier but, until Google arrived with their digitization program, any sharing beyond the local institution was via physical distribution2. In effect, it could be argued that the Google scanning program simply makes an existing practice vastly more efficient. Even though, the approval of the Google Book Settlement (GBS) hangs in the balance under review by Judge Chin of the Federal District Court of Southern New York, an Executive Director has been named to head the Book Rights Registry (BRR)3 and is preparing the groundwork to establish the organization (BRR) in advance of approval. This report represents an attempt to analyze the market size opportunity for Google as it seeks to exploit the Google Book Settlement. Following are our summary findings which are discussed in more detail in the ensuing pages of this report. Summary Findings of the Report:  Libraries will see tremendous advantages – both immediate and over time - from the GBS, although concerns have been voiced (notably from Robert Darnton of Harvard4)  Google’s annual subscription revenue for licensing to libraries could approach $260mm by year three of launch  Over time, publishers (and content owners) will recognize the GBS service as an effective way to reach the library community and are likely to add titles to the service5  Google will add services and may open the platform for other application providers to enhance and broaden the user experience 2 Resource sharing and improvements in the ‘logistics’ provided by OCLC (WorldCat) or via consortia such as OhioLink has made physical distribution effective and comparatively efficient. 3 The BRR is the management body tasked with administering the GBS and representing the interests of authors and publishers once approval has been granted by the court. 4 Robert Darnton, NY Review of Books 5 The settlement doesn’t provide for adding content prior to 1/5/09; however, we are suggesting that, by mutual consent, additional published content may be added as an expedient method of reaching the library market. Copyright: Michael Cairns – Replication and Distribution By Permission 4
  5. 5. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 5 of 19  The manner in which the GBS deals with orphan works will provide a roadmap for other communities of ‘orphans’ in photography, arts, and similar content and intellectual property Copyright: Michael Cairns – Replication and Distribution By Permission 5
  6. 6. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 6 of 19 Business Analysis: By mid-2008, the lawsuit was background noise adding to the general malaise and discomfort characterizing the media industry and the announcement that the parties had agreed to settle their differences was initially greeted with support, relief and some surprise. Yet, as the implications of the complex settlement agreement became clearer, a strong (and, at times, strident) opposition developed to argue for substantial revisions to, or the elimination of, key sections of the agreement. Importantly, this opposition also succeeded in enjoining the Department of Justice (DoJ) to voice ‘strong opposition’ to segments of the agreement. When combined with the concerns expressed by DoJ, the opposition to the agreement was able to exact significant changes to the agreement’s terms. A ‘revised agreement’ was presented to and is now pending approval by Judge Denny Chin of the Federal District Court of Southern New York. Among the principal arguments against approval of the original settlement agreement were the following: • Opponents argued Google would attain an insurmountable monopoly over in- copyright but out-of-print works • The obligation to ‘opt-out’ of the agreement places an undue burden on the copyright holder (author) • Foreign rights holders were under represented (or insufficiently consulted) and thus disadvantaged by the original agreement • Monies collected on behalf of copyright holders but never disbursed would be paid into a ‘general expenses’ fund to benefit the Books Rights Registry6 • Some authors believed their moral rights to determine the use and replication of their works were circumvented. • The agreement itself will in effect create copyright ‘legislation’ which should be the purview of Congress The revision to the agreement has partially addressed these issues (excepting the last item) but the settlement revision has not fully incorporated all of the challenges supported by the settlement opposition and the Department of Justice. Two aspects of the agreement which generated attention and hyperbole concerned the number of “orphan works” and the revenue model Google would implement to market their full-text database. Both of these issues are used by settlement opponents to justify the agreement’s rejection by the Court. In each case, very little real analysis has been 6 Changed in the second version of the settlement so that uncollected funds would eventually be distributed to designated charities. Copyright: Michael Cairns – Replication and Distribution By Permission 6
  7. 7. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 7 of 19 conducted to determine the true parameters of both the ‘orphan’ issue and the market opportunity. In August 2009, we published an estimate of the potential number of orphan works that may exist. We are unaware of any other detailed analysis that attempts to quantify the collection of titles which remain in copyright but whose copyright holder has not been located. This analysis is included as an attachment to this document7. The following chart summarizes the findings of potential orphan works: Estimate of Percent of Orphan Title Output: Works 1920 – 2000 580,388 Base Case 24% 824,553 High/Aggressive 34% In summary, the orphan analysis estimated a potential orphan population of 580,388 based on a review of pre-existing statistical information documenting the numbers of new titles published in the US since 1920. While we estimated that ‘orphans’ would be more prevalent among older titles, the total annual title output only exceeded 15,000 for the first time in 1960 (according to our source data); therefore, the universe of all titles published between 1920 and 1980 is actually relatively small. Publishing output only rapidly increased during the late 1980s and it is assumed that the majority of these titles will not be ‘orphans’ because copyright information is readily available and confirmable. As noted, the full report is included as an attachment to this report. We believe our analysis to be sound and the results were supported by a different methodology based on data from OCLC’s WorldCat database (as noted in the full report). After estimating the total number of ‘orphans’ we also estimated the number of foreign works that could potentially be included in the GBS. This analysis is more tenuous statistically because we relied entirely on the OCLC WorldCat database8 and made several key assumptions and extrapolations. Based on this conditional estimate, we determined there could be approximately 1.2million titles from the ten largest languages published and an additional 0.2million from all other languages. Currently, the content potentially covered by the GBS represents over 12mm titles scanned. Multiple versions of the same work are included in this total; however, even if all foreign works are to be excluded from the database and authors and publishers voluntarily remove 7 A related analysis that extrapolates the potential number of foreign language titles that may fall under the umbrella of the settlement has also been completed but is not included in this document. 8 This is not to assert that the WorldCat data is inaccurate in any way; rather, our assumptions should be considered ‘best-guess’. Copyright: Michael Cairns – Replication and Distribution By Permission 7
  8. 8. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 8 of 19 their titles from inclusion, the Google Book subscription product will remain a compelling database for the academic and public library market as well as schools and certain corporations. A significant change adopted in the amended settlement agreement has narrowed the class to UK, Australian and Canadian published books in addition to those registered with the US copyright office.9 The Google Books Database Subscription and Revenue Model Opponents have suggested that Google will be in a position to exercise monopolistic pricing and to ‘overcharge’ to extract maximum revenues from their customers. We agree that their market position could be abused; however, we believe there is a counter-balance included in the agreement that obviates this tendency. Google seeks maximum exposure for the content - not only to support its stated mission of providing wide and broad access to this ‘hidden’ content, but also to support other business opportunities they may implement (such as advertising programs). We believe Google will see overly aggressive pricing as an inhibitor to wide market acceptance of the product. The Book Rights Registry will represent the interests of authors and publishers who will argue for pricing that maximizes their opportunity. Together, balancing wide access (Google’s position) with pricing considerations will result in an optimal pricing matrix. In developing our financial and market analysis, there are several key assumptions we have relied upon10: • Pricing will be variable based on type of institution • This will be considered a ‘must have’ database product for all libraries • The Google product will effectively “level the playing field” from small to large academic libraries for the types of books covered by the Settlement • Google will continue to invest in the Book database product by adding content, functionality and applications/tools to aid usage over time and may raise pricing • Penetration will not reach 100% for any segment, but is likely to grow over time • Corporations will be important customers (e.g., science, aeronautics and engineering-based firms) 9 As an upper limit, the number of ‘non-English’ language titles could be 50% of the total books scanned. 10 Business models that include advertising are not assumed in this analysis. It may be possible that Google will use the scanned content as content around which they can tailor advertising offers; however, the second amended version has narrowed the application of varied business models and it is difficult to determine that any model other than a subscription-based service will be the primary revenue generator to Google and the BRR. Over time, this may change but that circumstance is not anticipated in this analysis. Copyright: Michael Cairns – Replication and Distribution By Permission 8
  9. 9. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 9 of 19 In the following analysis, we attempt to define the Google Books Database market opportunity and estimate the potential annual revenues the company may be able to generate each year from database subscriptions. Google currently markets several services to publishers which include Google Scholar, Google Partner Program and Google Editions (which will be launched in mid-2010). These current products and services are not included or assumed in this analysis. In estimating the market potential for the Google Settlement database product, we have taken three primary components (or drivers) into account: Market segmentation, penetration and pricing. Market Segment The agreement provides Google with the right to exploit certain markets including academic, public and special libraries, corporate customers, print-on-demand (POD)11 and direct-to-consumer sales. In our analysis, we have used American Library Association data itemizing the type and number of libraries in the US and used “best guess” estimates of the market opportunity represented by corporations and consumers. Most commentary to date has focused on the library community, which is where this analysis is strongest in its estimates and where we concentrate our discussion. An important accommodation of the Settlement is the provision of free access to the database product for all public libraries and certain “Carnegie” classed libraries. Each library accepting this access will receive the equivalent of a single user sign-on that will allow patrons and/or staff to access the Settlement database without restriction. While an important accommodation for some libraries, for the majority of libraries this access will not be appropriately functional and, thus, site-wide and unlimited user access provided under the terms of the subscription product will remain the better option. We do not believe this free access will materially impact the revenue opportunity for Google and have allowed for this circumstance in our financial model. In our opinion, academic libraries will consider a subscription to the Google Books database as a competitive necessity. For the first time, any subscribing library within the United States may gain direct access to the collections of some of the largest and most renowned academic collections in North America12. In addition, this access will far surpass the inter- library loan process of years past simply because the content is completely indexed. Researchers will no longer have to ‘guess’ that a title may be relevant to their research based on an index or table of contents and, moreover, they eliminate the risk that upon requesting the title be delivered to them, they discover the content to be irrelevant. 11 POD is a right that may be granted to Google in the future pending approval of the Book Rights Registry and the rightsholders they will represent. 12 The amended settlement has narrowed the class and effectively excludes non-English titles from the database. Copyright: Michael Cairns – Replication and Distribution By Permission 9
  10. 10. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 10 of 19 Many academic library collections have been built over centuries and titles in their collections are often unique, which is another compelling reason supporting the argument that the Google database represents a singular opportunity for all academic institutions to “narrow the gap” between their research capabilities and those of the country’s largest and best endowed institutions. While some academic collections’ titles are available via inter- library loan, many older, fragile and unique works are only available at the institution itself by special request. The digitization of many (not all) of these works significantly broadens access to and distribution of this content. Undoubtedly, researchers, educators and students at all academic institutions will pressure their administrators and librarians to subscribe to the product13. The following chart represents our construct for the potential addressable market segments for the Google book database14: Total Number of Academic Libraries 3,617 Total Public Libraries 9,198 School Libraries 99,783 Special Libraries 9,066 Armed Forces 296 Government 1,159 Market Penetration: We estimate that sales penetration will vary considerably across the segments; however, for the reasons presented earlier, we believe penetration into the academic library segment will lead all other markets. Public libraries (particularly metropolitan library systems) will find value in the database and, as a group, will represent the largest concentration of customers overall. School libraries are unlikely to subscribe to the database in great numbers for budgetary or relevance reasons and, moreover, students will be encouraged to gain access to the product via their public library remote-access facilities. We expect larger research public libraries (such as The New York Public Library) will be treated as academic libraries for the sake of pricing. We also expect some corporations to access the database product and, while pricing for these ‘for profit’ entities should be comparatively high, the absolute number of customers in this segment will be small. Pricing: Database subscription pricing can be complicated and confusing. Models can be based on population served, purchasing budgets and/or enrollment, and then be subject to 13 It is likely that an extensive database of user behavior maybe generated by usage of this database. This is data that publishers (and authors) may be interested in mining for product development and/or insights into consumer behavior. 14 Source: American Library Association Copyright: Michael Cairns – Replication and Distribution By Permission 10
  11. 11. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 11 of 19 multiplication factors such as number of simultaneous users, number of physical locations and other factors. We don’t know which method Google will choose; however, in order to keep our analysis as simple and transparent as possible, we have built our pricing model on the basis of the following criteria: • Unlimited users per location • Branch public libraries priced at 25% of base fee per additional branch • 3% price increases per year • Institution ‘classification’ based on ALA data • Full ramp-up will occur over the first three years Additionally, we expect Google will sell to the ‘highest’ administrative level possible15. For example, the University System of Georgia manages licensing contracts under their Galileo program for both public and academic libraries and, therefore, this agency would be the customer rather than individual or local libraries. In New York, Google would license access to the library authorities in each borough. In New York City (Manhattan), this would mean the main library and roughly 50 satellite libraries would have unlimited access via one contract and, based on our pricing matrix, the NYPL would pay approximately $340,000 per year for access ($25,000 for the main and $6,250 per 50 locations) For-profit organizations (corporations and businesses) will have a pricing matrix higher than for non-profit libraries and institutions (generally standard practice). We would expect that only a relatively small percentage of businesses would subscribe to the entire database and we have segmented the target market into Fortune 500, 1,000 and all others. The corporate customers most likely to subscribe would be those companies with large research needs such as pharmaceutical, aeronautics, engineering and the like. Options to better address this market may include shorter subscription terms, usage based on metering systems or topic/subject specific packages. Market Opportunity Summary: We believe Google and the Book Rights Registry (a proxy for authors, authors’ heirs and publishers) will be motivated to maximize access to the Google database in order to maximize viewing of the content which will, in turn, result in optimal revenues for both. We do not believe Google will implement a monopolistic approach to pricing and, in comparison with smaller and more segmented databases, we believe the Google pricing will appear reasonable considering the breadth and depth of content in the database. Approach to the Market: 15 Consortia pricing, while an important consideration, would represent a discount to the pricing matrix we present and would be negotiated on a case-by-case basis. We have not made accommodations for Consortia pricing. Copyright: Michael Cairns – Replication and Distribution By Permission 11
  12. 12. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 12 of 19 In our view, Google has several options for marketing and selling this database product: • Google sells the product themselves with their own sales force • Google designates one supplier for each segment • Google allows all vendors to integrate the books database product into their existing database products and pays Google a defined fee per user. In our view, it is unlikely that Google will establish their own sales force to sell into the library and corporate marketplaces. While Google does have an ad sales force supporting its SEM program(s), this activity is vastly different from building a sales team to call on libraries and corporate clients. Additionally, given Google’s predilection for automation, the hiring of a human sales team doesn’t seem culturally acceptable. Lastly, and possibly more important, we believe licensing this product will become more a ‘renewal’ business as the market matures (after 3-4yrs) which could require far less sales effort – or one significantly different than that required in the first three years. We estimate a fully staffed Google sales force could cost the company $15million annually but, in short, Google is unlikely to want the headache. Given the limitations of the above approach, we believe it is more likely Google will contract with one or more of the established players and pay a standard sales commission to the provider. In this model, Google will be able to set prices and targets and retain a degree of control over both the provider of this sales effort and the market delivery (pricing) of the product. Existing providers would bid on the right to sell this database on behalf of Google and, because the product will be highly valued, the bidding would likely be highly competitive. Likely providers to Google would include ProQuest, Gale/Cengage, OCLC or EBSCO. It is also possible that an ‘outlier’ such as Ingram, Baker & Taylor or Hudson News (LibreDigital) would also see representing this database as a significant opportunity. For an established player, it is likely the provider would see increased sales in their current offering – simply representing the Google Books database would open new market opportunities. For an ‘outlier’, the Google Books product may represent an opportunity to enter the market using the Google product as a foundation. In our estimation, the above scenario is not only practical (not having to administer their own sales force is a major advantage), but may also be cost effective. Given the ‘prize’ of representing the Google database, we believe the average cost to Google maybe less than 10% of revenues. (“Renewal” sales may also be commissioned less than initial sales). Working with a single provider thus represents an effective solution for Google but this strategy may not also be efficient. In order to achieve greater efficiency in reaching their target market while also eliminating possible “political” issues caused by selecting one vendor over the others, the company may consider allowing any provider to sign a standard distribution agreement with the company and sell and market the product into all markets. This approach has several advantages: Copyright: Michael Cairns – Replication and Distribution By Permission 12
  13. 13. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 13 of 19 • Immediately leverages the competitive position of all major providers that otherwise may be mutually exclusive • Gives a library subscriber a choice of provider and/or allows them to work with an existing ‘preferred’ vendor • Potentially enables providers to integrate the Google product with their existing products thus providing rapid development initiatives and built-in content ‘handcuffs’ supporting renewals • Minimizes Google’s exposure to any supplier limitations and negative customer support issues • Provides maximum exposure to all market segments virtually immediately • As part of these agreements, Google may gain access to index all content supplied by their third-party sales partners Approach to the Market Summary: Based on this review of Google’s tactical options, we believe the company will enable multiple (initially ‘preferred’) vendors to market and sell the product into the market. Google will establish pricing and the vendors will be required to pay Google based on this set price schedule (less vendor commission). Under this model, any vendor will be free to charge the end-customer less than the ‘set price’; however, the vendor would still pay Google based on the higher ‘full’ price. (Selling below the set price could occur due to bundling different products provided by the vendor). Forecasted Revenue Expectations: Based on our assumptions documented above, we believe the revenue Google may generate from the Google Books database product could approach $260million per year. Our revenue model was based on the following set of assumptions: • Base pricing by segment • Price discounts based on size of library holdings or population served • Penetration levels based on library size • Revenue represents full implementation, which we expect by year three Copyright: Michael Cairns – Replication and Distribution By Permission 13
  14. 14. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 14 of 19 The following chart documents our estimates: Total Avg. Revenue Segment Avg. Pricing Market Penetration ($MM) Academics 3,617 65% $55,000 $130.1 Publics 9,198 47% $21,000 $112.8 School 99,783 0.5% $10,000 $4.9 Special 9,066 0.5% $25,000 $1.1 Armed 296 5% $11,000 $0.1 Forces Government 1,159 25% $11,000 3.1 Corporate 100,000 2% $37.500 $7.5 Total $260.0 As noted, we believe it will take Google three years to ramp up this full implementation revenue (we do not see this as a limitation on Google’s part, rather, a typical expectation for a new-product roll out). At the above levels, we believe pricing is not only reasonable and affordable, but compares favorably with existing database publishers’ pricing. There are few, if any, other publishers who have products which serve as many (all) segments as the Google Book database. At this revenue level, each of the 12mm titles in the Google database has a nominal value of $22 (per year) to Google. More importantly, the per-unit price paid by each library will be less than $0.05 (five cents). On a pure cost-avoidance basis, licensing the Google Books database appears good value given current costs. If the costs of handing, cataloging, special requests (such as interlibrary loans) and storage are added to the base wholesale price of any title, the title’s full ‘carrying costs’ can double. Some studies have indicated that fulfilling an interlibrary loan request can cost $25 for each segment from the library to requestor and back. This cost far exceeds the original (or, in many instances, the replacement) cost of the title16. While we believe this database to be an important acquisition for most academic and many public libraries, we do expect that Google will need to sell this product aggressively in the early years to achieve the penetration levels we anticipate. There are several reasons for this: Firstly, the content of the database is largely unknown and, while representative of many important library collections, Google will need to market this collection as important and complementary to the library customers in question. Secondly, the sheer size of the database could be an inhibiting (or intimidating) factor and therefore the navigation, 16 Users may print all or portions of the titles they select – although the ability (functionality) to do this may be a subsequent grant provided by the BRR to Google – and there is a cost to these activities;; however, we maintain the utility of the database and the ability of the user to be precise in their printing requests will thus produce only a marginal negative cost (if any) relative the costs of avoidance that is endemic to the current solution. Copyright: Michael Cairns – Replication and Distribution By Permission 14
  15. 15. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 15 of 19 bibliographic data quality and the delivery of subject ‘collections’ will be important customer acquisition and retention areas for the company to focus on. In summary, we believe Google will be able to successfully launch their Book Database product into the market with fair and reasonable pricing that will encourage a broad base of target customers to subscribe. Future Market Growth Opportunities: While launch of this product is a focus of attention, we do believe the company has numerous opportunities to expand the product over time. We do not expect the Google Books database product to ‘stand still’; rather, we believe this product could become the primary access point for textural (monograph) materials into the library market. Future market opportunities17: • The addition of other content: Publishers may see this product as a viable library market entrance point for all their book content • Provision of usage data to publishers (and others) for business and product development needs • Pricing increases over time and penetration will increase • Inclusion of international/non-US market content – English language • Inclusion of international/non-US market content – Non-English language • Access to international markets • Addition of more in-copyright materials closer to current pub dates; perhaps becomes a major distribution mechanism for book content • Topic/segmented collections • Potential to open the database for third party application development 17 We expect these opportunities to ‘evolve’ over time based on discussion, negotiation and mutual agreement of the parties. Copyright: Michael Cairns – Replication and Distribution By Permission 15
  16. 16. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 16 of 19 Summary: This analysis argues that the Google Books Database product will be seen as a ‘must have’ product for a large proportion of academic and public libraries and is, thus, valuable on its merits. Google will price this product at levels both lower than existing database providers and at levels that are ‘economically viable’ given cost avoidance justifications. The company retains flexibility in how they will approach selling and marketing the product; however, we believe they will contract these services. Lastly, we believe there is potential upside to the revenue model based on adding new markets and expanding content. Copyright: Michael Cairns – Replication and Distribution By Permission 16
  17. 17. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 17 of 19 Addendum A – Orphan Works Analysis 580,388 Orphans (Give or Take) Clearly one of the most (if not the most) contentious issue regarding the Google Book Settlement (GBS) centers on the nebulous community of “orphans and orphan titles”. And yet, through the entirety of the discussion since the Google Book Settlement agreement was announced, no one has attempted to define how many orphans there really are. Allow me: 580,388. How do I know? Well, I admit, I do my share of guess work to get to this estimate, but I believe my analysis is based on key facts from which I have extrapolated a conclusion. Interestingly, I completed this analysis starting from two very different points and the first results were separated from the second by only 3,000 works (before I made some minor adjustments). Before I delve into my analysis, it might be useful to make some observations about the current discussion on the number of orphans. First, when commentators discuss this issue, they refer to the ‘millions’ of orphan titles. This is both deliberate obfuscation and lazy reporting: Most notably, the real issue is not titles but the number of works. My analysis attempts to identify the number of ‘works’; titles are a multiple of works. A work will often have multiple manifestations or derivations (paperback, library version, large print, etc.) and, thus, while the statement that there may be ‘millions of orphans titles’ may be partially correct, it is entirely misleading when the true measure applicable to the GBS discussion is how many orphan works exist. It is the owner (or parent) of the work we want to find. To many reporters and commentators, suggesting there are millions of orphans makes sense because of the sheer number of books scanned by Google but, again, this is laziness. Because Google has scanned 7-10 million titles then, so the logic goes, there must be ‘millions of orphans’. However, as a 2005 report (which I understand they are updating) by OCLC noted, many definitional disclaimers are applied to this universe of titles such as titles in foreign languages, titles distributed in the US, titles published in the UK, to name a few. Accounting for these disclaimers significantly reduces the population of titles at the core of this orphan discussion. These points were made in the 2005 OCLC report (although they were not looking specifically at orphans) when they looked at the overlap in title holdings among the first five Google libraries. (And, if you like this stuff, this was pretty interesting). Prognosticators unfamiliar with the industry may also believe there are millions and millions of published titles since, well, there are just lots and lots in their local B&N and town library. The two methods I chose to try to estimate the population of orphans relied, firstly, on data from Bowker’s BooksinPrint and OCLC’s Worldcat databases and, secondly, on industry data published by Bowker since 1880 on title output. I accessed BooksinPrint via NYPL (Bowker cut off my sub) and Worldcat is free via the web. The Bowker title data has been published and referred to numerous times over the years and I found this data via Google Book Search; I also purchased an old copy of The Bowker Annual from Alibris. In using these databases, my goal was to determine whether there are consistencies across Copyright: Michael Cairns – Replication and Distribution By Permission 17
  18. 18. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 18 of 19 the two databases that I could then apply to the Google title counts. In addition to the ‘raw data’ I extracted from the databases, OCLC (Dempsey) also noted some specific numbers of ‘books’ in their database (91mm), titles from the US (13mm) and non-corporate ‘Authors’ (4mm). Against the title counts from both sets of data, I attributed percentages which I then applied to the Google universe of titles (7mm). (My analysis also 'limits' these numbers to print books excluding, for example, dissertations). In order to complete the analysis to determine a specific orphan population, I reduced my raw results based on “best guess” estimates for non-books in the count, public domain titles and titles where the copyright status is known. These final calculations result in a potential orphan population of 600,000 works. I also stress-tested this calculation by manipulating my percentages resulting in a possible universe of 1.6mm orphan works. This latter estimate is (in my view) illogical, as I will show in my second analysis. An important point should be made here: I am calculating the potential orphan population, not the number of orphans. These numbers represent a total before any effort is made to find the copyright holder. These efforts are already underway and will get easier once money collected by the Books Rights Registry is to be distributed. My second approach emanated from a desire to validate the first approach. If I could determine how many works had been published each year since 1924, then I could attribute percentages to this annual output based on my estimate of how likely it was that the copyright status would be in doubt. Simply put, my supposition was that the older the work, the more likely it was that it could be an orphan. Bowker has consistently calculated the number of works published in the US since 1880 (give or take) and the methodology for these calculations remained consistent through the mid-1990s. According to their numbers, approximately 2mm works were published between 1920 and 2000. Unsurprisingly, a look at the distribution of these numbers confirms that the bulk of those works were published recently. If there were (only) 2mm works published since the 1920s, it is impossible to conclude there are millions of orphan works. To complete this analysis, I aggressively estimated the percentage of works published each decade since 1920 which could be orphan works. The analysis suggests a total of 580K potential orphan works which, as a subset of the approximately 2mm works published in the US during this period, seems a reasonable estimate. My objective to ‘validate’ my first approach (using OCLC and BIP data) shows that both approaches, using different methodology, reach similar conclusions. There are several conclusions that can be drawn from this analysis. Firstly, since the universe of works is finite then, beyond a certain point, the Google scanning operation will begin to find ‘new’ orphans at a decreasing rate. I don’t know if this number is 5mm scanned titles or 12mm; my estimate is 7mm because, according to Worldcat, there are 3mm authors to 12mm titles. If you apply this ratio to the Bowker estimate of total of works published, the number is around 7-8mm titles. Secondly, publishing output accelerated in Copyright: Michael Cairns – Replication and Distribution By Permission 18
  19. 19. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 19 of 19 the latter part of the 20th century. While my estimates in percentage terms of the number of more recent orphans were comparably lower than the percentages applied in the early part of the century for ‘older orphans’, the base number of published titles is much higher, therefore the number of possible orphans is higher. Common sense dictates that it will be far easier to find the parents of these later ‘orphans’. In the aggregate, the 600K potential orphans may still seem high against a “work” population of 2.2mm (25%). I disagree, given the distribution of the ‘orphan’ works (above paragraph) and because I have assumed no estimate of the BRR’s effort to find and identify the parents. In my view, true orphans will be a much lower number than 600,000, which leads me to my final point. Money collected on behalf of unidentified orphan owners will eventually be disbursed to cover costs of BRR or to other publishers. There has been some controversy on this point and it derives, again, from the idea that there are millions of orphans and thus the pool of undisbursed revenues will be huge. The true numbers don’t support this conclusion. There will not be a huge pool of royalty revenues to be ultimately disbursed to publishers who don’t ‘deserve’ this windfall because there won’t be very many true orphans. The other point here is that royalty revenues will be calculated on usage and, almost by definition, true orphan titles for the most part are not going to be popular titles and therefore will not generate significant revenues in comparison with all other titles. This analysis is not definitive, it is directional. Until someone else can present an argument that examines the true numbers and works in more detail, I think this analysis is more useful to the Google Settlement discussion than referring by rote to the ‘millions of orphans’. The prevailing approach is lazy, misleading and inaccurate. Copyright: Michael Cairns – Replication and Distribution By Permission 19