An empirical investigation of the Italian digital publishing market
Upcoming SlideShare
Loading in...5
×
 

An empirical investigation of the Italian digital publishing market

on

  • 505 views

Stefano Tombolini's master's thesis in business statistics on the digital publishing industry (dataset from Italy)

Stefano Tombolini's master's thesis in business statistics on the digital publishing industry (dataset from Italy)

Statistics

Views

Total Views
505
Views on SlideShare
504
Embed Views
1

Actions

Likes
1
Downloads
4
Comments
1

1 Embed 1

https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An empirical investigation of the Italian digital publishing market An empirical investigation of the Italian digital publishing market Document Transcript

  • Stefano Tombolini An empirical investigation of the Italian digital publishing market
  • This work is licensed under a Creative Commons Attribution 3.0 License. To view a copy of this license, visit bit.ly/T67jSf.
  • A Giovanni e Paolino.
  • ABSTRACT Il presente lavoro di tesi analizza il mercato italiano dei libri digitali (ebook) da un punto di vista statistico ed economico, utilizzando dati di catalogo e di vendita inediti, relativi al periodo 2010-2013. In primo luogo, le strategie di prezzo delle maggiori case editrici italiane vengono descritte tramite modelli di regressione lineare OLS e per quantili, individuando price point multipli, su dati di catalogo. Grazie all'elevato grado di dettaglio del dataset cross-section a disposizione, è stato possibile studiare il legame tra prezzi digitali e prezzi cartacei, un'analisi originale rispetto alla letteratura di riferimento. In secondo luogo, il lavoro esamina la concentrazione e la sensibilità al prezzo, a livello di singolo titolo, delle vendite di un distributore e-book italiano, focalizzato su editoria medio-piccola e self-publishing. L'uso ragionato di statistiche di concentrazione e il modello di regressione lineare, stimato in modo coerente alla natura longitudinale del panel di vendite, rivelano forti somiglianze tra la domanda di libri digitali e la domanda di libri cartacei, piuttosto che mutamenti della tipologia d'acquisto. Ciò è dovuto in parte a caratteristiche intrinseche del mercato librario, in parte a politiche di offerta editoriale sviluppate per analogia al mercato cartaceo, come evidenziato dall'analisi dei dati di catalogo. In futuro, invece, potrebbero emergere modelli di business alternativi basati su strategie di tying e di bundling, a causa di forti incentivi economici presenti nella distribuzione via Internet di beni digitali, come illustrato nelle conclusioni. Parole chiave: e-book, elasticità al prezzo, regressione lineare OLS, regressione lineare panel, regressione lineare per quantili, concentrazione. 4
  • Table of contents Abstract...............................................................................................................4 Table of figures...................................................................................................7 Table of tables.....................................................................................................8 I. Introduction......................................................................................................9 II. The digital publishing industry....................................................................11 II.1. History and technicalities.....................................................................11 II.2. Supply chain..........................................................................................17 II.3. The Italian e-book market.....................................................................20 III. Review of the literature...............................................................................24 III.1. Digital “cannibalization” of physical sales..........................................24 III.2. “Superstars” vs. “underdogs”.............................................................30 III.3. Survey and descriptive evidence.........................................................43 III.4. Econometric evidence..........................................................................45 IV. E-book pricing by major Italian publishers.................................................61 IV.1. Catalog dataset.....................................................................................61 IV.1.1. Descriptive statistics.....................................................................62 IV.1.2. Missing data..................................................................................70 IV.2. OLS linear regression results...............................................................72 IV.3. Quantile linear regression results........................................................81 V. A “long-tail-oriented” distributor sales.........................................................98 V.1. Sales dataset..........................................................................................98 V.1.1. Panel structure...............................................................................99 V.2. Concentration analysis........................................................................101 V.3. Panel linear regression results............................................................110 VI. Discussion..................................................................................................118 VI.1. Limitations of the study.....................................................................118 VI.1.1. Catalog analysis..........................................................................118 VI.1.2. Sales analysis..............................................................................120 VI.2. Possible further developments..........................................................123 Appendix.........................................................................................................129 Short glossary of the e-book......................................................................129 3G...........................................................................................................129 Adobe.....................................................................................................129 Adobe DRM............................................................................................129 AZW.......................................................................................................129 DRM.......................................................................................................129 E-book....................................................................................................129 E-book reader (e-reader)......................................................................129 E Ink®...................................................................................................129 EPUB......................................................................................................130 LCD........................................................................................................130 PDF........................................................................................................130 Social DRM............................................................................................130 5
  • Tablet.....................................................................................................130 Wi-Fi.......................................................................................................130 Short chronology of the e-book.................................................................130 1971.......................................................................................................130 1987.......................................................................................................131 1993.......................................................................................................131 1994.......................................................................................................131 1995.......................................................................................................131 1996.......................................................................................................131 1998.......................................................................................................131 1999.......................................................................................................131 2000.......................................................................................................131 2002.......................................................................................................132 2005.......................................................................................................132 2006.......................................................................................................132 2007.......................................................................................................132 2008.......................................................................................................132 2009.......................................................................................................132 2010.......................................................................................................133 2011.......................................................................................................133 Bibliography...................................................................................................134 6
  • Table of figures Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. 1: Growth in US e-book revenue (2002-2011)..........................................12 2: The publishing industry supply chain...................................................18 3: Graphical illustration of the “long tail” hypothesis..............................32 4: Concentration can be a misleading measure of the “long tail”............38 5: Calibration between sales and rank......................................................42 6: Likelihood to pay for downloading a single e-book..............................44 7: Zero-profit locus of price and output combinations.............................51 8: Pie chart of e-book protection mechanisms..........................................63 9: Pie chart of e-book subjects..................................................................64 10: Histogram of ebook.price....................................................................66 11: Histogram of paper.price.....................................................................67 12: Histogram of ebook.pub.delay............................................................68 13: Scatter plot of the quantitative catalog variables..............................70 14: Residuals normal Q-Q plot (OLS)........................................................76 15: Plot of residuals vs. fitted values (OLS)..............................................77 16: Graphical view of const (QUANTREG)................................................87 17: Graphical view of paper.price (QUANTREG)......................................88 18: Graphical view of file.is.open (QUANTREG).......................................89 19: Graphical view of subj.fiction (QUANTREG)......................................90 20: Graphical view of ebook.pub.delay.pos (QUANTREG)........................91 21: Graphical view of ebook.pub.delay.pos.sq (QUANTREG)...................92 22: Graphical view of ebook.pub.delay.neg (QUANTREG).......................93 23: Graphical view of ebook.pub.delay.neg.sq (QUANTREG)..................94 24: Graphical view of the pseudo-R-squared (QUANTREG).....................96 25: Bar chart of Stealth sales and catalog data........................................99 26: Lorenz curves of revenue distributions (TOT.SAMPLE)...................103 27: Lorenz curves of unit sales distributions (TOT.SAMPLE)................105 28: Lorenz curves of revenue distributions (SUB.SAMPLE)..................107 29: Lorenz curves of unit sales distributions (SUB.SAMPLE)................108 30: Plot of 95% confidence intervals for lnprice (PANEL)......................116 31: A Google Trends query by author.....................................................121 32: A Google Trends query by title..........................................................122 33: Demand for bundles of information goods........................................126 7
  • Table of tables Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. Tab. 1: Percentage of titles surviving more than 58 years..............................47 2: Summary statistics for the quantitative catalog variables..................65 3: Correlation matrix of the catalog variables..........................................69 4: Table of coefficients (LOGIT)................................................................71 5: Table of coefficients – preliminary (OLS).............................................75 6: White test (OLS)....................................................................................77 7: Table of coefficients – final (OLS).........................................................78 8: Breusch-Godfrey test (OLS)..................................................................80 9: Ramsey RESET test (OLS)....................................................................80 10: Table of coefficients – 0.05th quantile (QUANTREG.05)...................84 11: Table of coefficients – 0.25th quantile (QUANTREG.25)...................84 12: Table of coefficients – 0.50th quantile (QUANTREG.50)...................85 13: Table of coefficients – 0.75th quantile (QUANTREG.75)...................85 14: Table of coefficients – 0.95th quantile (QUANTREG.95)...................86 15: Gini coefficients of revenue distributions (TOT.SAMPLE)...............104 16: Gini coefficients of unit sales distributions (TOT.SAMPLE)............105 17: Gini coefficients of revenue distributions (SUB.SAMPLE)..............107 18: Gini coefficients of unit sales distributions (SUB.SAMPLE)............108 19: F test for individual effects (PANEL.1).............................................111 20: F test for individual effects (PANEL.2).............................................111 21: F test for individual effects (PANEL.3).............................................111 22: Hausman test (PANEL.1)..................................................................112 23: Hausman test (PANEL.2)..................................................................113 24: Hausman test (PANEL.3)..................................................................113 25: Table of coefficients (PANEL.1)........................................................113 26: Table of coefficients (PANEL.2)........................................................114 27: Table of coefficients (PANEL.3)........................................................114 8
  • I. INTRODUCTION The present work is a statistical and economic study of original catalog and sales data for the Italian digital publishing market during the period 2010-2013. Essential historical, technical, and economic information about the digital publishing industry, and the Italian context in particular, are provided in chapter II. The review of the literature in chapter III is quite extensive, and not only for completeness sake: a thorough review of the literature has been instrumental to the definition of the economic and statistical framework. Chapter IV presents the results of the analysis, through OLS and quantile linear regression, on multiple price points, of catalog data for e-books by major Italian publishers. Thanks to the detailed information available in the cross-sectional catalog dataset, it was possible to contribute a joint analysis of e-book and print book prices to the reference literature. An interpretation in terms of pricing behavior and market assumptions by incumbent print publishing houses is attempted. Chapter V presents the results of the analysis of sales data for a panel of titles distributed by a “long-tail-oriented” Italian e-book distributor. Concentration measures and panel linear regression models are used to investigate the distribution and the price sensitivity of sales on a title basis. We find little support in the data for the hypothesis of shifts in consumer tastes towards “niche” products. On the one hand, the finding is consistent with inherent characteristics of the book market, on the other, it might be ascribed also to a publishing of9
  • fer based on analogy with the print market, as suggested by the analysis of catalog data. Finally, in chapter VI, the methodological limitations of the research, both in technical and interpretative terms, are discussed. In addition to possible adjustments to the original models, alternative paradigms, more in line with the economics of digital markets for information goods, are outlined. Innovative tying and bundling strategies by e-book distributors could reshape the digital publishing industry and represent a serious competitive threat for incumbent players in the print publishing industry. 10
  • II. THE DIGITAL PUBLISHING INDUSTRY A basic understanding of the digital publishing industry supply chain requires some knowledge of the historical and technical aspects of e-book production and distribution. Sections II.1 and II.2 are preparatory to the comprehension of the variables in the datasets and to the interpretation of the results of the analyses. Readers unfamiliar with the digital publishing landscape might benefit from a quick survey of the short glossary and chronology in the Appendix. II.1. HISTORY AND TECHNICALITIES The digital storage of books probably dates as far back as the 1960s, in parallel to the development of the ASCII (American Standard Code for Information Interchange) character-encoding scheme.1 Over the years, the introduction of new encodings has allowed text files to represent many alphabets other than the English one. At least since the early 1990s, the supply side of the book market engaged in the digitization of the production process; by that time, ADOBE Portable Document Format (PDF), which “made it possible to create complex text documents with professional grade software”2, was already available.3 However, it is only at the end of the 2000s, with the advent of specialized hardware devices such as the AMAZON Kindle e-reader and the APPLE iPad tablet computer, that consumers began to perceive e-books as a viable alternative to traditional printed books.4 1 2 3 4 See ASCII, Wikipedia, accessed on 09/07/2013, bit.ly/WBUEXv. B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 9, bit.ly/XKt7Cp. Portable Document Format, Wikipedia, accessed on 09/07/2013, bit.ly/WRAkm3. B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 8, bit.ly/XKt7Cp. 11
  • During 2011, 20% of US Internet users have purchased e-books, with sales reaching 8% and 18% of sales of trade books and of fiction books, respectively;5 the share of US book consumers who also buy e-books grew from 13% in 2010 to 17% in 2011.6 Fig. 1: Growth in US e-book revenue (2002-2011) Source: M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, p. 30, fig. 1, available at SSRN: bit.ly/WcNhJq. AMAZON, the multinational e-commerce company, is the most successful global market player: during 2011, 70% of US e-book consumers have bought at least one of the 950,000 titles available in digital format on Amazon.com in the same year.7 5 6 7 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. 12
  • In July 2010, AMAZON announced that, for the previous three months, “sales of books for its e-reader, the Kindle, outnumbered sales of hardcover books.”8 Six months later, Kindle books overtook paperback books to become the most popular format on Amazon.com.9 By May 2011, AMAZON had been selling “more Kindle books than all print books – hardcover and paperback – combined”10. These numbers are remarkable indeed: AMAZON has been selling printed books since 1995, whereas Kindle was introduced only in 2007,11 and the Kindle catalog is still a small fraction of the printed one. 12 The figures do not even include free Kindle books, mostly out-of-copyright pre-1923 titles.13,14 The earning reports of large book publishers for 2011 also suggest that “e-books are generally more profitable than print books,” 15 despite roughly flat yearly revenues. In April 2012, the US DEPARTMENT APPLE and five major international HARPERCOLLINS, MACMILLAN, SIMON of antitrust laws. 8 9 10 11 12 13 14 15 16 OF JUSTICE AND book sued the technology giant publishers (HACHETTE, SCHUSTER, and PENGUIN) for violation 16 C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”, 07/19/2010, nyti.ms/118yEuR. Amazon.com now selling more Kindle books than print books, “Amazon Media Room: Press Releases”, 05/19/2011, bit.ly/119B0K2. Ibid. Ibid. C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”, 07/19/2010, nyti.ms/118yEuR. Ibid. Amazon.com now selling more Kindle books than print books, “Amazon Media Room: Press Releases”, 05/19/2011, bit.ly/119B0K2. L. HAZARD OWEN, Thanks to e-books, flat revenue is no problem for publishers, “TIME”, 03/30/2012, ti.me/UFbwiq. US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO. 13
  • In December 2011, the EUROPEAN COMMISSION had already opened formal antitrust proceedings against the same firms for anti-competitive practices.17 According to the accusations, book publishers teamed up with APPLE to restrain retail price competition in the e-book market. Initially, e-books were sold under a wholesale model, where publishers set a price and each retailer decides the cover price that he wants to charge for the title. AMAZON set $9.99 as its own price ceiling for e-books, an aggressive marketing policy to attract customers to the Kindle platform.18 Publishers reacted by shifting to agency pricing, where they fix the final customer price and pay a commission to the retailer. When APPLE launched the iBooks platform on iPad and iPhone, it accepted and advocated the adoption of such a pricing model for e-books. AMAZON, faced with the prospect of offering a smaller catalog than competitors, had to stop its $9.99 price policy and accept agency terms from publishers.19,20 At the very beginning of the 2000s, the ADOBE PDF file format had already found success among consumers, thanks to the ADOBE Acrobat Reader free viewing tool, and was used to “pioneer the commercial distribution of ebooks in Internet”21. The PDF file format is page-oriented and provides very precise layout control; these features are convenient for “high quality publications that re17 Antitrust: Commission opens formal proceedings to investigate sales of e-books , European Commission, 12/06/2011, bit.ly/ZBWTjW. 18 US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO. 19 Ibid. 20 M. RICH, B. STONE, E-book price increase may stir readers' passions, “The New York Times”, 12/06/2010, nyti.ms/XF4v2l. 21 B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 10, bit.ly/XKt7Cp. 14
  • quire very precise element spacing or include many image elements that have to be precisely positioned”22, but pose technical problems as far as the needs of the e-book market are concerned. […] the downturn of the PDF file format is that is quite difficult to read and use on small to medium size screens, like for example smartphones or very small tablet computers. The reason is that the text elements in PDF files are of a fixed size that is relative to the page size of the document and not to the size of the screen of the reader device. The user barely has the difficult choice between viewing the whole page with very small text characters or viewing only a magnified part of the page that has to be moved around all the time. 23 During the 2000s, attempts by the digital publishing industry to address the file format issue led to the emergence of two different de facto standards. The OPEN EBOOK FORUM, an international publishing industry organiza- tion created in 2000 and later renamed INTERNATIONAL DIGITAL PUBLISHING FORUM (IDPF), proposed the Open eBook (OeB) format, later replaced by the EPUB format, “in an attempt to set a common industry standard”24. Meanwhile, MOBIPOCKET, a company founded in 2000, concurrently developed the proprietary MOBI file format, very similar to the Open eBook specification. In 2005, AMAZON acquired the company. […] the MOBI file format was modified and transformed into the AZW file format. The AZW file format is now a proprietary format of AMAZON and there is no public specification available.25 Unlike PDF, both EPUB and AZW do not allow pagination or precise page layout; however, their text content is reflowable: “it adapts itself to the 22 Ibid., p. 29. 23 Ibid. 24 Ibid., p. 10. The EPUB file format is an open standard based on existing standard formats and algorithms: XML (eXtensible Markup Language), XHTML (eXtensible HyperText Markup Language), and ZIP (an open archiving file format). 25 Ibid., pp. 29-30. 15
  • size of the screen it is displayed on” 26 and to the viewing preferences of the user (e.g. bigger font size). This is a very convenient feature for simple books that do not require precise layout and image positioning (unlike comics, textbooks, and other richly illustrated books). The conventional wisdom about “open and standard formats” vs. “closed and proprietary formats” is that the former would be advantageous for producers, because of the complete control over the production process, and for consumers, because of the implicit guarantee against technological lock-in, whereas the latter would ensure distribution exclusivity and, well, customer lock-in.27 This line of reasoning is broadly correct, but, in the specific case of EPUB vs. AZW, it must be mitigated by the observation of two facts.28 First, AMAZON has been attentive to provide interoperability for its format: up to date, producers can easily create AZW files starting from EPUB files, and consumers can read AZW files with the official reading software, readily available for free on many platforms and devices other than the Kindle e-reader itself. Secondly, in order to prevent copyright infringement, many publishers encrypt their EPUB files with DRM (Digital Rights Management)29 technologies, similar to those implemented by AMAZON in its AZW format. Usually, the user cannot copy & paste or print the content of encrypted e-books, which can be read only on a limited number of authorized DRM-compatible devices. 26 27 28 29 Ibid., p. 20. Ibid., p. 30. Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management. See Digital Rights Management, Wikipedia, accessed on 09/07/2013, bit.ly/WzMF04. 16
  • As a consequence, the EPUB and the AZW formats can be considered very close substitutes.30 II.2. SUPPLY CHAIN Fig. 2 presents a modified version of the publishing industry supply chain as depicted by Blazejewski, 31 with the aim of providing a reference scheme and highlight recent industry trends. 30 A note for the tech-savvy reader: hereafter EPUB refers to the widely adopted EPUB 2.0.1 specification. The latest EPUB 3.0 specification introduced many modifications, aimed at improving the presentation of multimedia content and the expression of complex mathematical notation. For more information, see B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, pp. 18-21, bit.ly/XKt7Cp. AMAZON responded with the development of its new KF8 (Kindle Format 8) file format. For more information, see Kindle Format 8 Overview, Amazon.com, accessed on 09/07/2013, amzn.to/Vx6NlD. 31 B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 24, bit.ly/XKt7Cp. 17
  • Fig. 2: The publishing industry supply chain Source: own elaboration on B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 24, fig. VIII, bit.ly/XKt7Cp. For explanation sake, we minimized the level of integration of the supply chain; actually, we often observe a variety of higher degrees of vertical and horizontal integration (vertical integration between publishing houses and offline distributors, horizontal integration between offline and online bookstores, etc.) The author's manuscript, typescript, or, more probably nowadays, digital text document (.doc, .docx, .rtf, .odt, etc.) has to be delivered to the reader in a suitable format, viz. a printed book or an e-book. We can identify three stages in the production and commercialization process.32 32 Again, these stages may or may not correspond to as many specialized market operators. 18
  • • Publishing: the author's input is treated and converted into output formats of quality high enough to be printed and bound, or displayed on e-book readers. • Distribution: the output formats are cataloged, stocked, and distributed to retailers. • Retailing: the printed and/or digital versions of the book are made available for purchase to the public. While the second and the third stage are straightforward enough, the first stage needs a more in-depth analysis. During this phase, the author is faced with an alternative: whether to sign a publishing contract with a publisher or to self-publish. The distinction is relevant from many points of view, and there is an ongoing debate about advantages and disadvantages of each other. However, if we assume that the self-publishing author could find on the market the typical services provided by a publishing house (editing, translation, composition, etc.), for us it will suffice to neglect the technical aspects and focus on few fundamental economic considerations. On the one hand, a “traditional” author receives from his publishing house royalties on books sold, which provide “an incentive to the author to help market his book”33. The analogy of the author with “a salesman on commission”34 is intuitive and more persuasive than the analogy of royalties with taxes. On the other, a self-publishing author has to arrange the whole production and commercialization process of his book. 33 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 590, bit.ly/17dxX0b. 34 Ibid., p. 591. 19
  • It is very likely that he will need to outsource part of it, which explains the existence of specialized companies (self-publishing platforms). The higher the fixed and circulating capital requirements, the clearer the case for the role of intermediaries in the supply chain. Thanks to the introduction of low-cost and user-friendly desktop publishing, Internet distribution, and e-book reading devices, instead, it is virtually possible for an author to achieve a perfect vertical integration of the digital supply chain.35 Therefore, we may observe an even greater variety of degrees of supply chain integration in the future. II.3. THE ITALIAN E-BOOK MARKET In 2011, the Italian e-book market experienced fast growth. The number of e-book consumers reached 1.1 million people, approximately 2.3% of Italian adult (14+) population, with a yearly growth rate of 59%.36 Sales of e-reading devices increased by 718%, from €16 million in 2010 to €131 million in 2011.37 In the same year, e-book sales still represented a tiny fraction of the publishing industry total sales (€3 million vs. €1.3 billion, approximately),38 but the estimated 2011-2012 growth rate is 300%, with approximately €12 million worth of e-book sales.39 35 For a partial, yet worthy of mention, example of disintermediation, see O. SOLON, J.K. Rowling's Pottermore details revealed: Harry Potter e-books and more, “Wired”, 06/23/2011, bit.ly/WLzGY4. 36 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. 37 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. 38 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. 39 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. 20
  • In May 2012, 32,000 titles were available in digital format, 4.4% of the 2012 book catalog, a 180% increase over the previous period. 40 The e-book distribution platform Edigita, by FELTRINELLI, RCS, GEMS, and other Italian publishers, is the market leader in Italy, both in terms of value and catalog size.41 The publishing house MONDADORI, which distributes its own e-books, is comparable to Edigita in terms of value, in spite of a relatively small catalog. SIMPLICISSIMUS BOOK FARM e-book distribution platform, Stealth, is comparable to Edigita in terms of catalog size, with relatively low sales, because of its focus on small and medium publishers, and self-publishing authors.42 Both the paper and the digital book retailing industry are quite crowded, not only by many publishing houses, distributors, and specialized retailers, but also by “outsiders”, such as consumer electronics retailers and telecommunications operators.43 However, the Italian market is probably being shaken by the entrance of the main global industry players, AMAZON in particular, with the national release of Amazon.it in September 2010 and Kindle in December 2011.44 On the 16th of May 2013, ISTAT (ISTITUTO NAZIONALE DI STATISTICA) pub- lished a comprehensive report about the supply and demand of books in Italy for the years 2011 and 2012.45 The report is based on the results of two different surveys:46 40 41 42 43 44 45 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan. La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013, bit.ly/13RcbPE. 46 Nota metodologica, p. 1 in La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE. 21
  • • a supply-side survey about book production, administered to Italian publishers (almost 2,700 in total) in 2012;47 • a demand-side surveys about everyday habits and lifestyles, administered to a sample of Italian families (19,300 in total, distributed in 853 cities and towns) in 2011 and 2012.48 The Italian publishing industry is very concentrated: in 2011, 11.3% of active publishers49 published 75.8% of the entire book catalog and printed 88.7% of the total number of copies.50 In 2011, over 15% of the 9,000 print titles published in Italy in the same year was made available also in e-book format.51 Most of these digital titles (67.2%) are adult non-fiction books (natural sciences, linguistics, law and administration, geography and travel, information technology) and classic literary texts.52 It must be noted that, usually, fiction books attract a larger public and are more price-elastic and promotion-elastic than non-fiction books. In the same year, only one e-book out of four presented extra content or additional features (hypertext links, multimedia, etc.) with respect to its print 47 Publications shorter than five pages and propagandist, advertising, and informative materials are excluded from the inquiry. 48 “Readers” are defined as persons aged 6+ who have read at least one book in their leisure time during the 12 months prior to the interview. 49 The survey also includes companies that publish and print books as an accessory activity: in 2011, 25.2% of the responding publishers did not publish any book. For more information, see Nota metodologica, p. 1 in La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE and La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, p. 11, note 3, bit.ly/13RcbPE. 50 La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, p. 10, tab. 6, bit.ly/13RcbPE. 51 Ibid., p. 17. 52 Ibid. 22
  • edition,53 while 79.1% of the Italian digital publications were protected by DRM technologies54.55 In 2012, almost 5.5 million people aged 16-74 used a mobile device (cellphones, smartphones, PDAs, MP3 players, e-book readers, handheld game consoles, etc.) to connect to the Internet away from home or the workplace.56 Among them, 13.2% (over 700,000 people) read books online or downloaded e-books, in line with the European average (13%).57,58 53 Ibid. 54 See section II.1. 55 La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013, p. 17, bit.ly/13RcbPE. 56 Ibid., p. 15. 57 La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013, p. 15, bit.ly/13RcbPE. 58 The ISTAT estimate is lower than the SIMPLICISSIMUS BOOK FARM private estimate reported above in this section. 23
  • III. REVIEW OF THE LITERATURE Due to the youth of the digital publishing industry, especially in Italy, our research subject is relatively unexplored. Nevertheless, we have drawn valuable contributions from the voluminous literature on related subjects, such as the commercial impact of digital products, the characteristics of Internet distribution, and the peculiarities of the publishing industry. III.1.DIGITAL “CANNIBALIZATION” OF PHYSICAL SALES Some publishers and authors are skeptical about the profitability of ebooks and, thus, the sustainability of the industry: they fear that the digital versions of their books could “cannibalize” higher-priced print sales, without enough market growth to offset declining prices.59 The implicit assumption behind this line of reasoning is that paper books and e-books are products homogeneous enough to be considered very close substitutes, which implies highly positive cross-price elasticities. However, other industry observers point to the fact that e-book consumers represent a relatively distinct market segment for a relatively differentiated product.60 Once consumers invests on a device that allows to carry an entire library and search, scale, and highlight text, they shop directly the on-device ebook stores. 59 M. RICH, Steal this book (for $9.99), “The New York Times”, 05/16/2009, nyti.ms/11MVb0z. Interestingly, the article reports that publishers expressed similar concerns at the time of the introduction of the paperback format, which, eventually, expanded the demand for books, even though it cannibalized hardcover sales. 60 E. SCHNITTMAN, Ebooks don't cannibalize print, people do, “Black Plastic Glasses”, 09/27/2010, bit.ly/11Nk78l. 24
  • In contrast, print format selection (hardcover vs. paperback, usually) follows book title selection, which explains the publication delay of lowerpriced paperback editions with respect to higher-priced hardcover editions. 61 Hu and Smith (2011) use data from a natural experiment to test the significance of cross-channel effects between e-books and print books. 62 In April and May 2010, a publisher stopped distributing Kindle titles to AMAZON, but returned to release Kindle e-books and print (hardcover) books simultaneously in June 2010. The titles published in April and May 2010 are similar to those published in March and June 2010 along some observable dimensions. 63 The latter group serves as “control” (no publication delay of the e-book version with respect to the print edition) for the former “experimental” group (publication delay of the e-book version with respect to the print edition variable between one and eight weeks).64 The most robust finding of the research is a significant decrease in overall digital sales caused by delaying the publication of the e-book edition relative to the print edition. However, there is also some evidence of cross-channel substitution for “popular” books (defined in the paper as top 20% books ranked by sales) 65.66 In the case of popular books, content selection might precede channel selection, whereas for “niche” books (defined in the paper as bottom 80% 61 S. LOWE, Do ebooks cannibalize print sales?, “Publishing Bits”, 07/28/2009, bit.ly/YymFj6. 62 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural experiment, 08/29/2011, p. 9, available at SSRN: bit.ly/Y2G650. 63 Ibid., p. 12. 64 Ibid., p. 10. 65 Ibid., p. 19. 66 Ibid., p. 25. 25
  • books ranked by sales)67 consumers might be more likely to search for alternatives in their preferred channel.68 Unfortunately, in order to look for further evidence on digital cannibalization of physical sales, we have to turn our attention to other publishing products that have already undergone a substantial process of digitization. In an early study of the effects of the addition of Internet channels by newspaper companies, Deleersnyder et al. (2002) collect data for 85 online newspapers launched in UK and Netherlands between 1991 and 2001. 69 In the newspaper industry context, cannibalization is represented by a reduction in circulation and/or advertising revenues. 70 Based on individual and collective evidence, the researchers dismiss “the often-cited cannibalization fears”71 as “largely overstated”72. However, they also report a significantly higher probability of circulation revenues cannibalization in case of high overlap between the online and offline version of the newspapers, measured by surveying the respective webmasters.73 Another noteworthy secondary finding concerns the possible non-neutrality of the digitization process across product categories: some support emerges for the hypothesis that economic newspapers might benefit more than average from an online channel addition.74,75 67 Ibid., p. 19. 68 Ibid., p. 8. 69 B. DELEERSNYDER, I. GEYSKENS, K. GIELENS, M. G. DEKIMPE, How cannibalistic is the Internet channel? A study of the newspaper industry in the United Kingdom and the Netherlands, “International Journal Research in Marketing”, 19, 4, 2002, p. 337, bit.ly/1bP5umm. 70 Ibid., p. 342. 71 Ibid., p. 337. 72 Ibid., p. 346. 73 Ibid. 74 Ibid., p. 343 and p. 344, note 10. 75 Recently, similar evidence emerged for the Italian market: in the first seven months of 2013, Il Sole 24 Ore, the most widespread national daily business newspaper, outperformed the other national newspapers in terms of digital subscriptions. For more information, see G. FUSINA, Gli abbonamenti ai quotidiani digitali, dataninja.it, 09/18/2013, bit.ly/1fgfbwG. 26
  • In 2002, associations of publishers and authors complained to AMAZON CEO Jeff Bezos about the negative impact of the promotion of used books by the famous e-tailer on the sales of new titles. 76 Ghose et al. (2006) use data collected between 2002 and 2004 from Amazon.com new- and used-book marketplace to empirically test this theoretically possible proposition.77 Information and communication technology transformed the very inefficient brick-and-mortar used-book market into a relatively efficient online market, which shares with the e-book market potentially lower price tags than the new-book print market. […] while brick-and-mortar bookstores have high search costs, limited inventory capacity, limited geographical coverage, and relatively high prices, IT-enabled markets for used books offer low search costs, nearly unlimited (virtual) inventory ca pacity, global coverage, and—through competition among sellers—relatively low prices. […] Internet sales of used books made up an estimated 67% of all used-book sales in 2004 (Wyatt 2005). This represents the highest Internet penetration for any physical product category that we are aware of […] 78 The cross-price elasticity of new-book sales with respect to used-book prices has the expected positive sign, but is rather low.79 According to the theoretical model,80 […] only 16% of AMAZON used-book sales directly cannibalize new-book purchases; the remaining 84% of sales represent purchases that otherwise would not have occurred at new-book prices.81 76 D. K. KIRKPATRICK, Online sales of used books draw protest, “The New York Times”, 04/10/2002, nyti.ms/Y7mbUm. 77 A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1, 2006, pp. 4-5, bit.ly/1983mWp. 78 Ibid., p. 4. 79 Ibid., pp. 13-14. 80 Ibid., pp. 6-9. 81 Ibid., p. 17. 27
  • Used books may be poor substitutes for new books, mainly because of eventual quality degradation and possible reseller unreliability. The indirect method proposed by the researchers to quantify the substitution effect between used and new books treats the two products as homogeneous.82 If, instead, they were relatively differentiated products, such an estimate, based as it is on cross-price elasticity of demand, would be misleading. This consideration may not be deemed relevant for the used books market, but might be fundamental for e-books and, more in general, digital prod ucts. The often-cited fears expressed by publishing industry participants might rest not so much on sales cannibalization from cheaper versions of the same products, as on market “annihilation” from entirely new products. In UK, the OFCOM (OFFICE OF COMMUNICATIONS) and the IPO (INTELLECTUAL PROPERTY OFFICE) commissioned KANTAR MEDIA to conduct an extensive and rigorous83 survey, to measure online copyright infringement levels during the third quarter of 2012, consumer spend on recorded and digital media, and willingness to pay for six different content types: • music, • films, • TV programs, • computer software, • books, 82 Ibid., p. 8, eq. 8. 83 For the report, data reconciliations, questionnaire, and data tables, see Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. 28
  • • and video games.84 Among these product categories, books are the most laggard in terms of digitization: the total estimate of digital and physical books consumed is 176 millions, of which 39% are e-books consumed via downloading or accessing online (59% for free, of which 21% illegally). 85 These numbers are low if compared to those of the other content types. • The total estimate of digital and physical music tracks consumed is 1,403 millions, of which 81% are digital tracks consumed via downloading or streaming (72% for free, of which 37% illegally). 86 • The total estimate of digital and physical films consumed is 148 millions, of which 56% are digital films consumed via downloading or streaming (61% for free, of which 57% illegally).87 • The total estimate of digital and physical TV programs consumed is 272 millions, of which 92% are digital TV programs consumed via downloading or streaming (80% for free, of which 21% illegally). 88 • The total estimate of digital and physical software products consumed is 69 millions, of which 80% are computer software products consumed via downloading or accessing online (85% for free, of which 55% illegally).89 • The total estimate of digital and physical video games consumed is 68 millions, of which 55% are digital video games consumed via downloading or accessing online (63% for free, of which 29% illegally). 90 84 Report by Kantar Media, pp. 5-6 in Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. 85 Ibid., pp. 66-67. 86 Ibid., pp. 27-28. 87 Ibid., p. 38. 88 Ibid., pp. 48-49. 89 Ibid., p. 58. 90 Ibid., p. 77. 29
  • III.2.“SUPERSTARS” VS. “UNDERDOGS” It has always been common knowledge among publishing industry workers that “a small percentage of titles accounts for a large share of sales of copyrighted materials.”91 According to Lindy Hess, director of the Columbia Publishing Course, The truth about this business is that, with rare exceptions, nobody makes a great deal of money. 92 In its 1643 petition to the parliament, the British publishing guild, the STATIONERS' COMPANY, argued that “scarce one book in three sells well, or proves gainfull to the publisher.”93 Similar evidence was brought about by publishers and authors before the 1876-8 ROYAL COMMISSION ON COPYRIGHT. Four books out of five which are published do not pay their expenses […] The most experienced person can do no more than guess whether a book by an unknown author will succeed or fail.94 […] only one book in four is a very moderate calculation of the books which are successful, or the books which pay their expenses. 95 […] not one book in nine has paid its expenses […] still they [two publishers] have been able to carry on the trade.96 In 1986 and 1987, according to Liebowitz' estimations,97 91 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, p. 454, bit.ly/GHOhQB. 92 M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010, nyti.ms/YUlmO4. 93 M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010, nyti.ms/YUlmO4. 94 Ibid., p. 183. 95 Ibid., p. 185. 96 Ibid. 97 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, pp. 454-455, bit.ly/GHOhQB. 30
  • best-sellers [defined in the paper as the top 124 best-sellers] were likely to have generated nearly $1 billion in sales out of a total of $1.7 billion. 98 These estimates do not even include “sales of best-sellers from previous years that were still selling in relatively large numbers” 99. Liebowitz addresses also the question of market longevity, by constructing a small sample of 236 titles from a 1920s edition of the Book Review Digest, which reviewed approximately 25% of the new titles. 100 These were the “titles attracting the most attention, written by the more important authors and published by the better-known houses” 101. After 58 years, 54% of the 1920s best-selling titles were still in print, vs. only 33% of the 1920s non-best-selling titles.102 Brynjolfsson et al. (2006) report that a typical brick-and-mortar store in the early 2000s stocked only 40,000-100,000 unique titles, out of more than three million books in print.103 In the same period, Amazon.com and other Internet retailers were selling almost the entire catalog of books in print.104 The researchers estimate that 30-40% of Amazon.com sales were in books not normally available in brick-and-mortar stores. 105 Digital markets could have not only increased product variety, but also deepened consumer preferences, a phenomenon that web marketing experts have dubbed “long tail”.106 98 Ibid., p. 455. 99 Ibid. 100 Ibid. 101 Ibid. 102 Ibid., tab. 1. 103 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the long tail, “Heinz Research Papers”, 51, 06/01/2006, p. 3, bit.ly/GHk7y3. 104 Ibid. 105 Ibid. 106 Ibid., p. 4. 31
  • Increased product availability implies a shift of the sales distribution towards the tail, i.e. more obscure titles, while the dispersion of consumer preferences modifies the shape of the distribution: a long tail emerges, eventually at the expense of the head, i.e. top-selling hits.107 Fig. 3: Graphical illustration of the “long tail” hypothesis Source: A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, p. 40, fig. 1, bit.ly/19uLOzX. The distinction between first-order (original) and second-order (derivative) drivers provides a simple framework to describe the supply-side (producers/retailers) and demand-side (consumers) causes of the long tail phenomenon, and to understand its dynamics.108 107 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, p. 5, bit.ly/19uLOzX. 108 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the long tail, “Heinz Research Papers”, 51, 06/01/2006, p. 12, exhibit 4, bit.ly/GHk7y3. 32
  • From the supply-side point of view, information and communication technology loosens physical constraints (virtual shelf-space, aggregation of consumers from different geographical locations, etc.) and reduces production costs (e.g. make-to-order production, such as print-on-demand), distribution costs (e.g. electronic delivery of digital products), and marketing costs (websites, social networks, etc.)109 From the demand-side point of view, information and communication technology reduces search costs for consumers thanks to active search tools (search engines, sampling tools, etc.), passive search tools (recommender systems, product classifications, etc.), and user-generated content (customer reviews, online communities, etc.)110 Thus, “niche” products may become viable options for producers, retailers, and consumers. Moreover, these first-order (original) drivers of the long tail phenomenon could set up a potentially cumulative and self-perpetuating process triggered by second-order (derivative) drivers: • the increased profitability of niche products for producers and retailers (supply-side incentive), • and the further deepening of consumer tastes towards niche products (demand-side positive feedback).111 Whether or not the long tail hypothesis translates into a relevant busi- ness phenomenon is purely an empirical question, which, since the early 2000s, has proved to be an interesting line of research for both the academic and the business literature.112 109 Ibid., pp. 4-5. 110 Ibid., pp. 5-6. 111 Ibid., pp. 7-8. 112 In the following, we discuss only the statistical and econometric literature. For a broader overview, see Long tail, Wikipedia, accessed on 09/07/2013, bit.ly/ZX5aLb. 33
  • Initially, the reason that prompted researchers to study the concentration of book sales was eminently practical. Internet retailers are jealous of their own sales data, but usually report sales rankings, therefore the need for researchers to map observable sales ranks to the corresponding sales quantities. Given rank data, Chevalier and Goolsbee (2003) hypothesize that the probability distribution of book sales is paretian, a distributional assumption already exploited by authors and publishers. 113 So, a log-linear model can be used for demand estimation: log(Sales)=β1 +β 2⋅log( Rank)+ϵ .114 (1) β2 is the shape parameter relating sales quantities to sales ranks, while β1 is a scale parameter. They estimate β2 as -0.855115 by means of the sales quantities and the sales ranks obtained before and shortly after a simple experiment. […] they first obtained information from a publisher on a book with relatively constant weekly sales, then purchased six copies of the book in a 10-minute period, and tracked the Amazon.com rank […]116 Brynjolfsson et al. (2003) gather weekly sales data for 321 titles from one publisher during the summer of 2001 and the corresponding weekly sales rank data from Amazon.com.117 113 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2, 2003, pp. 208-209, bit.ly/1b24xGp. 114 Ibid. 115 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”, 49, 11, 2003, pp. 1587-1588, bit.ly/18I1YJp. 116 Ibid. 117 Ibid., p. 1587. 34
  • Even though extremely simple, the model in (1) fits the data fairly well (R-squared: 0.8008) and is consistent enough with industry statistics.118 Since their estimate of β2 as -0.871 is based on 861 data points, other researchers have preferred to stick to it rather than execute experiments with few data points.119 The most conservative estimate by Brynjolfsson et al. (2003) of the proportion of total Amazon.com sales generated by “niche” titles is 29.3%, computed as the proportion of total Amazon.com sales lying above rank 250,000, approximately the number of titles available at the largest BARNES & NOBLE superstore in New York City at the time (out of 2,300,000 books in print).120 Ghose and Gu (2006) analyze daily panel data for 3,210 books, gathered from Amazon.com and Barnes&Noble.com between September 2005 and April 2006.121 They show that, even in online markets, search costs for “obscure” books (defined in the paper as books with sales rank higher than 20,000 or 40,000, alternatively)122 are higher than for “popular” books (defined in the paper as books with sales rank lower than 20,000 or 40,000, alternatively) 123, which may limit the scope of the long tail phenomenon.124 118 Ibid. 119 For example, see A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1, 2006, p. 11, bit.ly/1983mWp and A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, p. 11, bit.ly/1aeZtxj. 120 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”, 49, 11, 2003, pp. 1588-1589, bit.ly/18I1YJp. See above and below in this section for less conservative estimates from the same authors. 121 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, pp. 9-10, bit.ly/1aeZtxj. 122 Ibid., p. 20. 123 Ibid. 124 Ibid., p. 7. 35
  • The stability across ranks and retailers, and over time, of β2 , which measures “customers' relative tastes for popular and obscure books” 125, is a very strong assumption that has been criticized in subsequent works; a number of alternative techniques have been proposed in the literature. Brynjolfsson et al. (2010) suggest that the relationship between sales and sales rank may not be purely log-linear.126 In order to improve the estimate of Amazon.com long tail sales, they use different slope coefficients127 to fit the sales-rank relationship on a sample of 1,598 Amazon.com titles, monitored over a ten-week period from June to August 2008.128 The estimation results of a negative binomial regression model with four splines (knot points at the 25 th, 50th, and 75th percentiles of sales rank)129 show that “the coefficients on all four splines are negative and highly significant.”130 In addition, the slope coefficients gradually become more negative as the book sales rank increases. […] book sales decrease at an increasingly faster pace, as we move from popular books to niche books […]131 The advocated advantage over the OLS linear regression in (1) is that the model takes into account also the frequent observations with zero sales.132 125 A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1, 2006, p. 11, bit.ly/1983mWp. 126 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, The longer tail: the changing shape of Amazon sales distribution curve, 09/20/2010, p. 2, available at SSRN: bit.ly/15QfXyK. 127 Ibid. 128 Ibid., p. 3. 129 Ibid., p. 7. 130 Ibid. 131 Ibid. 132 Ibid., p. 6. 36
  • According to the “old” methodology, the proportion of total Amazon.com sales generated in 2008 by “niche” titles (defined in the paper as books with sales rank higher than 100,000)133 would have been 82.57%, a clear overestimation with respect to 36.7%, the estimate obtained with the “new” methodology.134 The treatment of products with zero sales is critical also for the measurement of the significance of the long tail phenomenon through concentration statistics. Meaningful concentration comparisons, across channels, retailers, and over time, require similar product availability. […] the effect of product availability on the concentration of product sales may be nonmonotonic. A moderate increase in production selection may lead to a less concentrated distribution of product sales, but if the market is flooded by a large number of products that have minimal sales, product sales can actually appear to be more concentrated even if the sales don't change for any of the previously existing products.135 133 Ibid., p. 8. 134 Ibid. 135 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011, p. 1374, bit.ly/196Bu39. 37
  • Fig. 4: Concentration can be a misleading measure of the “long tail” Case 1A: 100 products are available and the top 50% of products account for 75% of total sales. Case 2: Add a “tail” of 100 niche products with small sales, while leaving the sales of existing products unchanged. Now 200 products are available, and the top 50% of products account for 95% of total sales. Case 1B: Sales of the top 100 products are exactly the same as in Case 1A. The only change from Case 1A is we now consider 100 niche products that have zero sales. In this case, 200 products are available, and the top 50% of products account for 100% of total sales. Source: E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011, p. 1375, fig. 1, bit.ly/196Bu39. Brynjolfsson et al. (2011) use the Lorenz curve and the Gini coefficient to study the concentration of product sales in the catalog channel and the Internet channel of a clothing retailer.136 136 Ibid., pp. 1378-1379. 38
  • They analyze an identical selection of products, at an identical set of prices, with the same order-fulfillment facilities, available and “visible” within an identical time window.137 As expected, the Internet channel exhibits a less concentrated distribution of product sales than the catalog channel.138 Elberse and Oberholzer-Gee (2006) use three different techniques and Nielsen VideoScan data to “study the distribution of revenues across products in the context of the US home video industry for the 2000 to 2005 pe riod”139. First, they generate various descriptive statistics for the distribution of sales across titles from year to year and compute the Kolmogorov-Smirnov statistic for pairs of years to test for shifts in the distribution. 140 Location, scale, skewness, kurtosis, and inter-quartile measures are “consistent with a scenario in which the distribution becomes more dispersed, more asymmetrical, and develops a sharper peak and a longer tail over time”141. The Kolmogorov-Smirnov tests reveal that the distributions of weekly sales across titles are significantly different across the years. 142 Then, they “estimate a quantile regression model to examine the factors that underlie the shift in the distribution of sales” 143. 137 Ibid., p. 1374. 138 Ibid., p. 1379. 139 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, p. 2, bit.ly/19uLOzX. 140 Ibid., pp. 8-9. 141 Ibid., p. 12. 142 Ibid. 143 Ibid., p. 8. 39
  • In the case at hand, quantile linear regression is more appropriate than OLS linear regression: the research is not concerned with average effects, but with “how the entire distribution changes with certain covariates” 144. Examining multiple quantiles allows for a richer inference than “segmenting the response variable into subsets according to its unconditional distribution and then doing least squares fitting on these subsets” 145, a procedure that clearly suffers from sample selection bias. Quantile linear regression results show that the distribution of sales has shifted down in general, but this shift is largest for the better-selling titles. 146 [...] the tail of the distribution has seen a much smaller decrease, implying a shift in the mass towards niche products. 147 Finally, they estimate a negative binomial regression model to analyze the number of titles that meet certain weekly sales threshold levels (“zero sales, sales below the 70th quantile, sales between the 70 th and the 80th quantile, sales between the 80th and the 90th quantile, and sales above the 90th quantile”148).149 The whole picture that emerges from this comprehensive study is quite complex: “superstar” and “long tail” effects are not necessarily antithetical, and, indeed, seem to coexist. Are there important superstar and long-tail effects in U.S. home video sales? The answers turn out to be of the “yes, but…” variety. Yes, there is a long-tail effect in that the number of titles that sell only a few copies every week increases during our study period. But at the same time, the number of non-selling titles also in 144 Ibid., p. 9. 145 K. F. HALLOCK, R. KOENKER, Quantile regression, “Journal of Economic Perspectives”, 15, 4, 2001, p. 147, bit.ly/15gERVI. 146 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, p. 13, bit.ly/19uLOzX. 147 Ibid. 148 Ibid., p. 17. 149 Ibid. 40
  • creases substantially; it is now four times as high as in 2000. Many underdogs turn out to be losers. We also find evidence of a superstar effect. Among the best-per forming titles, it is an ever-smaller number of films that accounts for the bulk of sales. The caveat here is that today's superstars lack the punch of earlier years. Video sales generally decrease over time across all quantiles of the sales distribution, but this effect is most pronounced among best-selling titles. 150 In order to estimate the potential producer and consumer welfare arising from the availability in e-book format of world out-of-print titles, Smith et al. (2012) match two random samples of titles: one composed of titles already available on the Kindle marketplace, the other of titles not yet available on the Kindle marketplace.151 The project required the mapping of sales ranks of Kindle out-of-print titles to the corresponding sales levels. Initially, they try to fit with (1) a dataset provided by a major publisher, covering weekly sales and weekly sales ranks for 713 e-book titles for a tenweek period. Since the research object is the “extreme tail”152 of out-of-print books, they also try various different polynomial rank terms to produce stronger fits in the tail of the distribution, and obtain better results with a third degree polynomial function: 2 3 log (Sales)=β1 +β 2⋅log (Rank)+β3⋅log( Rank) +β 4⋅log( Rank) +ϵ .153 (2) 150 Ibid., p. 18. 151 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, pp. 2-3, available at SSRN: bit.ly/WcNhJq. 152 Ibid., p. 10. 153 Ibid. 41
  • Fig. 5: Calibration between sales and rank Source: M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, p. 33, fig. 5, available at SSRN: bit.ly/WcNhJq. Still unsatisfied with the fit produced by (2) for observations with ranks above 200,000, the researchers complement the estimation method with an experiment similar to that of Chevalier and Goolsbee (2003).154 They purchased between one and three copies of 30 randomly selected Kindle titles with ranks between 200,000 and 1,000,000, and tracked their sales before and after this experiment.155 • For titles with low ranks (below 200,000), they predict sales with (2).156 154 See above in this section. 155 Ibid. 156 Ibid., p. 11. 42
  • • For titles with high ranks (above 200,000), they “assign sales according to the expected sales that belong to the interval the title rank falls into based on the experiment described above”157. • If a title does not have a rank, they assume that it has no sales. 158 III.3.SURVEY AND DESCRIPTIVE EVIDENCE In 2002, the OPEN EBOOK FORUM159 sponsored a consumer survey on e- books, administered during the New York City is Book Country event.160 263 volunteers self-completed the survey, so the results are limited by sample self-selection and incomplete questionnaires.161 Most participants (61%)162 reported that they were willing to buy ebooks at the same price of paperback books.163 However, according to survey data from the UK OFCOM Online copyright infringement tracker benchmark study Q3 2012,164 e-book consumers expect lower price tags with respect to print books.165 The likelihood to pay for a single book download decreases steadily with price: among those who have ever downloaded or accessed e-books, 78% are willing to pay at £2, falling to 7% at £10.166 The mean price willing to pay is £3.49; 167 similar results arise for the willingness to pay for a subscription service.168 157 Ibid. 158 Ibid. 159 See section II.1. 160 Consumer survey on ebooks, Open eBook Forum, 2003, p. 4, available at im+m: bit.ly/15Olump. 161 Ibid., p. 8. 162 Ibid., p. 14. 163 Ibid., p. 19. 164 See section III.1. 165 Report by Kantar Media, p. 69 in Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. 166 Ibid. 167 Ibid. 168 Ibid., p. 70. 43
  • Fig. 6: Likelihood to pay for downloading a single e-book Question: Assume you saw a new fiction e-book on an online service that you wanted to own. It would be high quality, and you knew it was a reputable and reliable service. How likely would you be to download it if it was the following prices? Base: All 12+ in the UK that have ever downloaded/accessed e-books (652). £3.49 is the average price people are willing to pay for a single e-book download. Source: OCI Benchmark study slide pack 5 - Books, p. 31, slide B23 in Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. Compared to the users of other content types covered in the report, ebook downloaders are more skewed towards females (54%) and have an older age profile (58% are 35+).169 AMAZON Kindle is the most used service for e-books (80% of users, consistent among demographics and sub-groups)170, which might explain, in part, why the estimate of illegal behavior for books is the lowest across content types (11% of users)171. Descriptive evidence on the e-book market has been presented in 2012 by Mark Croker, founder of SMASHWORDS, a large distributor of self-published e-books.172 169 Ibid., p. 61. 170 Ibid., p. 66. 171 Ibid., p. 65. 172 M. COKER, How data-driven decisions *might* help indie ebook authors reach more readers , “RT Booklovers Convention”, 04/25/2012, slidesha.re/Xhccuo. 44
  • Since 2008, the company has published more than 100,000 indie (independent) e-books distributed to multiple major retailers, and remunerates authors with significantly higher royalties (60% of list price) than traditional publishing houses.173 Aggregate sales data for a nine-month period from the SMASHWORDS distribution network174 reveal strong overall growth, driven by titles achieving viral word of mouth.175 In fact, the sales distribution is characterized, unsurprisingly, 176 by few titles selling extremely well, thousands of moderate sellers, and a vast majority of poor selling titles.177 Sales of individual titles at APPLE iBookstore rise and fall over time, either randomly or based on author promotions, new releases, and promotions by retailers.178 As for price elasticity, e-books priced between $2.00 and $2.99 sell 6.2 times more units than those priced more than $10.00, 179 which implies an approximate price elasticity of -2. However, e-books priced between $0.99 and $1.99 seem to underperform in terms of profitability with respect to those priced between $2.99 and $5.99.180 III.4.ECONOMETRIC EVIDENCE We can identify historical and political reasons prompting econometric investigations of the publishing market, of the relationship between quantity sold and prices in particular. 173 Ibid., pp. 8-9. 174 Ibid., pp. 18. 175 Ibid., pp. 24. 176 See section III.2. 177 Ibid., p. 61. 178 Ibid., p. 30. 179 Ibid., p. 51. 180 Ibid., p. 58. 45
  • Historically, economic arguments have had, and still have, a central role in public debates and political decisions about copyright laws. As far back as 1643, STATIONERS181 argued that “books are luxuries, the demand for which is elastic.”182 Books (except the sacred Bible) are not of such general use and necessity, as some staple commodities are, which feed and clothe us, nor are they so perishable, or require change in keeping, some of them being once bought, remain to children's children, and many of them are rarities only and useful only to a very few, and of no necessity to any, few men bestow more in Books than what they can spare out of their superfluities […] And therefore property in Books maintained among stationers cannot have the same effect, in order to the public, as it has in other commodities of more public use and necessity. 183 Before the 1876-8 ROYAL COMMISSION ON COPYRIGHT184, the English philosopher Herbert Spencer witnessed against the introduction of the “compulsory licence”185 system, where any person would be free to print any book after paying a fixed percentage of the selling price to the author.186 According to him, such a system would be “especially injurious to the particular class [of books] which of all others needs encouragement” 187, described by the chairman of the commission as “the graver class [of books] which do not appeal to the popular tastes”188. Contrary to STATIONERS' thesis, the demand for certain books, e.g. philosophical works, seems, according to their own authors, inelastic to a fall in price, which suggests significant economic differences across subject categories.189 181 See section III.2. 182 A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934, p. 177, bit.ly/1giWKb4. 183 Ibid. 184 See section III.2. 185 A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934, p. 183, bit.ly/1giWKb4. 186 Ibid., p. 188. 187 Ibid. 188 Ibid., pp. 188-189. 189 See section II.3. 46
  • For the titles composing Liebowitz Book Review Digest dataset,190 large differences across subject categories emerge also in market longevity.191 Tab. 1: Percentage of titles surviving more than 58 years Category All titles Best-sellers removed Academic 68% 68% Philosophical 52% 41% History 51% 43% Biography 49% 42% Religion 46% 40% Poetry 43% 40% Fiction 36% 40% Mystery 23% 16% Comedy 25% 0% Autobiography 19% 11% Art 17% 17% Travel 6% 6% Sports 0% 0% Source: S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, p. 456, tab. 2, bit.ly/GHOhQB. Bittlingmayer (1992) provides empirical estimates of the elasticity of the demand for books, 192 as part of a more general economic analysis of the 190 See section III.2. 191 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, p. 456, bit.ly/GHOhQB. 192 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 602, tab. 47
  • institutional framework of the publishing industry. Data about prices, sales quantities, retail margins, royalties, and inventories for over 1,000 titles for the period 1984-1986 were provided by a West German publishing house, specialized in the production of academic and intellectual books.193 In markets characterized by monopolistic competition, under the assumption of profit maximization, there is an inverse relationship between percentage margin (the difference between price, p, and marginal cost, c, as percentage of p) and demand elasticity ( ηp , in absolute value): (p−c) 1 194 =η . p p (3) The marginal costs associated with the sale of an extra copy of a book – taxes, royalties, and the retailer's margin – amount to only 40 to 60 percent of the retail price, which in turn implies a price elasticity in the range 1.7 to 2.5. 195 Resale price maintenance restricts retailing by allowing publishers to directly control both the retail price (p) and the wholesale price (the difference between the retail price, p, and the retail margin, v).196 In the period considered, resale price maintenance was legal in West Germany, and still is today in the Federal Republic of Germany, observed and enforced by the GERMAN BOOK TRADERS ASSOCIATION.197 3 and p. 603, tab. 4, bit.ly/17dxX0b. 193 Ibid., p. 589 and p. 597. 194 Ibid., p. 588. 195 Ibid., pp. 588-589. 196 See also section II.1 for information about the agency model. 197 German resale price maintenance act, Börsenverein des 07/14/2006, bit.ly/17bdJF6. 48 Deutschen Buchhandels,
  • So, under the assumption of positive yet marginally decreasing sales growth in retail selling effort,198 there are two first-order conditions for profit maximization: (p−v−c ) 1 =η p p 199 (4) (p−v−c ) 1 200 =η , v v (5) and where η v is the elasticity of demand with respect to the retail margin. (4) and (5) imply −η p+η v =−1− c .201 p−v −c (6) Ceteris paribus, the larger the price elasticity ( η p , in absolute value), the larger the retail margin elasticity ( η v ). For a cross section of books that are being optimally marketed and priced, those for which sales are most responsive to extra promotional efforts will also be those for which sales are most responsive to price changes. Novels have a comparatively large elasticity of demand because the quantity purchased can be influenced relatively easily with higher promotional effort. The demand for research monographs in entomology is price inelastic because the quantity sold is not sensitive to promotional efforts.202 Again, retail margin and, thus, price elasticity vary by type of book, and also by type of bookstore. 198 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 593, bit.ly/17dxX0b. 199 Ibid., eq. 5. 200 Ibid., p. 594, eq. 6. 201 Ibid., p. 594, eq. 7. 202 Ibid., p. 594. 49
  • The margin varies by type of book. For example, the margin on academic books is typically 25-30 percent, on literature 40 percent. School books, which are sold through bookstores, have margins of 20 percent. Margins also vary by the type of bookstore. A publisher will often grant bookstores that specialize in a particular topic a deeper discount on the corresponding titles. 203 In monopolistically competitive equilibrium, where free entry drives profits to zero, “books for which cost-covering prices allow large sales will have demand curves that are more elastic than books with more limited audiences.”204 203 Ibid., p. 591. 204 Ibid., p. 595. 50
  • Fig. 7: Zero-profit locus of price and output combinations K is the fixed cost of book production, m is the marginal cost of book production and distribution, and q is the quantity produced. Source: G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 596, fig. 1, bit.ly/17dxX0b. Analogously, “books that involve large marginal production or marketing expenses must have large sales.”205 It can be shown that: −η p* +η v* =−1 ,206 (7) where η p* is the equilibrium price elasticity (in absolute value) and η v* is the equilibrium retail margin elasticity. 205 Ibid. 206 Ibid. 51
  • For the econometric estimation of price elasticity and retail margin elasticity, the original dataset is divided into two cross-sectional datasets, composed of the “year-to-year percentage changes in prices and quantities”207 for the periods 1984-1985 and 1985-1986, respectively. When dealing with data from natural experiments, traditional “naïve”208 log-linear regression models cannot distinguish between shifts of the demand curve and movements along the demand curve. Suppose that the demand for a title shifts to the right, say, because this year is the centenary of the author's birth or the book has been made into a film. With un changed margin costs […], the optimal price would increase. If this anticipation of changed demand conditions is common, […] the estimated [price elasticity] […] could very well be positive. 209 The change in the inverse of (4) is an estimate of changes in price elase ticity ( Δ η pt ) and can be employed as a control for shifts of the demand curve. The result d η pt =dη vt from (7) yields the final econometric specification: e T Δ ln(qt )=a+b1⋅Δ ln(p t )+b2⋅Δ ln(v t )+b3⋅Δ η pt⋅(ln(p t )− ln(v t ))+∑ (b i Di)+ϵ ,210 (8) i=1 where the dummies Di , i=1, ... ,T , reflect the title vintage. The coefficients b 1 and b 2 are the estimators of η p and η v ; theoretically, the coefficient b3 should be equal to -1. Unfortunately, the model in (8) has a weak explanatory power. Most of the year-to-year variation in sales of books is attributable to influences not captured by price, margin or vintage. 211 207 Ibid., p. 598. 208 Ibid. 209 Ibid. 210 Ibid., p. 599, eq. 14. 211 Ibid., p. 604. 52
  • The empirical analysis suggests a price elasticity between -2 and -3 for this dataset of academic and intellectual books, 212 a figure slightly larger than what implied by (4) (about -1.7 to -2.5)213. The estimate of retail margin elasticity is about 1.5, positive and “roughly consistent with the theory”214. b 3 deviates from the predicted value of -1 and the estimated elasticities may be biased by measurement problems, especially for “poorly selling” titles (defined in the paper as the titles with sales lower than “the median number of books sold per title”215). Measurements problems include errors in the price and retail margin variables (e.g. “divergence of actual average sales price from the nominal price”216), and the failure to allocate “marginal printing costs” 217 and “title-specific promotional expenses of the publisher”218. Brynjolfsson et al. (2003) show that, under resale price maintenance219, the price elasticity of the aggregate demand for a given title in the retailing market equals the price elasticity of demand for the title faced by its publisher.220 Data from the AMERICAN ASSOCIATION OF PUBLISHERS and discussions of the researchers with various publishers indicate a gross margin of 56%-64% for “the typical obscure title”221.222 212 Ibid., p. 605. 213 See above in this section. 214 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 605, bit.ly/17dxX0b. 215 Ibid., p. 601. 216 Ibid., pp. 603-604. 217 Ibid., p. 605. 218 Ibid. 219 See above in this section. 220 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”, 49, 11, 2003, p. 1585, eq. 10, bit.ly/18I1YJp. 221 Ibid., p. 1586. 222 Ibid. 53
  • So, according to (4), for such titles, the price elasticity of the demand faced by the publisher is between -1.56 and -1.79, 223 which represents an estimate of the price elasticity of the aggregate demand in the retailing market, obtained “by taking advantage of the characteristics of the book industry structure and available industry statistics on gross margins” 224. The online book-selling business has been thriving for almost twenty years to date and anticipated the e-book market in many respects, such as the increased product availability.225 Studies about the electronic commerce of books could very well be relevant for the electronic book industry. At the beginning of the 2000s, online book sales made up about 10% of total book sales in the US;226 the combined market share of Amazon.com and BarnesandNobles.com, the two dominant online bookstores, was higher than 85% in terms of sales, with the former selling almost four times as much as the latter.227 Chevalier and Goolsbee (2003) use sales rank data228 to estimate both the own-price elasticity faced by the two merchants and their cross-price elasticity with respect to each other. Data about a sample of 20,000 books, constructed through stratified random sampling from three different sources (“to get books representative of different parts of the sales distribution” 229), were “scraped”230 from the 223 Ibid. 224 Ibid., p. 1585. 225 See section III.2 for a discussion of the long tail phenomenon. 226 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition and BarnesandNoble.com, “Quantitative Marketing and Economics”, bit.ly/1b24xGp. 227 Ibid. 228 See section III.2. 229 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition and BarnesandNoble.com, “Quantitative Marketing and Economics”, bit.ly/1b24xGp. 230 See Web scraping, Wikipedia, accessed on 09/07/2013, bit.ly/1e1Oo8h. 54 online: Amazon.com 1, 2, 2003, p. 205, online: Amazon.com 1, 2, 2003, p. 206,
  • websites of Amazon.com and BarnesandNobles.com during April, June and August 2001. The period was characterized by major pricing experiments by the two e-tailers, across broad categories of titles.231 BarnesandNobles.com data are censored for sales rankings greater than about 630,000 whereas Amazon.com provides complete sales rank data.232 Chevalier and Goolsbee (2003) use “the trimmed least absolute deviation deviations (LAD) panel estimator of Honore (1992)” 233 for the censored dataset and OLS for the complete one.234 BarnesandNoble.com own-price elasticity is around -3.5 vs. only about -0.45 for Amazon.com.235 Cross-price effect seems to be relevantly positive only for BarnesandNoble.com,236 but a robustness check of the trimmed LAD estimator, performed by dropping the observations missing sales rank and employing OLS in place of the trimmed LAM, 237 shows a much lower degree of shifting from Amazon.com to BarnesandNoble.com.238 Interestingly, Amazon.com, the incumbent, prices in the inelastic portion of the demand curve, in contrast with the theory of static imperfectly competitive markets.239 231 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2, 2003, p. 206, bit.ly/1b24xGp. 232 Ibid. 233 Ibid., p. 215. 234 Ibid. 235 Ibid., p. 217. 236 Ibid., p. 218. 237 Ibid., p. 219. 238 Ibid. 239 Ibid., pp. 217-218. 55
  • However, “a firm maximizing dynamic profits might choose a price below […] [the] static profit-maximizing level,” 240 which is of some interest for the fast-growing e-book market. Prices below the single-period profit-maximizing level would be attractive in a growing market with consumer switching costs, for example. 241 Ghose and Gu (2006) use data gathered from Amazon.com and BarnesandNobles.com to study the significance of search costs in online markets, 242 which could create kinks in the demand curve. “The demand elasticity for price increases is different from the demand elasticity for price increases”243 according to the magnitude of search costs. When search costs are high, if a retailer increases prices, its own customers will notice it and have an incentive to look for better bargains at competing retailers. If the retailer decreases prices, instead, potential new customers will not be aware of it. As a consequence, demand elasticity (in absolute value) for price increases is higher than for price decreases. Vice versa, when search costs are low, if a retailer increases prices, it will affect only its current customers. If the retailer decreases prices, instead, it will attract potential new customers. As a consequence, demand elasticity (in absolute value) for price increases is lower than for price decreases. 240 Ibid., p. 218. 241 Ibid. 242 See section III.2. 243 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, p. 7, bit.ly/1aeZtxj. 56
  • Log-linear models using sales rank data, scraped from the websites of online retailers, are common in the stream of literature under review. 244 In order to capture the difference in demand elasticity between price increases and price decreases in such regression models, Ghose and Gu (2006) propose a decomposition of the price explanatory variable. β1 log( Pit )+ ∑ j=1,2,3,4 β2j (log( Pit )−log( Pit ))× PriceDecreaseit×Week ijt ,245 (9) where: • Pit is the retailer price of product i at time t, • Pit is “the price before the most recent price change”246, • PriceDecreaseit is a dummy that “takes the value of 1 if the most recent action on product i is a price decrease”247, • and Weekijt are four ( j=1, 2, 3, 4 ) weekly dummies that represent the number of weeks after the most recent price decrease and quantify the “time for information on price decreases to spread in the market”248. Therefore, “ β1 represents demand elasticity for price increases,” 249 while “ β2 denotes the difference between demand elasticity for price decreases and that for price increases.”250 244 See section III.2, and also above and below in this section. 245 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, p. 14, eq. 4, bit.ly/1aeZtxj. 246 Ibid. 247 Ibid., p. 13. 248 Ibid., p. 14. 249 Ibid. 250 Ibid. 57
  • The relative price elasticities of the two online book retailers with respect to each other are similar: between -1.49 and -1.89 for Amazon.com, and between -1.53 and -1.60 for BarnesandNobles.com.251 However, Amazon.com relative price elasticity is higher for price decreases than for price increases and gradually increases over time after a price decrease, whereas BarnesandNobles.com relative price elasticity is lower for price decreases than for price increases, with little information diffusion over time.252 There are a number of possible explanations why the two retailers face different search costs: specific consumer search preferences, targeting of different consumer segments, different implementations of active and passive search tools253, etc. Hu and Smith (2011) analyze sales data, directly provided by a publisher, of both print books and e-books.254 The price elasticity estimates obtained by running an OLS linear regression on the whole dataset are around -3 for both print books and e-books.255 Quantile linear regression256, used in place of OLS on the same dataset, unveils significant differences between best-selling titles (defined in the paper as top 20% books ranked by sales) 257 and non-best-selling titles (defined in the paper as bottom 80% books ranked by sales) 258. 251 Ibid., p. 17. 252 Ibid., p. 25. 253 Ibid., pp. 26-27. 254 See section III.1. 255 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural experiment, 08/29/2011, p. 9, available at SSRN: bit.ly/Y2G650. 256 See section III.2. 257 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural experiment, 08/29/2011, p. 19, available at SSRN: bit.ly/Y2G650. 258 Ibid. 58
  • Price elasticity for print books in the 20th and the 80th percentile is about -1.6 and -1.3, respectively, vs. -3 and -1.9 for e-books in the 20 th and 80th percentile.259 Smith et al. (2012) study the potential market for the digital version of out-of-print titles260 through sales rank data, scraped from Amazon.com and the Kindle marketplace.261 First, a probit regression model is applied on random samples of out-ofprint titles already available on the Kindle marketplace and of out-of-print titles not yet available on the Kindle marketplace, to calculate the probability of an out-of-print title being digitized.262 The explanatory variables include: • the price of the print version, • the year of publication of the first print version, • the number of pages of the print version, • the subject category, • the type of audience, • the number of Bing search results for the ISBN263, • the sales rank of the print version on Amazon.com, • the binding format of the print version, • and a dummy set to one for “large publishers”. 264 259 Ibid., p. 24, tab. 10. 260 See section III.2. 261 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, pp. 6-7, available at SSRN: bit.ly/WcNhJq. 262 Ibid., pp. 8-9. 263 See International Standard Book Number, Wikipedia, accessed on 09/07/2013, bit.ly/16kdpGT. 264 Ibid., pp. 7-8. 59
  • Then, with the propensity scores thus obtained, the two samples are matched through the nearest neighbor and the stratification method.265 In order to obtain more credible confidence intervals for propensity scores, the probit regression model is re-estimated with the Bayesian approach outlined in Chen and Kaplan (2011)266.267 As before, the resulting propensity scores are used to match the two samples, based, again, on the nearest neighbor and the stratification method.268 Finally, sales and price of the out-of-print titles not yet available on Kindle marketplace are inferred from sales and price of the matched out-of-print titles already available on Kindle marketplace.269 […] bringing the world's 2.7 million out-of-print titles back into print as eBooks could create $740 million in revenue in the first year after publication […] 270 Cost assumptions based on current Kindle sales contracts271 suggest that as much as $460 million would accrue to publishers and authors. 272 265 Ibid., pp. 14-16. 266 C. J. S. CHEN, D. KAPLAN, Bayesian propensity score analysis: simulation and case study, Society for Research on Educational Effectiveness, 2011, bit.ly/16N541w. 267 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, pp. 16-17, available at SSRN: bit.ly/WcNhJq. 268 Ibid., pp. 17-18. 269 Ibid., p. 5. 270 Ibid., p. 25. 271 Ibid., pp. 21-22. 272 Ibid., p. 25. 60
  • IV. E-BOOK PRICING BY MAJOR ITALIAN PUBLISHERS Price information and, more in general, public catalog data for digital books published in Italy can be obtained, with relatively little effort, by webscraping mainstream Italian e-book stores (in our case, Ultima Books). Provided that a certain e-book has a corresponding print edition, if the respective publisher has entered the ISBN of the print edition as part of the e-book metadata, it is possible to query print book catalog databases (in our case, Informazioni Editoriali) for information. In the regression of the e-book price on the price of the print edition and other explanatory variables, we restrict our attention to titles of publishing houses distributed by Edigita, the leading e-book distributor in Italy.273 Edigita distributes most of the major Italian publishers 274 and provides well-formed and well-compiled metadata. Incidentally, the purposive sample reduction with respect to the whole market was also technically convenient, since it allowed for effective and timely querying of data sources. IV.1. CATALOG DATASET Cross-sectional catalog data for 14,794 e-books distributed by Edigita, and available for sale on the Ultima Books e-book store on the 28th of November 2012, were collected within a five-day time window (11/28/201212/02/2012). We arranged information across few variables of interest: • ebook.price, the retail price of the e-book (in euro, VAT included); 273 See section II.3. 274 See section II.3 for information about market concentration in the Italian publishing industry. 61
  • • drm, a dummy set to one if the e-book is encrypted with DRM technologies275, to zero otherwise; • watermark, a dummy set to one if the e-book embeds information about the purchase276, to zero otherwise; • paper.price, the retail price of the print edition (in euro, VAT included); • ebook.pub.delay, the distance in years between the publication of the e-book and of the print edition (either “positive” for print-first titles or “negative” for digital-first titles);277 • subj.fiction, a dummy set to one or zero whether the title subject is fiction or non-fiction, respectively.278 Unfortunately, only 6,053 observations out of a total of 14,794 are com- plete of information about the print edition. Since experiments in digital-only publishing279 by major publishers have been very limited, if not null, 280 missingness should be attributed to incomplete reporting during metadata compilation. Paragraph IV.1.2 describes the treatment method adopted to deal with the issue of missing values in X. IV.1.1. Descriptive statistics • Almost every e-book in the sample (96.88%) embeds DRM technologies. 275 See section II.1. 276 Digital watermarking is a “social” deterrent against piracy and represents a loose alternative to DRM technologies. 277 All the e-books in the sample were published between 2010 and 2012, so, in practice, the publication delay reflects the print edition vintage. 278 See section II.3. 279 For example, see Cos'è quintadicopertina, quintadicopertina, accessed on 09/07/2013, bit.ly/1eeukj9. 280 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management. 62
  • • Only few e-books in the sample (2.76%) employ digital watermarking. • Still fewer e-books in the sample do not employ either DRM encryption or digital watermarking (0.36%). • Non-fiction e-books are slightly more represented in the sample than fiction e-books (56.25% vs. 43.75%). Fig. 8: Pie chart of e-book protection mechanisms Red: DRM encryption (Adobe Content Server 4), 96.88%; Yellow: digital watermarking, 2.76%; Green: open file, 0.36%. Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 63
  • Fig. 9: Pie chart of e-book subjects Blue: non-fiction books, 56.25%; Light blue: fiction books, 43.75%. Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Tab. 2 reports summary statistics for the quantitative variables in the dataset. 64
  • Tab. 2: Summary statistics for the quantitative catalog variables Statistic ebook.price paper.price ebook.pub.delay 10.893 15.984 2.045 Median 8.990 14.000 0.000 Minimum 0.000 3.900 -2.000 Maximum 109.990 150.000 36.000 Std. Dev. 7.682 9.899 3.610 C.V. 0.705 0.619 1.765 Skewness 3.470 3.314 2.939 23.688 21.547 12.924 3.990 7.000 0.000 24.990 35.000 10.000 5.010 9.000 3.000 Mean Ex. kurtosis 5% Perc. 95% Perc. IQ range Source: own elaboration of public catalog data for a sample of ebooks by Italian publishers. • All three variables are right-skewed, as evident from Fig. 10, Fig. 11, and Fig. 12. • The median price, more robust to pricey outliers than the mean price, is €8.99 for e-books and €14.00 for print books. • Very few titles have been published first in e-book format and only later in print edition. 65
  • Fig. 10: Histogram of ebook.price Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 66
  • Fig. 11: Histogram of paper.price Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 67
  • Fig. 12: Histogram of ebook.pub.delay Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. A quick survey of the correlation matrix in Tab. 3 in anticipates both problems and results of successive regression analysis. 68
  • Tab. 3: Correlation matrix of the catalog variables ebook.price drm watermark ebook.price 1.000 drm 0.059 1.000 watermark -0.037 -0.938 1.000 paper.price 0.927 0.019 -0.006 ebook.pub.delay -0.125 -0.054 -0.021 subj.fiction -0.359 0.061 -0.083 paper.price paper.price ebook.pub.delay subj.fiction 1.000 ebook.pub.delay -0.131 1.000 subj.fiction -0.356 0.097 1.000 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. The very high (negative) correlation between the drm and the watermark dummies foreshadows quasi-collinearity among the two variables and the constant term; consequent adjustments to the model are described in section IV.2. The very high and positive correlation between the e-book price and the print edition price is very clear in the scatter plot in Fig. 13. The e-book price seems decreasing in the publication delay, but the effect is probably dependent on the paper price because, in our sample, the publication delay closely reflects the vintage of the print edition. 281 281 See note 277. 69
  • Fig. 13: Scatter plot of the quantitative catalog variables Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. IV.1.2. Missing data The treatment method applied to missing values is listwise deletion, one of the simplest and most conventional approaches. […] conventional statistical methods and software presume that all variables in a specified model are measured for all cases. The default method for virtually all statistical software is simply to delete cases with any missing data on the variables of interest, a method known as listwise deletion or complete case analysis. 282 The obvious drawback of listwise deletion is that it often discards a large fraction of information, with the resulting loss of statistical power. 282 P D. ALLISON, Missing data in A. MAYDEU-OLIVARES, R. E. MILLSAP, The SAGE handbook . of quantitative methods in psychology, SAGE Publications, 2009, p. 72, amzn.to/15UAtJe. 70
  • Even though listwise deletion is not competitive with well-devised and well-executed advanced treatment methods, such as maximum likelihood and multiple imputation, it is “honest”283 compared to other conventional methods, thanks to its fairly good bias minimizing properties and standard error estimates.284 In particular, when the missing values are restricted to X, so long as missingness on the predictors does not depend on the dependent variable, listwise deletion will yield approximately unbiased estimates of regression coefficients […] for virtually any kind of regression.285 In our sample, 8,741 observations out of a total of 14,794 (59.08%) have one or more missing values in the explanatory variables. In order to test whether missingness depends on the value of the response variable (ebook.price), we estimate, through logistic linear regression, the probability of the presence of missing values in X conditional on ebook.price. We define the missing.data dummy, set to one for the 8,741 observations with missing values in one or more explanatory variables, and to zero for the remaining 6,053 complete observations, and regress it on a constant term and ebook.price. Tab. 4: Table of coefficients (LOGIT) Coefficient Intercept ebook.price Estimate Std. error z value 0.3676 0.0167 21.979 -1.968E-09 4.999E-09 -0.3940 p-value < 2E-16 *** 0.6940 McFadden R-squared: 7.6970E-06 Source: own elaboration of public catalog data for a sample of ebooks by Italian publishers. 283 Ibid., p. 76. 284 Ibid., p 75. 285 Ibid. 71
  • The ebook.price coefficient can be interpreted as the change in the log odds ratio of missingness associated with a €1 increase in the e-book price. Its value is low and not significantly different from zero, so, according to the model, the hypothesis that the response variable does not affect the probability of missingness in X cannot be rejected. We can assert, with a certain degree of confidence, that eventual biased estimates in subsequent analysis should not be ascribed to listwise deletion of observations with missing values in explanatory variables. IV.2. OLS LINEAR REGRESSION RESULTS We now model, through OLS linear regression, the relationship between the e-book price (ebook.price) and a set of explanatory variables derived from the information collected in the catalog dataset described in section IV.1. The explanatory variables include: • the price of the print edition (paper.price); • the file.is.open dummy, computed as 1−drm , equal to one if the ebook is not encrypted with DRM technologies (that is, either the file contains a digital watermark or no protection at all) 286, to zero otherwise; • the subj.fiction dummy, set to one for fiction books, to zero for non-fiction books; • a set of functional forms of the publication delay in years between the e-book version and the print edition (ebook.pub.delay.pos, ebook.pub.delay.pos.sq, ebook.pub.delay.neg, and ebook.pub.delay.neg.sq). 286 See paragraph IV.1.1. 72
  • ebook.price= β0 +β 1 paper.price+β2 file.is.open+β3 subj.fiction+ β4 ebook.pub.delay.pos+β5 ebook.pub.delay.pos.sq+ β6 ebook.pub.delay.neg+β7 ebook.pub.delay.neg.sq (10) From the descriptive analysis of the dataset, 287 we expect paper.price to explain most of the ebook.price variability. The model estimates the intercept for DRM-encrypted e-books, the vast majority of titles in the sample, 288 and a constant “no-DRM effect” through the file.is.open dummy. The subj.fiction dummy captures a constant effect of the subject on the e-book price, net of the effects of other variables; in particular, the coefficient estimate is net of the effect of paper.price, probably very large and significant. The subj.fiction coefficient could be tentatively interpreted as an indicator of differential price elasticity at subject level between the e-book market and the print market. Consistently with the long tail hypothesis,289 in the purely-digital e-book market, we should expect some differential extra-elasticity for niche subjects (non-fiction), which implies a positive sign for the subj.fiction coefficient. As for the time delay between the publication of the e-book and of the print edition, we differentiate between print-first titles (“positive” delay) and digital-first titles (“negative” delay, extremely rare in the sample)290: • ebook.pub.delay.pos is the distance in years between the publication of the e-book and of the print edition, if the title is a print-first book, zero otherwise; 287 See 288 See 289 See 290 See paragraph IV.1.1. paragraph IV.1.1. section III.2. paragraph IV.1.1. 73
  • • ebook.pub.delay.neg is the distance in years between the publication of the print edition and of the e-book, if the title is a digital-first book, zero otherwise. Because of “backlog demand” piling up during the period of non-avail- ability in e-book format, not-so-old books never published in e-book format before could fetch, ceteris paribus, higher prices than books published simultaneously in e-book format and print edition. The effect is likely to diminish and turn negative, as the time distance, measured in years of publication delay, grows. For example, out-of-copyright titles, with decades-old print editions still on sale, have to compete with free e-book editions, available on dedicated websites and major e-book stores. Other possible reasons for a reversal of the effect include relatively more recency-oriented tastes of the digital public and price stickiness in the print book catalogs, due to menu costs, brand image considerations, and management of the book value of physical remainders. In order to capture such non-linear effects, we add the squares of both time delay variables to the model: • ebook.pub.delay.pos.sq is the square of the distance in years between the publication of the e-book and of the print edition, if the title is a print-first book, zero otherwise; • ebook.pub.delay.neg.sq is the square of the distance in years between the publication of the print edition and of the e-book, if the title is a digital-first book, zero otherwise. Accordingly, the sign of the “positive” variables coefficients should be positive for ebook.pub.delay.pos and negative for ebook.pub.delay.pos.sq. 74
  • The ebook.pub.delay.neg and ebook.pub.delay.neg.sq coefficients, instead, should not be significantly different from zero. First, we report estimates of coefficients, descriptive statistics, and graphs for the OLS linear regression model, all preliminary to the setup of the inferential framework. Tab. 5: Table of coefficients – preliminary (OLS) Coefficient Estimate -0.2020 Intercept paper.price 0.7090 file.is.open -1.8993 subj.fiction -0.5417 0.0571 ebook.pub.delay.pos ebook.pub.delay.pos.sq ebook.pub.delay.neg ebook.pub.delay.neg.sq -0.0037 1.9596 -0.7477 SSR: 49280.82 R-squared: 0.8620; Adjuster R-squared: 0.8618 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Normal quantile-quantile plot in Fig. 14 shows that the distribution of residuals is symmetric, but with fatter tails than the normal distribution. 75
  • Fig. 14: Residuals normal Q-Q plot (OLS) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Asymptotic properties of the OLS estimator do not assume normality of residuals, but require homoskedasticity of error terms. The plot of residuals vs. fitted values in Fig. 15 shows greater variation in the residuals for higher fitted values and, thus, casts serious doubts on the assumption. 76
  • Fig. 15: Plot of residuals vs. fitted values (OLS) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. The White test, a general test for heteroskedasticity that follows a chisquared distribution,291 rejects the null hypothesis of homoskedasticity. Tab. 6: White test (OLS) Test statistic Degrees of freedom 1435.2380 p-value 23 < 2.2E-16 *** Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 291 M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, p. 92, amzn.to/17bknLN. 77
  • The estimation of standard errors and tests in Tab. 7 rely on a heteroskedasticity-consistent covariance matrix (HC1 variant). Tab. 7: Table of coefficients – final (OLS) Coefficient Estimate Std. error t value p-value -0.2020 0.2643 -0.7644 0.4447 paper.price 0.7090 0.0149 47.5134 0.0000 *** file.is.open -1.8993 0.1764 -10.7638 8.906E-27 *** subj.fiction -0.5417 0.0954 0.0571 0.0189 3.0189 0.0025 *** -0.0037 0.0011 -3.3564 0.0008 *** 1.9596 1.2353 1.5863 0.1127 -0.7477 0.6473 -1.1551 0.2480 Intercept ebook.pub.delay.pos ebook.pub.delay.pos.sq ebook.pub.delay.neg ebook.pub.delay.neg.sq -5.6780 1.426E-08 *** SSR: 49280.82 R-squared: 0.8620; Adjusted R-squared: 0.8618 AIC: 29888.7; BIC: 29949.07 F(7, 6053): 996.42, p-value: < 2.2E-16 *** Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Overall, the model provides a good fit of the data, with a small constant term, not significantly different from zero. As suggested by the descriptive analysis, the paper.price coefficient is very high (0.7090) and significant, with a very small standard error (0.0149). By neglecting the intercept and setting all the explanatory variables to zero, except for the paper price, we could say that DRM-encrypted non-fiction e-books, published simultaneously in e-book format and print edtion, have an approximate 30% average discount rate off print books. E-book pricing based on strict analogy with the print book market implies no anticipation of significant non-neutral effects of digitization, like those experienced by the newspaper industry.292 292 See section III.1. 78
  • As expected, the coefficients of the “negative” time delay variables (ebook.pub.delay.neg and ebook.pub.delay.neg.sq) are not significantly different from zero, with very large standard errors (1.2353 and 0.6473, respectively). The coefficients of the “positive” time delay variables (ebook.pub.delay.pos and ebook.pub.delay.pos.sq), instead, are significant and their signs (positive and negative, respectively) are consistent with our hypothesis. Ceteris paribus, the marginal effect on the e-book price of a marginal increase of the (“positive”) delay in years between the publication of the ebook and of the print edition is given by: 0.0571−0.0037×2×ebook.pub.delay.pos . (11) Titles whose print edition has been published approximately eight years before the publication of the e-book get the maximum “seniority price premium”, approximately equal to €0.22. The “seniority effect” on the e-book price turns negative, at an increasing rate, after 16 years from the publication of the print edition, which one could take as a rough estimate of the average useful life of a title (“longevity”). Despite the high goodness of fit, the explanatory power of the model is satisfying only for DRM-encrypted non-fiction e-books, the majority of titles in the sample, but restricts the effect of the subject and of the protection mechanism to residual constant effects. The subject.fiction coefficient is negative and significant: ceteris paribus, on average a fiction book costs approximately €0.55 less than a nonfiction book. The result is opposite to the hypothesis of differential extra-elasticity for non-fiction books in the e-book market with respect to the print book mar 79
  • ket, but it may be also due to differences in market longevity or protection mechanisms between fiction and non-fiction e-books. The coefficient of file.is.open is high and significant: ceteris paribus, on average an e-book protected with digital watermarking, or not protected at all, “inexplicably” costs approximately €1.90 less than a DRM-encrypted ebook. More in general, the inferential power of the model, and thus its interpretation as a comprehensive pricing formula, is questionable. The rejection of the null hypothesis for both the Breusch-Godfrey test 293 and the Ramsey RESET test294 suggests the presence of bias due to additional non-linear effects and/or omitted explanatory variables. Tab. 8: Breusch-Godfrey test (OLS) Test statistic Degrees of freedom 837.8528 p-value 2 < 2.2E-16 *** Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Tab. 9: Ramsey RESET test (OLS) Test statistic Degrees of freedom 21.0448 p-value 2, 6043 7.799E-10 *** Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Nonetheless, the model retains the descriptive power of the OLS estimator, which is reasonably interesting in itself. 293 A test for autocorrelation that follows a chi-squared distribution. For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, pp. 101-102, amzn.to/17bknLN. 294 A test for functional misspecification that follows an F distribution. For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, p. 63, amzn.to/17bknLN. 80
  • As explained in paragraph VI.1.1, a multiplicative model, obtained through logarithmic transformation of the response variable, and in case of the non-dummy explanatory variables, should reduce heteroskedasticity and functional misspecification problems, but an additive raw-scale model is preferable for analysis in absolute terms, because of non-linearity of the logarithmic transformation. IV.3. QUANTILE LINEAR REGRESSION RESULTS The interpretation of the previous OLS linear regression model295 as a data generating process requires a “one-model assumption”296, viz. the model being supposedly appropriate for all data. However, even if e-book pricing were based on strict analogy with print edition prices, it would probably rely more on a heuristic pricing scheme than on an exact pricing mechanism. After all, books are complex products whose market is part of a historically complex institutional framework, hardly homogeneous commodities either sold in perfectly competitive markets or subject to regulated public tariffs. Thus, departures from a common basic pricing formula are very likely to occur, both at individual level and because of general marketing practices. Price points are a common psychological pricing strategy: few price groups are created to ease economic cognitive processes and/or to anchor perceptual judgments of consumers. According to some researchers,297 buyers do not judge prices only by the associated use value of the product: the position in the price distribution 295 See section IV.2. 296 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, p. 25, amzn.to/1aeZCAO. 297 K. B. MONROE, Buyers' subjective perceptions of price, “Journal of Marketing Research”, 10, 1, 1973, pp. 76-77, bit.ly/17MoGQW. 81
  • itself can affect the perceived value of a product. A counter-intuitive pricing strategy is the creation of contrast effects through high dispersion from the anchoring price range (“standard price”298). Value is not a physical stimulus, although it is an important attribute, and efficient discrimination between stimuli in terms of differences in value usually is more important than discrimination in terms of price. Accentuation of price differ ences may lead to a greater accentuation of perceived value differences […] 299 OLS linear regression describes the behavior of the location of the response variable distribution conditional on a set of explanatory variables, by using the mean as central tendency measure. In addition, it can be adapted to model the scale of the conditional distribution; for example, estimates of standard errors, obtained from additive or multiplicative functions of a set of explanatory variables, may be used to weight the original regressors. However, if the response distribution is skewed, the median may be a more informative central tendency measure; furthermore, shape shifts involving the skewness of the distribution can be difficult to capture with the standard deviation as scale measure. So, in the case at hand, quantile linear regression might provide relevant insights, additional and complementary to the results of OLS linear regression.300 Quantile linear regression models estimate conditional quantiles as a linear function of the conditioning variables, by minimizing a weighted sum of absolute errors.301 298 Ibid., p. 76. 299 Ibid., p. 77. 300 See section III.2. 301 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 29-38, amzn.to/1aeZCAO. 82
  • The weights of the loss function change according to the quantile under investigation; in the special case of conditional-median regression, the loss function corresponds to the sum of absolute errors.302 Under the assumption of i.i.d. errors,303 the asymptotic properties of quantile linear regression can be invoked to estimate consistent standard errors and confidence intervals for coefficients.304 Since “the often-observed skewness and outliers make the error distribution depart from i.i.d.,” empirical distibutions, obtained through bootstrap resampling, are usually employed for quantile linear regression inference. 305 We now estimate the linear model outlined in section IV.2 for the 0.05th, 0.25th, 0.50th, 0.75th, and 0.95th quantiles of ebook.price. The choice of the quantiles is quite conventional and arbitrary; ideally, each should be a proxy for “very cheap”, “cheap”, “standard”, “expensive”, and “very expensive” price points, respectively. For each conditional quantile, we take the 0.025th and 0.975th quantiles of the bootstrap distribution (500 replicates) as the endpoints of the empirical 95% confidence intervals for coefficients.306 Tab. 10, Tab. 11, Tab. 12, Tab. 13, and Tab. 14 report coefficient estimates and confidence intervals for the 0.05th, 0.25th, 0.50th, 0.75th, and 0.95th quantiles, respectively. 302 Ibid. 303 Under mild regularity conditions, asymptotic properties can be invoked also in the non -i.i.d. case. The case is more complex because errors no longer have a common distribution, so the asymptotic covariance matrix needs to be weighted accordingly. For more information, see L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 45-46, amzn.to/1aeZCAO. 304 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 45-46, amzn.to/1aeZCAO. 305 Ibid., pp. 49-51. 306 Due to its empirical nature, bootstrap resampling does not reproduce the same results for each different estimation. 83
  • Tab. 10: Table of coefficients – 0.05 th quantile (QUANTREG.05) Coefficient Estimate 95% C.I. Intercept 1.1980 [-0.1082, 2.1393] paper.price 0.2917 [0.2222, 0.4001] file.is.open -2.4069 [-3.5989, -2.0673] subj.fiction -1.0326 [-1.4921, -0.5036] ebook.pub.delay.pos 0.2218 ebook.pub.delay.pos.sq [0.1175, 0.3163] -0.0091 [-0.0158, -0.0042] ebook.pub.delay.neg 1.1691 [-1.2756, 4.3973] ebook.pub.delay.neg.sq 0.5222 [-1.0588, 2.5067] Quantile: 0.05 Pseudo-R-squared: 0.1703 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Tab. 11: Table of coefficients – 0.25 th quantile (QUANTREG.25) Coefficient Estimate Intercept 95% C.I. 0.1303 0.6361 paper.price [-0.1969, 0.4703] [0.6173, 0.6566] file.is.open -3.0440 [-3.4614, -2.2584] subj.fiction -0.3180 [-0.5598, -0.1199] ebook.pub.delay.pos 0.0538 ebook.pub.delay.pos.sq [0.0067, 0.0927] -0.0051 [-0.0073, -0.0021] ebook.pub.delay.neg 0.0506 [-4.4196, 3.3521] ebook.pub.delay.neg.sq 0.5115 [-1.5181, 2.2665] Quantile: 0.25 Pseudo-R-squared: 0.5135 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 84
  • Tab. 12: Table of coefficients – 0.50 th quantile (QUANTREG.50) Coefficient Estimate 95% C.I. Intercept 0.1622 [0.0830, 0.2656] paper.price 0.6940 [0.6875, 0.6985] file.is.open -0.8288 [-1.3306, -0.6176] subj.fiction -0.0430 [-0.1028, 0.0000] 0.0046 [-0.0169, 0.0167] -0.0015 [-0.0025, 0.0008] 1.4360 [-0.5678, 2.9088] -0.4787 [-1.5995, 0.5245] ebook.pub.delay.pos ebook.pub.delay.pos.sq ebook.pub.delay.neg ebook.pub.delay.neg.sq Quantile: 0.50 Pseudo-R-squared: 0.6683 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. Tab. 13: Table of coefficients – 0.75 th quantile (QUANTREG.75) Coefficient Estimate Intercept 95% C.I. -0.0671 0.7714 paper.price [-0.3995, 0.1845] [0.7497, 0.7948] file.is.open -1.0719 [-1.2227, -0.9119] subj.fiction -0.2857 [-0.3558, -0.1539] ebook.pub.delay.pos 0.0365 ebook.pub.delay.pos.sq [0.0127, 0.0812] -0.0017 [-0.0043, -0.0001] ebook.pub.delay.neg 1.4778 -0.5436 ebook.pub.delay.neg.sq [-0.0309, 4.2146] [-2.1741, 0.1595] Quantile: 0.75 Pseudo-R-squared: 0.7435 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 85
  • Tab. 14: Table of coefficients – 0.95 th quantile (QUANTREG.95) Coefficient Estimate Intercept 95% C.I. -0.5752 [-0.8409, -0.3247] paper.price 0.9317 [0.9231, 0.9436] file.is.open -0.8633 [-1.0679, -0.6454] subj.fiction -0.6149 [-0.8057, -0.4393] ebook.pub.delay.pos 0.0483 ebook.pub.delay.pos.sq [0.0363, 0.0857] -0.0009 [-0.0033, -0.0006] ebook.pub.delay.neg 0.4422 -0.4174 ebook.pub.delay.neg.sq [-1.4206, 1.7596] [-1.7231, 0.5123] Quantile: 0.95 Pseudo-R-squared: 0.8380 Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. The interpretation of quantile linear regression coefficients is analogous to OLS linear regression, but refers to the marginal effect on the specific conditional quantile, instead of the marginal effect on the conditional mean. Therefore, the same caveats in section IV.2 about the subj.fiction and file.is.open dummies apply to the results of the quantile linear regression model. Given the large amount of information conveyed by quantile linear regression, graphical views of coefficient estimates and confidence intervals across quantiles are usually more intelligible than tables of coefficients. 86
  • Fig. 16: Graphical view of const (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 87
  • Fig. 17: Graphical view of paper.price (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 88
  • Fig. 18: Graphical view of file.is.open (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 89
  • Fig. 19: Graphical view of subj.fiction (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 90
  • Fig. 20: Graphical view of ebook.pub.delay.pos (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 91
  • Fig. 21: Graphical view of ebook.pub.delay.pos.sq (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 92
  • Fig. 22: Graphical view of ebook.pub.delay.neg (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. 93
  • Fig. 23: Graphical view of ebook.pub.delay.neg.sq (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. For all the quantiles, the signs of the coefficients are roughly in line with the results of OLS linear regression, and the ebook.delay.neg and ebook.delay.neg.sq coefficients have very wide confidence intervals including zero. In general, coefficient confidence intervals are much wider for the lower, and especially the lowest, quantiles (0.05th and 0.25th). For “very cheap” e-books (0.05th quantile), the paper.price coefficient is low and dispersed, ranging from 0.2222 to 0.4001. 94
  • The confidence interval becomes narrower for higher quantiles and the estimate increases almost linearly from 0.6361 at the 0.25th quantile to 0.7714 at the 0.75th quantile. The conditional-median estimate of the paper.price coefficient is 0.6940, quite close to the conditional-mean estimate, while at the 0.95th quantile (“very expensive” e-books) the coefficient jumps to 0.9317. The “positive” delay variables (ebook.pub.delay.pos and ebook.pub.delay.pos.sq) are not significantly different from zero for the median, but are significant and have the expected signs for the other quantiles. For “very cheap” e-books (0.05th conditional quantile), in particular, the ebook.pub.delay.pos and ebook.pub.delay.pos.sq coefficients are high (in absolute value) and dispersed. In effect, it is very likely that out-of-copyright titles lie in this very lower tail of the distribution. The subj.fiction coefficient is not significantly different from zero for the median, but is significant for the other quantiles. As we move away from the median towards the extreme quantiles, its value decreases, especially at the lowest conditional quantile (0.05th), where it ranges from -1.50 to -0.50, approximately. The file.is.open coefficient is low and dispersed (from -2.00 to -3.50, approximately) for the 0.05th and 0.25th quantiles (“very cheap” and “cheap” ebooks). It becomes more stable around -1.00 from the 0.50th to the 0.95th quantiles. The intercept is quite close to zero for all the quantiles but for the lowest, where the estimate of the constant term coefficient is 1.1980, even though ranging from less than zero to more than 2.00, approximately. 95
  • The pseudo-R-squared307 across conditional quantiles follows a pattern similar to the trend in the coefficient of paper.price, the dominant explanatory variable. Fig. 24: Graphical view of the pseudo-R-squared (QUANTREG) Source: own elaboration of public catalog data for a sample of e-books by Italian publishers. The conditional-median linear regression results are similar to the OLS linear regression results, but the performance of the model varies along the distribution of the response variable. 307 The quantile linear regression pseudo-R-squared compares the performance of the full model vs. the intercept-only model. It is analogous to the OLS linear regression R-squared, but uses the weighted sum of absolute errors in place of the error sum of squares. For more information, see L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 51-54, amzn.to/1aeZCAO. 96
  • The explanatory power is very weak at the lowest quantile (0.05th), and improves considerably as we move towards higher quantiles. The pricing of “very expensive” e-books (0.95th quantile) closely matches the print book market, while there seems to be more price experimentation among “very cheap” and “cheap” e-books (0.05th and 0.25th quantiles). 97
  • V. A “LONG-TAIL-ORIENTED” DISTRIBUTOR SALES The following analysis of e-book sales distribution and price sensitivity is based on private data from the Stealth distribution platform, courtesy of SIMPLICISSIMUS BOOK FARM.308 Stealth is relatively more focused on small and medium publishers, which in Italy represent a very large numeric group, than Edigita.309 Moreover, at the end of 2010, SIMPLICISSIMUS BOOK FARM launched a self-publishing platform, Narcissus, that leverages Stealth technological and contractual assets to distribute self-published e-books through the same channels of “professional” (non-self-published) e-books. Such an accidental sampling from the whole market is due to the fact that we were not in a position to obtain sales data from other e-book distributors. However, it can also be interpreted as purposive sampling if we are interested in testing assumptions and claims about “long tail” tendencies in digital markets for information goods. V.1. SALES DATASET Monthly sales data of the Stealth e-book distribution platform, for the period from July 2010 to June 2013, were provided, on a title basis, by SIMPLICISSIMUS BOOK FARM. During the period, the market has undergone rapid growth, especially from the third quarter of 2012 to the second quarter of 2013, with unit sales growth offsetting declining prices and with a significant increase in the number of titles distributed. 308 See section II.3. 309 See section II.3 for information about market concentration in the Italian publishing industry. 98
  • Fig. 25: Bar chart of Stealth sales and catalog data Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). In these circumstances, a meaningful economic interpretation entails the need for technical adjustments in the statistical models. V.1.1. Panel structure We tentatively use panel linear regression on Stealth sales data to estimate price elasticity with a log-log model. The implied isoelastic demand function is consistent with the fact that “information goods are often highly valued only by a relatively small set of customers.”310 310 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, p. 1619, bit.ly/1fb2LYm. 99
  • The problem with natural pricing experiments is that the researcher cannot easily distinguish whether changes in quantity are determined by movements along the demand curve or shifts of the demand curve itself. 311 Therefore, we split the original dataset in three periods (July 2010 – June 2011, July 2011 – June 2012, and July 2012 – June 2013) of relatively more stable demand. The observations in all three panels have a longitudinal nature: the number of individuals observed (N) is much larger than the number of time periods of observation (T). In a given month, some e-books may have zero sales or be temporarily unavailable for sale, whereas other e-books may be published only several months after the beginning of the period. As a consequence, all three panels are highly unbalanced: the number of observations is much smaller than the product of the number of individuals and the number of time periods (NT). • In the first panel (2010/2011), there is a total of 10,698 observations for 2,317 titles vs. a theoretical balanced panel of 27,804 observations. • In the second panel (2011/2012), there is a total of 33,449 observations for 6,000 titles vs. a theoretical balanced panel of 72,000 observations. • In the third panel (2012/2013), there is a total of 74,206 observations for 12,370 titles vs. a theoretical balanced panel of 148,440 observations. 311 See section III.4. 100
  • The period from the beginning of the third quarter of a given year to the end of the second quarter of the next year offers significant advantages compared to the calendar year, because of peculiar market seasonality. August and December are the typical high-season months for both digital and print books, because of more leisure time in summer and during Christmas holidays.312 January, instead, seems to be an high-season month specific of the ebook market, possibly because of additional sales driven by e-book reader gifts at the end of December.313 In order to further control for demand shifts, we regress the logarithm of unit sales (lnqty) not only on the logarithm of price (lnprice), but also on a seasonal dummy (high.season) set to one in August, December, and January, and to zero otherwise. Grouping the three high-season months in a unique seasonal dummy should guarantee higher variability through time within individuals in the unbalanced panels than three distinct seasonal dummies. V.2. CONCENTRATION ANALYSIS In order to verify the presence of long tail effects in Stealth sales during the period 2010-2013, we follow the approach described in Brynjolfsson et al. (2011).314 Instead of comparing different distribution channels, we track the evolution of revenues and unit sales through time, and try to distinguish between first-order and second-order long tail drivers315. 312 In accordance with STATIONERS' line of reasoning, see section III.4. 313 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management. 314 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011, pp. 1378-1379, bit.ly/196Bu39. See also section III.2. 315 See section III.2. 101
  • First, we analyze the concentration of revenues and unit sales across the whole sample of titles, including those with zero sales, in 2010/2011 (July 2010 – June 2011), 2011/2012 (July 2011 – June 2012), and 2012/2013 (July 2012 – June 2013). Then, we repeat the analysis for a subsample of titles available and with positive sales in all three periods, which should highlight original trends in concentration, irrespective of subsequent modifications to the catalog. The Lorenz curve is a graphical representation of the fraction of the quantitative variable of interest (in our case, either revenues or unit sales), depicted on the vertical axis, that accrues to cumulative fractions of the pop ulation (titles, in our case), arranged on the horizontal axis in increasing order of the quantitative variable. In case of perfect equality, the Lorenz curve corresponds to the 45-degree line; the more unequal the distribution the larger the area between the 45-degree line of equality and the Lorenz curve. Since the Lorenz curve is increasing and (weakly) convex, a given distribution is more unequal than a second distribution if it lies below (to the right of) the second one. Crossings of two Lorenz curve occur in ambiguous situations where “we cannot go from one distribution to the other by a sequence of […] regressive transfers.”316 The Gini coefficient is a measure of statistical dispersion, computed as the sum of absolute differences in the quantitative variable of interest between all pairs of individuals, normalized by dividing by the square of the number of individuals and by the mean of the quantitative variable. 316 D. RAY, Economic inequality in D. RAY, Development economics, Princeton University Press, 1998, p. 183, amzn.to/1a4AECQ. 102
  • The Gini coefficient corresponds to the ratio of the area between the 45degree line of equality and the Lorenz curve, and the area of the triangle below the 45-degree line of equality. Its value is included between zero and one, and the higher the value the higher the concentration of the distribution. The Lorenz curves of the revenue distributions for the whole sample of titles are displayed in Fig. 26. Fig. 26: Lorenz curves of revenue distributions (TOT.SAMPLE) Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). The distribution of revenues across titles has grown unequal with time, especially in the last period (2012/2013), as confirmed also by the Gini coefficients reported in Tab. 15. 103
  • Tab. 15: Gini coefficients of revenue distributions (TOT.SAMPLE) Period Gini coefficient 2010/2011 0.7661 2011/2012 0.8029 2012/2013 0.8752 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). The Lorenz curves of the corresponding unit sales distributions, depicted in Fig. 27, follow a very similar pattern, with slightly higher Gini coefficients, reported in Tab. 16. 104
  • Fig. 27: Lorenz curves of unit sales distributions (TOT.SAMPLE) Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 16: Gini coefficients of unit sales distributions (TOT.SAMPLE) Period Gini coefficient 2010/2011 0.7774 2011/2012 0.8097 2012/2013 0.8923 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Concentration of revenues in fewer titles seems to be driven by concentration of unit sales, rather than differential evolution of prices through time. 105
  • Both “superstar” and “underdog” effects317 might have been at work in the Stealth catalog during the period 2010-2013. First, publishing houses have gradually started publishing new print releases simultaneously in e-book format, which has increased the availability of best-selling titles on major e-book stores.318 Secondly, the launch of the Narcissus self-publishing platform319 and the consequent addition of many niche titles to the catalog might represent a derivative effect of the long tail phenomenon,320 inflating the concentration measures. The same analysis conducted on the subsample of e-books with positive sales in all three periods should control for modifications to the catalog. The Lorenz curves in Fig. 28 and the Gini coefficients in Tab. 17 show an initial reduction in the concentration of revenues from 2010/2011 to 2011/2012, consistent with the long tail hypothesis. 317 See section III.2. 318 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management. 319 See above in this chapter. 320 See section III.2. 106
  • Fig. 28: Lorenz curves of revenue distributions (SUB.SAMPLE) Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 17: Gini coefficients of revenue distributions (SUB.SAMPLE) Period Gini coefficient 2010/2011 0.7059 2011/2012 0.6697 2012/2013 0.7301 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). However, in the last period, the distribution returns to even higher concentration than in the first period. 107
  • The Lorenz curves in Fig. 29 and the Gini coefficients in Tab. 18 refer to the corresponding unit sales distribution. Fig. 29: Lorenz curves of unit sales distributions (SUB.SAMPLE) Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 18: Gini coefficients of unit sales distributions (SUB.SAMPLE) Period Gini coefficient 2010/2011 0.7168 2011/2012 0.7174 2012/2013 0.8503 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). 108
  • From 2010/2011 to 2011/2012, there were no substantial changes in the concentration of unit sales: the Lorenz curves cross each other and the Gini coefficients are very close in value. As a consequence, the equalization of revenues that took place from the first to the second period seems to be driven either by price convergence across titles or changing price-composition of best-selling titles (relative success of cheaper e-books). In the last period, instead, the growth of the concentration of unit sales is relevant and probably drives the overall growth of the concentration of revenues. A possible explanation for the 2012/2013 concentration increase in the subsample unit sales is the adoption of e-books by a larger public of readers, with more homogeneous “mass” preferences. First-order drivers of the long tail phenomenon do not seem to be at work in the Stealth catalog, which calls into question also the evidence of derivative effects. Lorenz curves and Gini coefficients do not take into account the degree of “social mobility” among titles, which might provide an interpretative framework alternative to the long tail hypothesis. Lower entry barriers for authors in digital publishing, with respect to print publishing, could foster writing activity and flood the e-book market with niche titles, even without real changes in reader tastes. Furthermore, continuous “on-field” “natural” selection of new best-sellers directly by the digital audience itself could complement slower editorial work by publishing houses321, and provide a powerful incentive to self-publish, in spite of equal, or even lower, success rates. 321 For a seemingly ad hoc example, see G. SBAFFO, Da Narcissus.me a Newton Compton. La rivelazione Anna Premoli vince il premio Bancarella., Simplicissimus Book Farm, 07/22/2013, bit.ly/1fBRgrC. 109
  • V.3. PANEL LINEAR REGRESSION RESULTS The source of the three sales panels described in paragraph V.1.1 is quite new for the literature reviewed in section III.4. Unlike retailer-specific data, distributor-specific sales, of the kind we analyze, do not suffer from differential price elasticity for price increases and price decreases.322 Also publisher-specific sales data track the entire market for a given set of titles, but, usually, a publisher's catalog is more limited in scope and less heterogeneous than a distributor's catalog. In the reference literature, even the papers studying a longitudinal dataset resort to pooled OLS linear regression, without any test for the significance of individual effects. Moreover, the large number of variables and time dummies often employed as controls might overfit the data and compromise the meaningfulness of the model. By contrast, the following panel linear regression analysis is based on an easily interpretable model that relies only on the simple dataset adjustments and the minimal subject matter knowledge presented in paragraph V.1.1. The response variable is lnqty, the logarithm of unit sales, while the two explanatory variables are lnprice, the logarithm of the retail price (in euro, VAT included), and high.season, a dummy set to one for high-season months.323 For each of the three panels (2010/2011, 2011/2012, and 2012/2013), we report the F test statistic for the significance of individual effects 324 and 322 See section III.4. 323 See paragraph V.1.1. 324 The null hypothesis is the non-significance of individual effects. 110
  • the Hausman test statistic, distributed as a chi-squared distribution, for the comparison of the fixed and the random effects model325. For all three panels, the F test statistics in Tab. 19, Tab. 20, and Tab. 21 clearly confirm the presence of individual effects and do not support the pooling of all the observations in OLS linear regression, which assumes the same coefficients, intercepts included, for all the individuals. Tab. 19: F test for individual effects (PANEL.1) Test statistic 7.7131 Degrees of freedom p-value 2316, 8379 < 2.2E-16 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 20: F test for individual effects (PANEL.2) Test statistic 9.2230 Degrees of freedom p-value 5999, 27447 < 2.2E-16 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 21: F test for individual effects (PANEL.3) Test statistic 18.7558 Degrees of freedom p-value 12369, 61834 < 2.2E-16 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). In the fixed effects model, “the intercept terms vary over the individual units”326, that is, the constant term is interacted with a dummy variable for each individual in the panel. For more information, see Y. CROISSANT, G. MILLO, Panel Data Econometrics in R: The plm Package, “Journal of Statistical Software”, 27, 2, 2008, pp. 21-22, bit.ly/1cmn1Fm. 325 The null hypothesis is the consistency of the random effects model. For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, pp. 351-352, amzn.to/17bknLN. 326 M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, p. 345, amzn.to/17bknLN. 111
  • It can be shown that the fixed effects estimator is equivalent to the OLS estimator obtained from a time-demeaned model without constant terms (within transformation).327 As a consequence, coefficient estimates are conditional upon the fixed individual effects and capture the effects of time-variant explanatory variables within individuals. The random effects model, instead, assumes that individual effects are random and i.i.d. over individuals, so the error term is composed of a time-invariant individual-specific component and a remainder component uncorrelated over time, with both components uncorrelated with the explanatory variables. The deriving GLS estimator combines “the information from the between and within dimensions [of the panel] in an efficient way,” 328 and allows for the estimation of a common intercept term and the inclusion of explanatory variables varying between individuals but time-invariant in the within dimension. For all three panels, the Hausman test statistics in Tab. 22, Tab. 23, and Tab. 24 reject the null hypothesis of uncorrelation between individual effects and explanatory variables. Tab. 22: Hausman test (PANEL.1) Test statistic Degrees of freedom 87.1593 p-value 2 < 2.2E-16 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). 327 Ibid., pp. 345-346. 328 Ibid., p. 350. 112
  • Tab. 23: Hausman test (PANEL.2) Test statistic Degrees of freedom 274.4334 p-value 2 < 2.2E-16 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 24: Hausman test (PANEL.3) Test statistic Degrees of freedom 50.4353 p-value 2 1.117E-11 *** Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Intuitively, the inconsistency of the random effects model is due to the fact that each books is “ʻone of a kindʼ, and cannot be viewed as a random draw from some underlying population”329. Tab. 25, Tab. 26, and Tab. 27 report coefficient estimates of the fixed effects model for the 2010/2011, 2011/2012, and 2012/2013 panels, respectively. Tab. 25: Table of coefficients (PANEL.1) Coefficient lnprice high.season Estimate -0.6124 0.3922 Std. error 0.08930 t value p-value -6.8578 7.488E-12 *** 0.0171 22.8804 < 2.2E-16 *** R-squared: 0.0746; Adjusted R-squared: 0.0584 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). 329 Ibid., p. 351. 113
  • Tab. 26: Table of coefficients (PANEL.2) Coefficient lnprice high.season Estimate Std. error -0.0383 0.0112 0.0511 0.0091 t value -3.4227 p-value 0.0006 *** 5.5852 2.356E-08 *** R-squared: 0.0020; Adjusted R-squared: 0.0016 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Tab. 27: Table of coefficients (PANEL.3) Coefficient lnprice high.season Estimate -0.4607 0.0956 Std. error t value p-value 0.0323 -14.2670 < 2.2E-16 *** 0.0058 16.5630 < 2.2E-16 *** R-squared: 0.0239; Adjusted R-squared: 0.0199 Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). Standard errors and t tests are based on estimation of the covariance matrix robust to heteroskedasticity and serial correlation, more precisely on the Arellano (1987) variant, which relies on large N asymptotics with small T.330 For all three panels, the coefficient of the high.season dummy is significant, positive (as expected) and remarkable in magnitude, especially for the 2010/2011 panel. For the period 2010/2011, in August, December, and January, a given ebook sold, on average, almost 40% more units than its period average, vs. around 5% in 2011/2012 and 10% in 2012/2013. It is very likely that such a large coefficient estimate for the first period is due to the launch of many other e-book stores and distribution platforms over the end of 2010 and the beginning of 2011.331 330 M. ARELLANO, Computing robust standard errors for within group estimators, “Oxford Bulletin of Economics and Statistics”, 49, 4, 1987, pp. 431-434, bit.ly/18I0cbb. 331 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management. 114
  • The R-squared of the regressions332 is very low, which is not new to the literature:333 most of the within-title variation in sales cannot be attributed to price changes. However, weak explanatory power of the model is, in a sense, more severe for the pooled OLS estimations in the previous literature than for our fixed effects model: the first employ many additional variables with the aim of controlling for variation in sales that is intentionally left to individual-specific constant terms in the second. In each period, the 95% confidence interval for the lnprice coefficient, shown in Fig. 30, does not include zero and lies below it, yet well above -1.00, in the inelastic range. 332 In the case of the fixed effects model, the R-squared refers to the within variation. 333 See section III.4. 115
  • Fig. 30: Plot of 95% confidence intervals for lnprice (PANEL) 2010/2011: [-0.7874, -0.4373] 2011/2012: [-0.0602, -0.0163] 2012/2013: [-0.5239, -0.3974] Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM). For a given e-book, on average a one percent price decrease from the period average price translates into a unit sales increase lower than one percent of the period average unit sales. The large difference from the price elasticities around -2.00 reported in section III.4 is due to the improper use of the pooled OLS estimator in the lit erature. 116
  • Given the superstar nature of the publishing industry, 334 it is not surprising that, statistically, from the mere point of view of copies sold, an author would be better off by “milking” his (on average) very few readers. However, analysis of high conditional quantiles (e.g. 80th percentile, a common working definition of best-selling titles335), instead of the conditional mean, or, alternatively, analysis at aggregate-sales level, instead of individualtitle level, could produce completely opposite results, which is explored in paragraph VI.1.2 and in section VI.2. 334 See section III.2. 335 See section III.2. 117
  • VI. DISCUSSION The quality of data, the theoretical framework, the statistical techniques, and the robustness of the results jointly determine the significance and the interpretation of an econometric study. Although the present work is affected by a number of dataset limitations and technical issues, reported in section VI.1, it is its title-based approach that may represent the main problem of the research. In fact, there are powerful economic incentives at work in digital markets for information goods, discussed in section VI.2, that may call for analysis at more aggregate level. VI.1.LIMITATIONS OF THE STUDY VI.1.1.Catalog analysis As anticipated in section IV.2, the OLS linear regression model applied to the catalog dataset clearly suffers from heteroskedasticity and may be biased by functional form misspecification and/or omitted explanatory variables. Quantile linear regression results in section IV.3 contradict the assumption of a “one-size-fits-all” model and show substantial differences in the effects of the explanatory variables along the distribution of the response variable, but the quantile linear regression itself neither rests upon alternative model formulations nor relies on testable distributional assumptions. So, inference on the linear model proposed for the catalog dataset described in section IV.1 is doubtful; nonetheless, both the OLS and the quantile linear regression model retain their, in a sense, optimal descriptive properties. 118
  • Given the uniqueness and size of the sample, multidimensional descriptive evidence is of some interest in itself, as a more solid basis for speculation about the market than more or less educated guesses. If one, instead, is interested in valid inference, a multiplicative model should reduce heteroskedasticity and functional misspecification problems, and, thus, result more appropriate in the case at hand. In fact, logarithmic transformation of the right-skewed response variable should reduce the skewness of the distribution. Furthermore, logarithmic transformation of the non-dummy explanatory quantitative variables allows for the “interpretation of predictor variable effects in relative terms”336 (elasticities). The graphical views of the quantile linear regression coefficients suggest that such a model could perform quite well for e-books in the interquartile price range. In effect, from the 0.25th to the 0.75th quantiles, most variables, and paper.price in particular, show quasi-linear trends in coefficients. However, if one is “interested in the covariate effect on the response variable in absolute terms”337, raw-scale analysis is preferable to logarithmic transformation. Because of non-linearity of the logarithmic transformation, the conditional-mean e-book price is not the exponential function of the conditional mean of log e-book price. Moreover, even though the conditional median is equivariant to monotonic transformations, “the retransformation of [coefficient] estimates is more complicated because of the nonlinearity of the log transformation.” 338 336 L. HAO, D. Q. NAIMAN, amzn.to/1aeZCAO. 337 Ibid., p. 81. 338 Ibid. Quantile regression, 119 SAGE Publications, 2007, p. 77,
  • Finally, the constant effect of the subject, as captured by the subj.fiction dummy, may be judged restrictive: given the roughly equal distribution of the sample between fiction and non-fiction e-books, interactions of the dummy with the other explanatory variables could have been explored. VI.1.2.Sales analysis The concentration analysis of the sales dataset in section V.2 is technically simple, and its purely descriptive results are straightforward enough in terms of interpretation. The same does not apply to the panel linear regression analysis in section V.3. Notwithstanding dataset adjustments and the addition of a seasonal dummy to control for shifts of the demand curve, variation in e-book sales remains largely unexplained. According to the literature, the inclusion of other explanatory variables, derived from catalog information, and/or of ad hoc time dummies, does not seem to improve model performance.339 The demand for books is affected by so many title- and market-specific factors that subject matter omniscience would be required to discern price effects.340 At the time of writing, the data source that best approximates such an objective and all-knowing nature is Google Trends by GOOGLE, widely used by journalists, businesses, and researchers to explore trends in Internet search volumes through time.341 339 See section III.4. 340 It was only by sheer luck that the high.season dummy was able to capture the effect of the media exposure due to the take off of the e-book market in terms of product availability. For more information, see section VI.1.2. 341 Y. MATIAS, Insights into what the world is searching for: the new Google Trends, “Inside Search”, 09/27/2012, bit.ly/16T8VnI. 120
  • Smith et al. (2012) employ the number of Bing search results for the ISBN as an explanatory variable in regression analysis of a cross-sectional dataset of e-book sales.342 Similarly, it could be worth trying to add Google Trends results for a combination of author and title in a panel dataset of e-book sales. Fig. 31: A Google Trends query by author The highest peak corresponds to the base period (100). Capital letters link to relevant major news headlines (e.g. C links to news about Umberto Eco's 80th birthday.) Source: own elaboration of Google Trends public data. 342 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, pp. 14-16, available at SSRN: bit.ly/WcNhJq. For more information, see section III.4. 121
  • Fig. 32: A Google Trends query by title The highest peak corresponds to the base period (100). Capital letters link to relevant major news headlines (e.g. C links to news about the publication of a new revised edition of Il nome della rosa.) Source: own elaboration of Google Trends public data. The possibility to allocate title-specific effects, and, thus, the increased total explanatory power of the model, may allow for the estimation of a random effects model343 or, even more interestingly, of a correlated random effects quantile regression model. Monopolistic competition theory suggests that “books for which costcovering prices allow large sales will have demand curves that are more elastic than books with more limited audiences.”344 343 See section V.3. 344 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 595, bit.ly/17dxX0b. 122
  • Estimation of high conditional sales quantiles (e.g. 80th percentile, a common working definition of best-selling titles345) might reveal significantly higher price elasticity than conditional-mean estimation. Still few methods for panel quantile linear regression exist in the literature, but correlated random effects models have a number of desirable properties.346 A subset of the time-variant explanatory variables (in our case, variables derived from Google Trends data), correlated with “time-invariant unobserved characteristics”347 that affect the response variable, is required to correct for “the unexplained ranking mechanism in the model, but at the same time control for endogenous effects”348. Alternatively, a simpler analysis at aggregate sales level of the effect on unit sales of changes in the average price weighted by unit sales should average out individual-specific effects and result more relevant at general market level. VI.2.POSSIBLE FURTHER DEVELOPMENTS Fan et al. (2011) derive equilibrium conditions for a model of isoelastic consumer demand and duopolistic competition for a digital device with tied digital content.349 The setup of the model reflects real-world competition in the market for e-book readers fairly well, since the incumbent position of Kindle is periodi- For more information, see section III.4. 345 See section III.2. 346 S. H. BACHE, C. M. DAHL, J. T. KRISTENSEN, Headlights on tobacco road to low birthweight outcomes: evidence from a battery of quantile regression estimators and heterogeneous panel, “Center for Research in Econometric Analysis of Time Series Research Papers”, 2008-20, 2011, pp. 4-6, bit.ly/1ghltfQ. 347 Ibid., p. 9. 348 Ibid. 349 M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, p. 405, bit.ly/16sHzsj. 123
  • cally threatened by direct competition from a strong competitor, leveraging the same tying strategy pursued by AMAZON.350,351 Digital media products are information goods, that is, goods with zero, or very low, marginal cost of production and large fixed development costs. An important difference between a tying strategy involving physical products and one involving information goods is that, typically, physical products are consumed in fixed proportions, whereas demand for information goods is endogenously determined.352 In the model, the price of the digital content is given and the two firms determine the price of their digital device in a simultaneous game. 353 If the profit margin of the digital content is higher than a certain threshold, inversely proportional to the demand elasticity of the digital content, the price of the digital device and of the digital content move in the same direction.354 Thus, a lower digital content price increases profits through high demand elasticity and allows for a more competitive digital device price. High demand elasticity is not the only possible transmission mechanism to increase digital content profits and sell the digital device at lower prices. In the model, since the price of the digital content is exogenous, superior quality of the digital devices allows to fetch an higher price for the digital device itself, but does not affect the digital content demand. 350 P SMITH, The war between Nook & Kindle is over and Amazon has carried the day, “IT . world”, 07/10/2013, bit.ly/19pynkV. 351 P ST. JOHN MACKINTOSH, E-reader price war breaks out in UK as Kobo chases Nook down, . “TeleRead”, 06/17/2013, bit.ly/GAAbkv. 352 M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, p. 406, bit.ly/16sHzsj. 353 Ibid., p. 407. 354 Ibid. 124
  • However, product differentiation of the digital device does affect perceived quality of the digital content and allows either to fetch an higher price for the digital content or even to totally displace competition in the market. Digital content profits can also be increased by negotiating lower licensing fees with publishers and lower delivery costs with mobile operators. AMAZON has exploited all the strategies outlined above, promptly imitated by competitors. • AMAZON tried, unsuccessfully,355 to lower e-book prices by setting a ceiling at $9.99 per e-book in the Kindle marketplace. • The company invested in hardware and software improvements of the Kindle device, in particular with regard to the paper-like rendering of the display.356 • Kindle Direct Publishing, the AMAZON self-publishing platform, formerly known as Digital Text Platform, was launched concurrently with the Kindle device to bypass, at least in part, publisher fees. A possibly more disrupting competitive strategy, not yet exploited in the e-book market but well-established in television broadcasting and recently gaining track in the music industry, is the bundling of digital media products. Bakos and Brynjolfsson (1999) study optimal bundling strategies for a multiproduct monopolist selling digital information goods, as opposed to conventional price discrimination and micropayments, 357 and Bakos and Brynjolfsson (2000) extend the model to a competitive scenario.358 355 See section II.1. 356 C. M. TEICHER, Lots of little improvements make the Kindle 3 the best e-ink e-reader, “Publishers Weekly”, 08/26/2010, bit.ly/1cd3YNB. 357 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, p. 1613, bit.ly/1fb2LYm. 358 Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing Science”, 19, 1, 2000, p. 63, bit.ly/GCXl9R. 125
  • Usually, economic models get more complicated in the number of goods analyzed; in the case at hand, instead, the results rely on the law of large numbers to derive asymptotic optimality conditions. Under fairly general assumptions, the very low marginal costs of production and distribution of digital information goods make it possible to profitably sell them in large bundles, characterized by consumer valuations with lower variance per good than individual goods.359 Given i.i.d. consumer valuations and any demand function with finite mean and variance, the larger the bundle the lower the demand idiosyncratic factors. The number of “moderate” consumers with “quasi-average” valuations increases, and demand becomes more elastic near the mean and less elastic away from it.360 Fig. 33: Demand for bundles of information goods Linear demand case with i.i.d. valuations uniformly distributed in [0,1]. Source: Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, p. 1617, fig. 1, bit.ly/1fb2LYm. 359 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, p. 1614, bit.ly/1fb2LYm. 360 Ibid., pp. 1616-1617. 126
  • […] this “predictive value of bundling” makes it possible to achieve greater sales, greater economic efficiency, and greater profits per good from a bundle of information goods than can be attained when the same goods are sold separately. 361 The model can be extended to include complementarity and substitution effects, budget constraints, diminishing returns, negative valuations, and cognitive costs. Typically, the main difference with the basic model is that a mixedbundling strategy becomes more efficient than a pure-bundling strategy, where the number of information goods included in the bundle tends to infinity. These “economies of aggregation”362 (large aggregators are more profitable than small aggregators) are peculiar of digital markets for information goods and must be distinguished from network economies, economies of scale, and economies of scope in production, distribution, or consumption. If significant, economies of aggregation have a number of game-changing competitive implications for the entire market. • Downstream competition among different goods is replaced by upstream competition for new content: the largest bundler has a higher marginal profitability, so he is able to outbid single- and multiple-good non-bundlers, and smaller bundlers.363 • The pricing policy of the bundler is inherently “aggressive”: the optimal price happens also to deter competition, without the need to resort to any short-term “artificial” strategy.364 361 Ibid., p. 1613. 362 Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing Science”, 19, 1, 2000, p. 63, bit.ly/GCXl9R. 363 Ibid., pp. 68-70. 364 Ibid., pp. 75-76. 127
  • • An entrant with higher fixed or marginal costs, or with products of lower quality, but using the bundling strategy, may be able to dislodge the non-bundling incumbent.365 • Innovative activity shifts from standalone firms to bundlers, since the former will be reluctant to invest in research and development if a competing product can be incorporated into a larger bundle by the latter.366 The i.i.d. assumption is not likely to be empirically relevant for the pub- lishing market, because it would imply that “consumers differ significantly in their tastes across goods, but not in their total expenditure on the entire set of goods.”367 In fact, segmentation of readers by language, subject and reading habits is feasible, so mixed-bundling strategies might be expected to emerge in the foreseeable future. 365 Ibid., pp. 77-78. 366 Ibid., p. 78. 367 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, pp. 1616-1617, note 8, bit.ly/1fb2LYm. 128
  • APPENDIX SHORT GLOSSARY OF THE E-BOOK 3G Wireless technology that allows devices to connect to the Internet through the cellular data network, i.e. virtually everywhere. ADOBE Famous US software company (Photoshop, Acrobat Reader, etc.) ADOBE DRM DRM system, adopted by the majority of publishers, limiting the fruition of e-books to a maximum definite number of reading devices registered to the same owner. AZW Kindle proprietary file format, non-readable on other devices. DRM Acronym for Digital Rights Management, combination of technologies allowing publishers to protect digital contents from copy. E-book Book in electronic file format. E-book reader (e-reader) Device devoted to e-book reading. E Ink® Electronic ink, technology for the production of displays with paper-like rendering, used by e-book readers. 129
  • EPUB The most widespread file format for e-book creation, supported by all ebook readers but Kindle, that uses its own proprietary file format. LCD Liquid Crystal Display, used by desktop and tablet computers, characterized by backlight and high image quality. PDF Acronym for Portable Document Format, file format introduced by ADOBE in 1993 to enable the representation of documents independently of the availability of the creation software; the file format is supported by most e-book readers, but is not optimal for these devices. Social DRM “Loose” DRM mechanism that allows for the sharing of purchased ebooks, watermarked with information about the customer. Tablet Portable devices capable of performing most of the typical PC functions (e.g. APPLE iPad). Wi-Fi Wireless technology that allows devices to connect to the Internet through a home network. SHORT CHRONOLOGY OF THE E-BOOK 1971 Michael S. Hart launches Project Gutenberg: 1971 is considered the year of birth of the e-book. 130
  • 1987 Afternoon: a story by Michael Joyce, the first hypertextual novel, is published on floppy disks by EASTGATE SYSTEMS. 1993 DIGITAL BOOK offers floppy disks with 50 e-books in DBF file format. Brad Templeton publishes on CD-Roms an anthology of candidates for the Hugo Award for Best Novel. 1994 Progetto Manuzio, the first digital library in Italian language, is launched. 1995 AMAZON starts selling print books online. 1996 Project Gutenberg catalog reaches 1,000 titles. 1998 Kim Blagg obtains the first ISBN code for an e-book and starts selling it on amazon.com, bn.com, and borders.com. Rocket eBook and SoftBook, the first e-book readers, are launched. 1999 E-book stores, like eReader.com and eReads.com, begin to proliferate online. 2000 Stephen King makes his book Riding the Bullet available in digital format. 131
  • 2002 RANDOM HOUSE and HARPERCOLLINS begin to sell English-language ebooks. 2005 AMAZON acquires MOBIPOCKET. 2006 SONY launches Sony Reader, an e-book reader based on E Ink® technology. 2007 AMAZON launches Kindle in the United States. 2008 SONY partners with ADOBE to support DRM-encrypted e-books on its devices. SONY launches Sony Reader PRS-505 in UK and France. BOOKSONBOARD starts selling e-books for iPhone. 2009 AMAZON launches Kindle 2 and Kindle DX. The integration between the Amazon.com store and the Kindle device allows AMAZON to cover 60% of US e-book sales by the end of 2009. BARNES & NOBLE launches Nook in the United States. Bookboon.com reaches 10 million downloads of free e-books in one year. 132
  • 2010 In May, at the Torino Book Fair, major Italian publishers announce the release of e-books for the next autumn. APPLE launches iPad, a multipurpose device including e-book reading functions. APPLE launches iBookstore to compete with AMAZON and BARNES & NOBLE. GOOGLE announces a new service for the online sale of e-books (Google Editions). 2011 E-book readers with 3G connectivity appear on the Italian market. 133
  • BIBLIOGRAPHY P. D. ALLISON, Missing data in A. MAYDEU-OLIVARES, R. E. MILLSAP, The SAGE handbook of quantitative methods in psychology, SAGE Publications, 2009, amzn.to/15UAtJe. Amazon.com now selling more Kindle books than print books, “Amazon Media Room: Press Releases”, 05/19/2011, bit.ly/119B0K2. G. ANDERSON, GNU/Linux command-line tools summary, The Linux Documentation Project, 04/15/2006, bit.ly/1fTbAaG. Antitrust: Commission opens formal proceedings to investigate sales of e-books, European Commission, 12/06/2011, bit.ly/ZBWTjW. M. ARELLANO, Computing robust standard errors for within group estimators, “Oxford Bulletin of Economics and Statistics”, 49, 4, 1987, bit.ly/18I0cbb. ASCII, Wikipedia, accessed on 09/07/2013, bit.ly/WBUEXv. S. H. BACHE, C. M. DAHL, J. T. KRISTENSEN, Headlights on tobacco road to low birthweight outcomes: evidence from a battery of quantile regression estimators and heterogeneous panel, “Center for Research in Econometric Analysis of Time Series Research Papers”, 2008-20, 2011, bit.ly/1ghltfQ. Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing Science”, 19, 1, 2000, bit.ly/GCXl9R. Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, bit.ly/1fb2LYm. L. BALLAND-POIRIER, J. HOLLIS WEBER, H. RUSSMAN, Math guide: using the equation editor, The Document Foundation, 07/03/2013, bit.ly/17auTmw. M. BERI, Espressioni regolari, Apogeo, 2007, amzn.to/1b0tGRH. G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, bit.ly/17dxX0b. B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, bit.ly/XKt7Cp. F. BRIVIO, L'umanista informatico, Apogeo, 2010, amzn.to/1fb3CZd. E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”, 49, 11, 2003, bit.ly/18I1YJp. E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the long tail, “Heinz Research Papers”, 51, 06/01/2006, bit.ly/GHk7y3. E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011, bit.ly/196Bu39. E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, The longer tail: the changing shape of Amazon sales distribution curve, 09/20/2010, available at SSRN: bit.ly/15QfXyK. C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”, 07/19/2010, nyti.ms/118yEuR. 134
  • M. CANDUCCI, XML, Apogeo, 2005, amzn.to/1fb4UDn. C. J. S. CHEN, D. KAPLAN, Bayesian propensity score analysis: simulation and case study, Society for Research on Educational Effectiveness, 2011, bit.ly/16N541w. J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2, 2003, bit.ly/1b24xGp. M. COKER, How data-driven decisions *might* help indie ebook authors reach more readers, “RT Booklovers Convention”, 04/25/2012, slidesha.re/Xhccuo. Consumer survey on ebooks, Open eBook Forum, 2003, available at im+m: bit.ly/15Olump. R. CORRIGAL, The essential wGet-GUIde, wGetGUI, 10/17/2007, bit.ly/GFSCV7. Cos'è quintadicopertina, quintadicopertina, accessed on 09/07/2013, bit.ly/1eeukj9. Y. CROISSANT, G. MILLO, Panel Data Econometrics in R: The plm Package, “Journal of Statistical Software”, 27, 2, 2008, bit.ly/1cmn1Fm. C. DE FRANCESCO, G. DELLI ZOTTI, Tesi (e tesine) con PC e Web, Franco Angeli, 2004, amzn.to/GGRYqy. B. DELEERSNYDER, I. GEYSKENS, K. GIELENS, M. G. DEKIMPE, How cannibalistic is the Internet channel? A study of the newspaper industry in the United Kingdom and the Netherlands, “International Journal Research in Marketing”, 19, 4, 2002, bit.ly/1bP5umm. Digital Rights Management, Wikipedia, accessed on 09/07/2013, bit.ly/WzMF04. U. ECO, Come si fa una tesi di laurea, Bompiani, 1971, amzn.to/1980EQV. A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, bit.ly/19uLOzX. M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, bit.ly/16sHzsj. G. FUSINA, Gli abbonamenti ai quotidiani digitali, dataninja.it, 09/18/2013, bit.ly/1fgfbwG. M. GARDENER, Beginning R: the statistical programming language, John Wiley & Sons, 2012, amzn.to/15VwDzI. M. GARRELS, Bash guide for beginners, The Linux Documentation Project, 12/27/2008, bit.ly/1fUcnIl. German resale price maintenance act, Börsenverein des Deutschen Buchhandels, 07/14/2006, bit.ly/17bdJF6. A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, bit.ly/1aeZtxj. A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1, 2006, bit.ly/1983mWp. K. F. HALLOCK, R. KOENKER, Quantile regression, “Journal of Economic Perspectives”, 15, 4, 2001, bit.ly/15gERVI. L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, amzn.to/1aeZCAO. 135
  • L. HAZARD OWEN, Thanks to e-books, flat revenue is no problem for publishers, “TIME”, 03/30/2012, ti.me/UFbwiq. Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural experiment, 08/29/2011, available at SSRN: bit.ly/Y2G650. International Standard Book Number, Wikipedia, accessed on 09/07/2013, bit.ly/16kdpGT. Y. KATZNELSON, A (brief) introduction to inferential statistics, UCSC, 2006, bit.ly/1af1h9v. Kindle Format 8 Overview, Amazon.com, accessed on 09/07/2013, amzn.to/Vx6NlD. D. K. KIRKPATRICK, Online sales of used books draw protest, “The New York Times”, 04/10/2002, nyti.ms/Y7mbUm. R. KOENKER, Quantile regression for longitudinal data, “Journal of Multivariate Analysis”, 91, 1, 2004, bit.ly/17MmzwK. R. KOENKER, Quantile regression in R: a vignette, The Comprehensive R Archive Network, 09/04/2012, bit.ly/1e3qExX. La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE. S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, bit.ly/GHOhQB. Long tail, Wikipedia, accessed on 09/07/2013, bit.ly/ZX5aLb. S. LOWE, Do ebooks cannibalize print sales?, “Publishing Bits”, 07/28/2009, bit.ly/YymFj6. D. LUBIAN, Appunti di teoria delle distribuzioni limite, Università degli Studi di Verona, 03/13/1999, bit.ly/19uP3rb. R. J. LUCCHETTI, Elementi di econometria, Università Politecnica delle Marche, 12/21/2011, bit.ly/1fcdTo1. Y. MATIAS, Insights into what the world is searching for: the new Google Trends, “Inside Search”, 09/27/2012, bit.ly/16T8VnI. K. B. MONROE, Buyers' subjective perceptions of price, “Journal of Marketing Research”, 10, 1, 1973, bit.ly/17MoGQW. Nota metodologica in La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE. Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. G. PALOMBA, Elementi di statistica per l'econometria, Clua Edizioni Ancona, 2010, bit.ly/1af2U7d. G. PALOMBA, Modelli a variabili dipendenti qualitative, Università Politecnica delle Marche, 2008, bit.ly/19uPBgS. G. PALOMBA, Panel data, Università Politecnica delle Marche, 2008, bit.ly/17bgJBA. H. M. PARK, Regression models for binary dependent variables using Stata, SAS, R, LIMDEP and SPSS, Indiana University, 2009, bit.ly/15gIxGY. , 136
  • A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934, bit.ly/1giWKb4. Portable Document Format, Wikipedia, accessed on 09/07/2013, bit.ly/WRAkm3. D. RAY, Economic inequality in D. RAY, Development economics, Princeton University Press, 1998, amzn.to/1a4AECQ. Report by Kantar Media in Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS. M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010, nyti.ms/YUlmO4. M. RICH, Steal this book (for $9.99), “The New York Times”, 05/16/2009, nyti.ms/11MVb0z. M. RICH, B. STONE, E-book price increase may stir readers' passions, “The New York Times”, 12/06/2010, nyti.ms/XF4v2l. G. SBAFFO, Da Narcissus.me a Newton Compton. La rivelazione Anna Premoli vince il premio Bancarella., Simplicissimus Book Farm, 07/22/2013, bit.ly/1fBRgrC. E. SCHNITTMAN, Ebooks don't cannibalize print, people do, “Black Plastic Glasses”, 09/27/2010, bit.ly/11Nk78l. B. SCHRAMPFER AZAR, Understanding and using English grammar, Pearson Longman, 2002, amzn.to/198c6tS. M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print ebooks, 08/04/2012, available at SSRN: bit.ly/WcNhJq. P. SMITH, The war between Nook & Kindle is over and Amazon has carried the day, “IT world”, 07/10/2013, bit.ly/19pynkV. O. SOLON, J.K. Rowling's Pottermore details revealed: Harry Potter e-books and more, “Wired”, 06/23/2011, bit.ly/WLzGY4. SQL, Dipartimento del Tesoro, 10/11/2008, bit.ly/1ckfA1t. P. ST. JOHN MACKINTOSH, E-reader price war breaks out in UK as Kobo chases Nook down, “TeleRead”, 06/17/2013, bit.ly/GAAbkv. C. M. TEICHER, Lots of little improvements make the Kindle 3 the best e-ink e-reader, “Publishers Weekly”, 08/26/2010, bit.ly/1cd3YNB. US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO. M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, amzn.to/17bknLN. Web scraping, Wikipedia, accessed on 09/07/2013, bit.ly/1e1Oo8h. T. J. WEBSTER, Managerial economics: theory and practice, Elsevier, 2003, amzn.to/17MAmn0. 137