1. Stefano Tombolini
An empirical investigation of the Italian
digital publishing market
2. This work is licensed under a Creative Commons Attribution 3.0 License. To view a
copy of this license, visit bit.ly/T67jSf.
3. A Giovanni e Paolino.
4. ABSTRACT
Il presente lavoro di tesi analizza il mercato italiano dei libri digitali (ebook) da un punto di vista statistico ed economico, utilizzando dati di catalogo e di vendita inediti, relativi al periodo 2010-2013.
In primo luogo, le strategie di prezzo delle maggiori case editrici italiane vengono descritte tramite modelli di regressione lineare OLS e per quantili, individuando price point multipli, su dati di catalogo.
Grazie all'elevato grado di dettaglio del dataset cross-section a disposizione, è stato possibile studiare il legame tra prezzi digitali e prezzi cartacei,
un'analisi originale rispetto alla letteratura di riferimento.
In secondo luogo, il lavoro esamina la concentrazione e la sensibilità al
prezzo, a livello di singolo titolo, delle vendite di un distributore e-book italiano, focalizzato su editoria medio-piccola e self-publishing.
L'uso ragionato di statistiche di concentrazione e il modello di regressione lineare, stimato in modo coerente alla natura longitudinale del panel di
vendite, rivelano forti somiglianze tra la domanda di libri digitali e la domanda di libri cartacei, piuttosto che mutamenti della tipologia d'acquisto.
Ciò è dovuto in parte a caratteristiche intrinseche del mercato librario,
in parte a politiche di offerta editoriale sviluppate per analogia al mercato
cartaceo, come evidenziato dall'analisi dei dati di catalogo.
In futuro, invece, potrebbero emergere modelli di business alternativi
basati su strategie di tying e di bundling, a causa di forti incentivi economici
presenti nella distribuzione via Internet di beni digitali, come illustrato nelle
conclusioni.
Parole chiave: e-book, elasticità al prezzo, regressione lineare OLS, regressione lineare panel, regressione lineare per quantili, concentrazione.
4
5. Table of contents
Abstract...............................................................................................................4
Table of figures...................................................................................................7
Table of tables.....................................................................................................8
I. Introduction......................................................................................................9
II. The digital publishing industry....................................................................11
II.1. History and technicalities.....................................................................11
II.2. Supply chain..........................................................................................17
II.3. The Italian e-book market.....................................................................20
III. Review of the literature...............................................................................24
III.1. Digital “cannibalization” of physical sales..........................................24
III.2. “Superstars” vs. “underdogs”.............................................................30
III.3. Survey and descriptive evidence.........................................................43
III.4. Econometric evidence..........................................................................45
IV. E-book pricing by major Italian publishers.................................................61
IV.1. Catalog dataset.....................................................................................61
IV.1.1. Descriptive statistics.....................................................................62
IV.1.2. Missing data..................................................................................70
IV.2. OLS linear regression results...............................................................72
IV.3. Quantile linear regression results........................................................81
V. A “long-tail-oriented” distributor sales.........................................................98
V.1. Sales dataset..........................................................................................98
V.1.1. Panel structure...............................................................................99
V.2. Concentration analysis........................................................................101
V.3. Panel linear regression results............................................................110
VI. Discussion..................................................................................................118
VI.1. Limitations of the study.....................................................................118
VI.1.1. Catalog analysis..........................................................................118
VI.1.2. Sales analysis..............................................................................120
VI.2. Possible further developments..........................................................123
Appendix.........................................................................................................129
Short glossary of the e-book......................................................................129
3G...........................................................................................................129
Adobe.....................................................................................................129
Adobe DRM............................................................................................129
AZW.......................................................................................................129
DRM.......................................................................................................129
E-book....................................................................................................129
E-book reader (e-reader)......................................................................129
E Ink®...................................................................................................129
EPUB......................................................................................................130
LCD........................................................................................................130
PDF........................................................................................................130
Social DRM............................................................................................130
5
6. Tablet.....................................................................................................130
Wi-Fi.......................................................................................................130
Short chronology of the e-book.................................................................130
1971.......................................................................................................130
1987.......................................................................................................131
1993.......................................................................................................131
1994.......................................................................................................131
1995.......................................................................................................131
1996.......................................................................................................131
1998.......................................................................................................131
1999.......................................................................................................131
2000.......................................................................................................131
2002.......................................................................................................132
2005.......................................................................................................132
2006.......................................................................................................132
2007.......................................................................................................132
2008.......................................................................................................132
2009.......................................................................................................132
2010.......................................................................................................133
2011.......................................................................................................133
Bibliography...................................................................................................134
6
7. Table of figures
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
1: Growth in US e-book revenue (2002-2011)..........................................12
2: The publishing industry supply chain...................................................18
3: Graphical illustration of the “long tail” hypothesis..............................32
4: Concentration can be a misleading measure of the “long tail”............38
5: Calibration between sales and rank......................................................42
6: Likelihood to pay for downloading a single e-book..............................44
7: Zero-profit locus of price and output combinations.............................51
8: Pie chart of e-book protection mechanisms..........................................63
9: Pie chart of e-book subjects..................................................................64
10: Histogram of ebook.price....................................................................66
11: Histogram of paper.price.....................................................................67
12: Histogram of ebook.pub.delay............................................................68
13: Scatter plot of the quantitative catalog variables..............................70
14: Residuals normal Q-Q plot (OLS)........................................................76
15: Plot of residuals vs. fitted values (OLS)..............................................77
16: Graphical view of const (QUANTREG)................................................87
17: Graphical view of paper.price (QUANTREG)......................................88
18: Graphical view of file.is.open (QUANTREG).......................................89
19: Graphical view of subj.fiction (QUANTREG)......................................90
20: Graphical view of ebook.pub.delay.pos (QUANTREG)........................91
21: Graphical view of ebook.pub.delay.pos.sq (QUANTREG)...................92
22: Graphical view of ebook.pub.delay.neg (QUANTREG).......................93
23: Graphical view of ebook.pub.delay.neg.sq (QUANTREG)..................94
24: Graphical view of the pseudo-R-squared (QUANTREG).....................96
25: Bar chart of Stealth sales and catalog data........................................99
26: Lorenz curves of revenue distributions (TOT.SAMPLE)...................103
27: Lorenz curves of unit sales distributions (TOT.SAMPLE)................105
28: Lorenz curves of revenue distributions (SUB.SAMPLE)..................107
29: Lorenz curves of unit sales distributions (SUB.SAMPLE)................108
30: Plot of 95% confidence intervals for lnprice (PANEL)......................116
31: A Google Trends query by author.....................................................121
32: A Google Trends query by title..........................................................122
33: Demand for bundles of information goods........................................126
7
8. Table of tables
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
Tab.
1: Percentage of titles surviving more than 58 years..............................47
2: Summary statistics for the quantitative catalog variables..................65
3: Correlation matrix of the catalog variables..........................................69
4: Table of coefficients (LOGIT)................................................................71
5: Table of coefficients – preliminary (OLS).............................................75
6: White test (OLS)....................................................................................77
7: Table of coefficients – final (OLS).........................................................78
8: Breusch-Godfrey test (OLS)..................................................................80
9: Ramsey RESET test (OLS)....................................................................80
10: Table of coefficients – 0.05th quantile (QUANTREG.05)...................84
11: Table of coefficients – 0.25th quantile (QUANTREG.25)...................84
12: Table of coefficients – 0.50th quantile (QUANTREG.50)...................85
13: Table of coefficients – 0.75th quantile (QUANTREG.75)...................85
14: Table of coefficients – 0.95th quantile (QUANTREG.95)...................86
15: Gini coefficients of revenue distributions (TOT.SAMPLE)...............104
16: Gini coefficients of unit sales distributions (TOT.SAMPLE)............105
17: Gini coefficients of revenue distributions (SUB.SAMPLE)..............107
18: Gini coefficients of unit sales distributions (SUB.SAMPLE)............108
19: F test for individual effects (PANEL.1).............................................111
20: F test for individual effects (PANEL.2).............................................111
21: F test for individual effects (PANEL.3).............................................111
22: Hausman test (PANEL.1)..................................................................112
23: Hausman test (PANEL.2)..................................................................113
24: Hausman test (PANEL.3)..................................................................113
25: Table of coefficients (PANEL.1)........................................................113
26: Table of coefficients (PANEL.2)........................................................114
27: Table of coefficients (PANEL.3)........................................................114
8
9. I. INTRODUCTION
The present work is a statistical and economic study of original catalog
and sales data for the Italian digital publishing market during the period
2010-2013.
Essential historical, technical, and economic information about the digital publishing industry, and the Italian context in particular, are provided in
chapter II.
The review of the literature in chapter III is quite extensive, and not
only for completeness sake: a thorough review of the literature has been instrumental to the definition of the economic and statistical framework.
Chapter IV presents the results of the analysis, through OLS and quantile linear regression, on multiple price points, of catalog data for e-books by
major Italian publishers.
Thanks to the detailed information available in the cross-sectional catalog dataset, it was possible to contribute a joint analysis of e-book and print
book prices to the reference literature.
An interpretation in terms of pricing behavior and market assumptions
by incumbent print publishing houses is attempted.
Chapter V presents the results of the analysis of sales data for a panel
of titles distributed by a “long-tail-oriented” Italian e-book distributor.
Concentration measures and panel linear regression models are used to
investigate the distribution and the price sensitivity of sales on a title basis.
We find little support in the data for the hypothesis of shifts in consumer tastes towards “niche” products.
On the one hand, the finding is consistent with inherent characteristics
of the book market, on the other, it might be ascribed also to a publishing of9
10. fer based on analogy with the print market, as suggested by the analysis of
catalog data.
Finally, in chapter VI, the methodological limitations of the research,
both in technical and interpretative terms, are discussed.
In addition to possible adjustments to the original models, alternative
paradigms, more in line with the economics of digital markets for information
goods, are outlined.
Innovative tying and bundling strategies by e-book distributors could
reshape the digital publishing industry and represent a serious competitive
threat for incumbent players in the print publishing industry.
10
11. II. THE DIGITAL PUBLISHING INDUSTRY
A basic understanding of the digital publishing industry supply chain requires some knowledge of the historical and technical aspects of e-book production and distribution.
Sections II.1 and II.2 are preparatory to the comprehension of the variables in the datasets and to the interpretation of the results of the analyses.
Readers unfamiliar with the digital publishing landscape might benefit
from a quick survey of the short glossary and chronology in the Appendix.
II.1. HISTORY AND TECHNICALITIES
The digital storage of books probably dates as far back as the 1960s, in
parallel to the development of the ASCII (American Standard Code for Information Interchange) character-encoding scheme.1
Over the years, the introduction of new encodings has allowed text files
to represent many alphabets other than the English one.
At least since the early 1990s, the supply side of the book market engaged in the digitization of the production process; by that time, ADOBE Portable Document Format (PDF), which “made it possible to create complex
text documents with professional grade software”2, was already available.3
However, it is only at the end of the 2000s, with the advent of specialized hardware devices such as the AMAZON Kindle e-reader and the APPLE
iPad tablet computer, that consumers began to perceive e-books as a viable
alternative to traditional printed books.4
1
2
3
4
See ASCII, Wikipedia, accessed on 09/07/2013, bit.ly/WBUEXv.
B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse,
2011, p. 9, bit.ly/XKt7Cp.
Portable Document Format, Wikipedia, accessed on 09/07/2013, bit.ly/WRAkm3.
B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse,
2011, p. 8, bit.ly/XKt7Cp.
11
12. During 2011, 20% of US Internet users have purchased e-books, with
sales reaching 8% and 18% of sales of trade books and of fiction books, respectively;5 the share of US book consumers who also buy e-books grew from
13% in 2010 to 17% in 2011.6
Fig. 1: Growth in US e-book revenue (2002-2011)
Source: M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, p. 30, fig. 1, available at SSRN: bit.ly/WcNhJq.
AMAZON, the multinational e-commerce company, is the most successful
global market player: during 2011, 70% of US e-book consumers have bought
at least one of the 950,000 titles available in digital format on Amazon.com in
the same year.7
5
6
7
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
12
13. In July 2010, AMAZON announced that, for the previous three months,
“sales of books for its e-reader, the Kindle, outnumbered sales of hardcover
books.”8
Six months later, Kindle books overtook paperback books to become the most
popular format on Amazon.com.9
By May 2011, AMAZON had been selling “more Kindle books than all
print books – hardcover and paperback – combined”10.
These numbers are remarkable indeed: AMAZON has been selling
printed books since 1995, whereas Kindle was introduced only in 2007,11 and
the Kindle catalog is still a small fraction of the printed one. 12
The figures do not even include free Kindle books, mostly out-of-copyright pre-1923 titles.13,14
The earning reports of large book publishers for 2011 also suggest that
“e-books are generally more profitable than print books,” 15 despite roughly
flat yearly revenues.
In April 2012, the US DEPARTMENT
APPLE
and
five
major
international
HARPERCOLLINS, MACMILLAN, SIMON
of antitrust laws.
8
9
10
11
12
13
14
15
16
OF JUSTICE
AND
book
sued the technology giant
publishers
(HACHETTE,
SCHUSTER, and PENGUIN) for violation
16
C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”, 07/19/2010,
nyti.ms/118yEuR.
Amazon.com now selling more Kindle books than print books, “Amazon Media Room: Press
Releases”, 05/19/2011, bit.ly/119B0K2.
Ibid.
Ibid.
C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”, 07/19/2010,
nyti.ms/118yEuR.
Ibid.
Amazon.com now selling more Kindle books than print books, “Amazon Media Room: Press
Releases”, 05/19/2011, bit.ly/119B0K2.
L. HAZARD OWEN, Thanks to e-books, flat revenue is no problem for publishers, “TIME”,
03/30/2012, ti.me/UFbwiq.
US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO.
13
14. In December 2011, the EUROPEAN COMMISSION had already opened formal antitrust proceedings against the same firms for anti-competitive practices.17
According to the accusations, book publishers teamed up with APPLE to
restrain retail price competition in the e-book market.
Initially, e-books were sold under a wholesale model, where publishers
set a price and each retailer decides the cover price that he wants to charge
for the title.
AMAZON set $9.99 as its own price ceiling for e-books, an aggressive
marketing policy to attract customers to the Kindle platform.18
Publishers reacted by shifting to agency pricing, where they fix the final
customer price and pay a commission to the retailer.
When APPLE launched the iBooks platform on iPad and iPhone, it accepted and advocated the adoption of such a pricing model for e-books.
AMAZON, faced with the prospect of offering a smaller catalog than competitors, had to stop its $9.99 price policy and accept agency terms from publishers.19,20
At the very beginning of the 2000s, the ADOBE PDF file format had already found success among consumers, thanks to the ADOBE Acrobat Reader
free viewing tool, and was used to “pioneer the commercial distribution of ebooks in Internet”21.
The PDF file format is page-oriented and provides very precise layout
control; these features are convenient for “high quality publications that re17 Antitrust: Commission opens formal proceedings to investigate sales of e-books , European
Commission, 12/06/2011, bit.ly/ZBWTjW.
18 US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO.
19 Ibid.
20 M. RICH, B. STONE, E-book price increase may stir readers' passions, “The New York
Times”, 12/06/2010, nyti.ms/XF4v2l.
21 B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse,
2011, p. 10, bit.ly/XKt7Cp.
14
15. quire very precise element spacing or include many image elements that
have to be precisely positioned”22, but pose technical problems as far as the
needs of the e-book market are concerned.
[…] the downturn of the PDF file format is that is quite difficult to read and use
on small to medium size screens, like for example smartphones or very small tablet
computers. The reason is that the text elements in PDF files are of a fixed size that
is relative to the page size of the document and not to the size of the screen of the
reader device. The user barely has the difficult choice between viewing the whole
page with very small text characters or viewing only a magnified part of the page
that has to be moved around all the time. 23
During the 2000s, attempts by the digital publishing industry to address
the file format issue led to the emergence of two different de facto standards.
The OPEN
EBOOK
FORUM, an international publishing industry organiza-
tion created in 2000 and later renamed INTERNATIONAL DIGITAL PUBLISHING
FORUM (IDPF), proposed the Open eBook (OeB) format, later replaced by the
EPUB format, “in an attempt to set a common industry standard”24.
Meanwhile, MOBIPOCKET, a company founded in 2000, concurrently developed the proprietary MOBI file format, very similar to the Open eBook
specification.
In 2005, AMAZON acquired the company.
[…] the MOBI file format was modified and transformed into the AZW file format. The AZW file format is now a proprietary format of AMAZON and there is no
public specification available.25
Unlike PDF, both EPUB and AZW do not allow pagination or precise
page layout; however, their text content is reflowable: “it adapts itself to the
22 Ibid., p. 29.
23 Ibid.
24 Ibid., p. 10.
The EPUB file format is an open standard based on existing standard formats and algorithms: XML (eXtensible Markup Language), XHTML (eXtensible HyperText Markup Language), and ZIP (an open archiving file format).
25 Ibid., pp. 29-30.
15
16. size of the screen it is displayed on” 26 and to the viewing preferences of the
user (e.g. bigger font size).
This is a very convenient feature for simple books that do not require
precise layout and image positioning (unlike comics, textbooks, and other
richly illustrated books).
The conventional wisdom about “open and standard formats” vs.
“closed and proprietary formats” is that the former would be advantageous
for producers, because of the complete control over the production process,
and for consumers, because of the implicit guarantee against technological
lock-in, whereas the latter would ensure distribution exclusivity and, well,
customer lock-in.27
This line of reasoning is broadly correct, but, in the specific case of
EPUB vs. AZW, it must be mitigated by the observation of two facts.28
First, AMAZON has been attentive to provide interoperability for its format: up to date, producers can easily create AZW files starting from EPUB
files, and consumers can read AZW files with the official reading software,
readily available for free on many platforms and devices other than the Kindle e-reader itself.
Secondly, in order to prevent copyright infringement, many publishers
encrypt their EPUB files with DRM (Digital Rights Management)29 technologies, similar to those implemented by AMAZON in its AZW format.
Usually, the user cannot copy & paste or print the content of encrypted
e-books, which can be read only on a limited number of authorized DRM-compatible devices.
26
27
28
29
Ibid., p. 20.
Ibid., p. 30.
Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management.
See Digital Rights Management, Wikipedia, accessed on 09/07/2013, bit.ly/WzMF04.
16
17. As a consequence, the EPUB and the AZW formats can be considered
very close substitutes.30
II.2. SUPPLY CHAIN
Fig. 2 presents a modified version of the publishing industry supply
chain as depicted by Blazejewski, 31 with the aim of providing a reference
scheme and highlight recent industry trends.
30 A note for the tech-savvy reader: hereafter EPUB refers to the widely adopted EPUB 2.0.1
specification.
The latest EPUB 3.0 specification introduced many modifications, aimed at improving the
presentation of multimedia content and the expression of complex mathematical notation.
For more information, see B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, pp. 18-21, bit.ly/XKt7Cp.
AMAZON responded with the development of its new KF8 (Kindle Format 8) file format.
For more information, see Kindle Format 8 Overview, Amazon.com, accessed on 09/07/2013,
amzn.to/Vx6NlD.
31 B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse,
2011, p. 24, bit.ly/XKt7Cp.
17
18. Fig. 2: The publishing industry supply chain
Source: own elaboration on B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg Suisse, 2011, p. 24, fig. VIII, bit.ly/XKt7Cp.
For explanation sake, we minimized the level of integration of the supply chain; actually, we often observe a variety of higher degrees of vertical
and horizontal integration (vertical integration between publishing houses
and offline distributors, horizontal integration between offline and online
bookstores, etc.)
The author's manuscript, typescript, or, more probably nowadays, digital text document (.doc, .docx, .rtf, .odt, etc.) has to be delivered to the
reader in a suitable format, viz. a printed book or an e-book.
We can identify three stages in the production and commercialization
process.32
32 Again, these stages may or may not correspond to as many specialized market operators.
18
19. •
Publishing: the author's input is treated and converted into output formats of quality high enough to be printed and bound, or displayed on
e-book readers.
•
Distribution: the output formats are cataloged, stocked, and distributed to retailers.
•
Retailing: the printed and/or digital versions of the book are made
available for purchase to the public.
While the second and the third stage are straightforward enough, the
first stage needs a more in-depth analysis.
During this phase, the author is faced with an alternative: whether to
sign a publishing contract with a publisher or to self-publish.
The distinction is relevant from many points of view, and there is an ongoing debate about advantages and disadvantages of each other.
However, if we assume that the self-publishing author could find on the
market the typical services provided by a publishing house (editing, translation, composition, etc.), for us it will suffice to neglect the technical aspects
and focus on few fundamental economic considerations.
On the one hand, a “traditional” author receives from his publishing
house royalties on books sold, which provide “an incentive to the author to
help market his book”33.
The analogy of the author with “a salesman on commission”34 is intuitive and more persuasive than the analogy of royalties with taxes.
On the other, a self-publishing author has to arrange the whole production and commercialization process of his book.
33 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the
Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 590,
bit.ly/17dxX0b.
34 Ibid., p. 591.
19
20. It is very likely that he will need to outsource part of it, which explains
the existence of specialized companies (self-publishing platforms).
The higher the fixed and circulating capital requirements, the clearer
the case for the role of intermediaries in the supply chain.
Thanks to the introduction of low-cost and user-friendly desktop publishing, Internet distribution, and e-book reading devices, instead, it is virtually possible for an author to achieve a perfect vertical integration of the digital supply chain.35
Therefore, we may observe an even greater variety of degrees of supply
chain integration in the future.
II.3. THE ITALIAN E-BOOK MARKET
In 2011, the Italian e-book market experienced fast growth.
The number of e-book consumers reached 1.1 million people, approximately 2.3% of Italian adult (14+) population, with a yearly growth rate of
59%.36
Sales of e-reading devices increased by 718%, from €16 million in 2010
to €131 million in 2011.37
In the same year, e-book sales still represented a tiny fraction of the
publishing industry total sales (€3 million vs. €1.3 billion, approximately),38
but the estimated 2011-2012 growth rate is 300%, with approximately €12
million worth of e-book sales.39
35 For a partial, yet worthy of mention, example of disintermediation, see O. SOLON, J.K. Rowling's Pottermore details revealed: Harry Potter e-books and more, “Wired”, 06/23/2011,
bit.ly/WLzGY4.
36 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
37 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
38 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
39 Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
20
21. In May 2012, 32,000 titles were available in digital format, 4.4% of the
2012 book catalog, a 180% increase over the previous period. 40
The e-book distribution platform Edigita, by FELTRINELLI, RCS, GEMS,
and other Italian publishers, is the market leader in Italy, both in terms of
value and catalog size.41
The publishing house MONDADORI, which distributes its own e-books, is
comparable to Edigita in terms of value, in spite of a relatively small catalog.
SIMPLICISSIMUS BOOK FARM e-book distribution platform, Stealth, is comparable to Edigita in terms of catalog size, with relatively low sales, because
of its focus on small and medium publishers, and self-publishing authors.42
Both the paper and the digital book retailing industry are quite
crowded, not only by many publishing houses, distributors, and specialized
retailers, but also by “outsiders”, such as consumer electronics retailers and
telecommunications operators.43
However, the Italian market is probably being shaken by the entrance of
the main global industry players, AMAZON in particular, with the national release of Amazon.it in September 2010 and Kindle in December 2011.44
On the 16th of May 2013, ISTAT (ISTITUTO
NAZIONALE DI STATISTICA)
pub-
lished a comprehensive report about the supply and demand of books in Italy
for the years 2011 and 2012.45
The report is based on the results of two different surveys:46
40
41
42
43
44
45
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
Private estimates from SIMPLICISSIMUS BOOK FARM 2012 business plan.
La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013,
bit.ly/13RcbPE.
46 Nota metodologica, p. 1 in La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE.
21
22. •
a supply-side survey about book production, administered to Italian
publishers (almost 2,700 in total) in 2012;47
•
a demand-side surveys about everyday habits and lifestyles, administered to a sample of Italian families (19,300 in total, distributed in 853
cities and towns) in 2011 and 2012.48
The Italian publishing industry is very concentrated: in 2011, 11.3% of
active publishers49 published 75.8% of the entire book catalog and printed
88.7% of the total number of copies.50
In 2011, over 15% of the 9,000 print titles published in Italy in the same
year was made available also in e-book format.51
Most of these digital titles (67.2%) are adult non-fiction books (natural
sciences, linguistics, law and administration, geography and travel, information technology) and classic literary texts.52
It must be noted that, usually, fiction books attract a larger public and
are more price-elastic and promotion-elastic than non-fiction books.
In the same year, only one e-book out of four presented extra content or
additional features (hypertext links, multimedia, etc.) with respect to its print
47 Publications shorter than five pages and propagandist, advertising, and informative materials are excluded from the inquiry.
48 “Readers” are defined as persons aged 6+ who have read at least one book in their leisure
time during the 12 months prior to the interview.
49 The survey also includes companies that publish and print books as an accessory activity: in
2011, 25.2% of the responding publishers did not publish any book.
For more information, see Nota metodologica, p. 1 in La produzione e la lettura di libri in
Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE and La produzione e la lettura di
libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, p. 11, note 3, bit.ly/13RcbPE.
50 La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, p. 10, tab. 6,
bit.ly/13RcbPE.
51 Ibid., p. 17.
52 Ibid.
22
23. edition,53 while 79.1% of the Italian digital publications were protected by
DRM technologies54.55
In 2012, almost 5.5 million people aged 16-74 used a mobile device
(cellphones, smartphones, PDAs, MP3 players, e-book readers, handheld
game consoles, etc.) to connect to the Internet away from home or the workplace.56
Among them, 13.2% (over 700,000 people) read books online or downloaded e-books, in line with the European average (13%).57,58
53 Ibid.
54 See section II.1.
55 La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013, p. 17,
bit.ly/13RcbPE.
56 Ibid., p. 15.
57 La produzione e la lettura di libri in Italia: anni 2011 e 2012 , Istat, 05/16/2013, p. 15,
bit.ly/13RcbPE.
58 The ISTAT estimate is lower than the SIMPLICISSIMUS BOOK FARM private estimate reported
above in this section.
23
24. III. REVIEW OF THE LITERATURE
Due to the youth of the digital publishing industry, especially in Italy,
our research subject is relatively unexplored.
Nevertheless, we have drawn valuable contributions from the voluminous literature on related subjects, such as the commercial impact of digital
products, the characteristics of Internet distribution, and the peculiarities of
the publishing industry.
III.1.DIGITAL “CANNIBALIZATION” OF PHYSICAL SALES
Some publishers and authors are skeptical about the profitability of ebooks and, thus, the sustainability of the industry: they fear that the digital
versions of their books could “cannibalize” higher-priced print sales, without
enough market growth to offset declining prices.59
The implicit assumption behind this line of reasoning is that paper
books and e-books are products homogeneous enough to be considered very
close substitutes, which implies highly positive cross-price elasticities.
However, other industry observers point to the fact that e-book consumers represent a relatively distinct market segment for a relatively differentiated product.60
Once consumers invests on a device that allows to carry an entire library and search, scale, and highlight text, they shop directly the on-device ebook stores.
59 M. RICH, Steal this book (for $9.99), “The New York Times”, 05/16/2009, nyti.ms/11MVb0z.
Interestingly, the article reports that publishers expressed similar concerns at the time of
the introduction of the paperback format, which, eventually, expanded the demand for
books, even though it cannibalized hardcover sales.
60 E. SCHNITTMAN, Ebooks don't cannibalize print, people do, “Black Plastic Glasses”,
09/27/2010, bit.ly/11Nk78l.
24
25. In contrast, print format selection (hardcover vs. paperback, usually)
follows book title selection, which explains the publication delay of lowerpriced paperback editions with respect to higher-priced hardcover editions. 61
Hu and Smith (2011) use data from a natural experiment to test the significance of cross-channel effects between e-books and print books. 62
In April and May 2010, a publisher stopped distributing Kindle titles to
AMAZON, but returned to release Kindle e-books and print (hardcover) books
simultaneously in June 2010.
The titles published in April and May 2010 are similar to those published in March and June 2010 along some observable dimensions. 63
The latter group serves as “control” (no publication delay of the e-book
version with respect to the print edition) for the former “experimental” group
(publication delay of the e-book version with respect to the print edition variable between one and eight weeks).64
The most robust finding of the research is a significant decrease in
overall digital sales caused by delaying the publication of the e-book edition
relative to the print edition.
However, there is also some evidence of cross-channel substitution for
“popular” books (defined in the paper as top 20% books ranked by sales) 65.66
In the case of popular books, content selection might precede channel
selection, whereas for “niche” books (defined in the paper as bottom 80%
61 S. LOWE, Do ebooks cannibalize print sales?, “Publishing Bits”, 07/28/2009, bit.ly/YymFj6.
62 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural
experiment, 08/29/2011, p. 9, available at SSRN: bit.ly/Y2G650.
63 Ibid., p. 12.
64 Ibid., p. 10.
65 Ibid., p. 19.
66 Ibid., p. 25.
25
26. books ranked by sales)67 consumers might be more likely to search for alternatives in their preferred channel.68
Unfortunately, in order to look for further evidence on digital cannibalization of physical sales, we have to turn our attention to other publishing
products that have already undergone a substantial process of digitization.
In an early study of the effects of the addition of Internet channels by
newspaper companies, Deleersnyder et al. (2002) collect data for 85 online
newspapers launched in UK and Netherlands between 1991 and 2001. 69
In the newspaper industry context, cannibalization is represented by a
reduction in circulation and/or advertising revenues. 70
Based on individual and collective evidence, the researchers dismiss
“the often-cited cannibalization fears”71 as “largely overstated”72.
However, they also report a significantly higher probability of circulation revenues cannibalization in case of high overlap between the online and
offline version of the newspapers, measured by surveying the respective webmasters.73
Another noteworthy secondary finding concerns the possible non-neutrality of the digitization process across product categories: some support
emerges for the hypothesis that economic newspapers might benefit more
than average from an online channel addition.74,75
67 Ibid., p. 19.
68 Ibid., p. 8.
69 B. DELEERSNYDER, I. GEYSKENS, K. GIELENS, M. G. DEKIMPE, How cannibalistic is the
Internet channel? A study of the newspaper industry in the United Kingdom and the Netherlands, “International Journal Research in Marketing”, 19, 4, 2002, p. 337, bit.ly/1bP5umm.
70 Ibid., p. 342.
71 Ibid., p. 337.
72 Ibid., p. 346.
73 Ibid.
74 Ibid., p. 343 and p. 344, note 10.
75 Recently, similar evidence emerged for the Italian market: in the first seven months of 2013,
Il Sole 24 Ore, the most widespread national daily business newspaper, outperformed the
other national newspapers in terms of digital subscriptions.
For more information, see G. FUSINA, Gli abbonamenti ai quotidiani digitali, dataninja.it,
09/18/2013, bit.ly/1fgfbwG.
26
27. In 2002, associations of publishers and authors complained to AMAZON
CEO Jeff Bezos about the negative impact of the promotion of used books by
the famous e-tailer on the sales of new titles. 76
Ghose et al. (2006) use data collected between 2002 and 2004 from
Amazon.com new- and used-book marketplace to empirically test this theoretically possible proposition.77
Information and communication technology transformed the very inefficient brick-and-mortar used-book market into a relatively efficient online
market, which shares with the e-book market potentially lower price tags
than the new-book print market.
[…] while brick-and-mortar bookstores have high search costs, limited inventory
capacity, limited geographical coverage, and relatively high prices, IT-enabled markets for used books offer low search costs, nearly unlimited (virtual) inventory ca pacity, global coverage, and—through competition among sellers—relatively low
prices. […] Internet sales of used books made up an estimated 67% of all used-book
sales in 2004 (Wyatt 2005). This represents the highest Internet penetration for
any physical product category that we are aware of […] 78
The cross-price elasticity of new-book sales with respect to used-book
prices has the expected positive sign, but is rather low.79
According to the theoretical model,80
[…] only 16% of AMAZON used-book sales directly cannibalize new-book purchases; the remaining 84% of sales represent purchases that otherwise would not
have occurred at new-book prices.81
76 D. K. KIRKPATRICK, Online sales of used books draw protest, “The New York Times”,
04/10/2002, nyti.ms/Y7mbUm.
77 A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1,
2006, pp. 4-5, bit.ly/1983mWp.
78 Ibid., p. 4.
79 Ibid., pp. 13-14.
80 Ibid., pp. 6-9.
81 Ibid., p. 17.
27
28. Used books may be poor substitutes for new books, mainly because of
eventual quality degradation and possible reseller unreliability.
The indirect method proposed by the researchers to quantify the substitution effect between used and new books treats the two products as homogeneous.82
If, instead, they were relatively differentiated products, such an estimate, based as it is on cross-price elasticity of demand, would be misleading.
This consideration may not be deemed relevant for the used books market, but might be fundamental for e-books and, more in general, digital prod ucts.
The often-cited fears expressed by publishing industry participants
might rest not so much on sales cannibalization from cheaper versions of the
same products, as on market “annihilation” from entirely new products.
In
UK,
the
OFCOM
(OFFICE
OF
COMMUNICATIONS)
and
the
IPO
(INTELLECTUAL PROPERTY OFFICE) commissioned KANTAR MEDIA to conduct an
extensive and rigorous83 survey, to measure online copyright infringement
levels during the third quarter of 2012, consumer spend on recorded and digital media, and willingness to pay for six different content types:
•
music,
•
films,
•
TV programs,
•
computer software,
•
books,
82 Ibid., p. 8, eq. 8.
83 For the report, data reconciliations, questionnaire, and data tables, see Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS.
28
29. •
and video games.84
Among these product categories, books are the most laggard in terms
of digitization: the total estimate of digital and physical books consumed is
176 millions, of which 39% are e-books consumed via downloading or accessing online (59% for free, of which 21% illegally). 85
These numbers are low if compared to those of the other content types.
•
The total estimate of digital and physical music tracks consumed is
1,403 millions, of which 81% are digital tracks consumed via downloading or streaming (72% for free, of which 37% illegally). 86
•
The total estimate of digital and physical films consumed is 148 millions, of which 56% are digital films consumed via downloading or
streaming (61% for free, of which 57% illegally).87
•
The total estimate of digital and physical TV programs consumed is
272 millions, of which 92% are digital TV programs consumed via
downloading or streaming (80% for free, of which 21% illegally). 88
•
The total estimate of digital and physical software products consumed
is 69 millions, of which 80% are computer software products consumed via downloading or accessing online (85% for free, of which
55% illegally).89
•
The total estimate of digital and physical video games consumed is 68
millions, of which 55% are digital video games consumed via downloading or accessing online (63% for free, of which 29% illegally). 90
84 Report by Kantar Media, pp. 5-6 in Online copyright infringement tracker benchmark study
Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS.
85 Ibid., pp. 66-67.
86 Ibid., pp. 27-28.
87 Ibid., p. 38.
88 Ibid., pp. 48-49.
89 Ibid., p. 58.
90 Ibid., p. 77.
29
30. III.2.“SUPERSTARS” VS. “UNDERDOGS”
It has always been common knowledge among publishing industry
workers that “a small percentage of titles accounts for a large share of sales
of copyrighted materials.”91
According to Lindy Hess, director of the Columbia Publishing Course,
The truth about this business is that, with rare exceptions, nobody makes a
great deal of money. 92
In its 1643 petition to the parliament, the British publishing guild, the
STATIONERS' COMPANY, argued that “scarce one book in three sells well, or
proves gainfull to the publisher.”93
Similar evidence was brought about by publishers and authors before
the 1876-8 ROYAL COMMISSION
ON
COPYRIGHT.
Four books out of five which are published do not pay their expenses […] The
most experienced person can do no more than guess whether a book by an unknown author will succeed or fail.94
[…] only one book in four is a very moderate calculation of the books which are
successful, or the books which pay their expenses. 95
[…] not one book in nine has paid its expenses […] still they [two publishers]
have been able to carry on the trade.96
In 1986 and 1987, according to Liebowitz' estimations,97
91 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the
role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2,
2005, p. 454, bit.ly/GHOhQB.
92 M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010,
nyti.ms/YUlmO4.
93 M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010,
nyti.ms/YUlmO4.
94 Ibid., p. 183.
95 Ibid., p. 185.
96 Ibid.
97 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the
role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2,
2005, pp. 454-455, bit.ly/GHOhQB.
30
31. best-sellers [defined in the paper as the top 124 best-sellers] were likely to have
generated nearly $1 billion in sales out of a total of $1.7 billion. 98
These estimates do not even include “sales of best-sellers from previous
years that were still selling in relatively large numbers” 99.
Liebowitz addresses also the question of market longevity, by constructing a small sample of 236 titles from a 1920s edition of the Book Review Digest, which reviewed approximately 25% of the new titles. 100
These were the “titles attracting the most attention, written by the
more important authors and published by the better-known houses” 101.
After 58 years, 54% of the 1920s best-selling titles were still in print,
vs. only 33% of the 1920s non-best-selling titles.102
Brynjolfsson et al. (2006) report that a typical brick-and-mortar store in
the early 2000s stocked only 40,000-100,000 unique titles, out of more than
three million books in print.103
In the same period, Amazon.com and other Internet retailers were selling almost the entire catalog of books in print.104
The researchers estimate that 30-40% of Amazon.com sales were in
books not normally available in brick-and-mortar stores. 105
Digital markets could have not only increased product variety, but also
deepened consumer preferences, a phenomenon that web marketing experts
have dubbed “long tail”.106
98 Ibid., p. 455.
99 Ibid.
100 Ibid.
101 Ibid.
102 Ibid., tab. 1.
103 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the long
tail, “Heinz Research Papers”, 51, 06/01/2006, p. 3, bit.ly/GHk7y3.
104 Ibid.
105 Ibid.
106 Ibid., p. 4.
31
32. Increased product availability implies a shift of the sales distribution towards the tail, i.e. more obscure titles, while the dispersion of consumer preferences modifies the shape of the distribution: a long tail emerges, eventually
at the expense of the head, i.e. top-selling hits.107
Fig. 3: Graphical illustration of the “long tail” hypothesis
Source: A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the
long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015,
09/05/2006, p. 40, fig. 1, bit.ly/19uLOzX.
The distinction between first-order (original) and second-order (derivative) drivers provides a simple framework to describe the supply-side (producers/retailers) and demand-side (consumers) causes of the long tail phenomenon, and to understand its dynamics.108
107 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long
tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015,
09/05/2006, p. 5, bit.ly/19uLOzX.
108 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the long
tail, “Heinz Research Papers”, 51, 06/01/2006, p. 12, exhibit 4, bit.ly/GHk7y3.
32
33. From the supply-side point of view, information and communication
technology loosens physical constraints (virtual shelf-space, aggregation of
consumers from different geographical locations, etc.) and reduces production costs (e.g. make-to-order production, such as print-on-demand), distribution costs (e.g. electronic delivery of digital products), and marketing costs
(websites, social networks, etc.)109
From the demand-side point of view, information and communication
technology reduces search costs for consumers thanks to active search tools
(search engines, sampling tools, etc.), passive search tools (recommender
systems, product classifications, etc.), and user-generated content (customer
reviews, online communities, etc.)110
Thus, “niche” products may become viable options for producers, retailers, and consumers.
Moreover, these first-order (original) drivers of the long tail phenomenon could set up a potentially cumulative and self-perpetuating process triggered by second-order (derivative) drivers:
•
the increased profitability of niche products for producers and retailers (supply-side incentive),
•
and the further deepening of consumer tastes towards niche products
(demand-side positive feedback).111
Whether or not the long tail hypothesis translates into a relevant busi-
ness phenomenon is purely an empirical question, which, since the early
2000s, has proved to be an interesting line of research for both the academic
and the business literature.112
109 Ibid., pp. 4-5.
110 Ibid., pp. 5-6.
111 Ibid., pp. 7-8.
112 In the following, we discuss only the statistical and econometric literature.
For a broader overview, see Long tail, Wikipedia, accessed on 09/07/2013, bit.ly/ZX5aLb.
33
34. Initially, the reason that prompted researchers to study the concentration of book sales was eminently practical.
Internet retailers are jealous of their own sales data, but usually report
sales rankings, therefore the need for researchers to map observable sales
ranks to the corresponding sales quantities.
Given rank data, Chevalier and Goolsbee (2003) hypothesize that the
probability distribution of book sales is paretian, a distributional assumption
already exploited by authors and publishers. 113
So, a log-linear model can be used for demand estimation:
log(Sales)=β1 +β 2⋅log( Rank)+ϵ .114
(1)
β2 is the shape parameter relating sales quantities to sales ranks, while
β1 is a scale parameter.
They estimate β2 as -0.855115 by means of the sales quantities and the
sales ranks obtained before and shortly after a simple experiment.
[…] they first obtained information from a publisher on a book with relatively
constant weekly sales, then purchased six copies of the book in a 10-minute period,
and tracked the Amazon.com rank […]116
Brynjolfsson et al. (2003) gather weekly sales data for 321 titles from
one publisher during the summer of 2001 and the corresponding weekly sales
rank data from Amazon.com.117
113 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com
and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2, 2003, pp. 208-209,
bit.ly/1b24xGp.
114 Ibid.
115 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”,
49, 11, 2003, pp. 1587-1588, bit.ly/18I1YJp.
116 Ibid.
117 Ibid., p. 1587.
34
35. Even though extremely simple, the model in (1) fits the data fairly well
(R-squared: 0.8008) and is consistent enough with industry statistics.118
Since their estimate of β2 as -0.871 is based on 861 data points, other
researchers have preferred to stick to it rather than execute experiments
with few data points.119
The most conservative estimate by Brynjolfsson et al. (2003) of the proportion of total Amazon.com sales generated by “niche” titles is 29.3%, computed as the proportion of total Amazon.com sales lying above rank 250,000,
approximately the number of titles available at the largest BARNES & NOBLE
superstore in New York City at the time (out of 2,300,000 books in print).120
Ghose and Gu (2006) analyze daily panel data for 3,210 books, gathered
from Amazon.com and Barnes&Noble.com between September 2005 and
April 2006.121
They show that, even in online markets, search costs for “obscure”
books (defined in the paper as books with sales rank higher than 20,000 or
40,000, alternatively)122 are higher than for “popular” books (defined in the
paper as books with sales rank lower than 20,000 or 40,000, alternatively) 123,
which may limit the scope of the long tail phenomenon.124
118 Ibid.
119 For example, see A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books:
an empirical analysis of product cannibalization and welfare impact, “Information Systems
Research”, 17, 1, 2006, p. 11, bit.ly/1983mWp and A. GHOSE, B. GU, Search costs, demand
structure and long tail in electronic markets: theory and evidence, “NET Institute Working
Papers”, 06-19, 2006, p. 11, bit.ly/1aeZtxj.
120 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”,
49, 11, 2003, pp. 1588-1589, bit.ly/18I1YJp.
See above and below in this section for less conservative estimates from the same authors.
121 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, pp. 9-10, bit.ly/1aeZtxj.
122 Ibid., p. 20.
123 Ibid.
124 Ibid., p. 7.
35
36. The stability across ranks and retailers, and over time, of β2 , which
measures “customers' relative tastes for popular and obscure books” 125, is a
very strong assumption that has been criticized in subsequent works; a number of alternative techniques have been proposed in the literature.
Brynjolfsson et al. (2010) suggest that the relationship between sales
and sales rank may not be purely log-linear.126
In order to improve the estimate of Amazon.com long tail sales, they
use different slope coefficients127 to fit the sales-rank relationship on a sample of 1,598 Amazon.com titles, monitored over a ten-week period from June
to August 2008.128
The estimation results of a negative binomial regression model with
four splines (knot points at the 25 th, 50th, and 75th percentiles of sales rank)129
show that “the coefficients on all four splines are negative and highly significant.”130
In addition, the slope coefficients gradually become more negative as
the book sales rank increases.
[…] book sales decrease at an increasingly faster pace, as we move from popular books to niche books […]131
The advocated advantage over the OLS linear regression in (1) is that
the model takes into account also the frequent observations with zero
sales.132
125 A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1,
2006, p. 11, bit.ly/1983mWp.
126 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, The longer tail: the changing shape of Amazon
sales distribution curve, 09/20/2010, p. 2, available at SSRN: bit.ly/15QfXyK.
127 Ibid.
128 Ibid., p. 3.
129 Ibid., p. 7.
130 Ibid.
131 Ibid.
132 Ibid., p. 6.
36
37. According to the “old” methodology, the proportion of total Amazon.com sales generated in 2008 by “niche” titles (defined in the paper as books
with sales rank higher than 100,000)133 would have been 82.57%, a clear
overestimation with respect to 36.7%, the estimate obtained with the “new”
methodology.134
The treatment of products with zero sales is critical also for the measurement of the significance of the long tail phenomenon through concentration statistics.
Meaningful concentration comparisons, across channels, retailers, and
over time, require similar product availability.
[…] the effect of product availability on the concentration of product sales may
be nonmonotonic. A moderate increase in production selection may lead to a less
concentrated distribution of product sales, but if the market is flooded by a large
number of products that have minimal sales, product sales can actually appear to
be more concentrated even if the sales don't change for any of the previously existing products.135
133 Ibid., p. 8.
134 Ibid.
135 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8,
2011, p. 1374, bit.ly/196Bu39.
37
38. Fig. 4: Concentration can be a misleading measure of the “long tail”
Case 1A: 100 products are available and the top 50% of products account for 75% of total sales.
Case 2: Add a “tail” of 100 niche products with small sales, while leaving the sales of existing
products unchanged. Now 200 products are available, and the top 50% of products account for
95% of total sales.
Case 1B: Sales of the top 100 products are exactly the same as in Case 1A. The only change from
Case 1A is we now consider 100 niche products that have zero sales. In this case, 200 products
are available, and the top 50% of products account for 100% of total sales.
Source: E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the
effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011,
p. 1375, fig. 1, bit.ly/196Bu39.
Brynjolfsson et al. (2011) use the Lorenz curve and the Gini coefficient
to study the concentration of product sales in the catalog channel and the Internet channel of a clothing retailer.136
136 Ibid., pp. 1378-1379.
38
39. They analyze an identical selection of products, at an identical set of
prices, with the same order-fulfillment facilities, available and “visible” within
an identical time window.137
As expected, the Internet channel exhibits a less concentrated distribution of product sales than the catalog channel.138
Elberse and Oberholzer-Gee (2006) use three different techniques and
Nielsen VideoScan data to “study the distribution of revenues across products in the context of the US home video industry for the 2000 to 2005 pe riod”139.
First, they generate various descriptive statistics for the distribution of
sales across titles from year to year and compute the Kolmogorov-Smirnov
statistic for pairs of years to test for shifts in the distribution. 140
Location, scale, skewness, kurtosis, and inter-quartile measures are
“consistent with a scenario in which the distribution becomes more dispersed, more asymmetrical, and develops a sharper peak and a longer tail
over time”141.
The Kolmogorov-Smirnov tests reveal that the distributions of weekly
sales across titles are significantly different across the years. 142
Then, they “estimate a quantile regression model to examine the factors
that underlie the shift in the distribution of sales” 143.
137 Ibid., p. 1374.
138 Ibid., p. 1379.
139 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long
tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015,
09/05/2006, p. 2, bit.ly/19uLOzX.
140 Ibid., pp. 8-9.
141 Ibid., p. 12.
142 Ibid.
143 Ibid., p. 8.
39
40. In the case at hand, quantile linear regression is more appropriate than
OLS linear regression: the research is not concerned with average effects,
but with “how the entire distribution changes with certain covariates” 144.
Examining multiple quantiles allows for a richer inference than “segmenting the response variable into subsets according to its unconditional distribution and then doing least squares fitting on these subsets” 145, a procedure that clearly suffers from sample selection bias.
Quantile linear regression results show that
the distribution of sales has shifted down in general, but this shift is largest for
the better-selling titles. 146
[...] the tail of the distribution has seen a much smaller decrease, implying a
shift in the mass towards niche products. 147
Finally, they estimate a negative binomial regression model to analyze
the number of titles that meet certain weekly sales threshold levels (“zero
sales, sales below the 70th quantile, sales between the 70 th and the 80th quantile, sales between the 80th and the 90th quantile, and sales above the 90th
quantile”148).149
The whole picture that emerges from this comprehensive study is quite
complex: “superstar” and “long tail” effects are not necessarily antithetical,
and, indeed, seem to coexist.
Are there important superstar and long-tail effects in U.S. home video sales?
The answers turn out to be of the “yes, but…” variety. Yes, there is a long-tail effect
in that the number of titles that sell only a few copies every week increases during
our study period. But at the same time, the number of non-selling titles also in 144 Ibid., p. 9.
145 K. F. HALLOCK, R. KOENKER, Quantile regression, “Journal of Economic Perspectives”, 15,
4, 2001, p. 147, bit.ly/15gERVI.
146 A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of the long
tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015,
09/05/2006, p. 13, bit.ly/19uLOzX.
147 Ibid.
148 Ibid., p. 17.
149 Ibid.
40
41. creases substantially; it is now four times as high as in 2000. Many underdogs turn
out to be losers. We also find evidence of a superstar effect. Among the best-per forming titles, it is an ever-smaller number of films that accounts for the bulk of
sales. The caveat here is that today's superstars lack the punch of earlier years.
Video sales generally decrease over time across all quantiles of the sales distribution, but this effect is most pronounced among best-selling titles.
150
In order to estimate the potential producer and consumer welfare arising from the availability in e-book format of world out-of-print titles, Smith et
al. (2012) match two random samples of titles: one composed of titles already
available on the Kindle marketplace, the other of titles not yet available on
the Kindle marketplace.151
The project required the mapping of sales ranks of Kindle out-of-print titles to the corresponding sales levels.
Initially, they try to fit with (1) a dataset provided by a major publisher,
covering weekly sales and weekly sales ranks for 713 e-book titles for a tenweek period.
Since the research object is the “extreme tail”152 of out-of-print books,
they also try various different polynomial rank terms to produce stronger fits
in the tail of the distribution, and obtain better results with a third degree
polynomial function:
2
3
log (Sales)=β1 +β 2⋅log (Rank)+β3⋅log( Rank) +β 4⋅log( Rank) +ϵ .153
(2)
150 Ibid., p. 18.
151 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, pp. 2-3, available at SSRN: bit.ly/WcNhJq.
152 Ibid., p. 10.
153 Ibid.
41
42. Fig. 5: Calibration between sales and rank
Source: M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, p. 33, fig. 5, available at SSRN: bit.ly/WcNhJq.
Still unsatisfied with the fit produced by (2) for observations with ranks
above 200,000, the researchers complement the estimation method with an
experiment similar to that of Chevalier and Goolsbee (2003).154
They purchased between one and three copies of 30 randomly selected
Kindle titles with ranks between 200,000 and 1,000,000, and tracked their
sales before and after this experiment.155
•
For titles with low ranks (below 200,000), they predict sales with
(2).156
154 See above in this section.
155 Ibid.
156 Ibid., p. 11.
42
43. •
For titles with high ranks (above 200,000), they “assign sales according to the expected sales that belong to the interval the title rank falls
into based on the experiment described above”157.
•
If a title does not have a rank, they assume that it has no sales. 158
III.3.SURVEY AND DESCRIPTIVE EVIDENCE
In 2002, the OPEN
EBOOK
FORUM159 sponsored a consumer survey on e-
books, administered during the New York City is Book Country event.160
263 volunteers self-completed the survey, so the results are limited by
sample self-selection and incomplete questionnaires.161
Most participants (61%)162 reported that they were willing to buy ebooks at the same price of paperback books.163
However, according to survey data from the UK OFCOM Online copyright infringement tracker benchmark study Q3 2012,164 e-book consumers
expect lower price tags with respect to print books.165
The likelihood to pay for a single book download decreases steadily with
price: among those who have ever downloaded or accessed e-books, 78% are
willing to pay at £2, falling to 7% at £10.166
The mean price willing to pay is £3.49; 167 similar results arise for the
willingness to pay for a subscription service.168
157 Ibid.
158 Ibid.
159 See section II.1.
160 Consumer survey on ebooks, Open eBook Forum, 2003, p. 4, available at im+m:
bit.ly/15Olump.
161 Ibid., p. 8.
162 Ibid., p. 14.
163 Ibid., p. 19.
164 See section III.1.
165 Report by Kantar Media, p. 69 in Online copyright infringement tracker benchmark study Q3
2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS.
166 Ibid.
167 Ibid.
168 Ibid., p. 70.
43
44. Fig. 6: Likelihood to pay for downloading a single e-book
Question: Assume you saw a new fiction e-book on an online service that you wanted to own. It
would be high quality, and you knew it was a reputable and reliable service. How likely would
you be to download it if it was the following prices?
Base: All 12+ in the UK that have ever downloaded/accessed e-books (652).
£3.49 is the average price people are willing to pay for a single e-book download.
Source: OCI Benchmark study slide pack 5 - Books, p. 31, slide B23 in Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS.
Compared to the users of other content types covered in the report, ebook downloaders are more skewed towards females (54%) and have an older
age profile (58% are 35+).169
AMAZON Kindle is the most used service for e-books (80% of users, consistent among demographics and sub-groups)170, which might explain, in part,
why the estimate of illegal behavior for books is the lowest across content
types (11% of users)171.
Descriptive evidence on the e-book market has been presented in 2012
by Mark Croker, founder of SMASHWORDS, a large distributor of self-published
e-books.172
169 Ibid., p. 61.
170 Ibid., p. 66.
171 Ibid., p. 65.
172 M. COKER, How data-driven decisions *might* help indie ebook authors reach more readers ,
“RT Booklovers Convention”, 04/25/2012, slidesha.re/Xhccuo.
44
45. Since 2008, the company has published more than 100,000 indie (independent) e-books distributed to multiple major retailers, and remunerates authors with significantly higher royalties (60% of list price) than traditional
publishing houses.173
Aggregate sales data for a nine-month period from the SMASHWORDS distribution network174 reveal strong overall growth, driven by titles achieving
viral word of mouth.175
In fact, the sales distribution is characterized, unsurprisingly, 176 by few
titles selling extremely well, thousands of moderate sellers, and a vast majority of poor selling titles.177
Sales of individual titles at APPLE iBookstore rise and fall over time, either randomly or based on author promotions, new releases, and promotions
by retailers.178
As for price elasticity, e-books priced between $2.00 and $2.99 sell 6.2
times more units than those priced more than $10.00, 179 which implies an approximate price elasticity of -2.
However, e-books priced between $0.99 and $1.99 seem to underperform in terms of profitability with respect to those priced between $2.99 and
$5.99.180
III.4.ECONOMETRIC EVIDENCE
We can identify historical and political reasons prompting econometric
investigations of the publishing market, of the relationship between quantity
sold and prices in particular.
173 Ibid., pp. 8-9.
174 Ibid., pp. 18.
175 Ibid., pp. 24.
176 See section III.2.
177 Ibid., p. 61.
178 Ibid., p. 30.
179 Ibid., p. 51.
180 Ibid., p. 58.
45
46. Historically, economic arguments have had, and still have, a central role
in public debates and political decisions about copyright laws.
As far back as 1643, STATIONERS181 argued that “books are luxuries, the
demand for which is elastic.”182
Books (except the sacred Bible) are not of such general use and necessity, as
some staple commodities are, which feed and clothe us, nor are they so perishable,
or require change in keeping, some of them being once bought, remain to children's children, and many of them are rarities only and useful only to a very few,
and of no necessity to any, few men bestow more in Books than what they can
spare out of their superfluities […] And therefore property in Books maintained
among stationers cannot have the same effect, in order to the public, as it has in
other commodities of more public use and necessity. 183
Before the 1876-8 ROYAL COMMISSION
ON
COPYRIGHT184, the English
philosopher Herbert Spencer witnessed against the introduction of the “compulsory licence”185 system, where any person would be free to print any book
after paying a fixed percentage of the selling price to the author.186
According to him, such a system would be “especially injurious to the
particular class [of books] which of all others needs encouragement” 187, described by the chairman of the commission as “the graver class [of books]
which do not appeal to the popular tastes”188.
Contrary to STATIONERS' thesis, the demand for certain books, e.g. philosophical works, seems, according to their own authors, inelastic to a fall in
price, which suggests significant economic differences across subject categories.189
181 See section III.2.
182 A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934, p. 177,
bit.ly/1giWKb4.
183 Ibid.
184 See section III.2.
185 A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934, p. 183,
bit.ly/1giWKb4.
186 Ibid., p. 188.
187 Ibid.
188 Ibid., pp. 188-189.
189 See section II.3.
46
47. For the titles composing Liebowitz Book Review Digest dataset,190 large
differences across subject categories emerge also in market longevity.191
Tab. 1: Percentage of titles surviving more than 58 years
Category
All titles
Best-sellers removed
Academic
68%
68%
Philosophical
52%
41%
History
51%
43%
Biography
49%
42%
Religion
46%
40%
Poetry
43%
40%
Fiction
36%
40%
Mystery
23%
16%
Comedy
25%
0%
Autobiography
19%
11%
Art
17%
17%
Travel
6%
6%
Sports
0%
0%
Source: S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the role of theory, empirics, and network effects, “Harvard Journal of
Law & Technology”, 18, 2, 2005, p. 456, tab. 2,
bit.ly/GHOhQB.
Bittlingmayer (1992) provides empirical estimates of the elasticity of
the demand for books, 192 as part of a more general economic analysis of the
190 See section III.2.
191 S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright: the
role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2,
2005, p. 456, bit.ly/GHOhQB.
192 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the
Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 602, tab.
47
48. institutional framework of the publishing industry.
Data about prices, sales quantities, retail margins, royalties, and inventories for over 1,000 titles for the period 1984-1986 were provided by a West
German publishing house, specialized in the production of academic and intellectual books.193
In markets characterized by monopolistic competition, under the assumption of profit maximization, there is an inverse relationship between percentage margin (the difference between price, p, and marginal cost, c, as percentage of p) and demand elasticity ( ηp , in absolute value):
(p−c) 1 194
=η .
p
p
(3)
The marginal costs associated with the sale of an extra copy of a book – taxes,
royalties, and the retailer's margin – amount to only 40 to 60 percent of the retail
price, which in turn implies a price elasticity in the range 1.7 to 2.5. 195
Resale price maintenance restricts retailing by allowing publishers to
directly control both the retail price (p) and the wholesale price (the difference between the retail price, p, and the retail margin, v).196
In the period considered, resale price maintenance was legal in West
Germany, and still is today in the Federal Republic of Germany, observed and
enforced by the GERMAN BOOK TRADERS ASSOCIATION.197
3 and p. 603, tab. 4, bit.ly/17dxX0b.
193 Ibid., p. 589 and p. 597.
194 Ibid., p. 588.
195 Ibid., pp. 588-589.
196 See also section II.1 for information about the agency model.
197 German resale price maintenance act, Börsenverein des
07/14/2006, bit.ly/17bdJF6.
48
Deutschen
Buchhandels,
49. So, under the assumption of positive yet marginally decreasing sales
growth in retail selling effort,198 there are two first-order conditions for profit
maximization:
(p−v−c ) 1
=η
p
p
199
(4)
(p−v−c ) 1 200
=η ,
v
v
(5)
and
where η v is the elasticity of demand with respect to the retail margin.
(4) and (5) imply
−η p+η v =−1−
c
.201
p−v −c
(6)
Ceteris paribus, the larger the price elasticity ( η p , in absolute value),
the larger the retail margin elasticity ( η v ).
For a cross section of books that are being optimally marketed and priced,
those for which sales are most responsive to extra promotional efforts will also be
those for which sales are most responsive to price changes. Novels have a comparatively large elasticity of demand because the quantity purchased can be influenced relatively easily with higher promotional effort. The demand for research
monographs in entomology is price inelastic because the quantity sold is not sensitive to promotional efforts.202
Again, retail margin and, thus, price elasticity vary by type of book, and
also by type of bookstore.
198 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the
Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 593,
bit.ly/17dxX0b.
199 Ibid., eq. 5.
200 Ibid., p. 594, eq. 6.
201 Ibid., p. 594, eq. 7.
202 Ibid., p. 594.
49
50. The margin varies by type of book. For example, the margin on academic books
is typically 25-30 percent, on literature 40 percent. School books, which are sold
through bookstores, have margins of 20 percent. Margins also vary by the type of
bookstore. A publisher will often grant bookstores that specialize in a particular
topic a deeper discount on the corresponding titles. 203
In monopolistically competitive equilibrium, where free entry drives
profits to zero, “books for which cost-covering prices allow large sales will
have demand curves that are more elastic than books with more limited audiences.”204
203 Ibid., p. 591.
204 Ibid., p. 595.
50
51. Fig. 7: Zero-profit locus of price and output combinations
K is the fixed cost of book production, m is the marginal cost of book production and distribution,
and q is the quantity produced.
Source: G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and
the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 596, fig.
1, bit.ly/17dxX0b.
Analogously, “books that involve large marginal production or marketing expenses must have large sales.”205
It can be shown that:
−η p* +η v* =−1 ,206
(7)
where η p* is the equilibrium price elasticity (in absolute value) and η v*
is the equilibrium retail margin elasticity.
205 Ibid.
206 Ibid.
51
52. For the econometric estimation of price elasticity and retail margin
elasticity, the original dataset is divided into two cross-sectional datasets,
composed
of
the
“year-to-year
percentage
changes
in
prices
and
quantities”207 for the periods 1984-1985 and 1985-1986, respectively.
When dealing with data from natural experiments, traditional “naïve”208
log-linear regression models cannot distinguish between shifts of the demand
curve and movements along the demand curve.
Suppose that the demand for a title shifts to the right, say, because this year is
the centenary of the author's birth or the book has been made into a film. With un changed margin costs […], the optimal price would increase. If this anticipation of
changed demand conditions is common, […] the estimated [price elasticity] […]
could very well be positive. 209
The change in the inverse of (4) is an estimate of changes in price elase
ticity ( Δ η pt ) and can be employed as a control for shifts of the demand curve.
The result d η pt =dη vt from (7) yields the final econometric specification:
e
T
Δ ln(qt )=a+b1⋅Δ ln(p t )+b2⋅Δ ln(v t )+b3⋅Δ η pt⋅(ln(p t )− ln(v t ))+∑ (b i Di)+ϵ ,210
(8)
i=1
where the dummies Di , i=1, ... ,T , reflect the title vintage.
The coefficients b 1 and b 2 are the estimators of η p and η v ; theoretically, the coefficient b3 should be equal to -1.
Unfortunately, the model in (8) has a weak explanatory power.
Most of the year-to-year variation in sales of books is attributable to influences
not captured by price, margin or vintage. 211
207 Ibid., p. 598.
208 Ibid.
209 Ibid.
210 Ibid., p. 599, eq. 14.
211 Ibid., p. 604.
52
53. The empirical analysis suggests a price elasticity between -2 and -3 for
this dataset of academic and intellectual books, 212 a figure slightly larger than
what implied by (4) (about -1.7 to -2.5)213.
The estimate of retail margin elasticity is about 1.5, positive and
“roughly consistent with the theory”214.
b 3 deviates from the predicted value of -1 and the estimated elasticities
may be biased by measurement problems, especially for “poorly selling” titles
(defined in the paper as the titles with sales lower than “the median number
of books sold per title”215).
Measurements problems include errors in the price and retail margin
variables (e.g. “divergence of actual average sales price from the nominal
price”216), and the failure to allocate “marginal printing costs” 217 and “title-specific promotional expenses of the publisher”218.
Brynjolfsson et al. (2003) show that, under resale price maintenance219,
the price elasticity of the aggregate demand for a given title in the retailing
market equals the price elasticity of demand for the title faced by its publisher.220
Data from the AMERICAN ASSOCIATION
OF
PUBLISHERS and discussions of
the researchers with various publishers indicate a gross margin of 56%-64%
for “the typical obscure title”221.222
212 Ibid., p. 605.
213 See above in this section.
214 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the
Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 605,
bit.ly/17dxX0b.
215 Ibid., p. 601.
216 Ibid., pp. 603-604.
217 Ibid., p. 605.
218 Ibid.
219 See above in this section.
220 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers, “Management Science”,
49, 11, 2003, p. 1585, eq. 10, bit.ly/18I1YJp.
221 Ibid., p. 1586.
222 Ibid.
53
54. So, according to (4), for such titles, the price elasticity of the demand
faced by the publisher is between -1.56 and -1.79, 223 which represents an estimate of the price elasticity of the aggregate demand in the retailing market,
obtained “by taking advantage of the characteristics of the book industry
structure and available industry statistics on gross margins” 224.
The online book-selling business has been thriving for almost twenty
years to date and anticipated the e-book market in many respects, such as
the increased product availability.225
Studies about the electronic commerce of books could very well be relevant for the electronic book industry.
At the beginning of the 2000s, online book sales made up about 10% of
total book sales in the US;226 the combined market share of Amazon.com and
BarnesandNobles.com, the two dominant online bookstores, was higher than
85% in terms of sales, with the former selling almost four times as much as
the latter.227
Chevalier and Goolsbee (2003) use sales rank data228 to estimate both
the own-price elasticity faced by the two merchants and their cross-price
elasticity with respect to each other.
Data about a sample of 20,000 books, constructed through stratified
random sampling from three different sources (“to get books representative
of different parts of the sales distribution” 229), were “scraped”230 from the
223 Ibid.
224 Ibid., p. 1585.
225 See section III.2 for a discussion of the long tail phenomenon.
226 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition
and BarnesandNoble.com, “Quantitative Marketing and Economics”,
bit.ly/1b24xGp.
227 Ibid.
228 See section III.2.
229 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition
and BarnesandNoble.com, “Quantitative Marketing and Economics”,
bit.ly/1b24xGp.
230 See Web scraping, Wikipedia, accessed on 09/07/2013, bit.ly/1e1Oo8h.
54
online: Amazon.com
1, 2, 2003, p. 205,
online: Amazon.com
1, 2, 2003, p. 206,
55. websites of Amazon.com and BarnesandNobles.com during April, June and
August 2001.
The period was characterized by major pricing experiments by the two
e-tailers, across broad categories of titles.231
BarnesandNobles.com data are censored for sales rankings greater
than about 630,000 whereas Amazon.com provides complete sales rank
data.232
Chevalier and Goolsbee (2003) use “the trimmed least absolute deviation deviations (LAD) panel estimator of Honore (1992)” 233 for the censored
dataset and OLS for the complete one.234
BarnesandNoble.com own-price elasticity is around -3.5 vs. only about
-0.45 for Amazon.com.235
Cross-price effect seems to be relevantly positive only for BarnesandNoble.com,236 but a robustness check of the trimmed LAD estimator, performed by dropping the observations missing sales rank and employing OLS
in place of the trimmed LAM, 237 shows a much lower degree of shifting from
Amazon.com to BarnesandNoble.com.238
Interestingly, Amazon.com, the incumbent, prices in the inelastic portion of the demand curve, in contrast with the theory of static imperfectly
competitive markets.239
231 J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com
and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2, 2003, p. 206,
bit.ly/1b24xGp.
232 Ibid.
233 Ibid., p. 215.
234 Ibid.
235 Ibid., p. 217.
236 Ibid., p. 218.
237 Ibid., p. 219.
238 Ibid.
239 Ibid., pp. 217-218.
55
56. However, “a firm maximizing dynamic profits might choose a price below […] [the] static profit-maximizing level,” 240 which is of some interest for
the fast-growing e-book market.
Prices below the single-period profit-maximizing level would be attractive in a
growing market with consumer switching costs, for example. 241
Ghose and Gu (2006) use data gathered from Amazon.com and BarnesandNobles.com to study the significance of search costs in online markets, 242
which could create kinks in the demand curve.
“The demand elasticity for price increases is different from the demand
elasticity for price increases”243 according to the magnitude of search costs.
When search costs are high, if a retailer increases prices, its own customers will notice it and have an incentive to look for better bargains at competing retailers.
If the retailer decreases prices, instead, potential new customers will
not be aware of it.
As a consequence, demand elasticity (in absolute value) for price increases is higher than for price decreases.
Vice versa, when search costs are low, if a retailer increases prices, it
will affect only its current customers.
If the retailer decreases prices, instead, it will attract potential new customers.
As a consequence, demand elasticity (in absolute value) for price increases is lower than for price decreases.
240 Ibid., p. 218.
241 Ibid.
242 See section III.2.
243 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, p. 7, bit.ly/1aeZtxj.
56
57. Log-linear models using sales rank data, scraped from the websites of
online retailers, are common in the stream of literature under review. 244
In order to capture the difference in demand elasticity between price increases and price decreases in such regression models, Ghose and Gu (2006)
propose a decomposition of the price explanatory variable.
β1 log( Pit )+
∑
j=1,2,3,4
β2j (log( Pit )−log( Pit ))× PriceDecreaseit×Week ijt ,245
(9)
where:
•
Pit is the retailer price of product i at time t,
•
Pit is “the price before the most recent price change”246,
•
PriceDecreaseit is a dummy that “takes the value of 1 if the most recent action on product i is a price decrease”247,
•
and Weekijt are four ( j=1, 2, 3, 4 ) weekly dummies that represent the
number of weeks after the most recent price decrease and quantify
the “time for information on price decreases to spread in the
market”248.
Therefore, “ β1 represents demand elasticity for price increases,” 249
while “ β2 denotes the difference between demand elasticity for price decreases and that for price increases.”250
244 See section III.2, and also above and below in this section.
245 A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets: theory and evidence, “NET Institute Working Papers”, 06-19, 2006, p. 14, eq. 4, bit.ly/1aeZtxj.
246 Ibid.
247 Ibid., p. 13.
248 Ibid., p. 14.
249 Ibid.
250 Ibid.
57
58. The relative price elasticities of the two online book retailers with respect to each other are similar: between -1.49 and -1.89 for Amazon.com, and
between -1.53 and -1.60 for BarnesandNobles.com.251
However, Amazon.com relative price elasticity is higher for price decreases than for price increases and gradually increases over time after a
price decrease, whereas BarnesandNobles.com relative price elasticity is
lower for price decreases than for price increases, with little information diffusion over time.252
There are a number of possible explanations why the two retailers face
different search costs: specific consumer search preferences, targeting of different consumer segments, different implementations of active and passive
search tools253, etc.
Hu and Smith (2011) analyze sales data, directly provided by a publisher, of both print books and e-books.254
The price elasticity estimates obtained by running an OLS linear regression on the whole dataset are around -3 for both print books and e-books.255
Quantile linear regression256, used in place of OLS on the same dataset,
unveils significant differences between best-selling titles (defined in the paper as top 20% books ranked by sales) 257 and non-best-selling titles (defined
in the paper as bottom 80% books ranked by sales) 258.
251 Ibid., p. 17.
252 Ibid., p. 25.
253 Ibid., pp. 26-27.
254 See section III.1.
255 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural
experiment, 08/29/2011, p. 9, available at SSRN: bit.ly/Y2G650.
256 See section III.2.
257 Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a natural
experiment, 08/29/2011, p. 19, available at SSRN: bit.ly/Y2G650.
258 Ibid.
58
59. Price elasticity for print books in the 20th and the 80th percentile is
about -1.6 and -1.3, respectively, vs. -3 and -1.9 for e-books in the 20 th and
80th percentile.259
Smith et al. (2012) study the potential market for the digital version of
out-of-print titles260 through sales rank data, scraped from Amazon.com and
the Kindle marketplace.261
First, a probit regression model is applied on random samples of out-ofprint titles already available on the Kindle marketplace and of out-of-print titles not yet available on the Kindle marketplace, to calculate the probability
of an out-of-print title being digitized.262
The explanatory variables include:
•
the price of the print version,
•
the year of publication of the first print version,
•
the number of pages of the print version,
•
the subject category,
•
the type of audience,
•
the number of Bing search results for the ISBN263,
•
the sales rank of the print version on Amazon.com,
•
the binding format of the print version,
•
and a dummy set to one for “large publishers”. 264
259 Ibid., p. 24, tab. 10.
260 See section III.2.
261 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, pp. 6-7, available at SSRN: bit.ly/WcNhJq.
262 Ibid., pp. 8-9.
263 See International Standard Book Number, Wikipedia, accessed on 09/07/2013,
bit.ly/16kdpGT.
264 Ibid., pp. 7-8.
59
60. Then, with the propensity scores thus obtained, the two samples are
matched through the nearest neighbor and the stratification method.265
In order to obtain more credible confidence intervals for propensity
scores, the probit regression model is re-estimated with the Bayesian approach outlined in Chen and Kaplan (2011)266.267
As before, the resulting propensity scores are used to match the two
samples, based, again, on the nearest neighbor and the stratification
method.268
Finally, sales and price of the out-of-print titles not yet available on Kindle marketplace are inferred from sales and price of the matched out-of-print
titles already available on Kindle marketplace.269
[…] bringing the world's 2.7 million out-of-print titles back into print as eBooks
could create $740 million in revenue in the first year after publication […] 270
Cost assumptions based on current Kindle sales contracts271 suggest
that as much as $460 million would accrue to publishers and authors. 272
265 Ibid., pp. 14-16.
266 C. J. S. CHEN, D. KAPLAN, Bayesian propensity score analysis: simulation and case study, Society for Research on Educational Effectiveness, 2011, bit.ly/16N541w.
267 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, pp. 16-17, available at SSRN: bit.ly/WcNhJq.
268 Ibid., pp. 17-18.
269 Ibid., p. 5.
270 Ibid., p. 25.
271 Ibid., pp. 21-22.
272 Ibid., p. 25.
60
61. IV. E-BOOK PRICING BY MAJOR ITALIAN PUBLISHERS
Price information and, more in general, public catalog data for digital
books published in Italy can be obtained, with relatively little effort, by webscraping mainstream Italian e-book stores (in our case, Ultima Books).
Provided that a certain e-book has a corresponding print edition, if the
respective publisher has entered the ISBN of the print edition as part of the
e-book metadata, it is possible to query print book catalog databases (in our
case, Informazioni Editoriali) for information.
In the regression of the e-book price on the price of the print edition
and other explanatory variables, we restrict our attention to titles of publishing houses distributed by Edigita, the leading e-book distributor in Italy.273
Edigita distributes most of the major Italian publishers 274 and provides
well-formed and well-compiled metadata.
Incidentally, the purposive sample reduction with respect to the whole
market was also technically convenient, since it allowed for effective and
timely querying of data sources.
IV.1. CATALOG DATASET
Cross-sectional catalog data for 14,794 e-books distributed by Edigita,
and available for sale on the Ultima Books e-book store on the 28th of November 2012, were collected within a five-day time window (11/28/201212/02/2012).
We arranged information across few variables of interest:
•
ebook.price, the retail price of the e-book (in euro, VAT included);
273 See section II.3.
274 See section II.3 for information about market concentration in the Italian publishing industry.
61
62. •
drm, a dummy set to one if the e-book is encrypted with DRM technologies275, to zero otherwise;
•
watermark, a dummy set to one if the e-book embeds information
about the purchase276, to zero otherwise;
•
paper.price, the retail price of the print edition (in euro, VAT
included);
•
ebook.pub.delay, the distance in years between the publication of the
e-book and of the print edition (either “positive” for print-first titles or
“negative” for digital-first titles);277
•
subj.fiction, a dummy set to one or zero whether the title subject is fiction or non-fiction, respectively.278
Unfortunately, only 6,053 observations out of a total of 14,794 are com-
plete of information about the print edition.
Since experiments in digital-only publishing279 by major publishers have
been very limited, if not null, 280 missingness should be attributed to incomplete reporting during metadata compilation.
Paragraph IV.1.2 describes the treatment method adopted to deal with
the issue of missing values in X.
IV.1.1. Descriptive statistics
•
Almost every e-book in the sample (96.88%) embeds DRM technologies.
275 See section II.1.
276 Digital watermarking is a “social” deterrent against piracy and represents a loose alternative to DRM technologies.
277 All the e-books in the sample were published between 2010 and 2012, so, in practice, the
publication delay reflects the print edition vintage.
278 See section II.3.
279 For example, see Cos'è quintadicopertina, quintadicopertina, accessed on 09/07/2013,
bit.ly/1eeukj9.
280 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management.
62
63. •
Only few e-books in the sample (2.76%) employ digital watermarking.
•
Still fewer e-books in the sample do not employ either DRM encryption or digital watermarking (0.36%).
•
Non-fiction e-books are slightly more represented in the sample than
fiction e-books (56.25% vs. 43.75%).
Fig. 8: Pie chart of e-book protection mechanisms
Red: DRM encryption (Adobe Content Server 4), 96.88%;
Yellow: digital watermarking, 2.76%;
Green: open file, 0.36%.
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
63
64. Fig. 9: Pie chart of e-book subjects
Blue: non-fiction books, 56.25%;
Light blue: fiction books, 43.75%.
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
Tab. 2 reports summary statistics for the quantitative variables in the
dataset.
64
65. Tab. 2: Summary statistics for the quantitative catalog variables
Statistic
ebook.price
paper.price
ebook.pub.delay
10.893
15.984
2.045
Median
8.990
14.000
0.000
Minimum
0.000
3.900
-2.000
Maximum
109.990
150.000
36.000
Std. Dev.
7.682
9.899
3.610
C.V.
0.705
0.619
1.765
Skewness
3.470
3.314
2.939
23.688
21.547
12.924
3.990
7.000
0.000
24.990
35.000
10.000
5.010
9.000
3.000
Mean
Ex. kurtosis
5% Perc.
95% Perc.
IQ range
Source: own elaboration of public catalog data for a sample of ebooks by Italian publishers.
•
All three variables are right-skewed, as evident from Fig. 10, Fig. 11,
and Fig. 12.
•
The median price, more robust to pricey outliers than the mean price,
is €8.99 for e-books and €14.00 for print books.
•
Very few titles have been published first in e-book format and only
later in print edition.
65
66. Fig. 10: Histogram of ebook.price
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
66
67. Fig. 11: Histogram of paper.price
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
67
68. Fig. 12: Histogram of ebook.pub.delay
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
A quick survey of the correlation matrix in Tab. 3 in anticipates both
problems and results of successive regression analysis.
68
69. Tab. 3: Correlation matrix of the catalog variables
ebook.price
drm
watermark
ebook.price
1.000
drm
0.059
1.000
watermark
-0.037
-0.938
1.000
paper.price
0.927
0.019
-0.006
ebook.pub.delay
-0.125
-0.054
-0.021
subj.fiction
-0.359
0.061
-0.083
paper.price
paper.price
ebook.pub.delay
subj.fiction
1.000
ebook.pub.delay
-0.131
1.000
subj.fiction
-0.356
0.097
1.000
Source: own elaboration of public catalog data for a sample of e-books by
Italian publishers.
The very high (negative) correlation between the drm and the watermark dummies foreshadows quasi-collinearity among the two variables and
the constant term; consequent adjustments to the model are described in section IV.2.
The very high and positive correlation between the e-book price and the
print edition price is very clear in the scatter plot in Fig. 13.
The e-book price seems decreasing in the publication delay, but the effect is probably dependent on the paper price because, in our sample, the
publication delay closely reflects the vintage of the print edition. 281
281 See note 277.
69
70. Fig. 13: Scatter plot of the quantitative catalog variables
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
IV.1.2. Missing data
The treatment method applied to missing values is listwise deletion, one
of the simplest and most conventional approaches.
[…] conventional statistical methods and software presume that all variables in
a specified model are measured for all cases. The default method for virtually all
statistical software is simply to delete cases with any missing data on the variables
of interest, a method known as listwise deletion or complete case analysis. 282
The obvious drawback of listwise deletion is that it often discards a
large fraction of information, with the resulting loss of statistical power.
282 P D. ALLISON, Missing data in A. MAYDEU-OLIVARES, R. E. MILLSAP, The SAGE handbook
.
of quantitative methods in psychology, SAGE Publications, 2009, p. 72, amzn.to/15UAtJe.
70
71. Even though listwise deletion is not competitive with well-devised and
well-executed advanced treatment methods, such as maximum likelihood and
multiple imputation, it is “honest”283 compared to other conventional methods, thanks to its fairly good bias minimizing properties and standard error
estimates.284
In particular, when the missing values are restricted to X,
so long as missingness on the predictors does not depend on the dependent
variable, listwise deletion will yield approximately unbiased estimates of regression
coefficients […] for virtually any kind of regression.285
In our sample, 8,741 observations out of a total of 14,794 (59.08%) have
one or more missing values in the explanatory variables.
In order to test whether missingness depends on the value of the response variable (ebook.price), we estimate, through logistic linear regression,
the probability of the presence of missing values in X conditional on
ebook.price.
We define the missing.data dummy, set to one for the 8,741 observations with missing values in one or more explanatory variables, and to zero
for the remaining 6,053 complete observations, and regress it on a constant
term and ebook.price.
Tab. 4: Table of coefficients (LOGIT)
Coefficient
Intercept
ebook.price
Estimate
Std. error
z value
0.3676
0.0167
21.979
-1.968E-09
4.999E-09
-0.3940
p-value
< 2E-16 ***
0.6940
McFadden R-squared: 7.6970E-06
Source: own elaboration of public catalog data for a sample of ebooks by Italian publishers.
283 Ibid., p. 76.
284 Ibid., p 75.
285 Ibid.
71
72. The ebook.price coefficient can be interpreted as the change in the log
odds ratio of missingness associated with a €1 increase in the e-book price.
Its value is low and not significantly different from zero, so, according
to the model, the hypothesis that the response variable does not affect the
probability of missingness in X cannot be rejected.
We can assert, with a certain degree of confidence, that eventual biased
estimates in subsequent analysis should not be ascribed to listwise deletion
of observations with missing values in explanatory variables.
IV.2. OLS LINEAR REGRESSION RESULTS
We now model, through OLS linear regression, the relationship between
the e-book price (ebook.price) and a set of explanatory variables derived from
the information collected in the catalog dataset described in section IV.1.
The explanatory variables include:
•
the price of the print edition (paper.price);
•
the file.is.open dummy, computed as 1−drm , equal to one if the ebook is not encrypted with DRM technologies (that is, either the file
contains a digital watermark or no protection at all) 286, to zero otherwise;
•
the subj.fiction dummy, set to one for fiction books, to zero for non-fiction books;
•
a set of functional forms of the publication delay in years between the
e-book version and the print edition (ebook.pub.delay.pos, ebook.pub.delay.pos.sq, ebook.pub.delay.neg, and ebook.pub.delay.neg.sq).
286 See paragraph IV.1.1.
72
73. ebook.price=
β0 +β 1 paper.price+β2 file.is.open+β3 subj.fiction+
β4 ebook.pub.delay.pos+β5 ebook.pub.delay.pos.sq+
β6 ebook.pub.delay.neg+β7 ebook.pub.delay.neg.sq
(10)
From the descriptive analysis of the dataset, 287 we expect paper.price to
explain most of the ebook.price variability.
The model estimates the intercept for DRM-encrypted e-books, the vast
majority of titles in the sample, 288 and a constant “no-DRM effect” through
the file.is.open dummy.
The subj.fiction dummy captures a constant effect of the subject on the
e-book price, net of the effects of other variables; in particular, the coefficient
estimate is net of the effect of paper.price, probably very large and significant.
The subj.fiction coefficient could be tentatively interpreted as an indicator of differential price elasticity at subject level between the e-book market
and the print market.
Consistently with the long tail hypothesis,289 in the purely-digital e-book
market, we should expect some differential extra-elasticity for niche subjects
(non-fiction), which implies a positive sign for the subj.fiction coefficient.
As for the time delay between the publication of the e-book and of the
print edition, we differentiate between print-first titles (“positive” delay) and
digital-first titles (“negative” delay, extremely rare in the sample)290:
•
ebook.pub.delay.pos is the distance in years between the publication
of the e-book and of the print edition, if the title is a print-first book,
zero otherwise;
287 See
288 See
289 See
290 See
paragraph IV.1.1.
paragraph IV.1.1.
section III.2.
paragraph IV.1.1.
73
74. •
ebook.pub.delay.neg is the distance in years between the publication
of the print edition and of the e-book, if the title is a digital-first book,
zero otherwise.
Because of “backlog demand” piling up during the period of non-avail-
ability in e-book format, not-so-old books never published in e-book format
before could fetch, ceteris paribus, higher prices than books published simultaneously in e-book format and print edition.
The effect is likely to diminish and turn negative, as the time distance,
measured in years of publication delay, grows.
For example, out-of-copyright titles, with decades-old print editions still
on sale, have to compete with free e-book editions, available on dedicated
websites and major e-book stores.
Other possible reasons for a reversal of the effect include relatively
more recency-oriented tastes of the digital public and price stickiness in the
print book catalogs, due to menu costs, brand image considerations, and
management of the book value of physical remainders.
In order to capture such non-linear effects, we add the squares of both
time delay variables to the model:
•
ebook.pub.delay.pos.sq is the square of the distance in years between
the publication of the e-book and of the print edition, if the title is a
print-first book, zero otherwise;
•
ebook.pub.delay.neg.sq is the square of the distance in years between
the publication of the print edition and of the e-book, if the title is a
digital-first book, zero otherwise.
Accordingly, the sign of the “positive” variables coefficients should be
positive for ebook.pub.delay.pos and negative for ebook.pub.delay.pos.sq.
74
75. The ebook.pub.delay.neg and ebook.pub.delay.neg.sq coefficients, instead, should not be significantly different from zero.
First, we report estimates of coefficients, descriptive statistics, and
graphs for the OLS linear regression model, all preliminary to the setup of
the inferential framework.
Tab. 5: Table of coefficients – preliminary (OLS)
Coefficient
Estimate
-0.2020
Intercept
paper.price
0.7090
file.is.open
-1.8993
subj.fiction
-0.5417
0.0571
ebook.pub.delay.pos
ebook.pub.delay.pos.sq
ebook.pub.delay.neg
ebook.pub.delay.neg.sq
-0.0037
1.9596
-0.7477
SSR: 49280.82
R-squared: 0.8620; Adjuster R-squared: 0.8618
Source: own elaboration of public catalog data
for a sample of e-books by Italian publishers.
Normal quantile-quantile plot in Fig. 14 shows that the distribution of
residuals is symmetric, but with fatter tails than the normal distribution.
75
76. Fig. 14: Residuals normal Q-Q plot (OLS)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
Asymptotic properties of the OLS estimator do not assume normality of
residuals, but require homoskedasticity of error terms.
The plot of residuals vs. fitted values in Fig. 15 shows greater variation
in the residuals for higher fitted values and, thus, casts serious doubts on the
assumption.
76
77. Fig. 15: Plot of residuals vs. fitted values (OLS)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
The White test, a general test for heteroskedasticity that follows a chisquared distribution,291 rejects the null hypothesis of homoskedasticity.
Tab. 6: White test (OLS)
Test statistic
Degrees of freedom
1435.2380
p-value
23 < 2.2E-16 ***
Source: own elaboration of public catalog data for a
sample of e-books by Italian publishers.
291 M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, p. 92,
amzn.to/17bknLN.
77
78. The estimation of standard errors and tests in Tab. 7 rely on a heteroskedasticity-consistent covariance matrix (HC1 variant).
Tab. 7: Table of coefficients – final (OLS)
Coefficient
Estimate
Std. error
t value
p-value
-0.2020
0.2643
-0.7644
0.4447
paper.price
0.7090
0.0149
47.5134
0.0000 ***
file.is.open
-1.8993
0.1764 -10.7638 8.906E-27 ***
subj.fiction
-0.5417
0.0954
0.0571
0.0189
3.0189
0.0025 ***
-0.0037
0.0011
-3.3564
0.0008 ***
1.9596
1.2353
1.5863
0.1127
-0.7477
0.6473
-1.1551
0.2480
Intercept
ebook.pub.delay.pos
ebook.pub.delay.pos.sq
ebook.pub.delay.neg
ebook.pub.delay.neg.sq
-5.6780 1.426E-08 ***
SSR: 49280.82
R-squared: 0.8620; Adjusted R-squared: 0.8618
AIC: 29888.7; BIC: 29949.07
F(7, 6053): 996.42, p-value: < 2.2E-16 ***
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
Overall, the model provides a good fit of the data, with a small constant
term, not significantly different from zero.
As suggested by the descriptive analysis, the paper.price coefficient is
very high (0.7090) and significant, with a very small standard error (0.0149).
By neglecting the intercept and setting all the explanatory variables to
zero, except for the paper price, we could say that DRM-encrypted non-fiction
e-books, published simultaneously in e-book format and print edtion, have an
approximate 30% average discount rate off print books.
E-book pricing based on strict analogy with the print book market implies no anticipation of significant non-neutral effects of digitization, like
those experienced by the newspaper industry.292
292 See section III.1.
78
79. As expected, the coefficients of the “negative” time delay variables
(ebook.pub.delay.neg and ebook.pub.delay.neg.sq) are not significantly different from zero, with very large standard errors (1.2353 and 0.6473, respectively).
The coefficients of the “positive” time delay variables (ebook.pub.delay.pos and ebook.pub.delay.pos.sq), instead, are significant and their signs (positive and negative, respectively) are consistent with our hypothesis.
Ceteris paribus, the marginal effect on the e-book price of a marginal
increase of the (“positive”) delay in years between the publication of the ebook and of the print edition is given by:
0.0571−0.0037×2×ebook.pub.delay.pos .
(11)
Titles whose print edition has been published approximately eight years
before the publication of the e-book get the maximum “seniority price premium”, approximately equal to €0.22.
The “seniority effect” on the e-book price turns negative, at an increasing rate, after 16 years from the publication of the print edition, which one
could take as a rough estimate of the average useful life of a title
(“longevity”).
Despite the high goodness of fit, the explanatory power of the model is
satisfying only for DRM-encrypted non-fiction e-books, the majority of titles in
the sample, but restricts the effect of the subject and of the protection mechanism to residual constant effects.
The subject.fiction coefficient is negative and significant: ceteris
paribus, on average a fiction book costs approximately €0.55 less than a nonfiction book.
The result is opposite to the hypothesis of differential extra-elasticity
for non-fiction books in the e-book market with respect to the print book mar 79
80. ket, but it may be also due to differences in market longevity or protection
mechanisms between fiction and non-fiction e-books.
The coefficient of file.is.open is high and significant: ceteris paribus, on
average an e-book protected with digital watermarking, or not protected at
all, “inexplicably” costs approximately €1.90 less than a DRM-encrypted ebook.
More in general, the inferential power of the model, and thus its interpretation as a comprehensive pricing formula, is questionable.
The rejection of the null hypothesis for both the Breusch-Godfrey test 293
and the Ramsey RESET test294 suggests the presence of bias due to additional
non-linear effects and/or omitted explanatory variables.
Tab. 8: Breusch-Godfrey test (OLS)
Test statistic
Degrees of freedom
837.8528
p-value
2 < 2.2E-16 ***
Source: own elaboration of public catalog data for a
sample of e-books by Italian publishers.
Tab. 9: Ramsey RESET test (OLS)
Test statistic
Degrees of freedom
21.0448
p-value
2, 6043 7.799E-10 ***
Source: own elaboration of public catalog data for a
sample of e-books by Italian publishers.
Nonetheless, the model retains the descriptive power of the OLS estimator, which is reasonably interesting in itself.
293 A test for autocorrelation that follows a chi-squared distribution.
For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley &
Sons, 2004, pp. 101-102, amzn.to/17bknLN.
294 A test for functional misspecification that follows an F distribution.
For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley &
Sons, 2004, p. 63, amzn.to/17bknLN.
80
81. As explained in paragraph VI.1.1, a multiplicative model, obtained
through logarithmic transformation of the response variable, and in case of
the non-dummy explanatory variables, should reduce heteroskedasticity and
functional misspecification problems, but an additive raw-scale model is
preferable for analysis in absolute terms, because of non-linearity of the logarithmic transformation.
IV.3. QUANTILE LINEAR REGRESSION RESULTS
The interpretation of the previous OLS linear regression model295 as a
data generating process requires a “one-model assumption”296, viz. the model
being supposedly appropriate for all data.
However, even if e-book pricing were based on strict analogy with print
edition prices, it would probably rely more on a heuristic pricing scheme than
on an exact pricing mechanism.
After all, books are complex products whose market is part of a historically complex institutional framework, hardly homogeneous commodities either sold in perfectly competitive markets or subject to regulated public tariffs.
Thus, departures from a common basic pricing formula are very likely
to occur, both at individual level and because of general marketing practices.
Price points are a common psychological pricing strategy: few price
groups are created to ease economic cognitive processes and/or to anchor
perceptual judgments of consumers.
According to some researchers,297 buyers do not judge prices only by
the associated use value of the product: the position in the price distribution
295 See section IV.2.
296 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, p. 25,
amzn.to/1aeZCAO.
297 K. B. MONROE, Buyers' subjective perceptions of price, “Journal of Marketing Research”,
10, 1, 1973, pp. 76-77, bit.ly/17MoGQW.
81
82. itself can affect the perceived value of a product.
A counter-intuitive pricing strategy is the creation of contrast effects
through high dispersion from the anchoring price range (“standard price”298).
Value is not a physical stimulus, although it is an important attribute, and efficient discrimination between stimuli in terms of differences in value usually is
more important than discrimination in terms of price. Accentuation of price differ ences may lead to a greater accentuation of perceived value differences […] 299
OLS linear regression describes the behavior of the location of the response variable distribution conditional on a set of explanatory variables, by
using the mean as central tendency measure.
In addition, it can be adapted to model the scale of the conditional distribution; for example, estimates of standard errors, obtained from additive
or multiplicative functions of a set of explanatory variables, may be used to
weight the original regressors.
However, if the response distribution is skewed, the median may be a
more informative central tendency measure; furthermore, shape shifts involving the skewness of the distribution can be difficult to capture with the standard deviation as scale measure.
So, in the case at hand, quantile linear regression might provide relevant insights, additional and complementary to the results of OLS linear regression.300
Quantile linear regression models estimate conditional quantiles as a
linear function of the conditioning variables, by minimizing a weighted sum
of absolute errors.301
298 Ibid., p. 76.
299 Ibid., p. 77.
300 See section III.2.
301 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 29-38,
amzn.to/1aeZCAO.
82
83. The weights of the loss function change according to the quantile under
investigation; in the special case of conditional-median regression, the loss
function corresponds to the sum of absolute errors.302
Under the assumption of i.i.d. errors,303 the asymptotic properties of
quantile linear regression can be invoked to estimate consistent standard errors and confidence intervals for coefficients.304
Since “the often-observed skewness and outliers make the error distribution depart from i.i.d.,” empirical distibutions, obtained through bootstrap
resampling, are usually employed for quantile linear regression inference. 305
We now estimate the linear model outlined in section IV.2 for the 0.05th,
0.25th, 0.50th, 0.75th, and 0.95th quantiles of ebook.price.
The choice of the quantiles is quite conventional and arbitrary; ideally,
each should be a proxy for “very cheap”, “cheap”, “standard”, “expensive”,
and “very expensive” price points, respectively.
For each conditional quantile, we take the 0.025th and 0.975th quantiles
of the bootstrap distribution (500 replicates) as the endpoints of the empirical 95% confidence intervals for coefficients.306
Tab. 10, Tab. 11, Tab. 12, Tab. 13, and Tab. 14 report coefficient estimates and confidence intervals for the 0.05th, 0.25th, 0.50th, 0.75th, and 0.95th
quantiles, respectively.
302 Ibid.
303 Under mild regularity conditions, asymptotic properties can be invoked also in the non -i.i.d.
case.
The case is more complex because errors no longer have a common distribution, so the asymptotic covariance matrix needs to be weighted accordingly.
For more information, see L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications,
2007, pp. 45-46, amzn.to/1aeZCAO.
304 L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007, pp. 45-46,
amzn.to/1aeZCAO.
305 Ibid., pp. 49-51.
306 Due to its empirical nature, bootstrap resampling does not reproduce the same results for
each different estimation.
83
84. Tab. 10: Table of coefficients – 0.05 th quantile (QUANTREG.05)
Coefficient
Estimate
95% C.I.
Intercept
1.1980
[-0.1082, 2.1393]
paper.price
0.2917
[0.2222, 0.4001]
file.is.open
-2.4069 [-3.5989, -2.0673]
subj.fiction
-1.0326 [-1.4921, -0.5036]
ebook.pub.delay.pos
0.2218
ebook.pub.delay.pos.sq
[0.1175, 0.3163]
-0.0091 [-0.0158, -0.0042]
ebook.pub.delay.neg
1.1691
[-1.2756, 4.3973]
ebook.pub.delay.neg.sq
0.5222
[-1.0588, 2.5067]
Quantile: 0.05
Pseudo-R-squared: 0.1703
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
Tab. 11: Table of coefficients – 0.25 th quantile (QUANTREG.25)
Coefficient
Estimate
Intercept
95% C.I.
0.1303
0.6361
paper.price
[-0.1969, 0.4703]
[0.6173, 0.6566]
file.is.open
-3.0440 [-3.4614, -2.2584]
subj.fiction
-0.3180 [-0.5598, -0.1199]
ebook.pub.delay.pos
0.0538
ebook.pub.delay.pos.sq
[0.0067, 0.0927]
-0.0051 [-0.0073, -0.0021]
ebook.pub.delay.neg
0.0506
[-4.4196, 3.3521]
ebook.pub.delay.neg.sq
0.5115
[-1.5181, 2.2665]
Quantile: 0.25
Pseudo-R-squared: 0.5135
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
84
85. Tab. 12: Table of coefficients – 0.50 th quantile (QUANTREG.50)
Coefficient
Estimate
95% C.I.
Intercept
0.1622
[0.0830, 0.2656]
paper.price
0.6940
[0.6875, 0.6985]
file.is.open
-0.8288 [-1.3306, -0.6176]
subj.fiction
-0.0430
[-0.1028, 0.0000]
0.0046
[-0.0169, 0.0167]
-0.0015
[-0.0025, 0.0008]
1.4360
[-0.5678, 2.9088]
-0.4787
[-1.5995, 0.5245]
ebook.pub.delay.pos
ebook.pub.delay.pos.sq
ebook.pub.delay.neg
ebook.pub.delay.neg.sq
Quantile: 0.50
Pseudo-R-squared: 0.6683
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
Tab. 13: Table of coefficients – 0.75 th quantile (QUANTREG.75)
Coefficient
Estimate
Intercept
95% C.I.
-0.0671
0.7714
paper.price
[-0.3995, 0.1845]
[0.7497, 0.7948]
file.is.open
-1.0719 [-1.2227, -0.9119]
subj.fiction
-0.2857 [-0.3558, -0.1539]
ebook.pub.delay.pos
0.0365
ebook.pub.delay.pos.sq
[0.0127, 0.0812]
-0.0017 [-0.0043, -0.0001]
ebook.pub.delay.neg
1.4778
-0.5436
ebook.pub.delay.neg.sq
[-0.0309, 4.2146]
[-2.1741, 0.1595]
Quantile: 0.75
Pseudo-R-squared: 0.7435
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
85
86. Tab. 14: Table of coefficients – 0.95 th quantile (QUANTREG.95)
Coefficient
Estimate
Intercept
95% C.I.
-0.5752 [-0.8409, -0.3247]
paper.price
0.9317
[0.9231, 0.9436]
file.is.open
-0.8633 [-1.0679, -0.6454]
subj.fiction
-0.6149 [-0.8057, -0.4393]
ebook.pub.delay.pos
0.0483
ebook.pub.delay.pos.sq
[0.0363, 0.0857]
-0.0009 [-0.0033, -0.0006]
ebook.pub.delay.neg
0.4422
-0.4174
ebook.pub.delay.neg.sq
[-1.4206, 1.7596]
[-1.7231, 0.5123]
Quantile: 0.95
Pseudo-R-squared: 0.8380
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
The interpretation of quantile linear regression coefficients is analogous
to OLS linear regression, but refers to the marginal effect on the specific conditional quantile, instead of the marginal effect on the conditional mean.
Therefore, the same caveats in section IV.2 about the subj.fiction and
file.is.open dummies apply to the results of the quantile linear regression
model.
Given the large amount of information conveyed by quantile linear regression, graphical views of coefficient estimates and confidence intervals
across quantiles are usually more intelligible than tables of coefficients.
86
87. Fig. 16: Graphical view of const (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
87
88. Fig. 17: Graphical view of paper.price (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
88
89. Fig. 18: Graphical view of file.is.open (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
89
90. Fig. 19: Graphical view of subj.fiction (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
90
91. Fig. 20: Graphical view of ebook.pub.delay.pos (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
91
92. Fig. 21: Graphical view of ebook.pub.delay.pos.sq (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
92
93. Fig. 22: Graphical view of ebook.pub.delay.neg (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
93
94. Fig. 23: Graphical view of ebook.pub.delay.neg.sq (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
For all the quantiles, the signs of the coefficients are roughly in line
with the results of OLS linear regression, and the ebook.delay.neg and
ebook.delay.neg.sq coefficients have very wide confidence intervals including
zero.
In general, coefficient confidence intervals are much wider for the
lower, and especially the lowest, quantiles (0.05th and 0.25th).
For “very cheap” e-books (0.05th quantile), the paper.price coefficient is
low and dispersed, ranging from 0.2222 to 0.4001.
94
95. The confidence interval becomes narrower for higher quantiles and the
estimate increases almost linearly from 0.6361 at the 0.25th quantile to
0.7714 at the 0.75th quantile.
The conditional-median estimate of the paper.price coefficient is
0.6940, quite close to the conditional-mean estimate, while at the 0.95th quantile (“very expensive” e-books) the coefficient jumps to 0.9317.
The “positive” delay variables (ebook.pub.delay.pos and ebook.pub.delay.pos.sq) are not significantly different from zero for the median, but are
significant and have the expected signs for the other quantiles.
For “very cheap” e-books (0.05th conditional quantile), in particular, the
ebook.pub.delay.pos and ebook.pub.delay.pos.sq coefficients are high (in absolute value) and dispersed.
In effect, it is very likely that out-of-copyright titles lie in this very lower
tail of the distribution.
The subj.fiction coefficient is not significantly different from zero for the
median, but is significant for the other quantiles.
As we move away from the median towards the extreme quantiles, its
value decreases, especially at the lowest conditional quantile (0.05th), where
it ranges from -1.50 to -0.50, approximately.
The file.is.open coefficient is low and dispersed (from -2.00 to -3.50, approximately) for the 0.05th and 0.25th quantiles (“very cheap” and “cheap” ebooks).
It becomes more stable around -1.00 from the 0.50th to the 0.95th quantiles.
The intercept is quite close to zero for all the quantiles but for the lowest, where the estimate of the constant term coefficient is 1.1980, even
though ranging from less than zero to more than 2.00, approximately.
95
96. The pseudo-R-squared307 across conditional quantiles follows a pattern
similar to the trend in the coefficient of paper.price, the dominant explanatory variable.
Fig. 24: Graphical view of the pseudo-R-squared (QUANTREG)
Source: own elaboration of public catalog data for a sample of e-books by Italian publishers.
The conditional-median linear regression results are similar to the OLS
linear regression results, but the performance of the model varies along the
distribution of the response variable.
307 The quantile linear regression pseudo-R-squared compares the performance of the full
model vs. the intercept-only model.
It is analogous to the OLS linear regression R-squared, but uses the weighted sum of absolute errors in place of the error sum of squares.
For more information, see L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications,
2007, pp. 51-54, amzn.to/1aeZCAO.
96
97. The explanatory power is very weak at the lowest quantile (0.05th), and
improves considerably as we move towards higher quantiles.
The pricing of “very expensive” e-books (0.95th quantile) closely
matches the print book market, while there seems to be more price experimentation among “very cheap” and “cheap” e-books (0.05th and 0.25th quantiles).
97
98. V. A “LONG-TAIL-ORIENTED” DISTRIBUTOR SALES
The following analysis of e-book sales distribution and price sensitivity
is based on private data from the Stealth distribution platform, courtesy of
SIMPLICISSIMUS BOOK FARM.308
Stealth is relatively more focused on small and medium publishers,
which in Italy represent a very large numeric group, than Edigita.309
Moreover, at the end of 2010, SIMPLICISSIMUS BOOK FARM launched a
self-publishing platform, Narcissus, that leverages Stealth technological and
contractual assets to distribute self-published e-books through the same
channels of “professional” (non-self-published) e-books.
Such an accidental sampling from the whole market is due to the fact
that we were not in a position to obtain sales data from other e-book distributors.
However, it can also be interpreted as purposive sampling if we are interested in testing assumptions and claims about “long tail” tendencies in
digital markets for information goods.
V.1. SALES DATASET
Monthly sales data of the Stealth e-book distribution platform, for the
period from July 2010 to June 2013, were provided, on a title basis, by
SIMPLICISSIMUS BOOK FARM.
During the period, the market has undergone rapid growth, especially
from the third quarter of 2012 to the second quarter of 2013, with unit sales
growth offsetting declining prices and with a significant increase in the number of titles distributed.
308 See section II.3.
309 See section II.3 for information about market concentration in the Italian publishing industry.
98
99. Fig. 25: Bar chart of Stealth sales and catalog data
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
In these circumstances, a meaningful economic interpretation entails
the need for technical adjustments in the statistical models.
V.1.1. Panel structure
We tentatively use panel linear regression on Stealth sales data to estimate price elasticity with a log-log model.
The implied isoelastic demand function is consistent with the fact that
“information goods are often highly valued only by a relatively small set of
customers.”310
310 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency,
“Management Science”, 45, 12, 1999, p. 1619, bit.ly/1fb2LYm.
99
100. The problem with natural pricing experiments is that the researcher
cannot easily distinguish whether changes in quantity are determined by
movements along the demand curve or shifts of the demand curve itself. 311
Therefore, we split the original dataset in three periods (July 2010 –
June 2011, July 2011 – June 2012, and July 2012 – June 2013) of relatively
more stable demand.
The observations in all three panels have a longitudinal nature: the
number of individuals observed (N) is much larger than the number of time
periods of observation (T).
In a given month, some e-books may have zero sales or be temporarily
unavailable for sale, whereas other e-books may be published only several
months after the beginning of the period.
As a consequence, all three panels are highly unbalanced: the number
of observations is much smaller than the product of the number of individuals
and the number of time periods (NT).
•
In the first panel (2010/2011), there is a total of 10,698 observations
for 2,317 titles vs. a theoretical balanced panel of 27,804 observations.
•
In the second panel (2011/2012), there is a total of 33,449 observations for 6,000 titles vs. a theoretical balanced panel of 72,000 observations.
•
In the third panel (2012/2013), there is a total of 74,206 observations
for 12,370 titles vs. a theoretical balanced panel of 148,440 observations.
311 See section III.4.
100
101. The period from the beginning of the third quarter of a given year to the
end of the second quarter of the next year offers significant advantages compared to the calendar year, because of peculiar market seasonality.
August and December are the typical high-season months for both digital and print books, because of more leisure time in summer and during
Christmas holidays.312
January, instead, seems to be an high-season month specific of the ebook market, possibly because of additional sales driven by e-book reader
gifts at the end of December.313
In order to further control for demand shifts, we regress the logarithm
of unit sales (lnqty) not only on the logarithm of price (lnprice), but also on a
seasonal dummy (high.season) set to one in August, December, and January,
and to zero otherwise.
Grouping the three high-season months in a unique seasonal dummy
should guarantee higher variability through time within individuals in the unbalanced panels than three distinct seasonal dummies.
V.2. CONCENTRATION ANALYSIS
In order to verify the presence of long tail effects in Stealth sales during
the period 2010-2013, we follow the approach described in Brynjolfsson et al.
(2011).314
Instead of comparing different distribution channels, we track the evolution of revenues and unit sales through time, and try to distinguish between
first-order and second-order long tail drivers315.
312 In accordance with STATIONERS' line of reasoning, see section III.4.
313 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management.
314 E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales, “Management Science”, 57, 8,
2011, pp. 1378-1379, bit.ly/196Bu39.
See also section III.2.
315 See section III.2.
101
102. First, we analyze the concentration of revenues and unit sales across
the whole sample of titles, including those with zero sales, in 2010/2011 (July
2010 – June 2011), 2011/2012 (July 2011 – June 2012), and 2012/2013 (July
2012 – June 2013).
Then, we repeat the analysis for a subsample of titles available and with
positive sales in all three periods, which should highlight original trends in
concentration, irrespective of subsequent modifications to the catalog.
The Lorenz curve is a graphical representation of the fraction of the
quantitative variable of interest (in our case, either revenues or unit sales),
depicted on the vertical axis, that accrues to cumulative fractions of the pop ulation (titles, in our case), arranged on the horizontal axis in increasing order of the quantitative variable.
In case of perfect equality, the Lorenz curve corresponds to the 45-degree line; the more unequal the distribution the larger the area between the
45-degree line of equality and the Lorenz curve.
Since the Lorenz curve is increasing and (weakly) convex, a given distribution is more unequal than a second distribution if it lies below (to the right
of) the second one.
Crossings of two Lorenz curve occur in ambiguous situations where “we
cannot go from one distribution to the other by a sequence of […] regressive
transfers.”316
The Gini coefficient is a measure of statistical dispersion, computed as
the sum of absolute differences in the quantitative variable of interest between all pairs of individuals, normalized by dividing by the square of the
number of individuals and by the mean of the quantitative variable.
316 D. RAY, Economic inequality in D. RAY, Development economics, Princeton University Press,
1998, p. 183, amzn.to/1a4AECQ.
102
103. The Gini coefficient corresponds to the ratio of the area between the 45degree line of equality and the Lorenz curve, and the area of the triangle below the 45-degree line of equality.
Its value is included between zero and one, and the higher the value the
higher the concentration of the distribution.
The Lorenz curves of the revenue distributions for the whole sample of
titles are displayed in Fig. 26.
Fig. 26: Lorenz curves of revenue distributions (TOT.SAMPLE)
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
The distribution of revenues across titles has grown unequal with time,
especially in the last period (2012/2013), as confirmed also by the Gini coefficients reported in Tab. 15.
103
104. Tab. 15: Gini coefficients of revenue distributions (TOT.SAMPLE)
Period
Gini coefficient
2010/2011
0.7661
2011/2012
0.8029
2012/2013
0.8752
Source: own elaboration of
Stealth private sales data
(courtesy of SIMPLICISSIMUS
BOOK FARM).
The Lorenz curves of the corresponding unit sales distributions, depicted in Fig. 27, follow a very similar pattern, with slightly higher Gini coefficients, reported in Tab. 16.
104
105. Fig. 27: Lorenz curves of unit sales distributions (TOT.SAMPLE)
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 16: Gini coefficients of unit sales distributions (TOT.SAMPLE)
Period
Gini coefficient
2010/2011
0.7774
2011/2012
0.8097
2012/2013
0.8923
Source: own elaboration of
Stealth private sales data
(courtesy of SIMPLICISSIMUS
BOOK FARM).
Concentration of revenues in fewer titles seems to be driven by concentration of unit sales, rather than differential evolution of prices through time.
105
106. Both “superstar” and “underdog” effects317 might have been at work in
the Stealth catalog during the period 2010-2013.
First, publishing houses have gradually started publishing new print releases simultaneously in e-book format, which has increased the availability
of best-selling titles on major e-book stores.318
Secondly, the launch of the Narcissus self-publishing platform319 and the
consequent addition of many niche titles to the catalog might represent a derivative effect of the long tail phenomenon,320 inflating the concentration
measures.
The same analysis conducted on the subsample of e-books with positive
sales in all three periods should control for modifications to the catalog.
The Lorenz curves in Fig. 28 and the Gini coefficients in Tab. 17 show
an initial reduction in the concentration of revenues from 2010/2011 to
2011/2012, consistent with the long tail hypothesis.
317 See section III.2.
318 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management.
319 See above in this chapter.
320 See section III.2.
106
107. Fig. 28: Lorenz curves of revenue distributions (SUB.SAMPLE)
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 17: Gini coefficients of revenue distributions (SUB.SAMPLE)
Period
Gini coefficient
2010/2011
0.7059
2011/2012
0.6697
2012/2013
0.7301
Source: own elaboration of
Stealth private sales data
(courtesy of SIMPLICISSIMUS
BOOK FARM).
However, in the last period, the distribution returns to even higher concentration than in the first period.
107
108. The Lorenz curves in Fig. 29 and the Gini coefficients in Tab. 18 refer to
the corresponding unit sales distribution.
Fig. 29: Lorenz curves of unit sales distributions (SUB.SAMPLE)
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 18: Gini coefficients of unit sales distributions (SUB.SAMPLE)
Period
Gini coefficient
2010/2011
0.7168
2011/2012
0.7174
2012/2013
0.8503
Source: own elaboration of
Stealth private sales data
(courtesy of SIMPLICISSIMUS
BOOK FARM).
108
109. From 2010/2011 to 2011/2012, there were no substantial changes in
the concentration of unit sales: the Lorenz curves cross each other and the
Gini coefficients are very close in value.
As a consequence, the equalization of revenues that took place from the
first to the second period seems to be driven either by price convergence
across titles or changing price-composition of best-selling titles (relative success of cheaper e-books).
In the last period, instead, the growth of the concentration of unit sales
is relevant and probably drives the overall growth of the concentration of revenues.
A possible explanation for the 2012/2013 concentration increase in the
subsample unit sales is the adoption of e-books by a larger public of readers,
with more homogeneous “mass” preferences.
First-order drivers of the long tail phenomenon do not seem to be at
work in the Stealth catalog, which calls into question also the evidence of derivative effects.
Lorenz curves and Gini coefficients do not take into account the degree
of “social mobility” among titles, which might provide an interpretative
framework alternative to the long tail hypothesis.
Lower entry barriers for authors in digital publishing, with respect to
print publishing, could foster writing activity and flood the e-book market
with niche titles, even without real changes in reader tastes.
Furthermore, continuous “on-field” “natural” selection of new best-sellers directly by the digital audience itself could complement slower editorial
work by publishing houses321, and provide a powerful incentive to self-publish, in spite of equal, or even lower, success rates.
321 For a seemingly ad hoc example, see G. SBAFFO, Da Narcissus.me a Newton Compton. La
rivelazione Anna Premoli vince il premio Bancarella., Simplicissimus Book Farm, 07/22/2013,
bit.ly/1fBRgrC.
109
110. V.3. PANEL LINEAR REGRESSION RESULTS
The source of the three sales panels described in paragraph V.1.1 is
quite new for the literature reviewed in section III.4.
Unlike retailer-specific data, distributor-specific sales, of the kind we
analyze, do not suffer from differential price elasticity for price increases and
price decreases.322
Also publisher-specific sales data track the entire market for a given set
of titles, but, usually, a publisher's catalog is more limited in scope and less
heterogeneous than a distributor's catalog.
In the reference literature, even the papers studying a longitudinal
dataset resort to pooled OLS linear regression, without any test for the significance of individual effects.
Moreover, the large number of variables and time dummies often employed as controls might overfit the data and compromise the meaningfulness
of the model.
By contrast, the following panel linear regression analysis is based on
an easily interpretable model that relies only on the simple dataset adjustments and the minimal subject matter knowledge presented in paragraph
V.1.1.
The response variable is lnqty, the logarithm of unit sales, while the two
explanatory variables are lnprice, the logarithm of the retail price (in euro,
VAT included), and high.season, a dummy set to one for high-season
months.323
For each of the three panels (2010/2011, 2011/2012, and 2012/2013),
we report the F test statistic for the significance of individual effects 324 and
322 See section III.4.
323 See paragraph V.1.1.
324 The null hypothesis is the non-significance of individual effects.
110
111. the Hausman test statistic, distributed as a chi-squared distribution, for the
comparison of the fixed and the random effects model325.
For all three panels, the F test statistics in Tab. 19, Tab. 20, and Tab. 21
clearly confirm the presence of individual effects and do not support the pooling of all the observations in OLS linear regression, which assumes the same
coefficients, intercepts included, for all the individuals.
Tab. 19: F test for individual effects (PANEL.1)
Test statistic
7.7131
Degrees of freedom
p-value
2316, 8379 < 2.2E-16 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 20: F test for individual effects (PANEL.2)
Test statistic
9.2230
Degrees of freedom
p-value
5999, 27447 < 2.2E-16 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 21: F test for individual effects (PANEL.3)
Test statistic
18.7558
Degrees of freedom
p-value
12369, 61834 < 2.2E-16 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
In the fixed effects model, “the intercept terms vary over the individual
units”326, that is, the constant term is interacted with a dummy variable for
each individual in the panel.
For more information, see Y. CROISSANT, G. MILLO, Panel Data Econometrics in R: The plm
Package, “Journal of Statistical Software”, 27, 2, 2008, pp. 21-22, bit.ly/1cmn1Fm.
325 The null hypothesis is the consistency of the random effects model.
For more information, see M. VERBEEK, A guide to modern econometrics, John Wiley &
Sons, 2004, pp. 351-352, amzn.to/17bknLN.
326 M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004, p. 345,
amzn.to/17bknLN.
111
112. It can be shown that the fixed effects estimator is equivalent to the OLS
estimator obtained from a time-demeaned model without constant terms
(within transformation).327
As a consequence, coefficient estimates are conditional upon the fixed
individual effects and capture the effects of time-variant explanatory variables within individuals.
The random effects model, instead, assumes that individual effects are
random and i.i.d. over individuals, so the error term is composed of a time-invariant individual-specific component and a remainder component uncorrelated over time, with both components uncorrelated with the explanatory
variables.
The deriving GLS estimator combines “the information from the between and within dimensions [of the panel] in an efficient way,” 328 and allows
for the estimation of a common intercept term and the inclusion of explanatory variables varying between individuals but time-invariant in the within dimension.
For all three panels, the Hausman test statistics in Tab. 22, Tab. 23, and
Tab. 24 reject the null hypothesis of uncorrelation between individual effects
and explanatory variables.
Tab. 22: Hausman test (PANEL.1)
Test statistic
Degrees of freedom
87.1593
p-value
2 < 2.2E-16 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
327 Ibid., pp. 345-346.
328 Ibid., p. 350.
112
113. Tab. 23: Hausman test (PANEL.2)
Test statistic
Degrees of freedom
274.4334
p-value
2 < 2.2E-16 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
Tab. 24: Hausman test (PANEL.3)
Test statistic
Degrees of freedom
50.4353
p-value
2 1.117E-11 ***
Source: own elaboration of Stealth private sales data
(courtesy of SIMPLICISSIMUS BOOK FARM).
Intuitively, the inconsistency of the random effects model is due to the
fact that each books is “ʻone of a kindʼ, and cannot be viewed as a random
draw from some underlying population”329.
Tab. 25, Tab. 26, and Tab. 27 report coefficient estimates of the fixed effects model for the 2010/2011, 2011/2012, and 2012/2013 panels, respectively.
Tab. 25: Table of coefficients (PANEL.1)
Coefficient
lnprice
high.season
Estimate
-0.6124
0.3922
Std. error
0.08930
t value
p-value
-6.8578 7.488E-12 ***
0.0171 22.8804 < 2.2E-16 ***
R-squared: 0.0746; Adjusted R-squared: 0.0584
Source: own elaboration of Stealth private sales data (courtesy of
SIMPLICISSIMUS BOOK FARM).
329 Ibid., p. 351.
113
114. Tab. 26: Table of coefficients (PANEL.2)
Coefficient
lnprice
high.season
Estimate
Std. error
-0.0383
0.0112
0.0511
0.0091
t value
-3.4227
p-value
0.0006 ***
5.5852 2.356E-08 ***
R-squared: 0.0020; Adjusted R-squared: 0.0016
Source: own elaboration of Stealth private sales data (courtesy of
SIMPLICISSIMUS BOOK FARM).
Tab. 27: Table of coefficients (PANEL.3)
Coefficient
lnprice
high.season
Estimate
-0.4607
0.0956
Std. error
t value
p-value
0.0323 -14.2670 < 2.2E-16 ***
0.0058
16.5630 < 2.2E-16 ***
R-squared: 0.0239; Adjusted R-squared: 0.0199
Source: own elaboration of Stealth private sales data (courtesy of
SIMPLICISSIMUS BOOK FARM).
Standard errors and t tests are based on estimation of the covariance
matrix robust to heteroskedasticity and serial correlation, more precisely on
the Arellano (1987) variant, which relies on large N asymptotics with small
T.330
For all three panels, the coefficient of the high.season dummy is significant, positive (as expected) and remarkable in magnitude, especially for the
2010/2011 panel.
For the period 2010/2011, in August, December, and January, a given ebook sold, on average, almost 40% more units than its period average, vs.
around 5% in 2011/2012 and 10% in 2012/2013.
It is very likely that such a large coefficient estimate for the first period
is due to the launch of many other e-book stores and distribution platforms
over the end of 2010 and the beginning of 2011.331
330 M. ARELLANO, Computing robust standard errors for within group estimators, “Oxford Bulletin of Economics and Statistics”, 49, 4, 1987, pp. 431-434, bit.ly/18I0cbb.
331 Correspondence and conversations with SIMPLICISSIMUS BOOK FARM management.
114
115. The R-squared of the regressions332 is very low, which is not new to the
literature:333 most of the within-title variation in sales cannot be attributed to
price changes.
However, weak explanatory power of the model is, in a sense, more severe for the pooled OLS estimations in the previous literature than for our
fixed effects model: the first employ many additional variables with the aim of
controlling for variation in sales that is intentionally left to individual-specific
constant terms in the second.
In each period, the 95% confidence interval for the lnprice coefficient,
shown in Fig. 30, does not include zero and lies below it, yet well above -1.00,
in the inelastic range.
332 In the case of the fixed effects model, the R-squared refers to the within variation.
333 See section III.4.
115
116. Fig. 30: Plot of 95% confidence intervals for lnprice (PANEL)
2010/2011: [-0.7874, -0.4373]
2011/2012: [-0.0602, -0.0163]
2012/2013: [-0.5239, -0.3974]
Source: own elaboration of Stealth private sales data (courtesy of SIMPLICISSIMUS BOOK FARM).
For a given e-book, on average a one percent price decrease from the
period average price translates into a unit sales increase lower than one percent of the period average unit sales.
The large difference from the price elasticities around -2.00 reported in
section III.4 is due to the improper use of the pooled OLS estimator in the lit erature.
116
117. Given the superstar nature of the publishing industry, 334 it is not surprising that, statistically, from the mere point of view of copies sold, an author
would be better off by “milking” his (on average) very few readers.
However, analysis of high conditional quantiles (e.g. 80th percentile, a
common working definition of best-selling titles335), instead of the conditional
mean, or, alternatively, analysis at aggregate-sales level, instead of individualtitle level, could produce completely opposite results, which is explored in
paragraph VI.1.2 and in section VI.2.
334 See section III.2.
335 See section III.2.
117
118. VI. DISCUSSION
The quality of data, the theoretical framework, the statistical techniques, and the robustness of the results jointly determine the significance
and the interpretation of an econometric study.
Although the present work is affected by a number of dataset limitations and technical issues, reported in section VI.1, it is its title-based approach that may represent the main problem of the research.
In fact, there are powerful economic incentives at work in digital markets for information goods, discussed in section VI.2, that may call for analysis at more aggregate level.
VI.1.LIMITATIONS OF THE STUDY
VI.1.1.Catalog analysis
As anticipated in section IV.2, the OLS linear regression model applied
to the catalog dataset clearly suffers from heteroskedasticity and may be biased by functional form misspecification and/or omitted explanatory variables.
Quantile linear regression results in section IV.3 contradict the assumption of a “one-size-fits-all” model and show substantial differences in the effects of the explanatory variables along the distribution of the response variable, but the quantile linear regression itself neither rests upon alternative
model formulations nor relies on testable distributional assumptions.
So, inference on the linear model proposed for the catalog dataset described in section IV.1 is doubtful; nonetheless, both the OLS and the quantile
linear regression model retain their, in a sense, optimal descriptive properties.
118
119. Given the uniqueness and size of the sample, multidimensional descriptive evidence is of some interest in itself, as a more solid basis for speculation
about the market than more or less educated guesses.
If one, instead, is interested in valid inference, a multiplicative model
should reduce heteroskedasticity and functional misspecification problems,
and, thus, result more appropriate in the case at hand.
In fact, logarithmic transformation of the right-skewed response variable should reduce the skewness of the distribution.
Furthermore, logarithmic transformation of the non-dummy explanatory
quantitative variables allows for the “interpretation of predictor variable effects in relative terms”336 (elasticities).
The graphical views of the quantile linear regression coefficients suggest that such a model could perform quite well for e-books in the interquartile price range.
In effect, from the 0.25th to the 0.75th quantiles, most variables, and paper.price in particular, show quasi-linear trends in coefficients.
However, if one is “interested in the covariate effect on the response
variable in absolute terms”337, raw-scale analysis is preferable to logarithmic
transformation.
Because of non-linearity of the logarithmic transformation, the conditional-mean e-book price is not the exponential function of the conditional
mean of log e-book price.
Moreover, even though the conditional median is equivariant to monotonic transformations, “the retransformation of [coefficient] estimates is
more complicated because of the nonlinearity of the log transformation.” 338
336 L. HAO, D. Q. NAIMAN,
amzn.to/1aeZCAO.
337 Ibid., p. 81.
338 Ibid.
Quantile
regression,
119
SAGE
Publications,
2007,
p.
77,
120. Finally, the constant effect of the subject, as captured by the subj.fiction
dummy, may be judged restrictive: given the roughly equal distribution of the
sample between fiction and non-fiction e-books, interactions of the dummy
with the other explanatory variables could have been explored.
VI.1.2.Sales analysis
The concentration analysis of the sales dataset in section V.2 is technically simple, and its purely descriptive results are straightforward enough in
terms of interpretation.
The same does not apply to the panel linear regression analysis in section V.3.
Notwithstanding dataset adjustments and the addition of a seasonal
dummy to control for shifts of the demand curve, variation in e-book sales remains largely unexplained.
According to the literature, the inclusion of other explanatory variables,
derived from catalog information, and/or of ad hoc time dummies, does not
seem to improve model performance.339
The demand for books is affected by so many title- and market-specific
factors that subject matter omniscience would be required to discern price
effects.340
At the time of writing, the data source that best approximates such an
objective and all-knowing nature is Google Trends by GOOGLE, widely used by
journalists, businesses, and researchers to explore trends in Internet search
volumes through time.341
339 See section III.4.
340 It was only by sheer luck that the high.season dummy was able to capture the effect of the
media exposure due to the take off of the e-book market in terms of product availability.
For more information, see section VI.1.2.
341 Y. MATIAS, Insights into what the world is searching for: the new Google Trends, “Inside
Search”, 09/27/2012, bit.ly/16T8VnI.
120
121. Smith et al. (2012) employ the number of Bing search results for the
ISBN as an explanatory variable in regression analysis of a cross-sectional
dataset of e-book sales.342
Similarly, it could be worth trying to add Google Trends results for a
combination of author and title in a panel dataset of e-book sales.
Fig. 31: A Google Trends query by author
The highest peak corresponds to the base period (100).
Capital letters link to relevant major news headlines (e.g. C links to news about Umberto Eco's
80th birthday.)
Source: own elaboration of Google Trends public data.
342 M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, pp. 14-16, available at SSRN: bit.ly/WcNhJq.
For more information, see section III.4.
121
122. Fig. 32: A Google Trends query by title
The highest peak corresponds to the base period (100).
Capital letters link to relevant major news headlines (e.g. C links to news about the publication
of a new revised edition of Il nome della rosa.)
Source: own elaboration of Google Trends public data.
The possibility to allocate title-specific effects, and, thus, the increased
total explanatory power of the model, may allow for the estimation of a random effects model343 or, even more interestingly, of a correlated random effects quantile regression model.
Monopolistic competition theory suggests that “books for which costcovering prices allow large sales will have demand curves that are more elastic than books with more limited audiences.”344
343 See section V.3.
344 G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance and the
Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4, 1992, p. 595,
bit.ly/17dxX0b.
122
123. Estimation of high conditional sales quantiles (e.g. 80th percentile, a
common working definition of best-selling titles345) might reveal significantly
higher price elasticity than conditional-mean estimation.
Still few methods for panel quantile linear regression exist in the literature, but correlated random effects models have a number of desirable properties.346
A subset of the time-variant explanatory variables (in our case, variables derived from Google Trends data), correlated with “time-invariant unobserved characteristics”347 that affect the response variable, is required to
correct for “the unexplained ranking mechanism in the model, but at the
same time control for endogenous effects”348.
Alternatively, a simpler analysis at aggregate sales level of the effect on
unit sales of changes in the average price weighted by unit sales should average out individual-specific effects and result more relevant at general market
level.
VI.2.POSSIBLE FURTHER DEVELOPMENTS
Fan et al. (2011) derive equilibrium conditions for a model of isoelastic
consumer demand and duopolistic competition for a digital device with tied
digital content.349
The setup of the model reflects real-world competition in the market for
e-book readers fairly well, since the incumbent position of Kindle is periodi-
For more information, see section III.4.
345 See section III.2.
346 S. H. BACHE, C. M. DAHL, J. T. KRISTENSEN, Headlights on tobacco road to low birthweight outcomes: evidence from a battery of quantile regression estimators and heterogeneous panel, “Center for Research in Econometric Analysis of Time Series Research Papers”,
2008-20, 2011, pp. 4-6, bit.ly/1ghltfQ.
347 Ibid., p. 9.
348 Ibid.
349 M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, p. 405, bit.ly/16sHzsj.
123
124. cally threatened by direct competition from a strong competitor, leveraging
the same tying strategy pursued by AMAZON.350,351
Digital media products are information goods, that is, goods with zero,
or very low, marginal cost of production and large fixed development costs.
An important difference between a tying strategy involving physical
products and one involving information goods is that, typically, physical products are consumed in fixed proportions, whereas demand for information
goods is endogenously determined.352
In the model, the price of the digital content is given and the two firms
determine the price of their digital device in a simultaneous game. 353
If the profit margin of the digital content is higher than a certain
threshold, inversely proportional to the demand elasticity of the digital content, the price of the digital device and of the digital content move in the
same direction.354
Thus, a lower digital content price increases profits through high demand elasticity and allows for a more competitive digital device price.
High demand elasticity is not the only possible transmission mechanism
to increase digital content profits and sell the digital device at lower prices.
In the model, since the price of the digital content is exogenous, superior quality of the digital devices allows to fetch an higher price for the digital
device itself, but does not affect the digital content demand.
350 P SMITH, The war between Nook & Kindle is over and Amazon has carried the day, “IT
.
world”, 07/10/2013, bit.ly/19pynkV.
351 P ST. JOHN MACKINTOSH, E-reader price war breaks out in UK as Kobo chases Nook down,
.
“TeleRead”, 06/17/2013, bit.ly/GAAbkv.
352 M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, p. 406, bit.ly/16sHzsj.
353 Ibid., p. 407.
354 Ibid.
124
125. However, product differentiation of the digital device does affect perceived quality of the digital content and allows either to fetch an higher price
for the digital content or even to totally displace competition in the market.
Digital content profits can also be increased by negotiating lower licensing fees with publishers and lower delivery costs with mobile operators.
AMAZON has exploited all the strategies outlined above, promptly imitated by competitors.
•
AMAZON tried, unsuccessfully,355 to lower e-book prices by setting a
ceiling at $9.99 per e-book in the Kindle marketplace.
•
The company invested in hardware and software improvements of the
Kindle device, in particular with regard to the paper-like rendering of
the display.356
•
Kindle Direct Publishing, the AMAZON self-publishing platform, formerly known as Digital Text Platform, was launched concurrently with
the Kindle device to bypass, at least in part, publisher fees.
A possibly more disrupting competitive strategy, not yet exploited in the
e-book market but well-established in television broadcasting and recently
gaining track in the music industry, is the bundling of digital media products.
Bakos and Brynjolfsson (1999) study optimal bundling strategies for a
multiproduct monopolist selling digital information goods, as opposed to conventional price discrimination and micropayments, 357 and Bakos and Brynjolfsson (2000) extend the model to a competitive scenario.358
355 See section II.1.
356 C. M. TEICHER, Lots of little improvements make the Kindle 3 the best e-ink e-reader, “Publishers Weekly”, 08/26/2010, bit.ly/1cd3YNB.
357 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency,
“Management Science”, 45, 12, 1999, p. 1613, bit.ly/1fb2LYm.
358 Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing Science”, 19, 1, 2000, p. 63, bit.ly/GCXl9R.
125
126. Usually, economic models get more complicated in the number of goods
analyzed; in the case at hand, instead, the results rely on the law of large
numbers to derive asymptotic optimality conditions.
Under fairly general assumptions, the very low marginal costs of production and distribution of digital information goods make it possible to profitably sell them in large bundles, characterized by consumer valuations with
lower variance per good than individual goods.359
Given i.i.d. consumer valuations and any demand function with finite
mean and variance, the larger the bundle the lower the demand idiosyncratic
factors.
The number of “moderate” consumers with “quasi-average” valuations
increases, and demand becomes more elastic near the mean and less elastic
away from it.360
Fig. 33: Demand for bundles of information goods
Linear demand case with i.i.d. valuations uniformly distributed in [0,1].
Source: Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, p. 1617, fig. 1, bit.ly/1fb2LYm.
359 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency,
“Management Science”, 45, 12, 1999, p. 1614, bit.ly/1fb2LYm.
360 Ibid., pp. 1616-1617.
126
127. […] this “predictive value of bundling” makes it possible to achieve greater
sales, greater economic efficiency, and greater profits per good from a bundle of information goods than can be attained when the same goods are sold separately. 361
The model can be extended to include complementarity and substitution
effects, budget constraints, diminishing returns, negative valuations, and
cognitive costs.
Typically, the main difference with the basic model is that a mixedbundling strategy becomes more efficient than a pure-bundling strategy,
where the number of information goods included in the bundle tends to infinity.
These “economies of aggregation”362 (large aggregators are more profitable than small aggregators) are peculiar of digital markets for information
goods and must be distinguished from network economies, economies of
scale, and economies of scope in production, distribution, or consumption.
If significant, economies of aggregation have a number of game-changing competitive implications for the entire market.
•
Downstream competition among different goods is replaced by upstream competition for new content: the largest bundler has a higher
marginal profitability, so he is able to outbid single- and multiple-good
non-bundlers, and smaller bundlers.363
•
The pricing policy of the bundler is inherently “aggressive”: the optimal price happens also to deter competition, without the need to resort to any short-term “artificial” strategy.364
361 Ibid., p. 1613.
362 Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing Science”, 19, 1, 2000, p. 63, bit.ly/GCXl9R.
363 Ibid., pp. 68-70.
364 Ibid., pp. 75-76.
127
128. •
An entrant with higher fixed or marginal costs, or with products of
lower quality, but using the bundling strategy, may be able to dislodge
the non-bundling incumbent.365
•
Innovative activity shifts from standalone firms to bundlers, since the
former will be reluctant to invest in research and development if a
competing product can be incorporated into a larger bundle by the latter.366
The i.i.d. assumption is not likely to be empirically relevant for the pub-
lishing market, because it would imply that “consumers differ significantly in
their tastes across goods, but not in their total expenditure on the entire set
of goods.”367
In fact, segmentation of readers by language, subject and reading
habits is feasible, so mixed-bundling strategies might be expected to emerge
in the foreseeable future.
365 Ibid., pp. 77-78.
366 Ibid., p. 78.
367 Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency,
“Management Science”, 45, 12, 1999, pp. 1616-1617, note 8, bit.ly/1fb2LYm.
128
129. APPENDIX
SHORT GLOSSARY OF THE E-BOOK
3G
Wireless technology that allows devices to connect to the Internet
through the cellular data network, i.e. virtually everywhere.
ADOBE
Famous US software company (Photoshop, Acrobat Reader, etc.)
ADOBE DRM
DRM system, adopted by the majority of publishers, limiting the fruition
of e-books to a maximum definite number of reading devices registered to the
same owner.
AZW
Kindle proprietary file format, non-readable on other devices.
DRM
Acronym for Digital Rights Management, combination of technologies
allowing publishers to protect digital contents from copy.
E-book
Book in electronic file format.
E-book reader (e-reader)
Device devoted to e-book reading.
E Ink®
Electronic ink, technology for the production of displays with paper-like
rendering, used by e-book readers.
129
130. EPUB
The most widespread file format for e-book creation, supported by all ebook readers but Kindle, that uses its own proprietary file format.
LCD
Liquid Crystal Display, used by desktop and tablet computers, characterized by backlight and high image quality.
PDF
Acronym for Portable Document Format, file format introduced by
ADOBE in 1993 to enable the representation of documents independently of
the availability of the creation software; the file format is supported by most
e-book readers, but is not optimal for these devices.
Social DRM
“Loose” DRM mechanism that allows for the sharing of purchased ebooks, watermarked with information about the customer.
Tablet
Portable devices capable of performing most of the typical PC functions
(e.g. APPLE iPad).
Wi-Fi
Wireless technology that allows devices to connect to the Internet
through a home network.
SHORT CHRONOLOGY OF THE E-BOOK
1971
Michael S. Hart launches Project Gutenberg: 1971 is considered the
year of birth of the e-book.
130
131. 1987
Afternoon: a story by Michael Joyce, the first hypertextual novel, is published on floppy disks by EASTGATE SYSTEMS.
1993
DIGITAL BOOK offers floppy disks with 50 e-books in DBF file format.
Brad Templeton publishes on CD-Roms an anthology of candidates for
the Hugo Award for Best Novel.
1994
Progetto Manuzio, the first digital library in Italian language, is
launched.
1995
AMAZON starts selling print books online.
1996
Project Gutenberg catalog reaches 1,000 titles.
1998
Kim Blagg obtains the first ISBN code for an e-book and starts selling it
on amazon.com, bn.com, and borders.com.
Rocket eBook and SoftBook, the first e-book readers, are launched.
1999
E-book stores, like eReader.com and eReads.com, begin to proliferate
online.
2000
Stephen King makes his book Riding the Bullet available in digital format.
131
132. 2002
RANDOM HOUSE and HARPERCOLLINS begin to sell English-language ebooks.
2005
AMAZON acquires MOBIPOCKET.
2006
SONY launches Sony Reader, an e-book reader based on E Ink® technology.
2007
AMAZON launches Kindle in the United States.
2008
SONY partners with ADOBE to support DRM-encrypted e-books on its devices.
SONY launches Sony Reader PRS-505 in UK and France.
BOOKSONBOARD starts selling e-books for iPhone.
2009
AMAZON launches Kindle 2 and Kindle DX.
The integration between the Amazon.com store and the Kindle device
allows AMAZON to cover 60% of US e-book sales by the end of 2009.
BARNES & NOBLE launches Nook in the United States.
Bookboon.com reaches 10 million downloads of free e-books in one
year.
132
133. 2010
In May, at the Torino Book Fair, major Italian publishers announce the
release of e-books for the next autumn.
APPLE launches iPad, a multipurpose device including e-book reading
functions.
APPLE launches iBookstore to compete with AMAZON and BARNES &
NOBLE.
GOOGLE announces a new service for the online sale of e-books (Google
Editions).
2011
E-book readers with 3G connectivity appear on the Italian market.
133
134. BIBLIOGRAPHY
P. D. ALLISON, Missing data in A. MAYDEU-OLIVARES, R. E. MILLSAP, The SAGE
handbook of quantitative methods in psychology, SAGE Publications, 2009,
amzn.to/15UAtJe.
Amazon.com now selling more Kindle books than print books, “Amazon Media Room:
Press Releases”, 05/19/2011, bit.ly/119B0K2.
G. ANDERSON, GNU/Linux command-line tools summary, The Linux Documentation
Project, 04/15/2006, bit.ly/1fTbAaG.
Antitrust: Commission opens formal proceedings to investigate sales of e-books, European Commission, 12/06/2011, bit.ly/ZBWTjW.
M. ARELLANO, Computing robust standard errors for within group estimators, “Oxford Bulletin of Economics and Statistics”, 49, 4, 1987, bit.ly/18I0cbb.
ASCII, Wikipedia, accessed on 09/07/2013, bit.ly/WBUEXv.
S. H. BACHE, C. M. DAHL, J. T. KRISTENSEN, Headlights on tobacco road to low
birthweight outcomes: evidence from a battery of quantile regression estimators and
heterogeneous panel, “Center for Research in Econometric Analysis of Time Series
Research Papers”, 2008-20, 2011, bit.ly/1ghltfQ.
Y. BAKOS, E. BRYNJOLFSSON, Bundling and competition on the Internet, “Marketing
Science”, 19, 1, 2000, bit.ly/GCXl9R.
Y. BAKOS, E. BRYNJOLFSSON, Bundling information goods: pricing, profits, and efficiency, “Management Science”, 45, 12, 1999, bit.ly/1fb2LYm.
L. BALLAND-POIRIER, J. HOLLIS WEBER, H. RUSSMAN, Math guide: using the equation editor, The Document Foundation, 07/03/2013, bit.ly/17auTmw.
M. BERI, Espressioni regolari, Apogeo, 2007, amzn.to/1b0tGRH.
G. BITTLINGMAYER, The elasticity of demand for books, resale price maintenance
and the Lerner index, “Journal of Institutional and Theoretical Economics”, 148, 4,
1992, bit.ly/17dxX0b.
B. BLAZEJEWSKI, EPUB: new open standard in e-publishing, Université de Fribourg
Suisse, 2011, bit.ly/XKt7Cp.
F. BRIVIO, L'umanista informatico, Apogeo, 2010, amzn.to/1fb3CZd.
E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Consumer surplus in the digital economy:
estimating the value of increased product variety at online booksellers, “Management
Science”, 49, 11, 2003, bit.ly/18I1YJp.
E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, From niches to riches: the anatomy of the
long tail, “Heinz Research Papers”, 51, 06/01/2006, bit.ly/GHk7y3.
E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, Goodbye Pareto principle, hello long tail:
the effect of search costs on the concentration of product sales, “Management Science”, 57, 8, 2011, bit.ly/196Bu39.
E. BRYNJOLFSSON, Y. J. HU, M. D. SMITH, The longer tail: the changing shape of
Amazon sales distribution curve, 09/20/2010, available at SSRN: bit.ly/15QfXyK.
C. CAIN MILLER, E-books top hardcovers at Amazon, “The New York Times”,
07/19/2010, nyti.ms/118yEuR.
134
135. M. CANDUCCI, XML, Apogeo, 2005, amzn.to/1fb4UDn.
C. J. S. CHEN, D. KAPLAN, Bayesian propensity score analysis: simulation and case
study, Society for Research on Educational Effectiveness, 2011, bit.ly/16N541w.
J. CHEVALIER, A. GOOLSBEE, Measuring prices and price competition online: Amazon.com and BarnesandNoble.com, “Quantitative Marketing and Economics”, 1, 2,
2003, bit.ly/1b24xGp.
M. COKER, How data-driven decisions *might* help indie ebook authors reach more
readers, “RT Booklovers Convention”, 04/25/2012, slidesha.re/Xhccuo.
Consumer survey on ebooks, Open eBook Forum, 2003, available at im+m:
bit.ly/15Olump.
R. CORRIGAL, The essential wGet-GUIde, wGetGUI, 10/17/2007, bit.ly/GFSCV7.
Cos'è quintadicopertina, quintadicopertina, accessed on 09/07/2013, bit.ly/1eeukj9.
Y. CROISSANT, G. MILLO, Panel Data Econometrics in R: The plm Package, “Journal
of Statistical Software”, 27, 2, 2008, bit.ly/1cmn1Fm.
C. DE FRANCESCO, G. DELLI ZOTTI, Tesi (e tesine) con PC e Web, Franco Angeli,
2004, amzn.to/GGRYqy.
B. DELEERSNYDER, I. GEYSKENS, K. GIELENS, M. G. DEKIMPE, How cannibalistic
is the Internet channel? A study of the newspaper industry in the United Kingdom and
the Netherlands, “International Journal Research in Marketing”, 19, 4, 2002,
bit.ly/1bP5umm.
Digital Rights Management, Wikipedia, accessed on 09/07/2013, bit.ly/WzMF04.
U. ECO, Come si fa una tesi di laurea, Bompiani, 1971, amzn.to/1980EQV.
A. ELBERSE, F. OBERHOLZER-GEE, Superstars and underdogs: an examination of
the long tail phenomenon in video sales, “Harvard Business School Working Paper Series”, 07-015, 09/05/2006, bit.ly/19uLOzX.
M. FAN, Y. HU, A. YU, Pricing strategies for tied digital contents and devices, “Decision Support Systems”, 51, 3, 2011, bit.ly/16sHzsj.
G. FUSINA, Gli abbonamenti ai quotidiani digitali, dataninja.it, 09/18/2013,
bit.ly/1fgfbwG.
M. GARDENER, Beginning R: the statistical programming language, John Wiley &
Sons, 2012, amzn.to/15VwDzI.
M. GARRELS, Bash guide for beginners, The Linux Documentation Project,
12/27/2008, bit.ly/1fUcnIl.
German resale price maintenance act, Börsenverein des Deutschen Buchhandels,
07/14/2006, bit.ly/17bdJF6.
A. GHOSE, B. GU, Search costs, demand structure and long tail in electronic markets:
theory and evidence, “NET Institute Working Papers”, 06-19, 2006, bit.ly/1aeZtxj.
A. GHOSE, M. D. SMITH, R. TELANG, Internet exchanges for used books: an empirical analysis of product cannibalization and welfare impact, “Information Systems Research”, 17, 1, 2006, bit.ly/1983mWp.
K. F. HALLOCK, R. KOENKER, Quantile regression, “Journal of Economic Perspectives”, 15, 4, 2001, bit.ly/15gERVI.
L. HAO, D. Q. NAIMAN, Quantile regression, SAGE Publications, 2007,
amzn.to/1aeZCAO.
135
136. L. HAZARD OWEN, Thanks to e-books, flat revenue is no problem for publishers,
“TIME”, 03/30/2012, ti.me/UFbwiq.
Y. J. HU, M. D. SMITH, The impact of ebook distribution on print sales: analysis of a
natural experiment, 08/29/2011, available at SSRN: bit.ly/Y2G650.
International Standard Book Number, Wikipedia, accessed on 09/07/2013,
bit.ly/16kdpGT.
Y. KATZNELSON, A (brief) introduction to inferential statistics, UCSC, 2006,
bit.ly/1af1h9v.
Kindle Format 8 Overview, Amazon.com, accessed on 09/07/2013, amzn.to/Vx6NlD.
D. K. KIRKPATRICK, Online sales of used books draw protest, “The New York Times”,
04/10/2002, nyti.ms/Y7mbUm.
R. KOENKER, Quantile regression for longitudinal data, “Journal of Multivariate Analysis”, 91, 1, 2004, bit.ly/17MmzwK.
R. KOENKER, Quantile regression in R: a vignette, The Comprehensive R Archive Network, 09/04/2012, bit.ly/1e3qExX.
La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013,
bit.ly/13RcbPE.
S. J. LIEBOWITZ, S. MARGOLIS, Seventeen famous economists weigh in on copyright:
the role of theory, empirics, and network effects, “Harvard Journal of Law & Technology”, 18, 2, 2005, bit.ly/GHOhQB.
Long tail, Wikipedia, accessed on 09/07/2013, bit.ly/ZX5aLb.
S. LOWE, Do ebooks cannibalize print sales?, “Publishing Bits”, 07/28/2009,
bit.ly/YymFj6.
D. LUBIAN, Appunti di teoria delle distribuzioni limite, Università degli Studi di
Verona, 03/13/1999, bit.ly/19uP3rb.
R. J. LUCCHETTI, Elementi di econometria, Università Politecnica delle Marche,
12/21/2011, bit.ly/1fcdTo1.
Y. MATIAS, Insights into what the world is searching for: the new Google Trends, “Inside Search”, 09/27/2012, bit.ly/16T8VnI.
K. B. MONROE, Buyers' subjective perceptions of price, “Journal of Marketing Research”, 10, 1, 1973, bit.ly/17MoGQW.
Nota metodologica in La produzione e la lettura di libri in Italia: anni 2011 e 2012, Istat, 05/16/2013, bit.ly/13RcbPE.
Online copyright infringement tracker benchmark study Q3 2012, Ofcom, 11/20/2012,
bit.ly/Vw6IfS.
G. PALOMBA, Elementi di statistica per l'econometria, Clua Edizioni Ancona, 2010,
bit.ly/1af2U7d.
G. PALOMBA, Modelli a variabili dipendenti qualitative, Università Politecnica delle
Marche, 2008, bit.ly/19uPBgS.
G. PALOMBA, Panel data, Università Politecnica delle Marche, 2008, bit.ly/17bgJBA.
H. M. PARK, Regression models for binary dependent variables using Stata, SAS, R,
LIMDEP and SPSS, Indiana University, 2009, bit.ly/15gIxGY.
,
136
137. A. PLANT, The economic aspects of copyright in books, “Economica”, 1, 2, 1934,
bit.ly/1giWKb4.
Portable Document Format, Wikipedia, accessed on 09/07/2013, bit.ly/WRAkm3.
D. RAY, Economic inequality in D. RAY, Development economics, Princeton University
Press, 1998, amzn.to/1a4AECQ.
Report by Kantar Media in Online copyright infringement tracker benchmark study
Q3 2012, Ofcom, 11/20/2012, bit.ly/Vw6IfS.
M. RICH, Math of publishing meets the e-book, “The New York Times”, 02/28/2010,
nyti.ms/YUlmO4.
M. RICH, Steal this book (for $9.99), “The New York Times”, 05/16/2009,
nyti.ms/11MVb0z.
M. RICH, B. STONE, E-book price increase may stir readers' passions, “The New York
Times”, 12/06/2010, nyti.ms/XF4v2l.
G. SBAFFO, Da Narcissus.me a Newton Compton. La rivelazione Anna Premoli vince il
premio Bancarella., Simplicissimus Book Farm, 07/22/2013, bit.ly/1fBRgrC.
E. SCHNITTMAN, Ebooks don't cannibalize print, people do, “Black Plastic Glasses”,
09/27/2010, bit.ly/11Nk78l.
B. SCHRAMPFER AZAR, Understanding and using English grammar, Pearson Longman, 2002, amzn.to/198c6tS.
M. D. SMITH, R. TELANG, Y. ZHANG, Analysis of the potential market for out-of-print
ebooks, 08/04/2012, available at SSRN: bit.ly/WcNhJq.
P. SMITH, The war between Nook & Kindle is over and Amazon has carried the day,
“IT world”, 07/10/2013, bit.ly/19pynkV.
O. SOLON, J.K. Rowling's Pottermore details revealed: Harry Potter e-books and more,
“Wired”, 06/23/2011, bit.ly/WLzGY4.
SQL, Dipartimento del Tesoro, 10/11/2008, bit.ly/1ckfA1t.
P. ST. JOHN MACKINTOSH, E-reader price war breaks out in UK as Kobo chases Nook
down, “TeleRead”, 06/17/2013, bit.ly/GAAbkv.
C. M. TEICHER, Lots of little improvements make the Kindle 3 the best e-ink e-reader,
“Publishers Weekly”, 08/26/2010, bit.ly/1cd3YNB.
US sues Apple and publishers over e-book prices, BBC, 04/11/2012, bbc.in/13fslFO.
M. VERBEEK, A guide to modern econometrics, John Wiley & Sons, 2004,
amzn.to/17bknLN.
Web scraping, Wikipedia, accessed on 09/07/2013, bit.ly/1e1Oo8h.
T. J. WEBSTER, Managerial economics: theory and practice, Elsevier, 2003,
amzn.to/17MAmn0.
137