Internet piracy and book sales. A field experiment. 
Wojciech Hardy 
Michał Krawczyk 
EALE `14 
Joanna Tyrowicz 
Group for Research in APplied Economics
Summary of the results 
Book industry representatives’ concern about internet piracy might be exaggerated. 
(At least) for now. 
•sssssssss 
2
•Many studies for film and music industries. 
•Mixed results depending on: 
–The methodology. 
–The analyzed good. 
–The analyzed period. 
•A few studies on book industry. But: 
–Small-scale (almost case studies). 
–Specific genres of books. 
–Not really on piracy. 
3 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes...
•Many studies for film and music industries. 
•Mixed results depending on: 
–The methodology. 
–The analyzed good. 
–The analyzed period. 
•A few studies on book industry. But: 
–Small-scale (almost case studies). 
–Specific genres of books. 
–Not really on piracy. 
3 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes... 
Main problems: 
Seasonal effects 
Causality 
Omitted variable 
Perfect tool? Experiments!
In general 
•The sample should be: 
–Large (for statistical inference). 
–Varied (for representativeness). 
–Long (to control for seasonal effects). 
•The experimental methodology would deal with: 
–Reverse causality (a reference group). 
–Omitted variables (randomized treatment application). 
In detail 
•Acquire book data. 
•Match them in `pairs’ (two or more books as similar as possible!). 
•Protect one randomly chosen book in each pair. 
•Do it for a year. 
•Compare the two outcomes. 
4 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes...
In general 
•The sample should be: 
–Large (for statistical inference). 
–Varied (for representativeness). 
–Long (to control for seasonal effects). 
•The experimental methodology would deal with: 
–Reverse causality (a reference group). 
–Omitted variables (randomized treatment application). 
In detail 
•Acquire book data. 
•Match them in `pairs’ (two or more books as similar as possible!). 
•Protect one randomly chosen book in each pair. 
•Do it for a year. 
•Compare the two outcomes. 
4 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes... 
Contribution? 
New industry 
New methodology 
New conclusions
•Around 70 Polish publishers contacted. 
•11 accepted. 
•9 went through with providing all of the data. 
The specialities of the publishers varied: 
•Law-focused, fantasy-focused, mixed, etc. 
•Foreign versus national. 
Both medium and larger publishers [some not very popular books and some bestsellers (one excluded)]. 
Overall: we managed to get a considerable sample. 
5 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes...
In total: almost 250 books. 
Other variables: E-book existence, date of publishing, price, sales forecasts, first print, etc. 
6 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes...
We match them into pairs (groups) 
The variables they are matched on: 
–Publisher 
–Segment 
–Publication date (current ed.) 
–Edition number 
–Page count 
–Versions available (type of cover, digital) 
–Sales forecasts (monthly) 
–Number of unauthorized copies found prior to the experiment 
Matching results: 
–Groups of two: 94 
–Groups of three: 13 
–Groups of five: 1 
7 
Design 
Recruitment 
Book Data 
Literature 
Matching 
Treatment 
A year passes...
Within each group we have randomly picked protected and control titles. 
Thus both groups were comparable. 
We did nothing to the Control Treatment (CT) group. 
Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. 
We observed both groups between 11.2012 and 9.2013. 
8 
Design 
Recruitment 
Book Data 
Literature 
Matching 
A year passes... 
Treatment
Within each group we have randomly picked protected and control titles. 
Thus both groups were comparable. 
We did nothing to the Control Treatment (CT) group. 
Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. 
We observed both groups between 11.2012 and 9.2013. 
8 
Design 
Recruitment 
Book Data 
Literature 
Matching 
A year passes... 
Treatment 
A note on file-sharing in Poland. 
Alexa ranking of 
„most popular websites”: 
Chomikuj.pl – 17th 
Pirate Bay – 66th
Within each group we have randomly picked protected and control titles. 
Thus both groups were comparable. 
We did nothing to the Control Treatment (CT) group. 
Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. 
We observed both groups between 11.2012 and 9.2013. 
8 
Design 
Recruitment 
Book Data 
Literature 
Matching 
A year passes... 
Treatment 
A note on file-sharing in Poland. 
Alexa ranking of 
„most popular websites”: 
Chomikuj.pl – 17th 
Pirate Bay – 66th
•Two manipulation checks: 
1)Based on the data from Plagiat.pl 
2)Three research assistants: 
–Searched for 20 titles each. 
–Found fewer protected books. 
–If found – searched longer. 
–If found – mostly at non-‚standard’ sources. 
9 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions
•We received sales data. 
•Some distribution. 
•They could be negative (what we did) 
•Smth by genre? 
10 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions
Two comparable groups -> simple testing should suffice. 
•No difference in sales! 
•No difference in variance! 
11 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions
•Let’s recheck our strategy and add controls. 
•No results! 
12 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions
•Popularity? (see this and that) 
•No results! 
13 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions
•Good thing about piracy that you don’t need a result 
14 
Sales Data 
Tests 
Base Regressions 
Manipulation Check 
Quantile Regressions 
Conclusions 
We have performed a large field experiment on piracy’s impact on book sales. We applied a robust methodology and checked for more complex relationships. Internet piracy does not seem to pose a threat to the book industry. We cannot predict the future and piracy’s impact on the e-book industry.
Thank you for your attention! Author: Wojciech Hardy e-mail: whardy@wne.uw.edu.pl 
More about our research on 
http://grape.uw.edu.pl/ipiracy 
Twitter: @GrapeUW

Internet piracy and book sales. A field experiment.

  • 1.
    Internet piracy andbook sales. A field experiment. Wojciech Hardy Michał Krawczyk EALE `14 Joanna Tyrowicz Group for Research in APplied Economics
  • 2.
    Summary of theresults Book industry representatives’ concern about internet piracy might be exaggerated. (At least) for now. •sssssssss 2
  • 3.
    •Many studies forfilm and music industries. •Mixed results depending on: –The methodology. –The analyzed good. –The analyzed period. •A few studies on book industry. But: –Small-scale (almost case studies). –Specific genres of books. –Not really on piracy. 3 Design Recruitment Book Data Literature Matching Treatment A year passes...
  • 4.
    •Many studies forfilm and music industries. •Mixed results depending on: –The methodology. –The analyzed good. –The analyzed period. •A few studies on book industry. But: –Small-scale (almost case studies). –Specific genres of books. –Not really on piracy. 3 Design Recruitment Book Data Literature Matching Treatment A year passes... Main problems: Seasonal effects Causality Omitted variable Perfect tool? Experiments!
  • 5.
    In general •Thesample should be: –Large (for statistical inference). –Varied (for representativeness). –Long (to control for seasonal effects). •The experimental methodology would deal with: –Reverse causality (a reference group). –Omitted variables (randomized treatment application). In detail •Acquire book data. •Match them in `pairs’ (two or more books as similar as possible!). •Protect one randomly chosen book in each pair. •Do it for a year. •Compare the two outcomes. 4 Design Recruitment Book Data Literature Matching Treatment A year passes...
  • 6.
    In general •Thesample should be: –Large (for statistical inference). –Varied (for representativeness). –Long (to control for seasonal effects). •The experimental methodology would deal with: –Reverse causality (a reference group). –Omitted variables (randomized treatment application). In detail •Acquire book data. •Match them in `pairs’ (two or more books as similar as possible!). •Protect one randomly chosen book in each pair. •Do it for a year. •Compare the two outcomes. 4 Design Recruitment Book Data Literature Matching Treatment A year passes... Contribution? New industry New methodology New conclusions
  • 7.
    •Around 70 Polishpublishers contacted. •11 accepted. •9 went through with providing all of the data. The specialities of the publishers varied: •Law-focused, fantasy-focused, mixed, etc. •Foreign versus national. Both medium and larger publishers [some not very popular books and some bestsellers (one excluded)]. Overall: we managed to get a considerable sample. 5 Design Recruitment Book Data Literature Matching Treatment A year passes...
  • 8.
    In total: almost250 books. Other variables: E-book existence, date of publishing, price, sales forecasts, first print, etc. 6 Design Recruitment Book Data Literature Matching Treatment A year passes...
  • 9.
    We match theminto pairs (groups) The variables they are matched on: –Publisher –Segment –Publication date (current ed.) –Edition number –Page count –Versions available (type of cover, digital) –Sales forecasts (monthly) –Number of unauthorized copies found prior to the experiment Matching results: –Groups of two: 94 –Groups of three: 13 –Groups of five: 1 7 Design Recruitment Book Data Literature Matching Treatment A year passes...
  • 10.
    Within each groupwe have randomly picked protected and control titles. Thus both groups were comparable. We did nothing to the Control Treatment (CT) group. Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. We observed both groups between 11.2012 and 9.2013. 8 Design Recruitment Book Data Literature Matching A year passes... Treatment
  • 11.
    Within each groupwe have randomly picked protected and control titles. Thus both groups were comparable. We did nothing to the Control Treatment (CT) group. Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. We observed both groups between 11.2012 and 9.2013. 8 Design Recruitment Book Data Literature Matching A year passes... Treatment A note on file-sharing in Poland. Alexa ranking of „most popular websites”: Chomikuj.pl – 17th Pirate Bay – 66th
  • 12.
    Within each groupwe have randomly picked protected and control titles. Thus both groups were comparable. We did nothing to the Control Treatment (CT) group. Agency Plagiat.pl removed unauthorized copies from the Enforcement Treatment (ET) group. We observed both groups between 11.2012 and 9.2013. 8 Design Recruitment Book Data Literature Matching A year passes... Treatment A note on file-sharing in Poland. Alexa ranking of „most popular websites”: Chomikuj.pl – 17th Pirate Bay – 66th
  • 13.
    •Two manipulation checks: 1)Based on the data from Plagiat.pl 2)Three research assistants: –Searched for 20 titles each. –Found fewer protected books. –If found – searched longer. –If found – mostly at non-‚standard’ sources. 9 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions
  • 14.
    •We received salesdata. •Some distribution. •They could be negative (what we did) •Smth by genre? 10 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions
  • 15.
    Two comparable groups-> simple testing should suffice. •No difference in sales! •No difference in variance! 11 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions
  • 16.
    •Let’s recheck ourstrategy and add controls. •No results! 12 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions
  • 17.
    •Popularity? (see thisand that) •No results! 13 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions
  • 18.
    •Good thing aboutpiracy that you don’t need a result 14 Sales Data Tests Base Regressions Manipulation Check Quantile Regressions Conclusions We have performed a large field experiment on piracy’s impact on book sales. We applied a robust methodology and checked for more complex relationships. Internet piracy does not seem to pose a threat to the book industry. We cannot predict the future and piracy’s impact on the e-book industry.
  • 19.
    Thank you foryour attention! Author: Wojciech Hardy e-mail: whardy@wne.uw.edu.pl More about our research on http://grape.uw.edu.pl/ipiracy Twitter: @GrapeUW