The document analyzes the rise of paywalls on news websites and their implications. It finds that over 18% of US news sites and 12.69% of Australian news sites now have paywalls. Paywalls are implemented in different ways like truncated or obscured articles. Popular third-party paywall providers like Piano and Tecnavia manage paywalls for many sites. Paywalls often allow 3-4 free articles but are ineffective as clearing cookies bypasses 75% of paywalls. Paywalled sites also track users and show ads despite subscriptions.
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Understanding the Rise of Online Paywalls and Their Impact
1. Keeping out the Masses:
Understanding the Popularity and
Implications of Internet Paywalls
Panagiotis (Panos) Papadopoulos
Peter Snyder, Dimitrios Athanasakis, Benjamin Livshits
Brave Software
2. Current ad ecosystem
ü Performance issues
ü Ad fraud
ü Duopoly (Google and Facebook) harvest more than 70% of global ad revenues
ü Privacy implications of user tracking drive away users from viewing ads
Publishers can do nothing but:
Ø move away from the advertisement-based funding
Ø look for alternative content monetization mechanisms
2
Source: e-marketer.com
Panagiotis Papadopoulos ~ panpap@brave.com
4. Shift to a new pay-for-access web
From the “advertising-but-open” web to a “paywalled” web
Huge implications for the society:
• information stops flowing freely
• paywalls create a web where high quality information is available to fewer and fewer people
• rest of the users access less information (of lower accuracy and quality)
4
Panagiotis Papadopoulos ~ panpap@brave.com
5. Too many questions unanswered
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
5
Gap in our understanding
Panagiotis Papadopoulos ~ panpap@brave.com
6. One at a time
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
6
Panagiotis Papadopoulos ~ panpap@brave.com
7. Dataset
Two sources:
• Inspect several popular paywall bypassing browser extensions
Ø 147 paywalled websites
• Use crowd-maintained filterlists that block third party paywall libraries
Ø 43 third party paywall libraries
Ø query for each of them in HTTPArchive and PublicWWW
Ø 1,563 more paywalled websites
In total: 1710 unique paywalled sites
Open-Sourced: https://gist.github.com/panpap/68af1c99b49366dfce4044a354f6e1b8
7
Panagiotis Papadopoulos ~ panpap@brave.com
8. Soft (metered) Paywall: There is a specific
number of articles available to read for free
Types of paywalls
8
Hard Paywall: User needs to pay
before accessing the very first article
Panagiotis Papadopoulos ~ panpap@brave.com
10. One at a time
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
10
Panagiotis Papadopoulos ~ panpap@brave.com
11. Popularity of the different paywall types
11
0%
10%
20%
30%
40%
50%
60%
Obscured
article
Truncated
article
Redirection
Portionofpaywalledwebsites
Enforcing strategies
48.2%
44.5%
7.3%
Panagiotis Papadopoulos ~ panpap@brave.com
• 66.7% of the websites use “soft” paywalls
• 15.7% of the websites use “hard” paywalls
• 16.6% of the websites use a “hybrid” strategy
(i.e., freemium - some content is free, some
requires payment)
Popularity of the different paywall
enforcing strategies.
48.2% obfuscate and 44.5%
truncate the article the user
cannot have access to.
12. Growth of paywall use over time
• When did the sites in our dataset switched to paywalls?
• Wayback archive to browse the historical versions
• In our dataset, the paywall use started in 2015
• Paywall use was increasing at a rate 120%-230% every 6
months
• Paywall use quadrupled the first 6 months of 2019
0.0x
1.0x
2.0x
3.0x
4.0x
5.0x
6.0x
7.0x
2015A
2015B
2016A
2016B
2017A
2017B
2018A
2018B
2019A
2019B
*
Growthofpaywalldeployments
Half-year term
Growth of paywall deployments per 6 months.
Note that the y-axis depicts the growth-rate,
and not absolute numbers.
12
13. Type of content the paywalled sites deliver
• 80.3% deliver news content (local, regional, or
world-level)
• remaining 14 categories account for just 19.7%
of paywalled sites.
13
0%
1%
10%
100%
News
W
orld
Regional
Sports
Business
Arts
Society
Com
puters
Recreation
Science
Reference
Hom
e
Kids and
Teens
Health
Shopping
Adult
Portionofpaywalledsites
Type of content delivered by paywalled sites
43.3%
19.6%
17.4%
6.0%
2.9%
2.7%
2.3%
1.3%
1.2%
1.0%
0.5%
0.5%
0.4%
0.4%
0.3%
0.1%
Panagiotis Papadopoulos ~ panpap@brave.com
Type of content that paywalled
websites deliver.
14. Prevalence of paywalled sites across countries
• Retrieve the Alexa Top 10,000 per country
• Filter the list and remove all non-news sites
• Calculate per country:
the portion of paywalled news sites as a fraction of
all news sites
14
Panagiotis Papadopoulos ~ panpap@brave.com
4%
6%
8%
10%
12%
14%
16%
18%
20%
United States
Australia
France
Canada
Germany
Portionofpaywalledsites
Top Countries
Portion of news sites using paywalls per country.
Paywall adoption reaches
18.75% in US
12.69% in Australia.
15. One at a time
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
15
Panagiotis Papadopoulos ~ panpap@brave.com
16. 3rd party paywall libraries - Case study: Tinypass
Ø Paywall-as-a-service: publishers pay them to manage and enforce a paywall on their site.
Ø To perform access control the engagement of the user with the content is tracked:
• how much time they spend on a web site,
• how many articles they read,
• how many times they visit the website
Ø Whenever a user browses a new article:
• Tinypass code running on the browser will query a remote server
if the particular user has access
16
Server knows what the user reads at any time
17. Popularity of 3rd party paywall libraries
17
0%
5%
10%
15%
20%
25%
30%
piano.io/tinypass.com
tecnavia.com
/newsm
em
ory.com
zeen101.com
/leaky-paywall
bwbx.io
poool.fr
syncronex.com
trbas.com
subscriptiongenius.com
pigeon.com
vindicia.com
pelcro.com
laterpay.net
m
ediapass.com
m
g2connext.com
GateHouse.com
contentpass.net
presspadapp.com
api.ffx.io
inplayer.com
m
ppglobal.com
sovrnlabs.net
payread.net
ppjol.net
Portionofpaywalledsites
Third-party library provider
23.5%21.0%16.6%9.6%
5.2%
4.3%
3.0%
2.7%
2.4%
2.2%
1.7%
1.3%
1.3%
1.0%
0.9%
0.8%
0.6%
0.5%
0.4%
0.4%
0.4%
0.2%
0.2%
Popularity of 3rd party paywall libraries in our dataset.
• A small number of providers account for the
majority of third-party paywall deployments
• Piano (23.5%) and Tecnavia (21.0%) the most
popular providers
18. One at a time
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
18
Panagiotis Papadopoulos ~ panpap@brave.com
19. Number of free articles allowed
19
Panagiotis Papadopoulos ~ panpap@brave.com
0%
20%
40%
60%
80%
100%
0 2 4 6 8 10 12 14 16 18
CDFofpaywalledsites
Number of articles allowed
Distribution of the number of free
articles allowed per user.
• The median paywalled website allows 3.5 articles
• The median soft-paywalled website allows 4 articles
• Hard-paywalled websites do not allow any free article
20. Annual subscription cost (1/2)
• Manual evaluation of 120 (20x 6 countries) paywalled
websites
• Australia:193$ and Germany:190$ have the highest
median subscription costs
• Subscription costs vary widely by site:
The most expensive subscriptions cost 2.63× (in Germany)
and 3.51× (in US) more than the median rate
20
Panagiotis Papadopoulos ~ panpap@brave.com
0
50
100
150
200
250
300
350
400
450
500
550
US FR BR CA DE AU
Annualsubscriptioncost(USD)
Top countries on paywalls popularity
Min, 15th percentile, median, 85th percentile, and max
annual subscription costs by country.
21. Annual subscription cost (2/2)
• 64.76% provide only a monthly subscription option
• 17.14% provide only an annual one
• 22% charge less than 60$
• 21% charge more than 180$
• The median annual is 108$
• Lower than estimated in 2018 (189$)*
21
Panagiotis Papadopoulos ~ panpap@brave.com
0%
20%
40%
60%
80%
100%
100 200 300 400 500 600
CDFofpaywalledsitestested
Annual subscription cost (USD)
Cumulative distribution of the subscription cost per
website for a 12-month content access.
*https://www.niemanlab.org/2019/05/across-seven-countries-the-average-price-for-paywalled-news-is-about-15-75-month/
22. Content popularity and link rate
The impact of paywall use on the site’s backlinks
a) Alexa top 1K news sites (baseline)
b) 1K random paywalled news sites of our dataset
0%
20%
40%
60%
80%
100%
101
102
103
104
105
106
CDFofwebsites
Sites linking to the website
Paywalled news sites
Alexa top 1K news sites
Distribution of the incoming site links per
news site (normalized by its Alexa rank).
In median values
paywalled sites get significantly (18.9×) less site links
22
23. Paywalls and privacy
• Advertising systems: users pay for content with
their privacy
• Paywalled systems: do users preserve their privacy
by paying the subscription costs???
• We paid subscriptions for 10 sites and we filter
their traffic
• Users receive trackers and ads for the content they
have paid
23
Requests captured for vanilla and premium user.
Panagiotis Papadopoulos ~ panpap@brave.com
24. One at a time
• How paywalls work? What are the different implementations?
• How popular paywalls have become?
• What are the most popular 3rd party paywall libraries and how such a library operates?
• What are the different policies (e.g., access control, subscription cost, user tracking)?
• How effective paywalls are at protecting premium content?
24
Panagiotis Papadopoulos ~ panpap@brave.com
25. Paywall circumvention
We manually test 32 websites:
1. browse pages till we trigger the paywall
2. we test a variety of bypassing approaches to
circumvent the paywall
25
Panagiotis Papadopoulos ~ panpap@brave.com
0%
10%
20%
30%
40%
50%
60%
70%
80%
Screen
size
IP
hiding
Adblock
plus
UA
m
odification
Reader M
ode
Paywall library
blocking
Pocket
Private
M
ode
Cookie
cleaning
Portionofbypassedpaywalls
Paywall bypassing approaches
Success rate of the different paywall bypassing
approaches.
Clearing the cookie jar alone can bypass 75%
of the paywalls
27. Paywall detector
• Websites manually labelled as ground truth
• Crawled websites (20 child pages):
- sequentially
- clean state per page
• Extracted Features:
- Visual Features
- e.g., look for popups, article obstructing divs
- Textual Features
- e.g., look for specific keywords (e.g., “subscribe”, ”remaining”, ”signup”)
- Structural Features
- e.g., look for differences in the amount of article text between the crawled versions (i.e., look for truncated articles)
• Accuracy:
- Besides the great heterogeneity of existing paywall implementations: AUROC of 0.74
Open-Sourced: https://drive.google.com/file/d/1_bR_v_6HJ72cN1DE2oVioaCf4vDs8k5a/view?usp=sharing
27
Panagiotis Papadopoulos ~ panpap@brave.com
28. Conclusion
The prevalence of paywalls trigger a shift from the free web to a pay-for-access web
Our analysis shows:
• There are 2 different types of paywalls (Soft: 66.7%, Hard: 15.7%) and a hybrid one (Freemium: 16.6%)
• 80.3% of the paywalled sites deliver News content
• Paywall adoption reaches 18.75% in US and 12.69% in Australia
• The median paywalled website allows 3.5 free articles
• Subscription costs vary a lot but the median annual subscription costs 108 USD
• Users get tracked and stormed with ads for the content they have already paid for
• Paywalls do not adequately protect premium content: Cookie jar clearing can bypass 75% of the paywalls
• We design an ML-based automated tool for detecting whether a site is using a paywall
Dataset: https://gist.github.com/panpap/68af1c99b49366dfce4044a354f6e1b8
Paywall Detector source: https://drive.google.com/file/d/1_bR_v_6HJ72cN1DE2oVioaCf4vDs8k5a/view?usp=sharing
28
Panagiotis Papadopoulos ~ panpap@brave.com