Vermarkterübergreifende Videostudie "Brands in (E)Motion"
Yahoo! Engagement Study
1. TECHNICAL REPORT
YL-2010-008
EDISCOPE: SOCIAL ANALYTICS FOR ONLINE NEWS
Yury Lifshits
Santa Clara, CA 95054
{lifshits@yahoo-inc.com}
December 20, 2010
Bangalore • Barcelona • Haifa • Montreal • New York
Santiago • Silicon Valley
3. EDISCOPE: SOCIAL ANALYTICS FOR ONLINE NEWS
Yury Lifshits
Santa Clara, CA 95054
{lifshits@yahoo-inc.com}
December 20, 2010
ABSTRACT: We present Ediscope — an system for measuring social engagement around online
news articles. Ediscope collects signals from Twitter, Facebook and Bit.ly. Using our link spotter
and social crawler we address a number of questions. What is a lifespan of a typical news story?
What are the typical engagement numbers per-pageview? Can social signals be used for pageview
estimates? How much improvement a social optimization can bring to a news source? Our first
results indicate that less than 20% of activity happens to an article after its first 24 hours. In average
a story has 5-20 social actions per 1000 pageviews. For most feeds, top 7 stories a week capture
65% Facebook actions and 25% retweets. The correlation between pageviews and social signals
is surprisingly low. Our measurements indicate a double digit improvement potential for social
optimizations.
4. 1. Introduction
Online news are on the way to become our primary source of information. In order to win the
competition and delight the users, the editors of online news have to constantly optimize their con-
tent strategy. Content strategy is a new applied discipline that addresses the following questions:
What should we write about? How many articles per day? How to allocate coverage shares be-
tween main topics? How to discover breaking stories? Which stories to promote within a website?
What is the most effective navigation structure for our content? Next to content strategy, there is
the emerging field of social media optimization (SMO): How to maximize engagement? How to
maximize secondary traffic from social sources (Facebook, Twitter)? How to grow the number of
followers, subscribers and fans?
To solve the problems of content strategy and social media optimization one needs both art
and science. As web news are inherently more measurable than print news, the role of science
is increasing. Until recently, most solutions were based on click-through rates, time spent, eye
tracking and pageviews. This information is typically available only for website owners. Therefore,
it was hard to create generic measurement and optimization solutions. Fortunately, in the last couple
of years, social signals emerged as a universal and public feedback mechanism. In this paper, we
present a study based on Facebook likes, links in Twitter, and clicks on Bit.ly links. The availability
of social signals for content strategy problems created the new research direction of social media
analytics [1].
Questions we address in this study: For how long an average article receives user attention?
Can we guess the pageviews counts from social signals? Can social signals be used to promote
best stories? Should editors focus on producing better content or on producing more content? How
much improvement can it bring?
Contribution. Our first contribution is the data engineering infrastructure we build for the project.
Ediscope system has modules for link discovery, signal monitoring, statistical analysis and visual-
ization. Ediscope data and lookup tool are available at http://ediscope.labs.yahoo.net.
Currently, Ediscope toolkit is available on request as it is subject to third-party API rate limits. Feel
free to contact Yury at lifshits@yahoo-inc.com to use Ediscope for your project or to order a custom
report on your favorite news source.
Our most surprising finding is the low correlation between social signals and the actual pageivew
counts. The gap is especially large for non-top new, in that case Pearson coefficient approaches 0.5.
To understand the role of these low correlations we introduce a simple user experience model. Under
this model we demonstrate a potential for double digit improvement at Gawker, Business Insider,
Change.org and Forbes blogs.
In average we see around 10 Facebook/Twitter actions per 1000 pageviews. Correlation between
social activities is higher for the top news than in the average case. Mainstream sources have much
more Facebook activity than mentions on Twitter. Tech media has the opposite situation. Facebook
actions are much more skewed to top news. Finally, Twitter signals have slightly better correlation
to pageviews counts.
5. Our results show that almost universally across news sources less than 20% of activity happens
after the first 24 hours. Feeds and frontpages drive attention to the latest content units. Search brings
traffic to “evergreen” content like Wikipedia. But there is no driver for materials with mid-range
(few weeks – few months) lifespan. Perhaps, we need a new promotion mechanism for this type of
content.
Remark on focus. When scientists work with real world data there are two mindsets. One can
focus on hard/intelligent tasks like model fitting and parameter predictions. This approach makes it
easy to judge the project by comparing accuracy of results to the previous work. The other method is
to measure the raw signals and turn them into actionable insights for domain experts. In this case, the
findings can be judged by novelty of measurements and importance of resulting recommendations.
This study follows the second approach. Here are our takeaway lessons for editors and product
managers of online news:
Create new promotion mechanisms for in-depth content. At the moment there is no middle place
between breaking news and reference content. Perhaps, we need dedicated feed, section and
frontpage module that highlight articles of mid-range lifespan.
Use social signals for content optimization. There is a serious gap between what content units are
most liked and what content units receive the most pageviews. In other words, user experience
can be improved by using Facebook likes and retweet counts to promote the most popular
content.
Check your engagement scores. If you see less than 10-20 social actions per 1000 pageviews,
your sharing functionality can be improved. Typically, it is as simple as getting the buttons of
the right size, at the right place and minimize the number of clicks to share your content.
Check your head/tail structure. If you have heavy head, improvements in quality and promotion
mechanisms should be your priorities. If you have heavy tail, your best opportunity is in
expanding content production. According to our measurements, one has heavier-than-typical
head if over 75% of weekly Facebook actions or over 45% of weekly retweets is concentrated
in top 7 articles.
1.1. Related work
Social signals (Facebook likes, Retweet counters, Bit.ly click counters) are relatively new phe-
nomena. In particular, Facebook Like button was introduced in May 2010, just 6 months prior to
this paper. Until now, social analytics research was centered around text-based signals [6, 7, 8]. To
our knowledge, we present the first temporal study of Facebook like counts.
Before social signals, researchers were looking into comment counts, Digg counts and Youtube
viewcounts. Tsagkias, Weerkamp and de Rijke developed several algorithms to predict the total
volume of comments shortly after publication [12, 13]. Paul Ogilvie measured and modeled total
comment counts across various RSS feeds as a part of FeedHub project [9]. Cha, Kwak, Rodriguez,
6. Ahn, and Moon performed a long tail analysis of Youtube and Daum videos [3]. Avramova, Wit-
tevrongel, Bruneel and De Vleeschauwer developed classifier that distinguishes videos with expo-
nential and power law popularity decays [2]. Salman and Rangwala showed how to predict a total
Digg count shortly after publication [10]. Spiliotopoulos studied correlations between Digg counts
and comment counts for most popular stories [11].
The key advantage of social signals comparing to comment/Digg/Youtube counts is their uni-
versality. Only now one can develop optimization/prediction/recommendation systems that will be
applicable to any news source on the Web.
2. Overview of Ediscope System
2.1. Architecture of Ediscope.
For our study we implemented a new social analytics system called Ediscope. It has four primary
components. Link spotting tool is taking RSS feeds as an input and check them regularly to spot
new links. In many cases, RSS feeds present proxy links in order to measure clicks from RSS
readers. In particular, Feedburner and Pheedo do that. In this cases we convert proxy links to
the original ones. The second component is signal crawler. It takes news URLs and calls public
APIs (Facebook, Bit.ly, TweetMeme) to retrieve the current numbers for a given story. We also
implemented custom scraping for pageview counts. After that, we have monitoring component
that re-crawl active links in our database regularly (by default, every hour). Ediscope’s monitor
computes the deltas to the previous crawl for measuring activity over the last interval. Monitoring
functionality is used for temporal analysis of social engagement. Finally, we call Google Chart API
for dynamic visualization of results at Ediscope’s website.
In its current form, Ediscope has certain limitations. First of all, APIs we use have strict rate
limits. In particular, TweetMeme only allows 250 requests per 60 minute time period. This forced
us to focus on smaller datasets. Secondly, the same news article can be represented by several
URLs. Sometimes, Facebook, Bit.ly or Twitter fail to recognize these links as the same object. As
a result, APIs return lower engagement numbers, missing likes, clicks and retweets on non-canonic
versions of an article. E.g. Wall Street Journal has different URLs for a story when you visit it
directly vs. when you visit it from the frontpage. Next, many top websites do not have RSS feeds
or their feeds do not work properly. For example, Yahoo’s today module, the central piece of its
frontpage, does not have a feed. In these cases, one has to use manual lookups or scraping. Finally,
Ediscope is using a pull mechanism to discover new stories. By the time we add an article to our
system, around 15% of its social activity has already happened. In the future, push mechanisms
such as PubHubSubBub can be used to address this issue.
There are several commercial systems in the space of social analytics. Postrank is a proprietary
article ranking algorithm that takes social signals into account. BackType is a lookup system that
retrieves the current values of social metrics. Unlike Ediscope, it does not have the fully accessible
temporal profiles or pageview extractor modules. Klout is using social signals to rate news sources
and Twitter personalities.
7. 2.2. Datasets.
We created three datasets for our study: temporal set, pageview set, head-tail set. For temporal
analysis we selected 10 RSS feeds from major US news sources. We used our linkspotting module
to discover 20 articles per source. Link spotter was checking RSS feeds every 10 minutes in order
to discover articles almost immediately after publishing. Then, we used our monitoring tool to
update social counts every hour and compute the corresponding delta values. As a result we have
got temporal social profiles for 20 articles at 10 sources. For pageview analysis we consider four
major content networks that explicitly show viewcounts at their articles: Business Insider, Gawker,
Forbes Blogs and Change.org. For every network, we picked three RSS feeds, launched our link
spotting module and kept it live until we spotted around 50-75 articles per network. Then we waited
for several days until the total social counts are close to their final values. Then, we used our crawler
to measure social counts and pageview counts for every article in our dataset. For head/tail analysis
we looked at RSS feeds of several major news sources. For every publisher, we used link spotter to
get all articles from a one week period (around 200 articles per feed). Then we crawled them once
to collect social counts.
3. Empirical Study
3.1. Article Lifespan
In our temporal study we track 20 articles from each of the following sources: Washington Post,
Gizmodo, CNN, MSNBC, HuffingtonPost, Yahoo News, New York Times, Engadget, Mashable,
and TechCrunch. On average, every story has 901 Facebook actions (likes, shares and Facebook
comments), 221 retweets and 660 clicks on from Bitly-shortened links. The following table repre-
sents percentages of activities for the first, second, third, forth and fifth interval of 24 hours after
publication. Note that the total share of activity is significantly less than 100%. This is due to ac-
tivity in the interval between the time a story was published and the time Ediscope has discovered
it. Out of all sources, Engadget articles have the slowest decay of activity and Yahoo News has the
sharpest decay.
8. Signals in average 1 day 2 day 3 day 4 day 5 day
Facebook 73.94 11.57 2.83 1.29 0.48
Twitter 70.71 5.11 1.72 0.69 0.37
Bitly 73.27 8.07 2.49 1.06 1.01
Engadget signals
Facebook 56.13 24.40 9.03 4.35 1.99
Twitter 71.27 9.24 4.12 1.28 0.71
Bitly 76.53 10.02 4.12 1.54 0.86
Yahoo News signals
Facebook 85.49 6.69 1.01 0.38 0.10
Twitter 84.80 4.21 0.33 0.13 0.00
Bitly 33.88 2.08 0.40 0.21 0.05
Figure 1: Average activity of Engadget article during the first 68 hours of track-
ing. Deep blue represents Facebook, light blue represents Twitter, yellow rep-
resents Bit.ly.
Figure 2: Social activity of Engadget article “BlackBerry users running out of
loyalty”
9. Here are our main observations:
• Majority (typically, over 80%) of social activity happens during the first 24 hours.
• Monotonicity. Majority of shapes are monotone or monotone after daytime correction (bump-
next-morning effect).
• Twitter is geeky. While mainstream sources like NYT, Yahoo, CNN, MSNBC and Washington
Post have up to 10 Facebook actions for one retweet, TechCrunch and Mashable have more
retweets than Facebook signals. The Facebook advantage over Twitter in mainstream news
indicates that it can be a more reliable signal for content optimization solutions.
• Non-original content has lower activity. HuffingtonPost has two patterns: one for original
posts, another for aggregated content. Five links from TechCrunch feed are re-posts from
CrunchGear and TechCrunch.EU and have much lower counts than TC-proper articles.
• User experience flaws. The sharing functionality can have serious affect on total amount of
activity. In particular, at New York Times Twitter buttons do not directly tweet the story, but
instead ask reader to use Twitter for logging into NYT.
The fact that most activity happens during the first day has serious implications for editors
and product managers of online news. As our study shows, the currently used mechanisms for
promotion (feeds, frontpage promotions, cross-linking) are only capable for driving the first day
audience. In such an environment, weekly/analytic/evergreen content is highly discouraged and
unsustainable. Thus, if a certain publisher wants to produce longer-lifespan articles, it should depart
from existing content promotion strategies. On a positive side, we feel that the opportunity of high
quality weekly/monthly analytic content is wide open in almost every vertical.
3.2. Per-pageview Statistics
Several online content networks display actual pageview counts. This allows us to compute
average amounts of social activity per 1000 pageviews. In some cases several top stories have
different activity pattern than the rest of the site. To get more robust results we compute averages
both for full sets of articles and the sets excluding top 10 articles.
Network Facebook Twitter Bit.ly FB (non-top) TW (non-top) BT (non-top)
Gawker 24.59 4.66 13.36 11.55 4.74 2.65
Forbes blogs 4.61 9.16 41.41 5.13 11.86 29.00
Business Insider 3.08 6.40 34.37 3.90 28.99 106.47
Change.org 4.43 2.74 3.54 8.69 4.12 6.25
Then we look at the Pearson correlation coefficient between social signals and the actual pageview
counts. We also compute correlations between Facebook and Twitter signals and between Bit.ly and
Twitter signals.
10. Network FB / PV TW / PV BT / PV FB / TW BT / TW
Gawker 0.92 0.95 0.93 0.95 0.95
Forbes blogs 0.35 0.40 0.63 0.34 0.63
Business Insider 0.93 0.54 0.65 0.65 0.87
Change.org -0.01 0.45 0.05 0.34 0.65
Excluding top 10 news
Gawker 0.47 0.63 0.41 0.47 0.35
Forbes blogs 0.12 0.34 0.55 0.31 0.56
Business Insider 0.34 0.43 0.53 0.50 0.80
Change.org 0.67 0.50 -0.09 0.47 0.75
To get a visual sense of correlations we present plots for Gawker and Change.org. Absolute
values are scaled to fit in the same space. The top-right point at Gawker plot is in fact far outside of
the chart (Gawker has one outstandingly popular story).
Figure 3: Correlation between retweets and pageviews at Gawker network
Let us make some observations from the above tables:
• On average articles have around 10 Facebook/Twitter actions per 1000 pageviews.
• With the exception of Facebook signals at Gawker, the top news have less social actions per-
pageview than the average stories.
• For the non-top news, correlation between social signals and pageviews is around 0.5. Recall
that Pearson coefficient is ranging from -1 (perfectly negatively correlated) to 0 (totally inde-
pendent) to 1 (perfectly positively correlated). Thus, 0.5 value means that social signals are as
close to perfect correlation as they are to total independence.
• In 6 cases out of 8, retweets have higher correlation to pageviews than Facebook actions.
• Change.org shows negative correlations in some cases. An article is more likely to get Face-
book activity if it has less pageviews. It turns out that “Social Entrepreneurship” section has
much more pageviews but the same (or even slightly lower) Facebook counts. Once we re-
move articles from this section, the correlation returns to positive value.
• As expected, bit.ly clicks are better correlated to retweets than Facebook signals.
11. Figure 4: Correlation between pageviews vs. and Facebook (dark blue), Twitter
(light blue) and Bit.ly (yellow) signals at Change.org. The gap in pageviews
represents difference in popularity between different sections of the portal.
Looking at our per-pageview results, one can try to reconstruct pageview counts for the rest of
the Web. The baseline guess would be around Facebook count (or Twitter count) times 100. As our
measurements show, there are more chances to accurately predict the pageviews for a top story than
to do so for an average article. And looking at our lifespan study, we recommend Facebook over
Twitter as the primary signal for mainstream sources.
What lessons can one learn from these measurements? At the moment the role of social traffic
in overall article success seems to be very small. For an average story there is a very low correlation
between social signals and its pageview count. When we include top stories to the picture, social
activity per pageview actually goes down. These observations hint that factors different from lik-
ability and social cascades are playing the leading role in pageview success. As a result, traffic is
allocated to not-so-likable stories.
Let us do the following mind experiment. Assume for a moment that Facebook count or Twitter
count represents the actual reader satisfaction score. Then we can compute the total user satisfaction
score as the sum of products between pageviews and Facebook/Twitter counts. Now, let us reallocate
pageview counts in a way that the top pageview value corresponds to the top Facebook count, the
second top corresponds to the second top and so on. Then, we can calculate the “optimal” user
satisfaction score. In other words, we want to check how much user benefit promotion-by-likability
can bring to existing content networks. Below is the table of our results.
Network FB increase TW increase FB increase (non-top) TW increase (non-top)
Gawker 1.019 1.026 1.330 1.181
Forbes blogs 1.566 1.403 1.796 1.341
Business Insider 1.047 1.342 1.402 1.227
Change.org 2.346 1.245 1.109 1.110
12. As we see, all networks have double digit potential to increase their user experience for non-top
stories. Forbes blogs and Change.org can significantly increase the overall user experience. Again,
Change.org results look a bit weird because it has two cluster of very different articles. One are
more likable, others get more pageviews. So the overall experience can be improved significantly.
Once we remove the top news (one cluster), the rest of the site can only achieve 10-11% increase. Of
course, our model for user satisfaction score is oversimplification, but it can be used as a first-order
approximation of possible improvement based on social signals.
3.3. Head vs. Tail Analysis
In our final experiment we collect links from RSS feeds at several US news sources over the
course of one week. These feeds have from 64 to 226 items per week. Then, for every source we
retrieve and sort the social counts for discovered articles. We compute the percentage of weekly
social activity that corresponds to the top story, top 7 stories and all stories outside top 7. We use
the constant 7 as a reflection of one-story-per-day strategy.
Feed Articles tracked Top item: FB / TW Top 7: FB / TW The rest: FB / TW
TechCrunch 182 32.3 / 4.6 61.5 / 16.8 38.5 / 83.2
Mashable 162 23.1 / 2.1 47.1 / 13.2 52.9 / 86.8
Wired 120 9.9 / 4.8 41.4 / 24.9 58.6 / 75.1
Engadget 200 44.3 / 18.9 68.7 / 27.5 31.3 / 72.5
Wall Street Journal 201 36.6 / 5.8 65.4 / 18.5 34.6 / 81.5
Vanity Fair 64 21.8 / 11.4 70.5 / 44.7 29.5 / 55.3
Yahoo! Upshot 109 28.8 / 26.0 75.7 / 59.1 24.3 / 40.9
Yahoo! Top News 226 20.9 / 9.1 45.6 / 29.6 54.4 / 70.4
All Things D 139 66.2 / 17.2 89.2 / 41.5 10.8 / 58.5
Gizmodo 82 36.1 / 5.2 70.0 / 21.1 30.0 / 78.9
Aol News 78 19.2 / 11.4 85.1 / 44.4 14.9 / 55.6
One can made several immediate observations:
• Typically, around 65% of Facebook actions and 25% of retweets happens around top 7 stories.
• Facebook activity is much more heavy-headed than retweets.
• Yahoo! Upshot is the most heavy-headed blog in our study. Only 40% of retweets and 25% of
Facebook actions happens outside top 7 articles. Perhaps, it is so, because Upshot has very few
dedicated readers, and majority of action corresponds to a few Yahoo-wide promoted stories.
AllThingsD is also fairly heavy-headed.
• Mashable and Wired have the heaviest tails. Both have over over 75% of retweets and over
50% of Facebook actions outside top 7 stories.
Let us offer an interpretation from a content optimization perspective. The heavy head of social
activity means that the total user satisfaction can be improved by improving quality of the tail
13. content or by finding a better ways to promote it. The heavy tail indicates that the tail content
has its own audience and is well promoted. Thus, the best opportunity for heavy-tail websites lies
in expanding its content production. For more accurate interpretation, one should track individual
consumption patterns. Goel, Broder, Gabrilovich and Pang have recently shown that the purpose of
the tail inventory is not only to capture new users but also to better serve users who like some-of-
the-top and some-of-the-niche [5].
4. Roadmap for Social Analytics
There is a number of natural next steps for Ediscope framework. First, we can turn our mea-
surements into rankings of news sources and individual writers by their engagement scores and
lifespans of their content. It is also informative to compare the signals for the same story covered
at different destinations. Then, one can do in-depth factor analysis to find what features of content
and audience increase the overall success of an article. In particular, what is the role of frontpages
and other in-site promotions? Another important step is to release datasets for research community.
Pageviews vs. social signals spreadsheet is likely to be published first. As we identified the problem
with content of mid-range lifespan, one should have a closer look at this area. Videos and products
have a longer lifespan and should be studied through social signals. And, of course, Ediscope should
collect larger datasets to make its findings more robust.
The general direction of using social signals for content management is wide open. Here is the
overview of the key areas.
Data engineering. Ediscope establishes the basic architecture for news analytics systems. For
a typical study, one needs content discovery, signal crawler, monitoring, statistical analysis and
visualization components. Looking into future, the research community will benefit from a shared
public stack of these tools. We do not want to recreate the same code again and again. Ediscope
platform can be extended in a number of ways. Of course, we need more signals: StumbleUpon,
Delicious, Yahoo Site Explorer, Digg, Spinn3r, comment counts, signals from public and private
hit counters. In its future versions, Ediscope can incorporate content metadata: author, publisher,
keywords, topics, headlines, tags, full text, content type, date and time, staff/guest/sponsored. User
data can be harder to add due to privacy concerns, but eventually it will be a part of analytics
systems. We need real-time content discovery and signal stream processing. Higher rate limits
should be negotiated with API providers. Then, there should be a way to add prediction, ranking
and optimization algorithms on top of basic infrastructure.
Measurements and modeling. Every category of web content can be a subject of social ana-
lytics: video, products, movies, books, websites, blogs, newspapers, magazines, TV shows, and
content farms. One can focus either on a particular vertical or on a content network (Yahoo, MSN,
Aol). A number of metrics can be created based on social signals: content lifespan, engagement
score, engagement-per-visit, share of social traffic to overall pageviews. Once we focus on a certain
content source and a metric, it is time for factor analysis. How do features of content, audience
14. and user interface affect the social success of a published material? Then, we need comprehensive
industry studies: the baseline numbers for social engagement and leaderboards. Finally, one can
create a taxonomy of engagement scenarios of content units.
Content optimization. Of course, the ultimate goal of social analytics is not just to collect data
and compute some metrics and rankings. The real impact is in using social insights for making
better publishing choices. Every online publisher faces the following issues: Choose stories and
topics to cover. Balance recency and importance in news coverage. Optimize headlines. Optimize
article length. Optimize in-network promotion. Rank its own stream of news [4] and make the best
selection for the frontpage. Find and fix underperforming areas. Optimize user interface. Make the
best content easy to discover. To conclude, the future of Ediscope and other analytics systems is to
recommend choices that maximize social engagement.
Acknowledgement. Author thanks Benjamin Moseley and Silvio Lattanzi for fruitful discussions
at the early stage of this project.
References
[1] Workshop on Social Media Analytics, 2010. http://snap.stanford.edu/soma2010.
[2] Z. Avramova, S. Wittevrongel, H. Bruneel, and D. De Vleeschauwer. Analysis and model-
ing of video popularity evolution in various online video content systems: power-law versus
exponential decay. In INTERNET’09.
[3] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. Analyzing the video popular-
ity characteristics of large-scale user generated content systems. IEEE/ACM Trans. Netw.,
17(5):1357–1370, 2009.
[4] G.M. Del Corso, A. Gull´, and F. Romani. Ranking a stream of news. In WWW’05.
ı
[5] S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: Ordinary people
with extraordinary tastes. In WSDM’10.
[6] J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news
cycle. In KDD’09.
[7] M. Mathioudakis and N. Koudas. TwitterMonitor: Trend detection over the Twitter stream. In
SIGMOD’10.
[8] M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we RT? In
SOMA’10.
[9] P. Ogilvie. Modeling blog post comment counts, 2008.
http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/.
[10] J. Salman and H. Rangwala. Digging Digg: Comment mining, popularity prediction, and
social network analysis. In WISM’09.
[11] T. Spiliotopoulos. Votes and comments in recommender systems: The case of Digg, 2010.
http://hci.uma.pt/courses/socialweb/projects/2009.digg.paper.pdf.
15. [12] M. Tsagkias, W. Weerkamp, and M. de Rijke. News comments: Exploring, modeling, and
online prediction. In ECIR’10.
[13] M. Tsagkias, W. Weerkamp, and M. de Rijke. Predicting the volume of comments on online
news stories. In CIKM’09.