Reading the World Wide Web
The internet is now a mainstream media, the majority of households in the UK have broadband, and the
user numbers are continuing to growi. These users are constantly viewing and creating additional online
content meaning that the debate online grows and changes by the minute. This collective action creates large
discussion pools of all things people communicate to one another. If one was able to take in and understand
all of these posts they would in short, immediately know everything everyone is talking about. This would give
the ability to know when information first breaks, who is saying what, and why.
However the sheer scale of the internet means that it is impossible to process all of the information at once.
To learn anything about what is being said from the collective discussions means that sorting, categorisation
and filtering is required to channel content into relevant intelligence.
If all discussions taking place on the internet could be placed into a usable database, the problem of scale
would be solved. While treating the internet as a database has proved challenging in the past, new tools and
techniques now allow access to the volume of online data in a usable format.
At present, relatively few organisations are taking advantage of these tools to inform business decisions.
This is because until recently, there was no way of collecting and measuring the rapidly changing content
from the many online sources in a meaningful way. There was therefore no efficient way of knowing the
nature of the online discussion, and how changes in the discussion actually reflect real preferences.
In this paper we will show how the changes in online discussion do reflect real world changes. We will
compare changes in the online discussion with real world events to show that the two are linked, basing this
paper on our case study: the UK Election.
The UK election is a topic that many people will be familiar with and for which there is a significant source
of polling survey data available. This data will be used throughout this paper for comparison with analysis
obtained from measuring the online discussion. Some of the study results were unforeseeable:
Changes in daily election poll results could be estimated by measuring the changes in the relative amount
of online discussion
We find that ‘traditional media’ maintains a high level of influence, and that the influence of ‘social media’
The Lib Dem’s responded to attention, like a new brand in an established market place
Labour and the Conservatives had a joint interest in preventing discussion of the Lib Dem’s
‘Bigot gate’ hurt Nick Clegg
Gordon Brown was unpopular; and changing sentiment towards him during the campaign was correlated
with Labour performance.
These results suggest that politicians and parties can gain valuable insight from studying the online debate.
In the sections that follow, this paper will discuss the relationship between the online discussion and the poll
results; introduce some of the tools and techniques now available for studying the online discussion further,
and apply these to the UK Election campaign to show how the internet can provide new insight regarding
people’s preferences for brands.
InfluenceMonitor™ – a tool to study the online debate
What is the most valuable resource in the information age? Not likely data, according to Google, the number
of unique URLs online surpassed 1 trillion in 2008ii. What is valuable is the ability to convert all this data into
actionable intelligence. InfluenceMonitor is the software solution developed by Onalytica which collects,
organises, analyses and presents unstructured online data as meaningful insight.
Searching or Crawling the Web
Before analysis can begin, data must be collected from the Web. For this case study, InfluenceMonitor searched
the web for all articles that it could locate mentioning the UK Election and associated synonyms. From 23rd
March to 6th May 2010 a sample of approximately 80,000 web pages was obtained from the Internet.
Creation of Sub-Categories
Once collected, these pages can be further categorised based on the user’s preference. For example: the
party names or the leader’s names: ‘Gordon Brown’, ‘David Cameron’, or ‘Nick Clegg’ and the synonyms of
each, can be established as sub-categories for further analysis. The sub-categories, may be analysed on their
own, or compared with other sub-categories, such as ‘Key Issues’ or ‘Political Parties’ and their associated
Having established categorisation of the data; analysis of the Buzz, Influence and Sentiment can be performed.
For the purposes of this paper, we will take advantage of the findings produced by survey and polling agencies
offline and compare these with the results we discovered online.
Voter preference was estimated throughout the election campaign by the polling agencies, such as YouGov/
ICM and MORI, the results of their polls have been averaged to make a daily poll-of-polls. These poll results
were sourced from http://ukpollingreport.co.ukiv, and they are used here as a source for comparison with the
online sample of the debate. (This data is available from Onalytica on request).
Using our tools and the relevant survey data, we will consider the following questions:
Does online attention reflect the real-word? (Is the online debate representative of the total debate?)
What is the value of measuring influence? Is Share-of-Attention valuable?
Does online attention reflect the real-world?
Attention is a scarce resource; those who don’t have it want it, and those who do have it want more. Attention is
a necessary first step towards brand awareness. Without obtaining attention it is not possible to communicate;
in politics as in business, the ability to communicate a message is highly sought after.
PR and advertising are conducted with a desire for attention and awareness since brands with attention are
often selected over, or preferred to those without it. Performing analysis of Buzz enables a sound estimate
of relative attention.
We can test the degree to which attention equals selection in the UK Election by comparing the amount of
attention the political parties receive online to the changes in voting preference calculated by the polling
agency. To do this, we simplified the poll results and Share-of-Buzz to represent the three main parties then
tested whether or not the changes in each were correlated.
The sum total of all the discussions online is commonly called Buzz. Buzz is comprised
of web posts freely and publicly available that meet the search criteria; in this case
‘UK Election’ and synonyms of this.
Buzz is sometimes referred to as “User Generated Content” or simply “Word-of-Mouth”.
The data sample includes articles and media coverage, as well as people commenting,
asking questions, submitting reviews, or airing their grievances. A brand’s Share-of-Buzz
is a representation of the brands share of people’s attention.
In Buzz analysis, all pages are treated equally; there is no allowance for the weight of the
stakeholders in the debate. So for example in Buzz analysis, the BBC (www.bbc.co.uk); a
site many people frequently refer to as a source of information about the election, counts
the same as the teen magazine Nuts (www.nuts.co.uk); which only had one article related
to the election in our samplevii. To solve the problem of equal weighting, for each page
we calculate the stakeholders influence.
This test of attention is complicated by potential bias. Although the internet continues to be a growing
part of everyday life for most people in the UK, not everyone using the internet contributes to online content.
This raises the second question of whether or not the discussions online are a good representation for the
Our finding is that the Share-of-Buzz is a reliable predictor of
poll results. This means the evidence suggests both that the
online debate reflects the offline debate, and relative attention
Taking some of the results from our study of the UK Election, figure 1 below shows the Share-of-Buzz
alongside the next day share of the poll results for the three main parties during the election campaign.
Looking at all three parties on one graph is busy, but one can see that:
In general, it appears that the Share-of-Buzz seems to track the next day poll results: when Buzz for a party
increases, the next day’s poll results increase, and vice versa
There is an apparent exception at the 28th April, this corresponds with ‘Bigot-gate’ and will be discussed in
more detail below
Figure 1: Share of Poll Results
and Share of Buzz -1 day,
Three Main Parties 6th April to
6th May, Index 6th April = 100.
Figures 2 to 4: Test results by party.
The X-axis (or horizontal) of these images in
Figures 2-4 show the daily change in the party
Share-of-Buzz; this is plotted against the next
day Share-of-Poll results on the Y-axis (or
vertical). One can see that in each case there
is a strong correlation.
The relationships may be interpreted as
follows; on average, a 10% increase in a
party share of the total UK Election discussion,
the day before a poll, resulted in:
a 9% increase in poll results for the Tories
an 8% increase in the polls for the Lib Dem’s
and a 4% increase in Labour’s share of the
Note: increased discussion is best correlated
with the Lib Dem’s performance and least
correlated with the incumbent (Labour)
Similar findings to the above have been replicated elsewhere; for example the Tweetminster Website also
To read 80,000 found that the amount of discussion on Twitter correlated to the poll resultsviii.
web posts from
a 1 month sample The patterns are also similar to those seen in analysis of other markets. For material products such as cars,
would take books, or films, changes in relative attention is often a leading indicator of a change in market share through
1,320 hours, or sales. Here we find that, although political parties are different from products: for example they are intangible
33 working weeks. and have no price, the pattern is the same: the level of attention reflects a preference of one brand over
another. The correlation is often particularly strong for new and growing products.
The correlation is so significant in this election analysis that, those maintaining a daily ticker of the change
in Buzz could have made reasonably good predictions of the future poll results. For the political parties
themselves, the cause of the changes in Buzz is important; understanding what gets attention could
power the creation of a superior promotion strategy. Understanding Buzz by reading the entire
online discussion would be difficult, if not impossible (there were 80,000 web pages in this one month
sample, if spending just one minute on each page, it would take about 33 40-hour weeks to read them all).
Instead, we can automate this process by using a tool to create queries and understand the content more.
An important insight would be to learn who is influencing the debate. When considering Buzz, all voices are
treated equally; there is no attempt to consider the weight of a voice or the source of the conversation. In the
next section, we look more closely at who instigates the Buzz by measuring who is influential.
The Value of Measuring Influence
Information overload occurs when more information than is necessary to make a decision is received (or that
can be understood and digested in the time available). If one was to listen to all of the Buzz about the UK
Election, they would hear a lot of duplicated content being passed from one stakeholder to the next. Imagine
knowing where relevant information that people care about is generated on an up-to-the minute basis. This
would mean having the ability to target those who generate the most interest, and potentially taking action to
generate their interest. Measuring influence provides us with this information.
Without a large database of categorised and dated discussion, measuring influence without encountering
heavy survey bias and significant cost was not previously possible.
Below we use the influence measure to answer two questions: i) what was the influence of ‘traditional media’?
And ii) what was the influence of ‘social media’ in the UK General Election?
The influence score is the best method available of measuring the relative authority of
one source compared to another on a topic.
Here, the influence score is objectively measured using Leontief’s input-output modelix.
This model is widely used in economics, but also the academic world for the purposes
of citation analysis. The influence score is topical, which is important, because few
stakeholders are universally influential. A site with an influence score of two may be seen
as two times as influential as a site with an influence score of one. In the UK Election
debate, the influence score ranged from 1 to 20, meaning some stakeholders had twenty
times the influence of others in this debate.
Influence presents a valuable metric because it gives the ability to estimate which
stakeholders are the best for communicating a message: influential stakeholders generate
the content that others make reference to. Thus content provided by the stakeholders
with high influence often leads to significant Buzz.
Knowing the relative amount of attention that a brand or product receives from influential
stakeholders and how this changes is important. Increasing Share-of-Influence represents
more attention, leading to greater awareness which typically leads to preference or
selection. There are instances where increasing Share-of-Influence does not lead to
increased preference, especially if discussion surrounding a brand is very negative. To
check if discussion around a brand is negative we also factor in sentiment.
Who was Influential in the UK Election debate?
Knowing who is influential is a common topic during elections, the influence of media agencies is
often heavily scrutinised. For instance, a Times story asked: “Do Newspapers win Elections?” However,
the discussion of who is influential typically lacks quantitative evidence. The influence score
is useful in answering this question.
The major UK newspapers or ‘traditional media’ have websites that often provide the same report content
as their printed versions. If the newspapers were not influential in the UK Election we would expect them
to rank poorly in the measured influence score this, however is not the case. Many of the most influential
stakeholders in the UK election debate were in fact ‘traditional media’ (see Table 1). As a major source of the
discussion the papers have the ability to present information commentary that drives debate, their influence
means they can set the agenda for much of the discussion.
Table 1: The Most Influential Stakeholders in the UK Election 2010.
www.telegraph.co.uk ........................................................... 14.74
www.timesonline.co.uk ....................................................... 9.46
www.dailymail.co.uk ........................................................... 9.04
www.libdems.org.uk ............................................................ 8.45
www.caledonianmercury.com ............................................. 7.19
www.ft.com .......................................................................... 4.02
www2.labour.org.uk ............................................................. 3.70
www.thesun.co.uk ................................................................ 3.64
www.news.sky.com .............................................................. 3.55
www.libdemvoice.org .......................................................... 3.28
www.thisislondon.co.uk ...................................................... 3.28
The potential influence of Social Media was heavily discussed in the lead-up to the election, for example: “Will
this be a Social Media Election?xi”. There is evidence that Social Media reflected the wider election debatexii,
increased the speed and intensity of debatexii and in some casesxiii was used effectively as a communication
tool to engage with a network of local voters. However, a low influence measurement indicates that Social
Media does not frequently generate discussion that is cited outside of those networks. Consistent with
findings elsewherexv, the content is more typically generated by the larger media agencies.
This finding means that to gain attention the political parties could take actions to ensure they will be
discussed by the ‘traditional media’ as the weight of this media in the overall discussion is significant.
Before attempting to attract attention of key stakeholders, analysis of what is being said about brands should
be undertaken. Indeed, not all discussion is positive and especially for mature brands; not all news is good
news. Thus, not all discussion is expected to have a positive impact on brand performance.
Perception of the Marketplace from the
Armed with knowledge of who is generating discussion in the UK Election debate, we now seek further
understanding of the market by investigating what is being said. We begin by discussing the products and
the dynamics of the market overall, before presenting a few examples of product-specific issues including:
A market can be characterised by the way it is discussed. In the UK Election, the data suggests that there is a
clear grouping of the three major parties, with the Lib Dem’s on one side and the Conservatives and Labour on
the other. From the onset, Labour and Conservatives were discussed most with the Lib Dem’s seeking growth.
Figure 5 shows the Share-of-Buzz amongst the three main parties during the campaign. The Lib Dem’s began
with a relatively small Share-of-Buzz, but made significant gains following the launch of Nick Clegg.
Figure 5: Share-of-Buzz 6th April – 6th May.
The Lib Dem’s performance relative to the amount of online discussion was similar to a new brand entering
a mature marketplace: increased attention from influential stakeholders translated into improved poll
performance. The Conservatives and Labour responded like mature brands. For mature brands mention by
influential stakeholders remains significant, but also sentiment becomes more important.
Labour and the Conservatives had a common interest in
diminishing the Lib Dem’s appeal because increases in the
latter’s share of the discussion was highly correlated with losses
in the share of intended votes for both parties.
Whenever the Lib Dem’s increased their share of discussion amongst the influential stakeholders both the
Conservatives and Labour lost out. The relationship is shown graphically in Figure 6 and Figure 7. The X-axis
displays the change in Influence; whilst the Y-axis displays the next day polls for the Conservatives (Figure
increase in the Lib Dem’s 6) and Labour (Figure 7). In both instances, we see a very strong negative correlation.
to approximately a Numerically, the correlation may be interpreted as follows: a 10% increase in the Lib Dem’s Share-of-Influence
led to approximately a 6% drop in Tory, and 4% drop in Labour Share-of-Poll results.
drop in Tory, and
drop in Labour Share-
For the Liberal Democrats, the above pattern suggests that attention from influential stakeholders is key to
stealing market share. For the Conservatives and Labour, this relationship raises the interesting question: ‘what
if they had never discussed the Lib Dem’s?’
At the start of the campaign for market share, Conservatives and Labour both had relatively well-known
products; in this context, the leaders Cameron and Brown. The third brand, the Lib Dem’s, threw in a new,
relatively unknown product which took the public by fascination: Nick Clegg.
The Key Products
How important were the leaders?
Much of the literature surrounding Political Marketing suggests that political formations may be analysed as
one might analyse other brands competing for market share1. Thus, we regard the political parties as brands
and their people and policies as products. In this election, it was the party leaders that were the key products.
The table below shows the ratio of leader mentions to party mentions in the same sample.
Table 2: Ratio of Pages Mentioning Leaders to Parties 6th April to 6th May
Leader on Pages Party on Pages Ratio
Labour 18,100 33,700 0.54
Tory 17,300 32,700 0.53
Lib Dem’s 13,200 18,000 0.73
The quantity of mentions that the leaders received compared to their parties is impressive: Gordon Brown and
David Cameron were mentioned at a frequency of more than 50% of that for their brands. For the Lib Dem’s
the party leader mentions relative to the party were significantly higher, more than 70%. This is indicates how
much attention Nick Clegg actually got.
Brand and Product Specific Issues
The Nick Clegg Effect, the product and the Brand
Nick Clegg’s soaring popularity generated by inclusion in the first televised election debate has been well
documented. However, the success of the Clegg product did not translate into equal success for the Lib
Political Marketing; Jennifer Lees-Marshment, 2009 Political Marketing, Routledge. 10
Figure 8 below shows that from 6th April until 6th May, there was always more influential debate on Nick
Clegg compared to his peers, or than there was for the Lib Dem’s compared to Labour and the Conservatives.
Relative to David Cameron and Gordon Brown, Nick Clegg’s Share-of-Influence increased from 20% on the
day before the first televised debate to 35% one day after. However, the increase for the Lib Dem’s (relative to
the Conservatives and Labour) was not as significant rising from 11% to only 17% over the same period.
This discrepancy suggests that Nick Clegg was not tightly associated with his brand. The data lends support to
the notion that if the party had taken action to align the brand further with their leader, it is not inconceivable
that overall performance could have been improved.
Nick Clegg’s Figure 8: Nick Clegg’s Share-of-Influence among the Three Leaders vs. Lib Dem’s Share-of-Influence
Share-of-Influence among Top Three Parties 6th April – 6th May
20% on the day
before the first
televised debate to
2nd live TV debate
1st live TV debate “Bigot-gate”
one day after.
2rd live TV debate
Nick Clegg Lib Dem’s
The Impact of ‘bigot-gate’ on Nick Clegg’s Spotlight
Shifts in attention for any reason can represent a crisis for those losing the interest. Tracking the online debate
allows immediate measurement of how much debate a brand is receiving, when there is a shift - players
should consider action.
Figure 8 above shows that following the second TV debate (22nd April), Nick Clegg’s share of the influential
debate dropped from 34% to 25% over the week, similarly the Lib Dem’s share among parties dropped from
17% to 13%.
The reason for the dramatic shift away from Nick Clegg can be understood by analysing the key issues being
discussed. The key issue analysis shows that ‘bigot-gate’ was largely to blame. The ‘bigot-gate scandal’
generated huge amounts of attention to shift from Nick Clegg to Gordon Brown.
There were approximately 4,600 posts within the sample mentioning ‘bigot’ and associated terms during the
election campaign, nearly all occurring from the 28th April. For comparison, despite the short time frame; this
is more than twice the number of posts mentioning the expenses scandal (2,300), more than welfare (3,600),
and more than half the size of large issues, (that are also commonly used terms!) such as environment (7,900)
and crime (8,300). Thus, ‘bigot-gate’, because it was discussed mostly in terms of Gordon Brown, was likely
a major cause for the loss of influential discussion about Nick Clegg.
This loss of attention was never regained; by being excluded from the discussion on ‘bigot-gate’ Nick Clegg
had lost the spotlight, and thus the ability to promote himself and his brand.
Note that Gordon Brown’s influence boost due to the ‘bigot scandal’ did not translate to an equally rapid poll
increase for Labour. This can be explained by considering the sentiment towards him – which is considered
on page 13.
Conservatives, Labour and the Influential Stakeholders
The mature brands responded differently to mentions from key stakeholders. Figure 9 below shows that
influence was a predictor of next day Conservative poll results: a 10% increase in Share-of-Influence resulted
in a 4.5% increases in the share of poll results.
Figure 9: Conservative’s Share-of-Influence and Share of Poll Results, 6th April – 6th May
The Conservatives had a special relationship with many of the influential stakeholders (refer back to Table
1, page 8) in the debate. 7 of the top 15 most influential stakeholders officially endorsed the Conservative
partyxvi. By contrast, only two endorsed the rival Labour party. This positive endorsement contributes to the
explanation of influence correlating to Conservative poll results, but not with Labour’s (influence was only a
significant predictor for Labour poll results if sentiment was included), because the influential stakeholders
were unlikely to discuss the Labour brand favourably.
Party Brand Sentiment
Adding sentiment analysis provides help to understand why Share-of–Influence can increase, without leading
to an increase in poll performance. We can apply sentiment analysis to determine the extent to which brands
or issues are discussed in negative or positive terms.
Sentiment analysis aims to determine the attitude of a speaker or a writer with respect
to a topic. The attitude may be judgment, evaluation, or the intended emotional
communication (that is to say, the emotional effect the author wishes to have on the
reader). Sentiment is provided by InfluenceMonitor for every post collected.
Table 3 below shows the percentage of debate, weighted by influence, that was measured as ‘very negative’
Following the first in two time periods. It shows that the Conservatives were consistently mentioned with less very negative
television debate, sentiment than Labour.
15% of influential
Among the political parties’ the Lib Dem’s sentiment was the least negative, but subject to a very high degree
of variance. Table 3 shows that prior to the first television debate, only 11% of the influential Lib Dem debate
was ‘very negative’. Following the first television debate, 15% of influential debate became very negative.
debate became This pattern could be an issue limiting potential upward growth for the Lib Dem’s, it seems as the party
very negative. gained more attention, and it disproportionately gained more very negative attention. If the party had this
information, analysis of the mostly negative posts, and the stakeholders producing negative discussion would
be a rational next step.
Table 3: Percent of Influence Weighted Debate Measured as ‘Very Negative’
23rd March - 15th April 15th April – 6th May Change
Labour 28% 27% -2%
Conservatives 22% 22% 0%
Lib Dem’s 11% 15% 4%
As with brands, sentiment scores for products can also be measured. Here sentiment has been measured for
the party leaders.
Our results echo the massive unpopularity of Gordon Brown as found by the 2010 BESxvii. Gordon Brown was
discussed more frequently on ‘very negative’ pages. 29% of the pages discussing Gordon Brown had very
negative sentiment scores compared to 24% for David Cameron and 20% for Nick Clegg.
Sentiment towards Gordon Brown was a significant predictor of the party poll results. On average, a
combined 10% increase in sentiment and Share-of-Influence resulted in a roughly 4.3% change in Labour’s
share of the polls.
Given that the sentiment scores were very negative; the polls did not convey good news. This is partially the
reason for Share-of-Influence alone not predicting poll results for Labour.
A Finger on the Pulse of Society
The internet consists of information created by millions of private, public, academic, business, and government
networks of local to global scope. In short, it contains a large sample of all information communicated
amongst society. Listening to this communication would mean taking the pulse of society: knowing what
people are saying, who is saying it, and knowing instantly.
However the Internet’s sheer scale means that listening to all of the noise would result in information
overload. Before it can be understood effectively, it needs to be harnessed into usable data. The tools applied
throughout this paper are designed to do just that. To focus on relevant discussion we use sub categories; to
insure that the main concerns of society are given the most attention, we measure relative influence and to
understand the public opinion on topics, we measure sentiment.
Applied to the election, InfluenceMonitor has shown that actionable information can be obtained at a relatively
low cost. Before these tools, knowing what people are saying would have otherwise required time, resource
and budget consuming surveys.
If these tools had been used during the election by any of the players, proactive strategies might have been
created to improve party performance. At present, it is not known if any of the political parties utilise the
internet in this way. In future, it is likely that some form of internet study will be regularly performed by all of
those interested in public opinion. As with any valuable technology, those who implement the tools first will
obtain an advantage; and those who implement these tools last, will be left behind.
In this paper we have demonstrated that the Internet can be put to use as a research database, but our
Election case study has not been comprehensive; readers might know of many other tests that would be
valuable to explore further on this topic. Researches interested in pursuing more study with the election data
set are welcome to contact Onalytica for information on how to obtain the data.
According to the ONS, 60% of households had broadband internet access in 2009 (a rapid increase from 40%
in 2006) http://www.statistics.gov.uk/pdfdir/iahi0809.pdf
The Official Google Blog, “We Knew it was Big” http://googleblog.blogspot.com/2008/07/we-knew-web-
A list of all of the sub categories and search terms used in the analysis is available by request.
Between 23rd March and 6th May there were 110 polls, on days when multiple polls were conducted we
have used the average of all poll results for that day. The 4th April had no poll data; for this day the average
results of the 3rd and 5th April polls were used, an alternative method would be to remove this date from
Thomas H. Davenport, 2001. The Attention Economy - Harvard Business School Press.
A variant of this analysis would be to include more parties, or an ‘other’ category. The simplification was
performed here to reduce analysis time requirements.
Incidentally, the one article was for an ‘Election Zombie’ video game. http://www.nuts.co.uk/4bd75ad04913c/
Tweetminster “Is word-of-mouth correlated to General Election results? The results are in.” 2010
Wassily Leontief, Input-Output Economics, New York: Oxford University Press, 1966.
Times Online September 30, 2009 “Analysis: do newspapers win elections?” http://business.timesonline.
Tweetminster “Is word-of-mouth correlated to General Election results? The results are in.” 2010.
An example of such use is explained by Stella Creasy, MP, in Walthamstow. http://www.thersa.org/events/
http://www.journalism.org/ “NEW MEDIA, OLD MEDIA, Social media’s agenda versus the MSM” 23rd May, 2010
Clarke, Sanders, Stewart, Whiteley 2010 ‘Electoral Choice in Britain, 2010: Emerging Evidence From the BES’
Founded in 2004, Onalytica offers a range of solutions designed to provide organisations with
forward-looking insight, enabling users to interpret trends and issues earlier, and make more
informed decisions, sooner.
Onalytica Ltd, 29th Floor, Centre Point, 103 New Oxford Street, London WC1A 1DD Tel: +44 (0) 20 7407 7642 www.onalytica.com