SlideShare a Scribd company logo
1 of 48
Download to read offline
UNIVERSITY COLLEGE LONDON, DIVISION OF PSYCHOLOGY AND LANGUAGE
SCIENCE
MSc Cognitive and Decision Sciences 2018-19
Seeking wisdom: A wiser crowd judgements through weighted methods
Ethics approval code: CPB.2013/015
Date of submission: 10th August 2019
Abstract 2
Introduction 3
Method 13
Design 13
Participants 15
Procedure 16
Results 22
Hypotheses and predicted results 22
Discussion 32
Review of hypotheses 32
Possible explanations for the current findings 33
Conclusion 35
References 37
Appendix 43
1
Abstract
Background:​ How can we make the wisdom of the crowd wiser? This question comes at a
time where society is faced with increasing amounts of misinformation dispersed at scale and
with speed through the internet. With the spread of “fake news” the facts no longer seem like
facts until verified by an unbiased entity. To battle this increasing amount of information,
society is seemingly turning to platforms where ‘the crowd’ or communities that can help
filter some of this misinformation through social proof techniques. However, there are many
weaknesses which can be found in these communities such as low-level contributors or
spammers having the ability to skew the crowd's judgement. In this study, we aimed to
examine whether weighted methods can be applied to the crowd to increase the crowds'
accuracy on judging truthfulness, making them arguably “wiser”.
Method: ​Through a popular online crowdsourcing platform (Amazon MTurk), participants
completed a questionnaire of 120 questions related to current affairs and world news. A
repeated measures design was used to detect any difference between the weighted methods.
Results: ​We found that by applying performance-based weighted methods the crowds'
accuracy improved significantly compared to the purely unweighted “democratic” method.
Interestingly, we found a significant decrease in accuracy when aggregating confidence.
Conclusion: ​These findings suggest that by weighting performance-based experts higher, the
crowds' accuracy significantly improves compared to an exclusively democratic
("unweighted”) aggregation method.
2
Introduction
Background
The internet has revolutionised the way human society interacts more than any other
invention since the printing press, which inaugurated the dissemination of printed information
to the masses in Europe between the years 1450 and 1500 (Dittmar, 2011). In this digital age,
the internet is now a stage for the “vox populi” to be heard. Digital platforms that allow
communities to be built and individual contributions to be made are making a large impact on
society. Conversely, the ease of access the internet has also allowed low-quality contributors
and spammers to more easily participate in crowdsourcing events. Howe (2006) originally
coined the term “crowdsourcing” in 2006, defined as outsourcing a task and function that
would traditionally have been performed by a single agent to an undefined network of
labourers. Crowdsourcing is now a popular way to distribute tasks in exchange for a
monetary reward or recognition.
The aim of this study is to determine if the veracity of judgements at the crowd level,
on a set of claims found in the news and general knowledge factual statements, can be
improved upon by applying various weighted methods to responses within the crowd.
Wisdom of the crowd
The commonly known “wisdom of the crowd” (WoC) effect is predicated on the
belief that a diverse group of independent people can achieve a better result, measured by
accuracy in this study, than any of the individuals in the group. James Surowiecki (2005)
outlined the WoC phenomenon, describing the remarkability of collective intelligence in the
right circumstances can be smarter than the smartest person in the group. Giving our inability
3
to recall at will information in our brain and humans’ bounded rationality highlighted by
Simon (1955), the advantages of collective decision making appears to remain broadly
supported (​Bonabeau, 2009)​.
Francis Galton (1907), an English Statistician, is known to have discovered the power
of crowd wisdom from his analysis of a weight-judging competition, which was held at the
annual show of the ‘West of England Fat Stock and Poultry Exhibition’. This competition
allowed attendees, mostly consisting of experienced butchers and farmers as well as people
with no specific expertise in cattle raising and the like, to guess the weight of an Ox, once it
had been slaughtered and dressed, for a small entry fee of 6d (sixpence, equivalent to £1.96 in
today’s money). The participants were incentivised with prizes for those who submitted the
most accurate estimate. Galton studied the 787 estimates submitted by the crowd and
discovered that the median weight of the crowd (1,197 lb) was 1 lb away from the actual
weight (1198 lb). A recent re-examination of Galton’s findings demonstrated that Galton’s
data indicated some errors in the original calculation and when corrected, taking the mean as
opposed to the median, the crowd produced a perfect estimate (Wallis, 2014).
The aforementioned conclusion from Galton supported the notion that there is a
higher probability in achieving the correct judgment or decision through a democratic
mechanism. This has led to further studies that endeavour to seek ways in which the crowd
could be enriched by identifying experts and eliminating the contribution of poor performers
(Budescu, Chen, 2014; Drew, 2018; Zhao & Zhu, 2014).
The search for an expert
Experts are typically perceived as someone who demonstrates good judgment and
high predictive accuracy. According to Merriam Webster, an expert is​ “one with the special
4
skill or knowledge representing mastery of a particular subject”​ (Expert, n.d.). In some
cases, experts are hard to come by, subjective, highly dependant on the domain and also
relative to the group with which they are compared. Nevertheless, many people believe that
experts are above average at making good judgements and decisions compared to another
person or group and turn to them for signals in times of uncertainty. For example, a professor
in neurology could be deemed an “Expert” in neuroscience knowledge due to their depth of
knowledge built over years of experience, but a novice in sports knowledge if that is not a
domain they allocate time to or have an interest in. The difference in performance between
experts, therefore, can vary significantly depending on the task. Previous research had argued
experts can be knowledgeable but bad at predicting outcomes (Camerar & Johnson, 1997)
and that the expectation of experts are unproven, causing the relationship between expertise
and accuracy to be unpredictable (Hinds, 1999; Norman et al., 1989).​ ​However, contrasting
research has demonstrated good reliable experts do exist and can perform well, for example, a
group of National Weather Service (NWS) forecasters provided reliable probability
predictions of precipitation and temperature (Murphy & Winkler, 1977).
Crowd diversity
Group diversity is an important attribute to consider in the context of wisdom of the
crowd which to counter-argues the expert judgment theory. Research has shown crowd
diversity to have a positive effect performance (Hong & Page, 2004). The study showed that
a team of randomly selected intelligent agents outperforms a team comprising of the
best-performing agent at problem-solving tasks. The paper focuses exclusively on functional
diversity, which is constructed on the agents perspective and heuristic, inspired by research
conducted in the organisational behaviour (Kephart, Hogg & Huberman, 1990; Miller, Burke
5
&Glick, 1998) and psychology literature (Polzer, Milton & Swarm, 2002; Nisbett & Ross,
1980). Cognitive diversity in the crowd is valuable even in the context of seeking experts. An
expert is commonly domain-specific and relative to the size of the crowd. The smaller the
crowd, the more significant the role of an expert. Having a diversified crowd fills
knowledge-gaps that a group of experts may form.
In terms of the application of group diversity, recent research has also studied the
effects on businesses when there is diverse leadership (Noland, Moran & Kotschwar, 2016).
In this study, they found there to be a positive relationship between the proportion of female
leaders and net revenue.
Group diversity and independence can also negate potential “group-think” effects. A
high-profile case study of the adverse effects of group-think phenomenon can be illustrated in
the ​Challenger ​disaster. The ​Challenger ​was an American space shuttle that exploded 73
seconds after launch. An investigation into the cause claimed the failure of the launch to be
due to group-thinking. Some key traits of group-thinking are having a concurrence-seeking
tendency and having a homogeneous group with similar ideology, social background, etc. this
can lead to symptoms such as over-estimation of the group, close-mindedness and pressures
toward uniformity which consequently can lead to defective decision-making (Janis, 2008).
This highlights the value of crowd diversity. Through crowdsourcing, group-think effects can
be circumvented as the crowd should perform tasks independently.
Crowd motivation
When assessing crowdsourcing it is important to consider the motivation of crowd
members. Understanding crowd motivation design and incentive structuring are integral to
mitigate biases, attract a high level of quality and participation. The motivation for this type
6
of participation sits well within the ‘belonging’ and ‘self-esteem’ categories of Maslow's
hierarchy of needs model (Maslow, 1943, 1954). There are a few ways in which a crowd
member could be motivated to participate and contribute to a platform, for example, an
immediate payoff as a payment (Lakhani & Wolf, 2005), delayed payoffs in the form of
signalling and stakeholder feedback (Hackman & Oldham, 1980). Further studies have
analysed different motivation design approaches in the context of “ubiquitous
crowdsourcing,” crowdsourcing on the go through mobile phones, to assess the various
effects of motivational design on crowd participation and contribution quality (Goncalves,
Hosio, Rogstadius, Karapanos & Kostakos, 2015). In this study, they found a positive effect
on participation rates from using various motivation techniques such as psychological
empowerment, self-efficacy and causal importance. It may be stated that increased incentives,
in particular, extrinsic incentives, can cause an adverse effect on quality as discussed in
previous papers (Kittur et. al, 2013).
Sourcing the crowd
Howe (2006) was the first to coin the phrase “crowdsourcing”, which describes a way
in which micro problem-solving tasks can be completed by a distributed network of many
agents. Brahbam (2013) has also described crowdsourcing as requiring the following
ingredients:
1) An organisation that has a task that it needs performed,
2) A community (crowd) that is willing to perform the task voluntarily,
3) An online environment that allows the work to take place and the community
to interact with the organization, and
4) Mutual benefit for the organization and the community
7
By harnessing collective-intelligence, growing companies and industries have been
able to leverage the wisdom of the crowd effects through the internet. A well-known
beneficiary of crowdsourcing is Wikipedia. Wikipedia, the highly successful free online
encyclopedia has effectively leveraged crowd coordination to curate content at a scale and
quality which would be hard to replicate by another company. Wikipedia has succeeded by
having a large number of agents that contribute to the platform improving the accuracy and
completeness whilst reducing the bias of the online encyclopedia. Wikipedia has over 2.5m
pages of information that has had over six million contributors (​Kittur & Kraut, 2008)​,
requiring a high amount of coordination in order to harness the crowd’s wisdom. Although
this is a high-touch approach, Kittur and Kraut found in this case organised crowdsourcing
was highly effective.
Approaches to judgement aggregation
In a recent study, Budescu and Chen (2014) explored aggregation weighted methods
which could improve upon the crowd's overall forecast accuracy. The study collected
responses on 104 “events” with 1,233 participants. Only 420 participants were used in the
data analysis as participants who responded to less than 10 events could unfoundedly skew
the crowd's effectiveness, reducing the efficacy of the study. The approach adopted was to 1)
identify experts, and 2) disregard non-expert contributions from the overall forecast. The
study observed a 39% improvement by applying and re-computing “expert weights”
periodically throughout the survey. The application of this approach, removing non-experts,
could be highly controversial in a democratic society raising the risk of displacing
non-experts which could reduce the level of participation driving by the disengagement of
so-called “non-experts.”
8
Fact-checking domain
There is a long history in fact-checking. Notably, the role of an entity or person
carrying out independent claims and facts validation appears dates back to 1913 when Ralph
Pulitzer and Isaac White of TIME magazine established the ​‘Bureau of Accuracy and Fair
Play.’ ​The primary goal of this particular bureau was to track repeated offences of
misinformation and disinformation, seeking reprimand or public apologies from the accused
(Machor, 2008). In recent times, a significant amount of disinformation and ‘fake news’ have
been prevalent in the media, causing a perceivably major influence on various democratic
processes such as the 2016 US election and UK Brexit referendum. Fake news can be defined
as news articles or claims that are intentionally and verifiably fabricated, likely misleading to
recipients (Allcott, & Gentzkow, 2017). The proliferation of misinformation and
disinformation on social media platforms have launched government-led inquiries concerning
the role in which social media platforms play in combating ‘fake news’.
In a recent report published by Full Fact , they discussed a fact-checking initiative1
(‘Third Party Fact Checking programme’)​ they lead with Facebook to fact-check posts which
had been flagged by Facebook as possibly false. These posts were then added to a queue for
fact-checking. Once the content was checked for the misinformation they published the
fact-check outcome and was able to attach the fact-check article to the Facebook post
together with a rating: ​False, Mixture, False Headline, True, Not eligible, Satire, Opinion,
Prank generator, and Not rated. ​Of the 96 facts, checks that were published as part of the
Third Party Fact Checking programme, 61.4% of the claims were rated as ‘false’..
Research has highlighted some of the effects of fake news; those who are exposed to
fake news are likely to believe them (Silverman & Singer-Vine, 2016; Pennycook & Rand,
1
Report on the Facebook Third Party Fact Checking programme (​https://fullfact.org/media/uploads/tpfc-q1q2-2019.pdf​).
9
2018). With technology today, the barriers to entry are low in disseminating information to a
large number of people, true or false. This low barrier allows for a higher frequency of fake
news exposure to individuals, to which psychological experimental studies have shown that
trust increases as familiarity increase through cognitive fluency (Begg, Anas & Farinacci,
1992; Alter & Oppenheimer, 2009), introducing the chance of familiarity bias. Another
example of this fluency effect can be found in a recent study by Pennycook et al. (2018).
Pennycook results suggested that social media platforms help to incubate belief of “fake
news” and prior exposure to misinformation can create an illusory truth effect, however,
within a plausible boundary. The high use of fake news in 2016 has also led to a decline in
trust of mass media amongst American voters, particularly, Republicans (Swift, 2016).
What might be an effective solution to tackle the rise of fake news? There are three
possible solutions envisaged: (1) reduce the number of fake news stimuli the masses are
exposed to through structural changes on social media and other news platforms; (2)
empower the crowd to effectively evaluate news and claims, demoting those which are
deemed false by the crowd; and (3) use machine learning and AI techniques to parse through
world news and flag misinformation. For the purpose of this paper, I will briefly expand on
points 2 and 3.
Empowering the crowd.​ Independent and non-partisan fact-checking entities (e.g.
FullFact, Politifact, Snopes and FactCheck) have been playing an increasing importance in
most recent times, however, independence alone is not sufficient, the source of funds should
be neutral in order to maintain strong efficacy with no outside influence political or
otherwise. There remains an imbalance between the velocity of news being “digitally printed”
and human fact-checkers ability to perform an in-depth investigation. Through online
crowdsourcing methods, distributed fact-checking could be a worthwhile solution. Attempts
10
have been made in this domain, in particular, a blockchain project called Avow (Shamlo, &
Alavi, 2018). The Avow project aims to counter disinformation and “fake news” by creating
a system which aggregates the crowd's anonymised opinion on various claims and news
items, rewarding the contributors with cryptocurrency ERC-20 Ethereum tokens. This23
approach uses a novel technology with an incentive system in place to encourage
participation for greater social impact. Although the recent cryptocurrency price volatility
may require further thought into the challenge that many blockchain projects encounter.
Wisdom of the machines.​ The relentless rate at which news is published
compounded by the fragmented distribution channels through the internet makes
fact-checking at scale a practically impossible human task. A recent MIT study (Baly et al.,
2018) explores tackling the problem of exponential “fake news” by using a machine learning
system to assess the source is accurate or politically bias for “fake news” detection inspired
by previous research (Horne et al., 2018a, 2018b). By collecting information from multiple
online sources such as Wikipedia, Twitter, the article itself, and​ Alexa Rank , the system is4
trained on a rich set of features on 1,066 news sources. The research found that news sources
which have a Wikipedia page can help predict factuality but not political bias, it also found
that analysing the Twitter account (not tweets) does not provide any significant indication of
factuality or bias. The best-performing feature was the articles from the source website which
highlights the importance of analysing the content of the news source, analysing article titles
alone was not strong enough. There is clearly a case and necessity for leveraging AI
techniques to tackle the scale of “fake news” and from this study, the advantage of assessing
2
ERC-20 is a technical standard for smart contracts used on the Ethereum blockchain
(​https://en.wikipedia.org/wiki/ERC-20​).
3
Ethereum is an open-source, public, blockchain-based distributed computing platform
(​https://en.wikipedia.org/wiki/Ethereum​).
4
Alex is a website traffic data provider that ranks websites (​https://www.alexa.com/siteinfo​).
11
the new source could reduce the need to assess claim-by-claim if there could be a bias and
factuality score easily visible on all publishing platforms.
Confident crowds
Can a confident crowd be trusted? A recent study carried out by Aydin and colleagues
(2014) found that by considering participant confidence can significantly improve accuracy,
particularly when combining performance with confidence. Previous studies have also shown
that expert confidence should be taken with care as Experts have been found to overestimate
their capabilities, for example, Glenberg & Epstein (1987) found that psychics and music
experts over-exaggerated their ability to understand text associated with physics or music,
respectively, compared to novices.
Overview of the current experiment
A related previous study (Drew, 2018) was unable to find any significant difference in
crowd accuracy from applying performance-based weighted methods to the crowd. The
inability to find an effect appeared to be primarily due to a small non-parametric sample (​n​ =
36). Drew also provided a 5-point Likert scale of truthfulness which then resulted in a
proportion of the sample to be removed as it was believed participants responding with
“neutral truthfulness” response did not provide a strong enough signal. For this reason, the
current study will aim to address these limitations by increasing the sample size (​n​ = 113),
increasing the question set (​n​ = 120) and replacing the 5-point Likert scale option with a
binary ​‘mostly true’​ or ​‘mostly false’​. The effects of this are unknown, however, the literature
reviewed above provides good reason to hypothesise the following:
(i) The crowd’s accuracy improves when removing low-performers.
12
(ii) The crowd’s accuracy improves when the expert judgements are mainly
considered while designedly maintaining “crowd diversity”.
(iii) The crowd’s accuracy improves when only confident answers are considered.
The current experiment aims to test these hypotheses using questionnaire accuracy as the
target variable. Given the limitations of the previous related study appeared to be mainly
caused by the sample size reducing the achieved power and causing a non-parametric data
(Drew, 2018), we aim to overcome these limitations by increasing the sample size to 113
(previously 37) and question set to 120 (previously 90).
Method
Design
For the purpose of this study, a repeated measures design was applied. Participants’
assessment of truthfulness and respective confidence responses to a group of 120
grounded-truth claims will be used to examine whether weighted methods can improve the
accuracy of the groups’ overall performance. The questions were presented as a survey to the
participants which consisted of 120 claims related to general knowledge (e.g. society and
culture) and current affairs. The claims used in the study covered a broad range of news
categories with the top 5 being economics (16%), politics (12.5%), science (11.6%), health
(10.83%) and immigration (6.67%). All the questions were sourced from highly reputable
sources with a strong skew towards independent fact-checking organisations. Each claim was
carefully qualitatively assessed to check for any potential biases. 61% of claims were related
to UK and USA news. The survey was slightly imbalance in favour of more truthful
13
statements (52.5%) than false statements (47.5%). Each page presented a short single claim,
that would be typically seen as a headline or tweet, to which the participant was required to
respond with whether they thought the claim was ​‘mostly true’ ​or​ ‘mostly false’ ​together with
their level of confidence ranging from ​‘unconfident’​ to​ ‘very confident’ ​(see figure 1)​.
Figure 1​. An example of the format of the question page for
each claim.
A large majority of the claims were chosen from unbiased and rigorous fact-checking
experts. Broader news sources were used, in avoidance of creating a very niche knowledge
quiz, using only fact-checking which is predominantly skewed towards political
fact-checking to base the survey on which would consequently create a very narrow and
nuanced survey for participants. Fact-checking organisations included ‘BBC Reality Check’,
‘Snopes’, ‘FullFact’, ‘PolitiFact’, ‘Fact Check’, ‘Channel 4’, ‘Washington Post’, ‘Africa
Check’ and ‘World Bank’.
It was decided against the idea of displaying participants with their result relative of
other participants or show their performance mid-way through the questionnaire as we were
not interested to see the effects of this type of stimuli on the participants' performance.
14
Self-reported questionnaire
All participants were required to self-report on a number of questions before the
questionnaire. These questions covered demographic information (age, nationality, gender,
religiosity, education-level and occupation). Participants were also asked ‘how happy do you
feel today?’ inspired by research which showed a positive correlation between emotional
intelligence and performance (Schutte, Schuettpelz & Malouff, 2001). Participants were also
required to indicate on a five-point Likert scale their attitude on free markets, Brexit vote,
frequency of news consumption and medium of news consumption. Another set of
self-reporting questions were focused on the participants level of expertise in news domains:
general news, economics and politics, science & health, pop culture, international affairs,
crime and art.
Participants
113 UK-based participants (62 males, 51 females) were recruited using a popular
crowdsourcing platform Amazon Mechanical Turk (MTurk). The crowd was incentivised
with immediate extrinsic motivation in the form of a $5.00 monetary reward upon the
completion of the survey, with an additional $20 Amazon voucher offered to the top three
performers, which took approximately 21 minutes to complete on average (equivalent of
£14.28/h). This is notably higher than the median wage level ranging between $1.38-2.30/h
previous research have shown Mturk crowd workers to earn (Horton and Chilton, 2010; Hara,
2018). The average age of the participants was 32.80 years (​SD = ​8.64, range = 18 - 67yrs).
Participants were only eligible to participate if they were over the age of 18 and spoke
English as their first language or had an equal level of fluency. Participants were given a brief
to read that outlined the general objective of the survey, this being that we were assessing
15
their ability to assess the truthfulness of information relating to the news and current affairs to
apply methods for validating information more efficiently and accurately. All subjects were
remunerated $5.00 for their participation. To incentivise high participant performance, a
bonus reward was of $20.00 Amazon vouchers were also offered to the top three
highest-scoring participants. To prevent bots from participating in the survey, participants
were required to pass a CAPTCHA task for the survey to commence. Participants were also
required to have 100% of their previous “Human Intelligence Tasks” (“HITs”) approved,
these were recommended criteria to apply from online forums to deter uncommitted
participants. The study has been approved by the UCL Ethics Board Committee (ethics
approval code: CPB/2013/015).
Procedure
Participants were recruited on the widely used crowdsourcing “knowledge-worker”
platform, Amazon Mechanical Turk (www.mturk.​com) ​and if they met they were eligible to
participate (UK-based and a 98% approval rate) they were sent to the survey brief and
questionnaire, which was hosted on the Gorilla platform. All participants were presented with
news headlines, quotes and factual statements, and a binary choice corresponding to whether
the statement was ‘mostly true’ or ‘mostly false’ along with their level of confidence in their
response. Each claim was presented on a new page. No ‘back’ option was given to the
participants to revise their response. From the learnings of the previous study (Drew, 2018), it
was necessary to limit the agreeableness to two binary options as opposed to a 5-point Likert
scale as applied in the previous research which resulted in a loss of sample data due to
insufficient direction on neutral responses. The confidence was set upon a 5-point Likert
16
scale, ranging from ‘highly confidence’ to ‘unconfident’ in 5 increments. The two data points
per claim provided the ability to assess another dimension to the Wisdom Of The Crowd
effects. Participants indicated their choice by clicking on it.
Splitting the question set
In order to establish the most efficient amount of questions to use for the two sets, 120
simulations were performed on the complete set of questions. The objective of running this
simulation was to find the range at which an increase in the number of questions added a
diminishing level of improvement in the crowd's accuracy. A random sample of questions
was picked each simulation run, which assessed the crowd accuracy. An additional question
was added each simulation and the crowd accuracy measured per simulation. A stabilisation
of crowd accuracy can be visually observ​ed in figure 2 betwee​n 40 to 60 questions, the
average accuracy within this range was 0.609. As efficiency in expert classification is a key
consideration for the purpose of this study i.e., the ability to identify experts within the with
the least amount of questions seemed highly beneficial to the application of the wisdom of
the crowd effect, 40/120 questions was chosen as the number of questions to split the two sets
by. This was also supported by a second test which took the mean difference in accuracy
between Set A (x) and set B (y). Three selected splits of questions were used to test the mean
difference between the two sets of questions (20/100, 40/80, 60/60). Results showed there
was a significant difference between question splits ​F​(2,336) = 9.83, ​p​ < .001; the Standard
Error was lowest for the 40/80 questions split (​SE​ = 0.0532) which also supported the visual
graph (see Table 1 and Figure 3).
17
Figure 2​. Line graph displaying the level of accuracy stabilising as
more questions added to the simulation group.
18
Figure 3​. Box plot showing the range of mean difference of each
question split.
Comparison of weighted methods effects
For the purpose of this study, five different weighted aggregation methods were
selected to explore the crowd’s wisdom in an attempt to improve upon the mean participant
accuracy of 61% (refer to table 1). In order to rank the participant's, the performance was
based on a random set of 40 questions (‘Set A’), which was used as the “rank set.” Participant
weights were calculated based on their individual performance on Set A. These weights were
then used in the aggregation of participants responses on the second set of 80 questions (“Set
B”). Set A and Set B questions were kept mutually exclusive. As a result of the aggregation,
the crowd’s response was binarised as ​mostly true​ or ​mostly false​ for each question. The
crowd’s responses were then compared to the expected answers, grounded by fact-checking
sources and implausible facts, on which an overall accuracy score was calculated on the
complete set of 80 questions. Given the large number of possible sample combinations that
could be produced with the question dataset , this(n, ) C(120, 0) 1.145568482e32C r = 4 =
19
process was simulated one thousand times to reduce the likelihood of high variance and
increase the reproducibility. Please refer to table 1 for a summary of the mean accuracy of
each weighted method which was then compared for statistical analysis.
For the purpose of data analysis, participants individual responses were enumeratedxp
q
to 1 (‘mostly true’) and -1 (‘mostly false’). Depending on the weighted method and the
participant's weight based on their relative performance on Set A , a weight would then bewp
a
applied to the participant's response and the aggregate response of the participants.wxp
q
p
a.
would indicate the crowds' answer.
20
Unweighted method (UW)
For this method, all participants have equal weight, the mean response of all participants for
each question was taken as the crowd's response. For example, if 75/113 participants voted
for the claim to be ​mostly true​, the result would be 0.32 (75-38)/113. A positive mean
indicates the crowd believes the question to be ​mostly true​ and a negative mean indicates the
crowd believe the question to be ​mostly false​. The same aggregation approach applies to all
tested weighted methods.
Performance-weighted method (PW)
Participants who were wrong more than they were correct scoring .50 accuracy in set A were
given a zero weight = 0 when aggregating the crowds' response on Set B.wp
a
Expert weighted method (ExW)
Participants who were identified to be a “top performer,” defined by being in the top 25
percentile in Set A, were applied a performance-based weight that was used in aggregating
the crowds' responses in set B.
Confidence weighted method (CW)
Participants confidence was normalised by applying a z-score across Set B responses. The
confidence is then multiplied by the participant's response for that particular question which
was -1 for ​‘mostly false’​ and +1 for ​‘mostly true’​. A high negative z-score would indicate the
participant is relatively confident that the claim is ​‘mostly false’​. The aggregation of the
participant's confidence is then taken as mostly true or false depending on the sign of the
aggregate number.
21
Z-score weighted methods (W)
A z-score was calculated on the participants' accuracy performance on Set A , which waswp
a
then used as an exponential weight on set B By adjusting the base number on a scale.xza x
(1, 4, 5, 6, 7, 1000), we were able to explore the optimal relative weight for each participant.
For example, W1 results in and W5 results in1xza
.5xza
Results
Hypotheses and predicted results
Hypothesis one (H1) predicts that the crowds' accuracy improves when the influence
of low-performers are decreased. Hypothesis two (H2) predicts the crowds’ accuracy
improves when the expert judgements are heavily weighted. Hypothesis three (H3) predicts
there is an optimal intermediate weighting method between expert weighting and
crowdsourcing (“unweighted”). Hypothesis four (H4) predicts the crowd’s accuracy improves
when taking the participants’ confidence into consideration. In summary:
(H1) The crowd’s accuracy improves when decreasing the influence of
low-performers.
(H2) The crowd’s accuracy improves when the expert judgements are heavily
weighted
(H3) There is an optimal intermediate weighting method between expert weighting
and crowdsourcing “unweighted” method
(H4) the crowd’s accuracy improves when taking the participants’ confidence into
consideration​.
22
Normality analysis
Various tests were conducted to assess whether the participant's scores were gaussian
and therefore parametric. The main concern was the parametricity of the crowds’ accuracy.
Figure 4 represents the crowd accuracy which visually appears to be gaussian. To further
show the data sample is parametric, figure 5 shows the data is Gaussian. As previous research
has shown the Shapiro-Wilk test to be a reliable test of normality (Mendes & Pala, 2003), to
confirm the visual plots we conducted a Shapiro-Wilk test (W = 0.987, ​p​ = 0.367), this
validated that the dataset was suitable for testing using parametric statistical methods.
Figure 4.​ Distribution of crowd accuracy.
This histogram shape illustrates the visual
appearance of a Gaussian distribution.
Figure 5.​ Q-Q Plot of crowd accuracy. This
figure illustrates there is no skew in the
accuracy.
Participant characteristics
The mean performance of was 0.60 (​min​ = 0.44; ​max ​= 0.77, ​SD ​= 0.07). Given the
questions were predominantly related to British and American news it appeared necessary to
understand whether participants from other nationalities were not disadvantaged. The data
suggests there were no significant difference between British participants (​M​ = 0.6, ​SD ​=
23
0;07) and non-British participants (​M​ = 06; ​SD ​= 0.07), a two-sample t-test displayed no clear
evidence that this difference was meaningful,​ t​(19) =​ -0.04, ​p ​= .48, ​d = ​.01.
A majority of participants had a high level of education (88% possessed A- Level or
equivalent or above). 5.3% of participants were students and 81% were employed (including
full-time, part-time and self-employed). The largest percentage of participants had British
nationality (86%), but the overall sample was diverse with a total of 14 different nationalities
represented.
Defining an expert
After carrying out a pairwise t-test on different percentiles we found there was not a
significant difference between defining experts as those who scored in the top 15th and 25th
percentile, ​F​(1,999) = 3.25, ​p​ = .0.071, = 0.03. The greatest mean difference wasρη 2
observed between unweighted and experts being defined at the 25th percentile threshold.
Repeated-measures ANOVA analysis
To confirm if there was a statistically significant effect of applying weighted methods
on accuracy a repeated-measures ANOVA was used to compare the mean difference. The
repeated-measures was carried out within two distinct groups: Group 1 (UW, PW, ExW) and
group 2 (W1, W4, W5, W6, W7, W1000). Refer to table 2 and figure 6.
24
Figure 6​. Graph illustrating the mean accuracy (of 1000 simulations) each weighted method
with respective error bars.
Group 1 analysis
The repeated-measures ANOVA showed there was a significant main effect of
weighted methods on crowd accuracy, ​F​(3, 2997) = 700.21,​ p​ < .001, = 0.524. To furtherρη 2
explore what specifically is significant within the test post-hoc analyses were performed
25
using Tukey’s HSD indicated that the expert method (ExW) had the most significant positive
effect on accuracy compared to the unweighted (UW) method (​p​ = <.001). The results also
showed that by purely weighing on the top-performer decreased the overall accuracy
compared to the unweighted method. Examining the results of group 1 results, allows us to
observe the hypothesis that there is value in applying weighted methods to the crowd for
better outcomes.
Figure 7​. Density plot of accuracy for each weighted method (excluding z-score
weighted method - see figure 8).
Group 2 analysis
The repeated-measures ANOVA showed there was a significant main effect of
weighted methods on crowd accuracy, ​F​(5, 4995) = 895.82,​ p​ < .001, = 0.431. To furtherρη 2
explore what specifically is significant within the test a post-hoc analyses were performed
using Tukey’s HSD indicated that applying a z-score weight of 4,5 or 6 significantly
26
improves accuracy compared to an unweighted method (W1) (p = <.001). The results also
showed there was no significant difference between W4, W5, W6 and W7, and accuracy
significantly decreases when experts are disproportionately weighted (w1000). This result
supports the prediction that accuracy improves when decreasing the weight of
low-performers (H1), while also providing an insight on (H2) showing that there is an
optimal limit for how much experts should be weighted and heavily weighting on experts has
an adverse effect on the crowd’s wisdom.
Figure 8​. Density plot of accuracy for z-score each weighted methods
Expert analysis
A further examination was carried out on the weighted methods which appeared
improved crowd accuracy significantly compared to the unweighted approach in both groups
(ExW, W4, W5, W6, W7). A repeated-measures ANOVA showed there was a significant
main effect within of expert-weighted methods on crowd accuracy ​F​(4, 3996) = 15.16,​ p​ <
27
.006, = 0.15. Pos-thoc analyses using Tukey’s HSD indicated there was only a significantρη 2
difference between the W4 and W7 weighted methods, as well as between W4 and W7.
Confidence analysis
An additional test was performed to check if crowd confidence (CW) affects crowd
accuracy. The participants were required to rate their confidence between 1 to 5 along with
their answer (mostly true or mostly false). To address confidence biases, we applied a z-score
normalisation to the confidence of each participant per question in Set B, then used the
z-score confidence value as the exponent of a base number 2. This created a one-sided
confidence distribution which was then multiplied by the participant's response for that
particular question which was -1 for ​‘mostly false’​ and +1 for ​‘mostly true’​. This meant, for
example, a high negative z-score would indicate the participant is relatively confident that the
claim is ​‘mostly false’​. The aggregation of the participant's confidence was then taken as
mostly true or false depending on the sign of the aggregate number. The results showed there
was a significant decrease in crowd accuracy when comparing between unweighted (​M​=.71,
SD​=.029) and Confidence weighted (​M​=.52, ​SD ​=.032), ​F​(1, 999) = 17228, ​p​ <.001.
Power analysis
By conducting a simulation, 1000 results were created to compare the effects of
weighted methods (​N​ = 1000). This played a key part in detecting significance with the
repeated measures approach. The post-hoc power analysis revealed that on the basis of the
mean, the achieved power on group 1 and 2 (​d​ = 1.0) is considered large and above the
recommended .80 level (Cohen, 1988).
28
Self-reported experts
Interestingly, in this study, we examined the self-reported expertise by looking at the
ranking that each participant rated themselves on a 5-point Likert scale for each of the seven
domains (general news, economics and politics, science and health, pop culture, international
affairs, crime and art). Participants ranked themselves on a 5-point Likert scale ranging from
‘very poor’ to ‘excellent’. Self-reported experts were defined as those participants who
reported to have a knowledge-score above the mean (​M​ = 23.93, ​max​ = 35, ​SD​ = 4.36). The
results showed there was a significant difference in the accuracy between self-report experts
(​M​ = .585, ​SD​ = .074) and self-reported non-experts (​M​ = ​.625​, ​SD​ = ​.057​); these results
appear to suggest that a conservative crowd is likely to perform better than a confident crowd,
t​(111) = -3.23, ​p​ = <​.002​).
Relationship between performance and level of education
Participants were asked​ “​What is the highest degree or level of school you have
completed? If currently enrolled, the highest degree received.” to which they were provided 6
options ranging from ‘GCSE (or equivalent)’ to ‘Doctorate’.​ ​A Welch’s t-test showed there
was a significant difference between performance between less formally educated
participants defined as those with or studying GCSEs (or equivalent) or A-levels (or
equivalent) (​M​ = 0.62, ​SD ​= 0.044, ​n ​= 29) compared to participants with Bachelor degrees or
higher (​M​ = 0.56, ​SD ​= 0.076, ​n ​= 84); ​t​(111) = -2,18, ​p​ = 0.0313). These results suggest that
higher education has a negative effect on news judgement. This appears slightly
counterintuitive in particular because the “highly educated” group had a larger crowd. That
said, there were many low-performing participants in the “highly educated” group.
29
Relationship between performance and happiness
Previous studies had shown there to be a positive correlation between happiness and
performance (Schutte, Schuettpelz & Malouff, 2001). Participants were asked their level of
happiness before the questionnaire commenced. The Welch’s t-test showed there was a
significant difference between participants who responded ‘OK’, ‘slightly unhappy’ or ‘very
unhappy’ (​M​ = 0.62, ​SD ​= 0.062) compared to those who reported a slightly happier
emotional state (​M​ = 0.59, ​SD ​= 0.072); ​t​(111) = -2.90, ​p​ = 0.004. These results suggest that
people who are less emotional or pessimistic perform significantly better on general
knowledge tests than those which skew towards optimistic. A Welch’s t-test was required
because the Levene f-test showed ​there was a significant difference between variances
F​(1,92) = 14.11, ​p ​= <.001.
Relationship between performance and age
We were interested to see if there would be a significant difference in performance
between age groups. The group was split into two by average group age of 32 years. The
independent-samples t-test showed there was no significant difference between under 32’s ​(​M
= 0.61, SD = 0.063) and over 32’s (​M​ = 0.59, ​SD ​= 0.079); ​t​(90) = 1.24,​ p​ = 0.21.
Relationship between performance and Brexit vote
The results from the Welch’s t-test displayed no significant difference in performance
between the ‘leavers (​M​ = 0.59, ​SD ​= 0.055) and ‘remainers’ (​M​ = 0.6, ​SD ​= 0.073). A
Welch’s t-test was required because the Levene f-test showed ​there was a significant
difference between variances ​F​(1,92) = 4.91, ​p ​= 0.02.
30
Relationship between performance and gender
An independent-samples t-test was conducted to show there was no relationship
between performance and gender. There was not a significant difference in the performance
of males (​M​ = 0.60, ​SD ​= 0.064) and females (​M​ = 0.61, ​SD ​= 0.08);​ t​(111) = 0.68, ​p​ = 0.49.
Relationship between performance and religiosity
Participants were asked ​“how religious are you?” ​and provided a 5-point Likert scale
from ‘Not religious at all’ to ‘Very religious’. The independent-samples t-test showed there
was no relationship between performance and religious participants (​M​ = 0.57, ​SD ​= 0.06)
and non-religious participants (​M​ = 0.64, ​SD ​= 0.05); ​t​(94) = 6.46,​ p​ = 4.57. Although the t
value was larger than the t statistic (1.98) the large p-value shows that the result is not
significant. Participants who stated ‘neutral’ were excluded from both groups.
Performance between requent news readers vs. infrequent news readers
Participants were asked​ “on average, how often do you read/watch the news?”​ and
provided 5 choices from ‘Rarely or never’ to ‘More than once a day’. An
independent-samples t-test showed there was no significant difference in performance
between news frequentists (​M​ = 0.6, ​SD ​= 0.07, ​n ​= 108) and non-frequentists (​M​ = 0.58, ​SD
= 0.047,​ n​ = 5); ​t​(111) = 0.60,​ p​ = 0.54.
31
Figure 9. ​Box plots of group comparison. Y-axis represents accuracy, X-axis
represents groups.
Discussion
Review of hypotheses
Our study found a significant improvement in accuracy when applying weighted
methods to crowd aggregation, in particular when applying higher weights to those who are
32
deemed “experts” within the crowd (H1 & H2). We also found there to be a similar positive
effect to crowd accuracy when either low-performing “non-Experts” are unweighted (ExW)
and unweighted on an exponential scale relative to high-performers (W5, W6, W7).
Interestingly, we also found there is an optimal intermediate weighting method between
expert and unweighted (H3), however, we found there to be an opposite effect to accuracy
when aggregating the participant-normalised confidence (H4) in comparison to the
unweighted method.
Possible explanations for the current findings
This study yielded support for the hypothesis that weighted methods can make the
crowd wiser. Measuring “crowd wisdom” as overall questionnaire accuracy, we explored
various aggregation methods to define an expert and treat them relative to non-experts. Group
2 analysis proved to be fruitful. By applying a plausibly objective method as opposed to the
subjective methods in group 1 we were able to explore an optimal weight that could
significantly improve the crowds' accuracy, particularly compared to the unweighted method.
When we explored the “weight space” for the expert we observed a loss in accuracy with a
base number greater than 5 and a significant drop in accuracy with an extreme case of
This method is highly sensitive given its exponentially. Due to time constraints,1000.χ =
we were unable to find the exact optimal base number but the results indicated it is located
around the 4 value (W4). These results also showed that over-dependance on experts can
have a negative effect on crowd accuracy (for example, W1000). An explanation of this could
be that although experts may perform better overall, the gaps in knowledge could be
domain-specific where the crowd could compensate. For example, an academic may have
limited or very little knowledge of sports and business whereas a “generalist” in the crowd
33
may possess such knowledge in those areas. This can be particularly true in the case of this
fact-checking survey as many questions were spread across multiple news categories, making
consistency a challenge for domain experts.
The crowd confidence weighted method (CW) did not prove to increase the accuracy
of the crowd and was, in fact, the weakest performing weighted method. A normalisation was
required to remove participants confidence bias as much as possible. The decrease in crowd
accuracy could be due to low-performing participants being over-confident on defective
decisions and experts being conservative, a phenomenon also known as the Dunning-Kruger
effect (Kruger & Dunning, 1999; Hodges, Regehr & Martin, 2001). This effect was also
observed as we saw self-reported experts were less accurate than non-experts. The confidence
results were contradictory to the conclusions of Ayin et al. (2014). They found by
aggregating only the ‘certain’ opinions and weighting by the degrees of confidence proved to
be an effective method that resulted in higher accuracy. Interestingly, we observed higher
accuracy in participants who were in a negative mood than those who were in a slightly
positive mood, defined by their self-reported level of happiness. This result contrasts previous
studies which found that positive mood promotes higher performance (Kavanagh, 1987).
Areas for future research
The conclusions above could not go beyond informed speculation without further
experimental work to test them. Therefore, for future research in this domain, I proposed the
following areas of interest.
(i) Firstly, further testing should be carried out to reproduce these findings. Although
we found there to be a significant effect in weighting experts in the crowd through
34
repeated-measures testing, the design of this study was discretely focused on current affairs
and news.
(ii) Secondly, it would be worthwhile to explore the expert space to better understand
the possible and best ways in which an Expert can be defined and identified. This study did
not focus on exhaustively explore the possibilities of defining an Expert, for example,
defining Experts from using the meta-data (age, nationality, news frequency).
(iii) Thirdly, if such a mechanism was to be implemented and made available to the
public, further investigation should be carried out on creating a dynamic weighted method
which takes into consideration the most recent performance compared to historical
performance on an individual level and group level.
(iv) In this study, participants were anonymous and had no reputation-risk of
performing badly. As crowdsourcing systems are decentralised by design the effect of
anonymity and reputational risk in a crowdsourcing system should be further tested. This
could be acceptable and have no impact on discrete tasks, however, incorporating a type of
reputation-risk could play an important part in designing higher-performing crowdsourcing.
(v) Lastly, creating a hybrid (‘Augmented Intelligence’) system between Artificial
Intelligence and traditional human judgement to help detect misinformation on the internet
could assist improving accuracy and tackling the scale at which “fake news” is being
published.
Conclusion
Our study found that applying a performance-based “expert” weighted method to the
crowd improves the crowds' wisdom, measured by crowd accuracy. This finding contrasts
35
previous research that was not able to find a significant improvement in accuracy by applying
weighted methods. This research indicates that in order to optimise crowdsourcing, experts
within the crowd should be given higher weighting compared to non-experts. Future research
should investigate ways in which experts can be assessed and weighted dynamically.
36
References
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election.
Journal of economic perspectives, 31(2), 211-36.
Alter, A. L., & Oppenheimer, D. M. (2009). Uniting the tribes of fluency to form a
metacognitive nation. Personality and social psychology review, 13(3), 219-235.
Aydin, B. I., Yilmaz, Y. S., Li, Y., Li, Q., Gao, J., & Demirbas, M. (2014, June).
Crowdsourcing for multiple-choice question answering. In Twenty-Sixth IAAI
Conference.
Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., & Nakov, P. (2018). Predicting factuality
of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765.
Begg, I. M., Anas, A., & Farinacci, S. (1992). Dissociation of processes in belief: Source
recollection, statement familiarity, and the illusion of truth. Journal of Experimental
Psychology: General, 121(4), 446.
Bonabeau, E. (2009). Decisions 2.0: The power of collective intelligence. MIT Sloan
management review, 50(2), 45.#
Brabham, D. C. (2013). Crowdsourcing. Mit Press. (pp. 3)
Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom of crowds.
Management Science, 61(2), 267-280.
Camerer, C. F., & Johnson, E. J. (1997). The process-performance paradox in expert
judgment: How can experts know so much and predict so badly. Research on judgment
and decision making: Currents, connections, and controversies, 342.
Dittmar, J. E. (2011). Information technology and economic change: the impact of the
printing press. The Quarterly Journal of Economics, 126(3), 1133-1172.
37
Drew, I. (2018). Applying weighting methods to crowd truth judgements: Can the wisdom of
the crowd be made wiser.
Expert. (n.d.). Retrieved July 12, 2019, from
https://www.merriam-webster.com/dictionary/expert
Galton, F. (1907). Vox populi (the wisdom of crowds). Nature, 75(7), 450-451.
Glenberg, A. M., & Epstein, W. (1987). Inexpert calibration of comprehension. Memory &
Cognition, 15(1), 84-93.
Goncalves, J., Hosio, S., Rogstadius, J., Karapanos, E., & Kostakos, V. (2015). Motivating
participation and improving quality of contribution in ubiquitous crowdsourcing.
Computer Networks, 90, 34-48.
Hackman, J. R., & Oldham, G. R. (1980). Work redesign.
Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987). Direct comparison of the
efficacy of intuitive and analytical cognition in expert judgment. IEEE Transactions on
systems, man, and cybernetics, 17(5), 753-770.
Hara, K., Adams, A., Milland, K., Savage, S., Callison-Burch, C., & Bigham, J. P. (2018,
April). A data-driven analysis of workers' earnings on amazon mechanical turk. In
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p.
449). ACM.
Hillsdale, NJ: Lawrence Earlbaum Associates Journal of Information and Technology, 2(2),
135-139.
Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on
prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2),
205.
38
Hodges, B., Regehr, G., & Martin, D. (2001). Difficulties in recognizing one's own
incompetence: novice physicians who are unskilled and unaware of it. Academic
Medicine, 76(10), S87-S89.
Hong, L., & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of
high-ability problem solvers. Proceedings of the National Academy of Sciences,
101(46), 16385-16389.
Horne, B. D., Dron, W., Khedr, S., & Adali, S. (2018b, April). Assessing the news landscape:
A multi-module toolkit for evaluating the credibility of news. In Companion
Proceedings of The Web Conference 2018 (pp. 235-238). International World Wide
Web Conferences Steering Committee.
Horne, B. D., Khedr, S., & Adali, S. (2018a, June). Sampling the news producers: A large
news and feature data set for the study of the complex media landscape. In Twelfth
International AAAI Conference on Web and Social Media.
Horton, J. J., & Chilton, L. B. (2010, June). The labor economics of paid crowdsourcing. In
Proceedings of the 11th ACM conference on Electronic commerce (pp. 209-218).
ACM.
Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1-4.
Janis, I. L. (2008). Groupthink. IEEE Engineering Management Review, 36(1), 36.
Kavanagh, D. J. (1987). Mood, persistence, and success. Australian Journal of Psychology,
39(3), 307-318.
Kephart, J. O., Hogg, T., & Huberman, B. A. (1990). Collective behavior of predictive
agents. Physica D: Nonlinear Phenomena, 42(1-3), 48-65.
39
Kittur, A., & Kraut, R. E. (2008, November). Harnessing the wisdom of crowds in wikipedia:
quality through coordination. In Proceedings of the 2008 ACM conference on
Computer supported cooperative work (pp. 37-46). ACM.
Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., ... &
Horton, J. (2013, February). The future of crowd work. In Proceedings of the 2013
conference on Computer supported cooperative work (pp. 1301-1318). ACM.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in
recognizing one's own incompetence lead to inflated self-assessments. Journal of
personality and social psychology, 77(6), 1121.
Lakhani, K. R., Wolf, R. G., Feller, J., & Fitzgerald, B. (2005). Perspectives on free and open
source software. Perspectives on Free and Open Source Software, 1-22.
Machor, P. G. J. L. (2008). New directions in American reception study. Oxford University
Press on Demand.
Maslow, A. H. (1943). A Theory of Human Motivation. Psychological Review, 50(4), 370-96
Maslow, A. H. (1954). Motivation and Personality. New York: Harper and Row.
Mendes, M., & Pala, A. (2003). Type I error rate and power of three normality tests.
PakistaCohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd
ed.).
Miller, C. C., Burke, L. M., & Glick, W. H. (1998). Cognitive diversity among upper-echelon
executives: implications for strategic decision processes. Strategic management journal,
19(1), 39-58.
Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable
probability forecasts of precipitation and temperature. National weather digest, 2(2),
2-9.
40
Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social
judgment.
Noland, M., Moran, T., & Kotschwar, B. R. (2016). Is gender diversity profitable? Evidence
from a global survey. Peterson Institute for International Economics Working Paper,
(16-3).
Norman, G. R., Rosenthal, D., Brooks, L. R., Allen, S. W., & Muzzin, L. J. (1989). The
development of expertise in dermatology. Archives of Dermatology, 125(8),
1063-1068.
Pennycook, G., & Rand, D. G. (2018). Who falls for fake news? The roles of bullshit
receptivity, overclaiming, familiarity, and analytic thinking. Journal of personality.
Pennycook, G., Cannon, T. D., & Rand, D. G. (2018). Prior exposure increases perceived
accuracy of fake news. Journal of experimental psychology: general.
Polzer, J. T., Milton, L. P., & Swarm Jr, W. B. (2002). Capitalizing on diversity:
Interpersonal congruence in small work groups. Administrative Science Quarterly,
47(2), 296-324.
Schutte, N. S., Schuettpelz, E., & Malouff, J. M. (2001). Emotional intelligence and task
performance. Imagination, Cognition and Personality, 20(4), 347-354.
Shamlo, N. B., & Alavi, S. (2018, August). Fact Checker: AVOW. Retrieved July 14, 2019,
from https://www.avow.ai/
Silverman, C., & Singer-Vine, J. (2016). Most Americans who see fake news believe it, new
survey says, BuzzFeed.
Simon, H. A. (1955). A behavioral model of rational choice. The quarterly journal of
economics, 69(1), 99-118.
Surowiecki, J. (2005). The wisdom of crowds. Anchor.
41
Swift, A. (2016). Americans’ trust in mass media sinks to new low. Gallup News, 14.
Wallis, K. F. (2014). Revisiting Francis Galton's forecasting competition. Statistical Science,
420-424.
Zhao, Y., & Zhu, Q. (2014). Evaluation on crowdsourcing research: Current status and future
direction. Inf Syst Front, 16, 417-434.
42
Appendix
Appendix A – The list of the 120 claims participants were required to respond to during the
quiz.
Claim
In the UK, youth unemployment is down 44% since 2010.
Ghanaians are allowed to divorce only if they attend court dressed the same clothing they
wore when they got married.
All cameras on M1 and M25 go live at midnight tonight, set at 72mph. Auto ticket
generating system with 6 point penalty. Watch your speed and tell everyone else tonight,
any speed over 90mph is instant ban & possible court & custodial sentence order!! Drive
safely.”
Lawmakers in California have proposed a new law called the "Check Your Oxygen
Privilege Act."
MDMA shown to increase empathy over other substances
Sniffing rosemary increases human memory by up to 75 percent.
UK won more gold medals in Rio Olympics 2016 than China
UK taxpayers are paying less income tax than 2010.
There are 4000 abortions a week in Britain.
NASA rejected Hillary Clinton's childhood dream of becoming an astronaut.
Over the past year the number of illegal immigrants crossing the Mexico-US has
significantly decreased.
‘A record number of people kill themselves in prisons in England and Wales in 2016,
figures show.’
The UK will be paying the Brexit divorce bill until 2064.
The number of poor [working class] students dropping out of university at the highest
level in five years.
The western cape has the lowest unemployment rate of all provinces.
Trump used crib notes during listening session with parkland survivors
Police officers have been cut by 21,000 since 2010
More than 80% of student graduates won't repay their loan in full.
China has a Panda shaped solar farm
The US is the largest donor of humanitarian aid in Syria
Some antidepressants are more effective than others
Wealthy professionals are most likely to drink regularly.
BBC Newsnight edited photos of Jeremy Corbyn to make him look close to Russia.
In a 2014 survey 24% of people thought that the USA was the country that posed the
43
greatest danger to world peace.
A woman who entered an Uber in Tampa, Florida, on 18 February 2019 was the victim of
an attempted kidnapping by a "sex traffic worker."
The Institute for Public Policy Research found household bills will rise by between £245
and £1,961 a year after Brexit.
Luxembourg is the capital of Luxembourg
The Walton family makes more money in one minute than Walmart workers do in an
entire year. This is what we mean when we talk about a rigged economy.
In 2018, Apple was the largest publicly traded company in the world
The single market is dependent on membership of the EU. What we’ve said all along is
that we want a tariff free trade access to the European market and a partnership with
Europe in the future.
"we just had 2 years (2016-2018) of record-breaking Global Cooling"
"Trump's action could push the Earth over the brink, to become like Venus, with a
temperature of two hundred and fifty degrees, and raining sulphuric acid."
Drug kingpin Joaquín "El Chapo" Guzmán testified that he gave millions of dollars to
Nancy Pelosi, Adam Schiff, and Hillary Clinton.
Musician Jay-Z said that "satan is our true lord" and that "only idiots believe in Jesus"
during a backstage tirade in November 2017.
President Trump's oft-repeated slogan "America First" was also a credo of the white
supremacist Ku Klux Klan organization.
The new technology, developed by private company ASI Data science, can detect Daesh
propaganda “with 99.99%% accuracy”.
The size of the world's ice caps (type of glacier) are at record high levels.
After leaving the EU, the UK will take back control of roughly £350 million per week.
Facebook shut down an AI experiment after chatbots developed their own language.
Illegal crossings at the US-Mexico border have reduced by 40%.
An individual's psychological attributes can be determined by observing and feeling the
skull.
Staying in the single market & customs union would not cover services.
The position and relative movement of continents is at least partially due to the volume of
Earth increasing.
98% of US mass shootings occur in gun-free zones
Former Federal Bureau of Investigation director Robert Mueller’s indictments (formal
accusation that a person has committed a crime) prove that there was no collusion
between Trump campaign and Russia.
Vaccines may cause autism.
Eating bacon is better for you than tilapia (common name for nearly a hundred species of
cichlid fish).
44
Claim: If you live in an area where the council is run by the Labour party, you pay £100
more than under the Conservatives.
A vintage Heineken advertisement showed a toddler drinking a beer and boasted about
having the youngest customers in the business.
New study shows that Marijuana leads to a 'complete remission of Crohn's Disease.
The McDonald's fast food chain announced they will be phasing out the Big Mac by July
1st.
Coffee causes cancer
There are 480,000 young people who are hidden from the unemployment figures.
Snapchat CEO has said that the app is for rich people and so did not want to expand
Parents should ask a baby's permission before changing their nappy/diaper.
Medical marijuana has no health risks says WHO
Trump doubled his African-american poll numbers (from 11% to 22%) in a week.
It would take $135 billion to eradicate global poverty.
More than 700 attacks have been launched from the Afrin area under PYD/YPG
Google search spike suggests many people don't know why they voted for Brexit
David Davis [Secretary of State for Exiting the European Union] has never said the
government had impact assessments of the effect Brexit on different parts of the
economy.
“There is more money going into our schools in this country than ever before. We know
that real-terms funding per pupil is increasing across the system, and with the national
funding formula, each school will see at least a small cash increase.”
“It is an absolute scandal that the Conservatives are pressing ahead with a plan that could
leave over a million children without a hot meal in schools.”
Japan’s prime minister, Shinzo Abe, championing of women’s advancement is a factor in
“the beginning of a new era in female success”.
Donald Trump has been much tougher on Russia than Barack Obama.
“The top 1% of earners in this country are paying 28% of the tax burden. That is the
highest percentage ever, under any Government.”
Last year, we increased the number of tourists [in South Africa] by 12.8%”
The National Institute of Health (NIH) have plans for lifting ban on human-animal
chimeras.
There are only 18 minutes of total action in the average baseball game.
Diesel cars are more polluting than petrol cars
Nigeria contributes 23% of the global malaria cases
47% of the population don’t earn enough money to bring in a wife or husband from
outside the EU.
Fewer than half Britons think Princess Diana's death was accidental.
Trump signed a bill blocking Obama-era background checks on guns for people with
mental illness
45
Spending on mental health went up by £575 million last year
60% of UK trade is through EU trade agreements.
700,000 public workers use up half of kenya's taxes
The type of cladding used on Grenfell Tower is banned in Britain
The total number of london murders, even excluding victims of terrorism, has risen by
38% sine 2014.
In May 2018, president Donald Trump established a 'religious office' to give religious
groups a 'voice in government'.
The CIA paid two psychologists $81 million "to develop and run their torture program."
3.7 million people living in the UK are citizens of another EU country. That’s about 6%
of the UK population, according to the latest figures covering the year to June 2018.
12.7% of NHS staff say that their nationality is not British
Indians are the second most common nationalities of NHS staff
76% of British people support Shamima Begum being stripped of her citizenship
Romanians are the most common EU national to live in the UK
The number of EU nurses coming to the UK has fallen by 90% since the Brexit vote.
In the Brexit campaign, parties on both sides of the EU referendum made false claims.
50% of Irish exports go to Northern Ireland.
Only 5% of Northern Ireland’s GDP goes to Ireland.
Every five minutes, 70 children will be born in the UK, 20 to mothers not born here.
The number of EEA nurses and midwives who joined the NHS for the first time fell by
91% from 2015/16 to 2017/18.
China is Britain's top trading partner
Neil Armstrong was the first man on the moon
Consumption of sugar causes type 2 diabetes
Approximately 1 in 4 people in the UK will experience a mental health problem each
year
Eating organic food doesn't come with any nutritional benefits over non-organic food
Refugees or illegal immigrants living in Britain get a total yearly benefit of £29,900.
The plane in the Malaysian Flight MH370 was hidden away and reintroduced as Flight
MH17 later the same year in order to be shot down over Ukraine for political purposes
We only used approx. 10% of our brain
Cold weather causes colds
The UK population is approximately 66 million in 2017
In January 2019, Christiano Ronaldo had the highest amount of Instagram followers in
the world
The most expensive Big Mac can be found in Switzerland at 6.62 USD
UK drink approximately 95 million cups of coffee per day
In 2018, Amazon was the largest publicly traded company in the world
Singapore has the highest average IQ level in the world
46
2 billion tons of waste was dumped in 2016
American Bison is the heaviest land mammal
The EU currently costs the UK over £350 million each week - nearly £20 billion a year
“Labour reveals over 200,000 nurses have quit the NHS since 2010.”
The Ethiopian calendar is 7.5 years behind the Gregorian calendar due to the fact that it
has 13 months.
Drinking lemon mixed with hot water for one to three months will cause cancer to
disappear.
75% of the world’s diet is produced from just 12 plant and five different animal species.
If you call 999 in an emergency but can’t speak, press 55 and they can track where you
are calling from using new technology.
There are 118 elements in the periodic table
Staying in the single market & customs union would not cover services.
The population of the UK in 2017 was 66 million
World War II ended in 1945
The Beatles are the best-selling artists of all time
47

More Related Content

What's hot

The gratification theory
The gratification theoryThe gratification theory
The gratification theoryKayyah_Robun
 
Analysis of Economics Social Data Categorization on Inter Group Contact in U....
Analysis of Economics Social Data Categorization on Inter Group Contact in U....Analysis of Economics Social Data Categorization on Inter Group Contact in U....
Analysis of Economics Social Data Categorization on Inter Group Contact in U....ijtsrd
 
Page 284 the journal of social media in s
Page 284                    the journal of social media in sPage 284                    the journal of social media in s
Page 284 the journal of social media in samit657720
 
The Role of Public Intellectuals in Cooperative Extension 
The Role of Public Intellectuals in Cooperative Extension The Role of Public Intellectuals in Cooperative Extension 
The Role of Public Intellectuals in Cooperative Extension Anne Adrian
 
Applying citizen science model to disaster management
Applying citizen science model to disaster managementApplying citizen science model to disaster management
Applying citizen science model to disaster managementW. David Stephenson
 
Social Media and the Disciplining of Visibility
Social Media and the Disciplining of VisibilitySocial Media and the Disciplining of Visibility
Social Media and the Disciplining of VisibilityJakob Svensson
 
CMS 498 Gender in Communication - Chapter 11 (Media)
CMS 498 Gender in Communication - Chapter 11 (Media)CMS 498 Gender in Communication - Chapter 11 (Media)
CMS 498 Gender in Communication - Chapter 11 (Media)Krysten Erica-Nicole
 
Final Thesis-Sabrina Slagowski-Tipton-2016
Final Thesis-Sabrina Slagowski-Tipton-2016Final Thesis-Sabrina Slagowski-Tipton-2016
Final Thesis-Sabrina Slagowski-Tipton-2016Sabrina Slagowski-Tipton
 
Cognitive Biases and Communication Strength in Social Networks.pdf
Cognitive Biases and Communication Strength in Social Networks.pdfCognitive Biases and Communication Strength in Social Networks.pdf
Cognitive Biases and Communication Strength in Social Networks.pdfssuser1867b7
 
Policy article review please answer the following questions regar
Policy article review please answer the following questions regarPolicy article review please answer the following questions regar
Policy article review please answer the following questions regaramit657720
 
Communication and Exchange in Secular and Catholic Discourse
Communication and Exchange in Secular and Catholic DiscourseCommunication and Exchange in Secular and Catholic Discourse
Communication and Exchange in Secular and Catholic DiscourseRuairidh MacLennan
 
Uses and gratifications theory newspapers
Uses and gratifications theory newspapersUses and gratifications theory newspapers
Uses and gratifications theory newspapersmsoppon
 

What's hot (14)

The gratification theory
The gratification theoryThe gratification theory
The gratification theory
 
Uses and gratifications theory
Uses and gratifications theoryUses and gratifications theory
Uses and gratifications theory
 
Analysis of Economics Social Data Categorization on Inter Group Contact in U....
Analysis of Economics Social Data Categorization on Inter Group Contact in U....Analysis of Economics Social Data Categorization on Inter Group Contact in U....
Analysis of Economics Social Data Categorization on Inter Group Contact in U....
 
Page 284 the journal of social media in s
Page 284                    the journal of social media in sPage 284                    the journal of social media in s
Page 284 the journal of social media in s
 
The Role of Public Intellectuals in Cooperative Extension 
The Role of Public Intellectuals in Cooperative Extension The Role of Public Intellectuals in Cooperative Extension 
The Role of Public Intellectuals in Cooperative Extension 
 
Applying citizen science model to disaster management
Applying citizen science model to disaster managementApplying citizen science model to disaster management
Applying citizen science model to disaster management
 
Social Media and the Disciplining of Visibility
Social Media and the Disciplining of VisibilitySocial Media and the Disciplining of Visibility
Social Media and the Disciplining of Visibility
 
M. Ash Thesis
M. Ash ThesisM. Ash Thesis
M. Ash Thesis
 
CMS 498 Gender in Communication - Chapter 11 (Media)
CMS 498 Gender in Communication - Chapter 11 (Media)CMS 498 Gender in Communication - Chapter 11 (Media)
CMS 498 Gender in Communication - Chapter 11 (Media)
 
Final Thesis-Sabrina Slagowski-Tipton-2016
Final Thesis-Sabrina Slagowski-Tipton-2016Final Thesis-Sabrina Slagowski-Tipton-2016
Final Thesis-Sabrina Slagowski-Tipton-2016
 
Cognitive Biases and Communication Strength in Social Networks.pdf
Cognitive Biases and Communication Strength in Social Networks.pdfCognitive Biases and Communication Strength in Social Networks.pdf
Cognitive Biases and Communication Strength in Social Networks.pdf
 
Policy article review please answer the following questions regar
Policy article review please answer the following questions regarPolicy article review please answer the following questions regar
Policy article review please answer the following questions regar
 
Communication and Exchange in Secular and Catholic Discourse
Communication and Exchange in Secular and Catholic DiscourseCommunication and Exchange in Secular and Catholic Discourse
Communication and Exchange in Secular and Catholic Discourse
 
Uses and gratifications theory newspapers
Uses and gratifications theory newspapersUses and gratifications theory newspapers
Uses and gratifications theory newspapers
 

Similar to Wisdom of the Crowd: Wiser Crowds Through Weighted Methods

Bullshiters - Who Are They And What Do We Know About Their Lives
Bullshiters - Who Are They And What Do We Know About Their LivesBullshiters - Who Are They And What Do We Know About Their Lives
Bullshiters - Who Are They And What Do We Know About Their LivesTrading Game Pty Ltd
 
Essay On Land Pollution
Essay On Land PollutionEssay On Land Pollution
Essay On Land PollutionErika Burgos
 
Qualitative Research Appraisal
Qualitative Research AppraisalQualitative Research Appraisal
Qualitative Research AppraisalAlexis Naranjo
 
RCDM Dissertation 2 - Laurence Horton (129040266)
RCDM Dissertation 2 - Laurence Horton (129040266)RCDM Dissertation 2 - Laurence Horton (129040266)
RCDM Dissertation 2 - Laurence Horton (129040266)Laurence Horton
 
Open innovation and artificial intelligence: Can OpenAI benefit humanity?
Open innovation and artificial intelligence: Can OpenAI benefit humanity?Open innovation and artificial intelligence: Can OpenAI benefit humanity?
Open innovation and artificial intelligence: Can OpenAI benefit humanity?Kasper Groes Ludvigsen
 
A Brief Exploration Of Modern Persuasion Theories
A Brief Exploration Of Modern Persuasion TheoriesA Brief Exploration Of Modern Persuasion Theories
A Brief Exploration Of Modern Persuasion TheoriesWendy Hager
 
Cause And Effect Of Air Pollution Essay.pdf
Cause And Effect Of Air Pollution Essay.pdfCause And Effect Of Air Pollution Essay.pdf
Cause And Effect Of Air Pollution Essay.pdfApril Lynn
 
Knowledge Gap: The Magic behind Knowledge Expansion
Knowledge Gap: The Magic behind Knowledge ExpansionKnowledge Gap: The Magic behind Knowledge Expansion
Knowledge Gap: The Magic behind Knowledge ExpansionAJHSSR Journal
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPMicah Altman
 
The Failure of Skepticism: Rethinking Information Literacy and Political Pol...
 The Failure of Skepticism: Rethinking Information Literacy and Political Pol... The Failure of Skepticism: Rethinking Information Literacy and Political Pol...
The Failure of Skepticism: Rethinking Information Literacy and Political Pol...Chris Sweet
 
JSAI paper on Collaborative Innovation Tools
JSAI paper on Collaborative Innovation ToolsJSAI paper on Collaborative Innovation Tools
JSAI paper on Collaborative Innovation ToolsJohn Thomas
 
Media, stem cells and publics
Media, stem cells and publicsMedia, stem cells and publics
Media, stem cells and publicsaurora70
 
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdf
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdfStereotypes as energy-saving devices A peek inside the cognitive toolbox.pdf
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdfAlejandroRamrezLpez
 
Resistance-Enhanced Dynamometer
Resistance-Enhanced DynamometerResistance-Enhanced Dynamometer
Resistance-Enhanced DynamometerDana Boo
 
The War for Diverse Talent working draft
The War for Diverse Talent working draftThe War for Diverse Talent working draft
The War for Diverse Talent working draftJohn Pollock
 
NUS talk-interpreting Chinese element.pptx
NUS talk-interpreting Chinese element.pptxNUS talk-interpreting Chinese element.pptx
NUS talk-interpreting Chinese element.pptxjiahepeng1972
 

Similar to Wisdom of the Crowd: Wiser Crowds Through Weighted Methods (20)

Bullshiters - Who Are They And What Do We Know About Their Lives
Bullshiters - Who Are They And What Do We Know About Their LivesBullshiters - Who Are They And What Do We Know About Their Lives
Bullshiters - Who Are They And What Do We Know About Their Lives
 
Essay On Land Pollution
Essay On Land PollutionEssay On Land Pollution
Essay On Land Pollution
 
Qualitative Research Appraisal
Qualitative Research AppraisalQualitative Research Appraisal
Qualitative Research Appraisal
 
RCDM Dissertation 2 - Laurence Horton (129040266)
RCDM Dissertation 2 - Laurence Horton (129040266)RCDM Dissertation 2 - Laurence Horton (129040266)
RCDM Dissertation 2 - Laurence Horton (129040266)
 
Open innovation and artificial intelligence: Can OpenAI benefit humanity?
Open innovation and artificial intelligence: Can OpenAI benefit humanity?Open innovation and artificial intelligence: Can OpenAI benefit humanity?
Open innovation and artificial intelligence: Can OpenAI benefit humanity?
 
A Brief Exploration Of Modern Persuasion Theories
A Brief Exploration Of Modern Persuasion TheoriesA Brief Exploration Of Modern Persuasion Theories
A Brief Exploration Of Modern Persuasion Theories
 
Cause And Effect Of Air Pollution Essay.pdf
Cause And Effect Of Air Pollution Essay.pdfCause And Effect Of Air Pollution Essay.pdf
Cause And Effect Of Air Pollution Essay.pdf
 
Knowledge Gap: The Magic behind Knowledge Expansion
Knowledge Gap: The Magic behind Knowledge ExpansionKnowledge Gap: The Magic behind Knowledge Expansion
Knowledge Gap: The Magic behind Knowledge Expansion
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
MP0605 (R)
MP0605 (R)MP0605 (R)
MP0605 (R)
 
The Failure of Skepticism: Rethinking Information Literacy and Political Pol...
 The Failure of Skepticism: Rethinking Information Literacy and Political Pol... The Failure of Skepticism: Rethinking Information Literacy and Political Pol...
The Failure of Skepticism: Rethinking Information Literacy and Political Pol...
 
JSAI paper on Collaborative Innovation Tools
JSAI paper on Collaborative Innovation ToolsJSAI paper on Collaborative Innovation Tools
JSAI paper on Collaborative Innovation Tools
 
Media, stem cells and publics
Media, stem cells and publicsMedia, stem cells and publics
Media, stem cells and publics
 
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdf
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdfStereotypes as energy-saving devices A peek inside the cognitive toolbox.pdf
Stereotypes as energy-saving devices A peek inside the cognitive toolbox.pdf
 
Intelligence Analysis
Intelligence AnalysisIntelligence Analysis
Intelligence Analysis
 
Resistance-Enhanced Dynamometer
Resistance-Enhanced DynamometerResistance-Enhanced Dynamometer
Resistance-Enhanced Dynamometer
 
The War for Diverse Talent working draft
The War for Diverse Talent working draftThe War for Diverse Talent working draft
The War for Diverse Talent working draft
 
Reaction Paper 2
Reaction Paper 2Reaction Paper 2
Reaction Paper 2
 
Persuasive Essay On The Art Of Art
Persuasive Essay On The Art Of ArtPersuasive Essay On The Art Of Art
Persuasive Essay On The Art Of Art
 
NUS talk-interpreting Chinese element.pptx
NUS talk-interpreting Chinese element.pptxNUS talk-interpreting Chinese element.pptx
NUS talk-interpreting Chinese element.pptx
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

Wisdom of the Crowd: Wiser Crowds Through Weighted Methods

  • 1. UNIVERSITY COLLEGE LONDON, DIVISION OF PSYCHOLOGY AND LANGUAGE SCIENCE MSc Cognitive and Decision Sciences 2018-19 Seeking wisdom: A wiser crowd judgements through weighted methods Ethics approval code: CPB.2013/015 Date of submission: 10th August 2019
  • 2. Abstract 2 Introduction 3 Method 13 Design 13 Participants 15 Procedure 16 Results 22 Hypotheses and predicted results 22 Discussion 32 Review of hypotheses 32 Possible explanations for the current findings 33 Conclusion 35 References 37 Appendix 43 1
  • 3. Abstract Background:​ How can we make the wisdom of the crowd wiser? This question comes at a time where society is faced with increasing amounts of misinformation dispersed at scale and with speed through the internet. With the spread of “fake news” the facts no longer seem like facts until verified by an unbiased entity. To battle this increasing amount of information, society is seemingly turning to platforms where ‘the crowd’ or communities that can help filter some of this misinformation through social proof techniques. However, there are many weaknesses which can be found in these communities such as low-level contributors or spammers having the ability to skew the crowd's judgement. In this study, we aimed to examine whether weighted methods can be applied to the crowd to increase the crowds' accuracy on judging truthfulness, making them arguably “wiser”. Method: ​Through a popular online crowdsourcing platform (Amazon MTurk), participants completed a questionnaire of 120 questions related to current affairs and world news. A repeated measures design was used to detect any difference between the weighted methods. Results: ​We found that by applying performance-based weighted methods the crowds' accuracy improved significantly compared to the purely unweighted “democratic” method. Interestingly, we found a significant decrease in accuracy when aggregating confidence. Conclusion: ​These findings suggest that by weighting performance-based experts higher, the crowds' accuracy significantly improves compared to an exclusively democratic ("unweighted”) aggregation method. 2
  • 4. Introduction Background The internet has revolutionised the way human society interacts more than any other invention since the printing press, which inaugurated the dissemination of printed information to the masses in Europe between the years 1450 and 1500 (Dittmar, 2011). In this digital age, the internet is now a stage for the “vox populi” to be heard. Digital platforms that allow communities to be built and individual contributions to be made are making a large impact on society. Conversely, the ease of access the internet has also allowed low-quality contributors and spammers to more easily participate in crowdsourcing events. Howe (2006) originally coined the term “crowdsourcing” in 2006, defined as outsourcing a task and function that would traditionally have been performed by a single agent to an undefined network of labourers. Crowdsourcing is now a popular way to distribute tasks in exchange for a monetary reward or recognition. The aim of this study is to determine if the veracity of judgements at the crowd level, on a set of claims found in the news and general knowledge factual statements, can be improved upon by applying various weighted methods to responses within the crowd. Wisdom of the crowd The commonly known “wisdom of the crowd” (WoC) effect is predicated on the belief that a diverse group of independent people can achieve a better result, measured by accuracy in this study, than any of the individuals in the group. James Surowiecki (2005) outlined the WoC phenomenon, describing the remarkability of collective intelligence in the right circumstances can be smarter than the smartest person in the group. Giving our inability 3
  • 5. to recall at will information in our brain and humans’ bounded rationality highlighted by Simon (1955), the advantages of collective decision making appears to remain broadly supported (​Bonabeau, 2009)​. Francis Galton (1907), an English Statistician, is known to have discovered the power of crowd wisdom from his analysis of a weight-judging competition, which was held at the annual show of the ‘West of England Fat Stock and Poultry Exhibition’. This competition allowed attendees, mostly consisting of experienced butchers and farmers as well as people with no specific expertise in cattle raising and the like, to guess the weight of an Ox, once it had been slaughtered and dressed, for a small entry fee of 6d (sixpence, equivalent to £1.96 in today’s money). The participants were incentivised with prizes for those who submitted the most accurate estimate. Galton studied the 787 estimates submitted by the crowd and discovered that the median weight of the crowd (1,197 lb) was 1 lb away from the actual weight (1198 lb). A recent re-examination of Galton’s findings demonstrated that Galton’s data indicated some errors in the original calculation and when corrected, taking the mean as opposed to the median, the crowd produced a perfect estimate (Wallis, 2014). The aforementioned conclusion from Galton supported the notion that there is a higher probability in achieving the correct judgment or decision through a democratic mechanism. This has led to further studies that endeavour to seek ways in which the crowd could be enriched by identifying experts and eliminating the contribution of poor performers (Budescu, Chen, 2014; Drew, 2018; Zhao & Zhu, 2014). The search for an expert Experts are typically perceived as someone who demonstrates good judgment and high predictive accuracy. According to Merriam Webster, an expert is​ “one with the special 4
  • 6. skill or knowledge representing mastery of a particular subject”​ (Expert, n.d.). In some cases, experts are hard to come by, subjective, highly dependant on the domain and also relative to the group with which they are compared. Nevertheless, many people believe that experts are above average at making good judgements and decisions compared to another person or group and turn to them for signals in times of uncertainty. For example, a professor in neurology could be deemed an “Expert” in neuroscience knowledge due to their depth of knowledge built over years of experience, but a novice in sports knowledge if that is not a domain they allocate time to or have an interest in. The difference in performance between experts, therefore, can vary significantly depending on the task. Previous research had argued experts can be knowledgeable but bad at predicting outcomes (Camerar & Johnson, 1997) and that the expectation of experts are unproven, causing the relationship between expertise and accuracy to be unpredictable (Hinds, 1999; Norman et al., 1989).​ ​However, contrasting research has demonstrated good reliable experts do exist and can perform well, for example, a group of National Weather Service (NWS) forecasters provided reliable probability predictions of precipitation and temperature (Murphy & Winkler, 1977). Crowd diversity Group diversity is an important attribute to consider in the context of wisdom of the crowd which to counter-argues the expert judgment theory. Research has shown crowd diversity to have a positive effect performance (Hong & Page, 2004). The study showed that a team of randomly selected intelligent agents outperforms a team comprising of the best-performing agent at problem-solving tasks. The paper focuses exclusively on functional diversity, which is constructed on the agents perspective and heuristic, inspired by research conducted in the organisational behaviour (Kephart, Hogg & Huberman, 1990; Miller, Burke 5
  • 7. &Glick, 1998) and psychology literature (Polzer, Milton & Swarm, 2002; Nisbett & Ross, 1980). Cognitive diversity in the crowd is valuable even in the context of seeking experts. An expert is commonly domain-specific and relative to the size of the crowd. The smaller the crowd, the more significant the role of an expert. Having a diversified crowd fills knowledge-gaps that a group of experts may form. In terms of the application of group diversity, recent research has also studied the effects on businesses when there is diverse leadership (Noland, Moran & Kotschwar, 2016). In this study, they found there to be a positive relationship between the proportion of female leaders and net revenue. Group diversity and independence can also negate potential “group-think” effects. A high-profile case study of the adverse effects of group-think phenomenon can be illustrated in the ​Challenger ​disaster. The ​Challenger ​was an American space shuttle that exploded 73 seconds after launch. An investigation into the cause claimed the failure of the launch to be due to group-thinking. Some key traits of group-thinking are having a concurrence-seeking tendency and having a homogeneous group with similar ideology, social background, etc. this can lead to symptoms such as over-estimation of the group, close-mindedness and pressures toward uniformity which consequently can lead to defective decision-making (Janis, 2008). This highlights the value of crowd diversity. Through crowdsourcing, group-think effects can be circumvented as the crowd should perform tasks independently. Crowd motivation When assessing crowdsourcing it is important to consider the motivation of crowd members. Understanding crowd motivation design and incentive structuring are integral to mitigate biases, attract a high level of quality and participation. The motivation for this type 6
  • 8. of participation sits well within the ‘belonging’ and ‘self-esteem’ categories of Maslow's hierarchy of needs model (Maslow, 1943, 1954). There are a few ways in which a crowd member could be motivated to participate and contribute to a platform, for example, an immediate payoff as a payment (Lakhani & Wolf, 2005), delayed payoffs in the form of signalling and stakeholder feedback (Hackman & Oldham, 1980). Further studies have analysed different motivation design approaches in the context of “ubiquitous crowdsourcing,” crowdsourcing on the go through mobile phones, to assess the various effects of motivational design on crowd participation and contribution quality (Goncalves, Hosio, Rogstadius, Karapanos & Kostakos, 2015). In this study, they found a positive effect on participation rates from using various motivation techniques such as psychological empowerment, self-efficacy and causal importance. It may be stated that increased incentives, in particular, extrinsic incentives, can cause an adverse effect on quality as discussed in previous papers (Kittur et. al, 2013). Sourcing the crowd Howe (2006) was the first to coin the phrase “crowdsourcing”, which describes a way in which micro problem-solving tasks can be completed by a distributed network of many agents. Brahbam (2013) has also described crowdsourcing as requiring the following ingredients: 1) An organisation that has a task that it needs performed, 2) A community (crowd) that is willing to perform the task voluntarily, 3) An online environment that allows the work to take place and the community to interact with the organization, and 4) Mutual benefit for the organization and the community 7
  • 9. By harnessing collective-intelligence, growing companies and industries have been able to leverage the wisdom of the crowd effects through the internet. A well-known beneficiary of crowdsourcing is Wikipedia. Wikipedia, the highly successful free online encyclopedia has effectively leveraged crowd coordination to curate content at a scale and quality which would be hard to replicate by another company. Wikipedia has succeeded by having a large number of agents that contribute to the platform improving the accuracy and completeness whilst reducing the bias of the online encyclopedia. Wikipedia has over 2.5m pages of information that has had over six million contributors (​Kittur & Kraut, 2008)​, requiring a high amount of coordination in order to harness the crowd’s wisdom. Although this is a high-touch approach, Kittur and Kraut found in this case organised crowdsourcing was highly effective. Approaches to judgement aggregation In a recent study, Budescu and Chen (2014) explored aggregation weighted methods which could improve upon the crowd's overall forecast accuracy. The study collected responses on 104 “events” with 1,233 participants. Only 420 participants were used in the data analysis as participants who responded to less than 10 events could unfoundedly skew the crowd's effectiveness, reducing the efficacy of the study. The approach adopted was to 1) identify experts, and 2) disregard non-expert contributions from the overall forecast. The study observed a 39% improvement by applying and re-computing “expert weights” periodically throughout the survey. The application of this approach, removing non-experts, could be highly controversial in a democratic society raising the risk of displacing non-experts which could reduce the level of participation driving by the disengagement of so-called “non-experts.” 8
  • 10. Fact-checking domain There is a long history in fact-checking. Notably, the role of an entity or person carrying out independent claims and facts validation appears dates back to 1913 when Ralph Pulitzer and Isaac White of TIME magazine established the ​‘Bureau of Accuracy and Fair Play.’ ​The primary goal of this particular bureau was to track repeated offences of misinformation and disinformation, seeking reprimand or public apologies from the accused (Machor, 2008). In recent times, a significant amount of disinformation and ‘fake news’ have been prevalent in the media, causing a perceivably major influence on various democratic processes such as the 2016 US election and UK Brexit referendum. Fake news can be defined as news articles or claims that are intentionally and verifiably fabricated, likely misleading to recipients (Allcott, & Gentzkow, 2017). The proliferation of misinformation and disinformation on social media platforms have launched government-led inquiries concerning the role in which social media platforms play in combating ‘fake news’. In a recent report published by Full Fact , they discussed a fact-checking initiative1 (‘Third Party Fact Checking programme’)​ they lead with Facebook to fact-check posts which had been flagged by Facebook as possibly false. These posts were then added to a queue for fact-checking. Once the content was checked for the misinformation they published the fact-check outcome and was able to attach the fact-check article to the Facebook post together with a rating: ​False, Mixture, False Headline, True, Not eligible, Satire, Opinion, Prank generator, and Not rated. ​Of the 96 facts, checks that were published as part of the Third Party Fact Checking programme, 61.4% of the claims were rated as ‘false’.. Research has highlighted some of the effects of fake news; those who are exposed to fake news are likely to believe them (Silverman & Singer-Vine, 2016; Pennycook & Rand, 1 Report on the Facebook Third Party Fact Checking programme (​https://fullfact.org/media/uploads/tpfc-q1q2-2019.pdf​). 9
  • 11. 2018). With technology today, the barriers to entry are low in disseminating information to a large number of people, true or false. This low barrier allows for a higher frequency of fake news exposure to individuals, to which psychological experimental studies have shown that trust increases as familiarity increase through cognitive fluency (Begg, Anas & Farinacci, 1992; Alter & Oppenheimer, 2009), introducing the chance of familiarity bias. Another example of this fluency effect can be found in a recent study by Pennycook et al. (2018). Pennycook results suggested that social media platforms help to incubate belief of “fake news” and prior exposure to misinformation can create an illusory truth effect, however, within a plausible boundary. The high use of fake news in 2016 has also led to a decline in trust of mass media amongst American voters, particularly, Republicans (Swift, 2016). What might be an effective solution to tackle the rise of fake news? There are three possible solutions envisaged: (1) reduce the number of fake news stimuli the masses are exposed to through structural changes on social media and other news platforms; (2) empower the crowd to effectively evaluate news and claims, demoting those which are deemed false by the crowd; and (3) use machine learning and AI techniques to parse through world news and flag misinformation. For the purpose of this paper, I will briefly expand on points 2 and 3. Empowering the crowd.​ Independent and non-partisan fact-checking entities (e.g. FullFact, Politifact, Snopes and FactCheck) have been playing an increasing importance in most recent times, however, independence alone is not sufficient, the source of funds should be neutral in order to maintain strong efficacy with no outside influence political or otherwise. There remains an imbalance between the velocity of news being “digitally printed” and human fact-checkers ability to perform an in-depth investigation. Through online crowdsourcing methods, distributed fact-checking could be a worthwhile solution. Attempts 10
  • 12. have been made in this domain, in particular, a blockchain project called Avow (Shamlo, & Alavi, 2018). The Avow project aims to counter disinformation and “fake news” by creating a system which aggregates the crowd's anonymised opinion on various claims and news items, rewarding the contributors with cryptocurrency ERC-20 Ethereum tokens. This23 approach uses a novel technology with an incentive system in place to encourage participation for greater social impact. Although the recent cryptocurrency price volatility may require further thought into the challenge that many blockchain projects encounter. Wisdom of the machines.​ The relentless rate at which news is published compounded by the fragmented distribution channels through the internet makes fact-checking at scale a practically impossible human task. A recent MIT study (Baly et al., 2018) explores tackling the problem of exponential “fake news” by using a machine learning system to assess the source is accurate or politically bias for “fake news” detection inspired by previous research (Horne et al., 2018a, 2018b). By collecting information from multiple online sources such as Wikipedia, Twitter, the article itself, and​ Alexa Rank , the system is4 trained on a rich set of features on 1,066 news sources. The research found that news sources which have a Wikipedia page can help predict factuality but not political bias, it also found that analysing the Twitter account (not tweets) does not provide any significant indication of factuality or bias. The best-performing feature was the articles from the source website which highlights the importance of analysing the content of the news source, analysing article titles alone was not strong enough. There is clearly a case and necessity for leveraging AI techniques to tackle the scale of “fake news” and from this study, the advantage of assessing 2 ERC-20 is a technical standard for smart contracts used on the Ethereum blockchain (​https://en.wikipedia.org/wiki/ERC-20​). 3 Ethereum is an open-source, public, blockchain-based distributed computing platform (​https://en.wikipedia.org/wiki/Ethereum​). 4 Alex is a website traffic data provider that ranks websites (​https://www.alexa.com/siteinfo​). 11
  • 13. the new source could reduce the need to assess claim-by-claim if there could be a bias and factuality score easily visible on all publishing platforms. Confident crowds Can a confident crowd be trusted? A recent study carried out by Aydin and colleagues (2014) found that by considering participant confidence can significantly improve accuracy, particularly when combining performance with confidence. Previous studies have also shown that expert confidence should be taken with care as Experts have been found to overestimate their capabilities, for example, Glenberg & Epstein (1987) found that psychics and music experts over-exaggerated their ability to understand text associated with physics or music, respectively, compared to novices. Overview of the current experiment A related previous study (Drew, 2018) was unable to find any significant difference in crowd accuracy from applying performance-based weighted methods to the crowd. The inability to find an effect appeared to be primarily due to a small non-parametric sample (​n​ = 36). Drew also provided a 5-point Likert scale of truthfulness which then resulted in a proportion of the sample to be removed as it was believed participants responding with “neutral truthfulness” response did not provide a strong enough signal. For this reason, the current study will aim to address these limitations by increasing the sample size (​n​ = 113), increasing the question set (​n​ = 120) and replacing the 5-point Likert scale option with a binary ​‘mostly true’​ or ​‘mostly false’​. The effects of this are unknown, however, the literature reviewed above provides good reason to hypothesise the following: (i) The crowd’s accuracy improves when removing low-performers. 12
  • 14. (ii) The crowd’s accuracy improves when the expert judgements are mainly considered while designedly maintaining “crowd diversity”. (iii) The crowd’s accuracy improves when only confident answers are considered. The current experiment aims to test these hypotheses using questionnaire accuracy as the target variable. Given the limitations of the previous related study appeared to be mainly caused by the sample size reducing the achieved power and causing a non-parametric data (Drew, 2018), we aim to overcome these limitations by increasing the sample size to 113 (previously 37) and question set to 120 (previously 90). Method Design For the purpose of this study, a repeated measures design was applied. Participants’ assessment of truthfulness and respective confidence responses to a group of 120 grounded-truth claims will be used to examine whether weighted methods can improve the accuracy of the groups’ overall performance. The questions were presented as a survey to the participants which consisted of 120 claims related to general knowledge (e.g. society and culture) and current affairs. The claims used in the study covered a broad range of news categories with the top 5 being economics (16%), politics (12.5%), science (11.6%), health (10.83%) and immigration (6.67%). All the questions were sourced from highly reputable sources with a strong skew towards independent fact-checking organisations. Each claim was carefully qualitatively assessed to check for any potential biases. 61% of claims were related to UK and USA news. The survey was slightly imbalance in favour of more truthful 13
  • 15. statements (52.5%) than false statements (47.5%). Each page presented a short single claim, that would be typically seen as a headline or tweet, to which the participant was required to respond with whether they thought the claim was ​‘mostly true’ ​or​ ‘mostly false’ ​together with their level of confidence ranging from ​‘unconfident’​ to​ ‘very confident’ ​(see figure 1)​. Figure 1​. An example of the format of the question page for each claim. A large majority of the claims were chosen from unbiased and rigorous fact-checking experts. Broader news sources were used, in avoidance of creating a very niche knowledge quiz, using only fact-checking which is predominantly skewed towards political fact-checking to base the survey on which would consequently create a very narrow and nuanced survey for participants. Fact-checking organisations included ‘BBC Reality Check’, ‘Snopes’, ‘FullFact’, ‘PolitiFact’, ‘Fact Check’, ‘Channel 4’, ‘Washington Post’, ‘Africa Check’ and ‘World Bank’. It was decided against the idea of displaying participants with their result relative of other participants or show their performance mid-way through the questionnaire as we were not interested to see the effects of this type of stimuli on the participants' performance. 14
  • 16. Self-reported questionnaire All participants were required to self-report on a number of questions before the questionnaire. These questions covered demographic information (age, nationality, gender, religiosity, education-level and occupation). Participants were also asked ‘how happy do you feel today?’ inspired by research which showed a positive correlation between emotional intelligence and performance (Schutte, Schuettpelz & Malouff, 2001). Participants were also required to indicate on a five-point Likert scale their attitude on free markets, Brexit vote, frequency of news consumption and medium of news consumption. Another set of self-reporting questions were focused on the participants level of expertise in news domains: general news, economics and politics, science & health, pop culture, international affairs, crime and art. Participants 113 UK-based participants (62 males, 51 females) were recruited using a popular crowdsourcing platform Amazon Mechanical Turk (MTurk). The crowd was incentivised with immediate extrinsic motivation in the form of a $5.00 monetary reward upon the completion of the survey, with an additional $20 Amazon voucher offered to the top three performers, which took approximately 21 minutes to complete on average (equivalent of £14.28/h). This is notably higher than the median wage level ranging between $1.38-2.30/h previous research have shown Mturk crowd workers to earn (Horton and Chilton, 2010; Hara, 2018). The average age of the participants was 32.80 years (​SD = ​8.64, range = 18 - 67yrs). Participants were only eligible to participate if they were over the age of 18 and spoke English as their first language or had an equal level of fluency. Participants were given a brief to read that outlined the general objective of the survey, this being that we were assessing 15
  • 17. their ability to assess the truthfulness of information relating to the news and current affairs to apply methods for validating information more efficiently and accurately. All subjects were remunerated $5.00 for their participation. To incentivise high participant performance, a bonus reward was of $20.00 Amazon vouchers were also offered to the top three highest-scoring participants. To prevent bots from participating in the survey, participants were required to pass a CAPTCHA task for the survey to commence. Participants were also required to have 100% of their previous “Human Intelligence Tasks” (“HITs”) approved, these were recommended criteria to apply from online forums to deter uncommitted participants. The study has been approved by the UCL Ethics Board Committee (ethics approval code: CPB/2013/015). Procedure Participants were recruited on the widely used crowdsourcing “knowledge-worker” platform, Amazon Mechanical Turk (www.mturk.​com) ​and if they met they were eligible to participate (UK-based and a 98% approval rate) they were sent to the survey brief and questionnaire, which was hosted on the Gorilla platform. All participants were presented with news headlines, quotes and factual statements, and a binary choice corresponding to whether the statement was ‘mostly true’ or ‘mostly false’ along with their level of confidence in their response. Each claim was presented on a new page. No ‘back’ option was given to the participants to revise their response. From the learnings of the previous study (Drew, 2018), it was necessary to limit the agreeableness to two binary options as opposed to a 5-point Likert scale as applied in the previous research which resulted in a loss of sample data due to insufficient direction on neutral responses. The confidence was set upon a 5-point Likert 16
  • 18. scale, ranging from ‘highly confidence’ to ‘unconfident’ in 5 increments. The two data points per claim provided the ability to assess another dimension to the Wisdom Of The Crowd effects. Participants indicated their choice by clicking on it. Splitting the question set In order to establish the most efficient amount of questions to use for the two sets, 120 simulations were performed on the complete set of questions. The objective of running this simulation was to find the range at which an increase in the number of questions added a diminishing level of improvement in the crowd's accuracy. A random sample of questions was picked each simulation run, which assessed the crowd accuracy. An additional question was added each simulation and the crowd accuracy measured per simulation. A stabilisation of crowd accuracy can be visually observ​ed in figure 2 betwee​n 40 to 60 questions, the average accuracy within this range was 0.609. As efficiency in expert classification is a key consideration for the purpose of this study i.e., the ability to identify experts within the with the least amount of questions seemed highly beneficial to the application of the wisdom of the crowd effect, 40/120 questions was chosen as the number of questions to split the two sets by. This was also supported by a second test which took the mean difference in accuracy between Set A (x) and set B (y). Three selected splits of questions were used to test the mean difference between the two sets of questions (20/100, 40/80, 60/60). Results showed there was a significant difference between question splits ​F​(2,336) = 9.83, ​p​ < .001; the Standard Error was lowest for the 40/80 questions split (​SE​ = 0.0532) which also supported the visual graph (see Table 1 and Figure 3). 17
  • 19. Figure 2​. Line graph displaying the level of accuracy stabilising as more questions added to the simulation group. 18
  • 20. Figure 3​. Box plot showing the range of mean difference of each question split. Comparison of weighted methods effects For the purpose of this study, five different weighted aggregation methods were selected to explore the crowd’s wisdom in an attempt to improve upon the mean participant accuracy of 61% (refer to table 1). In order to rank the participant's, the performance was based on a random set of 40 questions (‘Set A’), which was used as the “rank set.” Participant weights were calculated based on their individual performance on Set A. These weights were then used in the aggregation of participants responses on the second set of 80 questions (“Set B”). Set A and Set B questions were kept mutually exclusive. As a result of the aggregation, the crowd’s response was binarised as ​mostly true​ or ​mostly false​ for each question. The crowd’s responses were then compared to the expected answers, grounded by fact-checking sources and implausible facts, on which an overall accuracy score was calculated on the complete set of 80 questions. Given the large number of possible sample combinations that could be produced with the question dataset , this(n, ) C(120, 0) 1.145568482e32C r = 4 = 19
  • 21. process was simulated one thousand times to reduce the likelihood of high variance and increase the reproducibility. Please refer to table 1 for a summary of the mean accuracy of each weighted method which was then compared for statistical analysis. For the purpose of data analysis, participants individual responses were enumeratedxp q to 1 (‘mostly true’) and -1 (‘mostly false’). Depending on the weighted method and the participant's weight based on their relative performance on Set A , a weight would then bewp a applied to the participant's response and the aggregate response of the participants.wxp q p a. would indicate the crowds' answer. 20
  • 22. Unweighted method (UW) For this method, all participants have equal weight, the mean response of all participants for each question was taken as the crowd's response. For example, if 75/113 participants voted for the claim to be ​mostly true​, the result would be 0.32 (75-38)/113. A positive mean indicates the crowd believes the question to be ​mostly true​ and a negative mean indicates the crowd believe the question to be ​mostly false​. The same aggregation approach applies to all tested weighted methods. Performance-weighted method (PW) Participants who were wrong more than they were correct scoring .50 accuracy in set A were given a zero weight = 0 when aggregating the crowds' response on Set B.wp a Expert weighted method (ExW) Participants who were identified to be a “top performer,” defined by being in the top 25 percentile in Set A, were applied a performance-based weight that was used in aggregating the crowds' responses in set B. Confidence weighted method (CW) Participants confidence was normalised by applying a z-score across Set B responses. The confidence is then multiplied by the participant's response for that particular question which was -1 for ​‘mostly false’​ and +1 for ​‘mostly true’​. A high negative z-score would indicate the participant is relatively confident that the claim is ​‘mostly false’​. The aggregation of the participant's confidence is then taken as mostly true or false depending on the sign of the aggregate number. 21
  • 23. Z-score weighted methods (W) A z-score was calculated on the participants' accuracy performance on Set A , which waswp a then used as an exponential weight on set B By adjusting the base number on a scale.xza x (1, 4, 5, 6, 7, 1000), we were able to explore the optimal relative weight for each participant. For example, W1 results in and W5 results in1xza .5xza Results Hypotheses and predicted results Hypothesis one (H1) predicts that the crowds' accuracy improves when the influence of low-performers are decreased. Hypothesis two (H2) predicts the crowds’ accuracy improves when the expert judgements are heavily weighted. Hypothesis three (H3) predicts there is an optimal intermediate weighting method between expert weighting and crowdsourcing (“unweighted”). Hypothesis four (H4) predicts the crowd’s accuracy improves when taking the participants’ confidence into consideration. In summary: (H1) The crowd’s accuracy improves when decreasing the influence of low-performers. (H2) The crowd’s accuracy improves when the expert judgements are heavily weighted (H3) There is an optimal intermediate weighting method between expert weighting and crowdsourcing “unweighted” method (H4) the crowd’s accuracy improves when taking the participants’ confidence into consideration​. 22
  • 24. Normality analysis Various tests were conducted to assess whether the participant's scores were gaussian and therefore parametric. The main concern was the parametricity of the crowds’ accuracy. Figure 4 represents the crowd accuracy which visually appears to be gaussian. To further show the data sample is parametric, figure 5 shows the data is Gaussian. As previous research has shown the Shapiro-Wilk test to be a reliable test of normality (Mendes & Pala, 2003), to confirm the visual plots we conducted a Shapiro-Wilk test (W = 0.987, ​p​ = 0.367), this validated that the dataset was suitable for testing using parametric statistical methods. Figure 4.​ Distribution of crowd accuracy. This histogram shape illustrates the visual appearance of a Gaussian distribution. Figure 5.​ Q-Q Plot of crowd accuracy. This figure illustrates there is no skew in the accuracy. Participant characteristics The mean performance of was 0.60 (​min​ = 0.44; ​max ​= 0.77, ​SD ​= 0.07). Given the questions were predominantly related to British and American news it appeared necessary to understand whether participants from other nationalities were not disadvantaged. The data suggests there were no significant difference between British participants (​M​ = 0.6, ​SD ​= 23
  • 25. 0;07) and non-British participants (​M​ = 06; ​SD ​= 0.07), a two-sample t-test displayed no clear evidence that this difference was meaningful,​ t​(19) =​ -0.04, ​p ​= .48, ​d = ​.01. A majority of participants had a high level of education (88% possessed A- Level or equivalent or above). 5.3% of participants were students and 81% were employed (including full-time, part-time and self-employed). The largest percentage of participants had British nationality (86%), but the overall sample was diverse with a total of 14 different nationalities represented. Defining an expert After carrying out a pairwise t-test on different percentiles we found there was not a significant difference between defining experts as those who scored in the top 15th and 25th percentile, ​F​(1,999) = 3.25, ​p​ = .0.071, = 0.03. The greatest mean difference wasρη 2 observed between unweighted and experts being defined at the 25th percentile threshold. Repeated-measures ANOVA analysis To confirm if there was a statistically significant effect of applying weighted methods on accuracy a repeated-measures ANOVA was used to compare the mean difference. The repeated-measures was carried out within two distinct groups: Group 1 (UW, PW, ExW) and group 2 (W1, W4, W5, W6, W7, W1000). Refer to table 2 and figure 6. 24
  • 26. Figure 6​. Graph illustrating the mean accuracy (of 1000 simulations) each weighted method with respective error bars. Group 1 analysis The repeated-measures ANOVA showed there was a significant main effect of weighted methods on crowd accuracy, ​F​(3, 2997) = 700.21,​ p​ < .001, = 0.524. To furtherρη 2 explore what specifically is significant within the test post-hoc analyses were performed 25
  • 27. using Tukey’s HSD indicated that the expert method (ExW) had the most significant positive effect on accuracy compared to the unweighted (UW) method (​p​ = <.001). The results also showed that by purely weighing on the top-performer decreased the overall accuracy compared to the unweighted method. Examining the results of group 1 results, allows us to observe the hypothesis that there is value in applying weighted methods to the crowd for better outcomes. Figure 7​. Density plot of accuracy for each weighted method (excluding z-score weighted method - see figure 8). Group 2 analysis The repeated-measures ANOVA showed there was a significant main effect of weighted methods on crowd accuracy, ​F​(5, 4995) = 895.82,​ p​ < .001, = 0.431. To furtherρη 2 explore what specifically is significant within the test a post-hoc analyses were performed using Tukey’s HSD indicated that applying a z-score weight of 4,5 or 6 significantly 26
  • 28. improves accuracy compared to an unweighted method (W1) (p = <.001). The results also showed there was no significant difference between W4, W5, W6 and W7, and accuracy significantly decreases when experts are disproportionately weighted (w1000). This result supports the prediction that accuracy improves when decreasing the weight of low-performers (H1), while also providing an insight on (H2) showing that there is an optimal limit for how much experts should be weighted and heavily weighting on experts has an adverse effect on the crowd’s wisdom. Figure 8​. Density plot of accuracy for z-score each weighted methods Expert analysis A further examination was carried out on the weighted methods which appeared improved crowd accuracy significantly compared to the unweighted approach in both groups (ExW, W4, W5, W6, W7). A repeated-measures ANOVA showed there was a significant main effect within of expert-weighted methods on crowd accuracy ​F​(4, 3996) = 15.16,​ p​ < 27
  • 29. .006, = 0.15. Pos-thoc analyses using Tukey’s HSD indicated there was only a significantρη 2 difference between the W4 and W7 weighted methods, as well as between W4 and W7. Confidence analysis An additional test was performed to check if crowd confidence (CW) affects crowd accuracy. The participants were required to rate their confidence between 1 to 5 along with their answer (mostly true or mostly false). To address confidence biases, we applied a z-score normalisation to the confidence of each participant per question in Set B, then used the z-score confidence value as the exponent of a base number 2. This created a one-sided confidence distribution which was then multiplied by the participant's response for that particular question which was -1 for ​‘mostly false’​ and +1 for ​‘mostly true’​. This meant, for example, a high negative z-score would indicate the participant is relatively confident that the claim is ​‘mostly false’​. The aggregation of the participant's confidence was then taken as mostly true or false depending on the sign of the aggregate number. The results showed there was a significant decrease in crowd accuracy when comparing between unweighted (​M​=.71, SD​=.029) and Confidence weighted (​M​=.52, ​SD ​=.032), ​F​(1, 999) = 17228, ​p​ <.001. Power analysis By conducting a simulation, 1000 results were created to compare the effects of weighted methods (​N​ = 1000). This played a key part in detecting significance with the repeated measures approach. The post-hoc power analysis revealed that on the basis of the mean, the achieved power on group 1 and 2 (​d​ = 1.0) is considered large and above the recommended .80 level (Cohen, 1988). 28
  • 30. Self-reported experts Interestingly, in this study, we examined the self-reported expertise by looking at the ranking that each participant rated themselves on a 5-point Likert scale for each of the seven domains (general news, economics and politics, science and health, pop culture, international affairs, crime and art). Participants ranked themselves on a 5-point Likert scale ranging from ‘very poor’ to ‘excellent’. Self-reported experts were defined as those participants who reported to have a knowledge-score above the mean (​M​ = 23.93, ​max​ = 35, ​SD​ = 4.36). The results showed there was a significant difference in the accuracy between self-report experts (​M​ = .585, ​SD​ = .074) and self-reported non-experts (​M​ = ​.625​, ​SD​ = ​.057​); these results appear to suggest that a conservative crowd is likely to perform better than a confident crowd, t​(111) = -3.23, ​p​ = <​.002​). Relationship between performance and level of education Participants were asked​ “​What is the highest degree or level of school you have completed? If currently enrolled, the highest degree received.” to which they were provided 6 options ranging from ‘GCSE (or equivalent)’ to ‘Doctorate’.​ ​A Welch’s t-test showed there was a significant difference between performance between less formally educated participants defined as those with or studying GCSEs (or equivalent) or A-levels (or equivalent) (​M​ = 0.62, ​SD ​= 0.044, ​n ​= 29) compared to participants with Bachelor degrees or higher (​M​ = 0.56, ​SD ​= 0.076, ​n ​= 84); ​t​(111) = -2,18, ​p​ = 0.0313). These results suggest that higher education has a negative effect on news judgement. This appears slightly counterintuitive in particular because the “highly educated” group had a larger crowd. That said, there were many low-performing participants in the “highly educated” group. 29
  • 31. Relationship between performance and happiness Previous studies had shown there to be a positive correlation between happiness and performance (Schutte, Schuettpelz & Malouff, 2001). Participants were asked their level of happiness before the questionnaire commenced. The Welch’s t-test showed there was a significant difference between participants who responded ‘OK’, ‘slightly unhappy’ or ‘very unhappy’ (​M​ = 0.62, ​SD ​= 0.062) compared to those who reported a slightly happier emotional state (​M​ = 0.59, ​SD ​= 0.072); ​t​(111) = -2.90, ​p​ = 0.004. These results suggest that people who are less emotional or pessimistic perform significantly better on general knowledge tests than those which skew towards optimistic. A Welch’s t-test was required because the Levene f-test showed ​there was a significant difference between variances F​(1,92) = 14.11, ​p ​= <.001. Relationship between performance and age We were interested to see if there would be a significant difference in performance between age groups. The group was split into two by average group age of 32 years. The independent-samples t-test showed there was no significant difference between under 32’s ​(​M = 0.61, SD = 0.063) and over 32’s (​M​ = 0.59, ​SD ​= 0.079); ​t​(90) = 1.24,​ p​ = 0.21. Relationship between performance and Brexit vote The results from the Welch’s t-test displayed no significant difference in performance between the ‘leavers (​M​ = 0.59, ​SD ​= 0.055) and ‘remainers’ (​M​ = 0.6, ​SD ​= 0.073). A Welch’s t-test was required because the Levene f-test showed ​there was a significant difference between variances ​F​(1,92) = 4.91, ​p ​= 0.02. 30
  • 32. Relationship between performance and gender An independent-samples t-test was conducted to show there was no relationship between performance and gender. There was not a significant difference in the performance of males (​M​ = 0.60, ​SD ​= 0.064) and females (​M​ = 0.61, ​SD ​= 0.08);​ t​(111) = 0.68, ​p​ = 0.49. Relationship between performance and religiosity Participants were asked ​“how religious are you?” ​and provided a 5-point Likert scale from ‘Not religious at all’ to ‘Very religious’. The independent-samples t-test showed there was no relationship between performance and religious participants (​M​ = 0.57, ​SD ​= 0.06) and non-religious participants (​M​ = 0.64, ​SD ​= 0.05); ​t​(94) = 6.46,​ p​ = 4.57. Although the t value was larger than the t statistic (1.98) the large p-value shows that the result is not significant. Participants who stated ‘neutral’ were excluded from both groups. Performance between requent news readers vs. infrequent news readers Participants were asked​ “on average, how often do you read/watch the news?”​ and provided 5 choices from ‘Rarely or never’ to ‘More than once a day’. An independent-samples t-test showed there was no significant difference in performance between news frequentists (​M​ = 0.6, ​SD ​= 0.07, ​n ​= 108) and non-frequentists (​M​ = 0.58, ​SD = 0.047,​ n​ = 5); ​t​(111) = 0.60,​ p​ = 0.54. 31
  • 33. Figure 9. ​Box plots of group comparison. Y-axis represents accuracy, X-axis represents groups. Discussion Review of hypotheses Our study found a significant improvement in accuracy when applying weighted methods to crowd aggregation, in particular when applying higher weights to those who are 32
  • 34. deemed “experts” within the crowd (H1 & H2). We also found there to be a similar positive effect to crowd accuracy when either low-performing “non-Experts” are unweighted (ExW) and unweighted on an exponential scale relative to high-performers (W5, W6, W7). Interestingly, we also found there is an optimal intermediate weighting method between expert and unweighted (H3), however, we found there to be an opposite effect to accuracy when aggregating the participant-normalised confidence (H4) in comparison to the unweighted method. Possible explanations for the current findings This study yielded support for the hypothesis that weighted methods can make the crowd wiser. Measuring “crowd wisdom” as overall questionnaire accuracy, we explored various aggregation methods to define an expert and treat them relative to non-experts. Group 2 analysis proved to be fruitful. By applying a plausibly objective method as opposed to the subjective methods in group 1 we were able to explore an optimal weight that could significantly improve the crowds' accuracy, particularly compared to the unweighted method. When we explored the “weight space” for the expert we observed a loss in accuracy with a base number greater than 5 and a significant drop in accuracy with an extreme case of This method is highly sensitive given its exponentially. Due to time constraints,1000.χ = we were unable to find the exact optimal base number but the results indicated it is located around the 4 value (W4). These results also showed that over-dependance on experts can have a negative effect on crowd accuracy (for example, W1000). An explanation of this could be that although experts may perform better overall, the gaps in knowledge could be domain-specific where the crowd could compensate. For example, an academic may have limited or very little knowledge of sports and business whereas a “generalist” in the crowd 33
  • 35. may possess such knowledge in those areas. This can be particularly true in the case of this fact-checking survey as many questions were spread across multiple news categories, making consistency a challenge for domain experts. The crowd confidence weighted method (CW) did not prove to increase the accuracy of the crowd and was, in fact, the weakest performing weighted method. A normalisation was required to remove participants confidence bias as much as possible. The decrease in crowd accuracy could be due to low-performing participants being over-confident on defective decisions and experts being conservative, a phenomenon also known as the Dunning-Kruger effect (Kruger & Dunning, 1999; Hodges, Regehr & Martin, 2001). This effect was also observed as we saw self-reported experts were less accurate than non-experts. The confidence results were contradictory to the conclusions of Ayin et al. (2014). They found by aggregating only the ‘certain’ opinions and weighting by the degrees of confidence proved to be an effective method that resulted in higher accuracy. Interestingly, we observed higher accuracy in participants who were in a negative mood than those who were in a slightly positive mood, defined by their self-reported level of happiness. This result contrasts previous studies which found that positive mood promotes higher performance (Kavanagh, 1987). Areas for future research The conclusions above could not go beyond informed speculation without further experimental work to test them. Therefore, for future research in this domain, I proposed the following areas of interest. (i) Firstly, further testing should be carried out to reproduce these findings. Although we found there to be a significant effect in weighting experts in the crowd through 34
  • 36. repeated-measures testing, the design of this study was discretely focused on current affairs and news. (ii) Secondly, it would be worthwhile to explore the expert space to better understand the possible and best ways in which an Expert can be defined and identified. This study did not focus on exhaustively explore the possibilities of defining an Expert, for example, defining Experts from using the meta-data (age, nationality, news frequency). (iii) Thirdly, if such a mechanism was to be implemented and made available to the public, further investigation should be carried out on creating a dynamic weighted method which takes into consideration the most recent performance compared to historical performance on an individual level and group level. (iv) In this study, participants were anonymous and had no reputation-risk of performing badly. As crowdsourcing systems are decentralised by design the effect of anonymity and reputational risk in a crowdsourcing system should be further tested. This could be acceptable and have no impact on discrete tasks, however, incorporating a type of reputation-risk could play an important part in designing higher-performing crowdsourcing. (v) Lastly, creating a hybrid (‘Augmented Intelligence’) system between Artificial Intelligence and traditional human judgement to help detect misinformation on the internet could assist improving accuracy and tackling the scale at which “fake news” is being published. Conclusion Our study found that applying a performance-based “expert” weighted method to the crowd improves the crowds' wisdom, measured by crowd accuracy. This finding contrasts 35
  • 37. previous research that was not able to find a significant improvement in accuracy by applying weighted methods. This research indicates that in order to optimise crowdsourcing, experts within the crowd should be given higher weighting compared to non-experts. Future research should investigate ways in which experts can be assessed and weighted dynamically. 36
  • 38. References Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36. Alter, A. L., & Oppenheimer, D. M. (2009). Uniting the tribes of fluency to form a metacognitive nation. Personality and social psychology review, 13(3), 219-235. Aydin, B. I., Yilmaz, Y. S., Li, Y., Li, Q., Gao, J., & Demirbas, M. (2014, June). Crowdsourcing for multiple-choice question answering. In Twenty-Sixth IAAI Conference. Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., & Nakov, P. (2018). Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765. Begg, I. M., Anas, A., & Farinacci, S. (1992). Dissociation of processes in belief: Source recollection, statement familiarity, and the illusion of truth. Journal of Experimental Psychology: General, 121(4), 446. Bonabeau, E. (2009). Decisions 2.0: The power of collective intelligence. MIT Sloan management review, 50(2), 45.# Brabham, D. C. (2013). Crowdsourcing. Mit Press. (pp. 3) Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom of crowds. Management Science, 61(2), 267-280. Camerer, C. F., & Johnson, E. J. (1997). The process-performance paradox in expert judgment: How can experts know so much and predict so badly. Research on judgment and decision making: Currents, connections, and controversies, 342. Dittmar, J. E. (2011). Information technology and economic change: the impact of the printing press. The Quarterly Journal of Economics, 126(3), 1133-1172. 37
  • 39. Drew, I. (2018). Applying weighting methods to crowd truth judgements: Can the wisdom of the crowd be made wiser. Expert. (n.d.). Retrieved July 12, 2019, from https://www.merriam-webster.com/dictionary/expert Galton, F. (1907). Vox populi (the wisdom of crowds). Nature, 75(7), 450-451. Glenberg, A. M., & Epstein, W. (1987). Inexpert calibration of comprehension. Memory & Cognition, 15(1), 84-93. Goncalves, J., Hosio, S., Rogstadius, J., Karapanos, E., & Kostakos, V. (2015). Motivating participation and improving quality of contribution in ubiquitous crowdsourcing. Computer Networks, 90, 34-48. Hackman, J. R., & Oldham, G. R. (1980). Work redesign. Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987). Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. IEEE Transactions on systems, man, and cybernetics, 17(5), 753-770. Hara, K., Adams, A., Milland, K., Savage, S., Callison-Burch, C., & Bigham, J. P. (2018, April). A data-driven analysis of workers' earnings on amazon mechanical turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 449). ACM. Hillsdale, NJ: Lawrence Earlbaum Associates Journal of Information and Technology, 2(2), 135-139. Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2), 205. 38
  • 40. Hodges, B., Regehr, G., & Martin, D. (2001). Difficulties in recognizing one's own incompetence: novice physicians who are unskilled and unaware of it. Academic Medicine, 76(10), S87-S89. Hong, L., & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences, 101(46), 16385-16389. Horne, B. D., Dron, W., Khedr, S., & Adali, S. (2018b, April). Assessing the news landscape: A multi-module toolkit for evaluating the credibility of news. In Companion Proceedings of The Web Conference 2018 (pp. 235-238). International World Wide Web Conferences Steering Committee. Horne, B. D., Khedr, S., & Adali, S. (2018a, June). Sampling the news producers: A large news and feature data set for the study of the complex media landscape. In Twelfth International AAAI Conference on Web and Social Media. Horton, J. J., & Chilton, L. B. (2010, June). The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM conference on Electronic commerce (pp. 209-218). ACM. Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1-4. Janis, I. L. (2008). Groupthink. IEEE Engineering Management Review, 36(1), 36. Kavanagh, D. J. (1987). Mood, persistence, and success. Australian Journal of Psychology, 39(3), 307-318. Kephart, J. O., Hogg, T., & Huberman, B. A. (1990). Collective behavior of predictive agents. Physica D: Nonlinear Phenomena, 42(1-3), 48-65. 39
  • 41. Kittur, A., & Kraut, R. E. (2008, November). Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (pp. 37-46). ACM. Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., ... & Horton, J. (2013, February). The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 1301-1318). ACM. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of personality and social psychology, 77(6), 1121. Lakhani, K. R., Wolf, R. G., Feller, J., & Fitzgerald, B. (2005). Perspectives on free and open source software. Perspectives on Free and Open Source Software, 1-22. Machor, P. G. J. L. (2008). New directions in American reception study. Oxford University Press on Demand. Maslow, A. H. (1943). A Theory of Human Motivation. Psychological Review, 50(4), 370-96 Maslow, A. H. (1954). Motivation and Personality. New York: Harper and Row. Mendes, M., & Pala, A. (2003). Type I error rate and power of three normality tests. PakistaCohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Miller, C. C., Burke, L. M., & Glick, W. H. (1998). Cognitive diversity among upper-echelon executives: implications for strategic decision processes. Strategic management journal, 19(1), 39-58. Murphy, A. H., & Winkler, R. L. (1977). Can weather forecasters formulate reliable probability forecasts of precipitation and temperature. National weather digest, 2(2), 2-9. 40
  • 42. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Noland, M., Moran, T., & Kotschwar, B. R. (2016). Is gender diversity profitable? Evidence from a global survey. Peterson Institute for International Economics Working Paper, (16-3). Norman, G. R., Rosenthal, D., Brooks, L. R., Allen, S. W., & Muzzin, L. J. (1989). The development of expertise in dermatology. Archives of Dermatology, 125(8), 1063-1068. Pennycook, G., & Rand, D. G. (2018). Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking. Journal of personality. Pennycook, G., Cannon, T. D., & Rand, D. G. (2018). Prior exposure increases perceived accuracy of fake news. Journal of experimental psychology: general. Polzer, J. T., Milton, L. P., & Swarm Jr, W. B. (2002). Capitalizing on diversity: Interpersonal congruence in small work groups. Administrative Science Quarterly, 47(2), 296-324. Schutte, N. S., Schuettpelz, E., & Malouff, J. M. (2001). Emotional intelligence and task performance. Imagination, Cognition and Personality, 20(4), 347-354. Shamlo, N. B., & Alavi, S. (2018, August). Fact Checker: AVOW. Retrieved July 14, 2019, from https://www.avow.ai/ Silverman, C., & Singer-Vine, J. (2016). Most Americans who see fake news believe it, new survey says, BuzzFeed. Simon, H. A. (1955). A behavioral model of rational choice. The quarterly journal of economics, 69(1), 99-118. Surowiecki, J. (2005). The wisdom of crowds. Anchor. 41
  • 43. Swift, A. (2016). Americans’ trust in mass media sinks to new low. Gallup News, 14. Wallis, K. F. (2014). Revisiting Francis Galton's forecasting competition. Statistical Science, 420-424. Zhao, Y., & Zhu, Q. (2014). Evaluation on crowdsourcing research: Current status and future direction. Inf Syst Front, 16, 417-434. 42
  • 44. Appendix Appendix A – The list of the 120 claims participants were required to respond to during the quiz. Claim In the UK, youth unemployment is down 44% since 2010. Ghanaians are allowed to divorce only if they attend court dressed the same clothing they wore when they got married. All cameras on M1 and M25 go live at midnight tonight, set at 72mph. Auto ticket generating system with 6 point penalty. Watch your speed and tell everyone else tonight, any speed over 90mph is instant ban & possible court & custodial sentence order!! Drive safely.” Lawmakers in California have proposed a new law called the "Check Your Oxygen Privilege Act." MDMA shown to increase empathy over other substances Sniffing rosemary increases human memory by up to 75 percent. UK won more gold medals in Rio Olympics 2016 than China UK taxpayers are paying less income tax than 2010. There are 4000 abortions a week in Britain. NASA rejected Hillary Clinton's childhood dream of becoming an astronaut. Over the past year the number of illegal immigrants crossing the Mexico-US has significantly decreased. ‘A record number of people kill themselves in prisons in England and Wales in 2016, figures show.’ The UK will be paying the Brexit divorce bill until 2064. The number of poor [working class] students dropping out of university at the highest level in five years. The western cape has the lowest unemployment rate of all provinces. Trump used crib notes during listening session with parkland survivors Police officers have been cut by 21,000 since 2010 More than 80% of student graduates won't repay their loan in full. China has a Panda shaped solar farm The US is the largest donor of humanitarian aid in Syria Some antidepressants are more effective than others Wealthy professionals are most likely to drink regularly. BBC Newsnight edited photos of Jeremy Corbyn to make him look close to Russia. In a 2014 survey 24% of people thought that the USA was the country that posed the 43
  • 45. greatest danger to world peace. A woman who entered an Uber in Tampa, Florida, on 18 February 2019 was the victim of an attempted kidnapping by a "sex traffic worker." The Institute for Public Policy Research found household bills will rise by between £245 and £1,961 a year after Brexit. Luxembourg is the capital of Luxembourg The Walton family makes more money in one minute than Walmart workers do in an entire year. This is what we mean when we talk about a rigged economy. In 2018, Apple was the largest publicly traded company in the world The single market is dependent on membership of the EU. What we’ve said all along is that we want a tariff free trade access to the European market and a partnership with Europe in the future. "we just had 2 years (2016-2018) of record-breaking Global Cooling" "Trump's action could push the Earth over the brink, to become like Venus, with a temperature of two hundred and fifty degrees, and raining sulphuric acid." Drug kingpin Joaquín "El Chapo" Guzmán testified that he gave millions of dollars to Nancy Pelosi, Adam Schiff, and Hillary Clinton. Musician Jay-Z said that "satan is our true lord" and that "only idiots believe in Jesus" during a backstage tirade in November 2017. President Trump's oft-repeated slogan "America First" was also a credo of the white supremacist Ku Klux Klan organization. The new technology, developed by private company ASI Data science, can detect Daesh propaganda “with 99.99%% accuracy”. The size of the world's ice caps (type of glacier) are at record high levels. After leaving the EU, the UK will take back control of roughly £350 million per week. Facebook shut down an AI experiment after chatbots developed their own language. Illegal crossings at the US-Mexico border have reduced by 40%. An individual's psychological attributes can be determined by observing and feeling the skull. Staying in the single market & customs union would not cover services. The position and relative movement of continents is at least partially due to the volume of Earth increasing. 98% of US mass shootings occur in gun-free zones Former Federal Bureau of Investigation director Robert Mueller’s indictments (formal accusation that a person has committed a crime) prove that there was no collusion between Trump campaign and Russia. Vaccines may cause autism. Eating bacon is better for you than tilapia (common name for nearly a hundred species of cichlid fish). 44
  • 46. Claim: If you live in an area where the council is run by the Labour party, you pay £100 more than under the Conservatives. A vintage Heineken advertisement showed a toddler drinking a beer and boasted about having the youngest customers in the business. New study shows that Marijuana leads to a 'complete remission of Crohn's Disease. The McDonald's fast food chain announced they will be phasing out the Big Mac by July 1st. Coffee causes cancer There are 480,000 young people who are hidden from the unemployment figures. Snapchat CEO has said that the app is for rich people and so did not want to expand Parents should ask a baby's permission before changing their nappy/diaper. Medical marijuana has no health risks says WHO Trump doubled his African-american poll numbers (from 11% to 22%) in a week. It would take $135 billion to eradicate global poverty. More than 700 attacks have been launched from the Afrin area under PYD/YPG Google search spike suggests many people don't know why they voted for Brexit David Davis [Secretary of State for Exiting the European Union] has never said the government had impact assessments of the effect Brexit on different parts of the economy. “There is more money going into our schools in this country than ever before. We know that real-terms funding per pupil is increasing across the system, and with the national funding formula, each school will see at least a small cash increase.” “It is an absolute scandal that the Conservatives are pressing ahead with a plan that could leave over a million children without a hot meal in schools.” Japan’s prime minister, Shinzo Abe, championing of women’s advancement is a factor in “the beginning of a new era in female success”. Donald Trump has been much tougher on Russia than Barack Obama. “The top 1% of earners in this country are paying 28% of the tax burden. That is the highest percentage ever, under any Government.” Last year, we increased the number of tourists [in South Africa] by 12.8%” The National Institute of Health (NIH) have plans for lifting ban on human-animal chimeras. There are only 18 minutes of total action in the average baseball game. Diesel cars are more polluting than petrol cars Nigeria contributes 23% of the global malaria cases 47% of the population don’t earn enough money to bring in a wife or husband from outside the EU. Fewer than half Britons think Princess Diana's death was accidental. Trump signed a bill blocking Obama-era background checks on guns for people with mental illness 45
  • 47. Spending on mental health went up by £575 million last year 60% of UK trade is through EU trade agreements. 700,000 public workers use up half of kenya's taxes The type of cladding used on Grenfell Tower is banned in Britain The total number of london murders, even excluding victims of terrorism, has risen by 38% sine 2014. In May 2018, president Donald Trump established a 'religious office' to give religious groups a 'voice in government'. The CIA paid two psychologists $81 million "to develop and run their torture program." 3.7 million people living in the UK are citizens of another EU country. That’s about 6% of the UK population, according to the latest figures covering the year to June 2018. 12.7% of NHS staff say that their nationality is not British Indians are the second most common nationalities of NHS staff 76% of British people support Shamima Begum being stripped of her citizenship Romanians are the most common EU national to live in the UK The number of EU nurses coming to the UK has fallen by 90% since the Brexit vote. In the Brexit campaign, parties on both sides of the EU referendum made false claims. 50% of Irish exports go to Northern Ireland. Only 5% of Northern Ireland’s GDP goes to Ireland. Every five minutes, 70 children will be born in the UK, 20 to mothers not born here. The number of EEA nurses and midwives who joined the NHS for the first time fell by 91% from 2015/16 to 2017/18. China is Britain's top trading partner Neil Armstrong was the first man on the moon Consumption of sugar causes type 2 diabetes Approximately 1 in 4 people in the UK will experience a mental health problem each year Eating organic food doesn't come with any nutritional benefits over non-organic food Refugees or illegal immigrants living in Britain get a total yearly benefit of £29,900. The plane in the Malaysian Flight MH370 was hidden away and reintroduced as Flight MH17 later the same year in order to be shot down over Ukraine for political purposes We only used approx. 10% of our brain Cold weather causes colds The UK population is approximately 66 million in 2017 In January 2019, Christiano Ronaldo had the highest amount of Instagram followers in the world The most expensive Big Mac can be found in Switzerland at 6.62 USD UK drink approximately 95 million cups of coffee per day In 2018, Amazon was the largest publicly traded company in the world Singapore has the highest average IQ level in the world 46
  • 48. 2 billion tons of waste was dumped in 2016 American Bison is the heaviest land mammal The EU currently costs the UK over £350 million each week - nearly £20 billion a year “Labour reveals over 200,000 nurses have quit the NHS since 2010.” The Ethiopian calendar is 7.5 years behind the Gregorian calendar due to the fact that it has 13 months. Drinking lemon mixed with hot water for one to three months will cause cancer to disappear. 75% of the world’s diet is produced from just 12 plant and five different animal species. If you call 999 in an emergency but can’t speak, press 55 and they can track where you are calling from using new technology. There are 118 elements in the periodic table Staying in the single market & customs union would not cover services. The population of the UK in 2017 was 66 million World War II ended in 1945 The Beatles are the best-selling artists of all time 47