This document summarizes a study that aimed to validate the Wordscores text analysis method. The study found that Wordscores lacked content, construct, and criterion validity when used to estimate policy positions of political parties based on electoral manifestos. Specifically, Wordscores did not adequately represent the constructs being measured, did not correlate well with other measures, and did not perform as expected compared to expert surveys. The study concludes that Wordscores should not be used to estimate party policy positions from manifestos without further validation efforts on a case-by-case basis.
2. Computer assisted methods for text analysis
Fig. 1 An overview of text as data methods.
Justin Grimmer and Brandon M. Stewart2
Bruinsma, Gemenis Validating Wordscores
3. Wordscores
Originally proposed by Laver, Benoit & Garry (2003)
Popular tool (869 citations on Google Scholar)
Developed for political manifestos, but also used to study:
Party mergers, electoral coalitions, policy preferences,
speeches, reports from US state lotteries, Chinese newspaper
articles, public statements by US Senators, open-ended
questions ...
Attempts at validation are rather limited
Bruinsma, Gemenis Validating Wordscores
5. Previous attempts at validation
Mostly against CMP data though Benoit & Laver (2007)
advise against this
Only assess criterion validity
Only assess ordinal placement (Hjorth et al. 2015)
Only use Spearman’s ρ or Pearson’s r (and thus no
assessment of systematic measurement error)
Bruinsma, Gemenis Validating Wordscores
6. Replication of the original Laver et al. article
Table 1: Replication of the original scores
Number of Parties
Stata Version 5 parties 7 parties
0.36
EC
0 5 10 15 20
SO
DL Labour FG FF PD
FFLabour
PD
FGDL
DL Labour FFFG PDSF
Greens
EC
0 5 10 15 20
DL
Labour
FFFG
Greens
SO
SF PD
Laver et al. (2003)
23-Jun-2009
EC
0 5 10 15 20
SO
Labour FG PDFF DL
DL Labour FFFG
PD
EC
0 5 10 15 20
SO
DL
Labour
FF
FG
PD
SFGreens
DL
LabourFF FG PD
SF
Greens
Laver et al. (2003) Replication Material
Bruinsma, Gemenis Validating Wordscores
7. Hjorth et al. validation
ws_rankexpws_rankexpws_rankexpws_rankexp
low high low high low high
low high low high
low high low high low high
low high low high
low high
1945 1950 1953 1957 1960
1964 1966 1968 1971 1973
1977 1979 1981 1984 1987
1988 1990 1994 1998 2001
2005 2007
Bruinsma, Gemenis Validating Wordscores
9. Study Design
Documents
Using 2004 Euromanifestos to score 2009 Euromanifestos
Euromanifestos obtained from the Manifesto Project Database
Bruinsma, Gemenis Validating Wordscores
10. Study Design
Documents
Using 2004 Euromanifestos to score 2009 Euromanifestos
Euromanifestos obtained from the Manifesto Project Database
Reference scores
Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
Bruinsma, Gemenis Validating Wordscores
11. Study Design
Documents
Using 2004 Euromanifestos to score 2009 Euromanifestos
Euromanifestos obtained from the Manifesto Project Database
Reference scores
Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
Comparison
Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
Bruinsma, Gemenis Validating Wordscores
12. Study Design
Documents
Using 2004 Euromanifestos to score 2009 Euromanifestos
Euromanifestos obtained from the Manifesto Project Database
Reference scores
Chapel Hill Expert Study (2002), Benoit & Laver Expert
Survey (2003-2004), Euromanifestos Project (2004)
Comparison
Chapel Hill Expert Study (2010), EU Profiler (2009),
Euromanifestos Project (2009)
Analysis
Use Lin’s Concordance Correlation Coefficient instead of
Spearman’s ρ or Pearson’s r
25 countries/territories ∗ 4 dimensions ∗ 3 reference scores ∗ 2
transformations = 600 analyses
Bruinsma, Gemenis Validating Wordscores
14. Types of validity
Following Carmines & Zeller (1979):
Content Validity
Does the method represent all facets of a construct?
Bruinsma, Gemenis Validating Wordscores
15. Types of validity
Following Carmines & Zeller (1979):
Content Validity
Does the method represent all facets of a construct?
Construct Validity
Does the method correlate with other measures reflecting the
same concept?
Bruinsma, Gemenis Validating Wordscores
16. Types of validity
Following Carmines & Zeller (1979):
Content Validity
Does the method represent all facets of a construct?
Construct Validity
Does the method correlate with other measures reflecting the
same concept?
Criterion Validity
Does the method behave as expected within a given theoretical
context?
Bruinsma, Gemenis Validating Wordscores
17. Content validity for EU Integration
0.511.522.5
Density
0 .5 1
word relevance (mean)
BNP
01234
Density
0 .2 .4 .6 .8 1
word relevance (mean)
CONSERVATIVES
0246810
Density
0 .2 .4 .6 .8 1
word relevance (mean)
GREENS
0246
Density
0 .2 .4 .6 .8 1
word relevance (mean)
LABOUR
02468
Density
0 .2 .4 .6 .8 1
word relevance (mean)
LIBDEM
02468
Density
0 .2 .4 .6 .8 1
word relevance (mean)
PC
02468
Density
0 .2 .4 .6 .8 1
word relevance (mean)
SNP
0.511.522.5
Density
0 .5 1
word relevance (mean)
UKIP
0246
Density
0 .2 .4 .6 .8 1
word relevance (mean)
Total
Bruinsma, Gemenis Validating Wordscores
20. Conclusion
No serious validation of Wordscores up till now
This validation found it lacking on content, construct and
criterion validity
Wordscores should not be used to estimate parties’ policy
positions using electoral manifestos as reference and virgin
texts
Bruinsma, Gemenis Validating Wordscores
21. Outlook
Wordscores might still be useful in other applications where
the assumptions of ideal point estimation for words might be
approximated
However, a case-by-case validation should be applied
Bruinsma, Gemenis Validating Wordscores