Third webinar of the series: Measuring people's perceptions, evaluations and experiences, 29 September 2020, More information at: http://www.oecd.org/statistics/lac-well-being-metrics.htm
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
The Measurement of Trust and Subjective Well-being: OECD Guidelines and practical applications, Lara Fleischer
1. #3 The Measurement of trust and
subjective well-being: OECD
Guidelines and practical
applications
Webinar series – Measuring people's perceptions, evaluations
and experiences
29 September 2020
2. Languages today
• Translation into both Spanish and English is available
• Please write questions into the chat as we go along –
if possible, in Spanish and English
3. Today’s agenda
(1) OECD Guidelines on
measuring trust and
subjective well-being (swb)
(2) Swb in Latin America
(3) Using trust and swb
measures in practice
Mariano Rojas,
Universidad
Internacional de La
Rioja
Lina Martínez,
POLIS Observatory of
Public Policies
Lara Fleischer,
OECD WISE Centre
4. OECD Guidelines on Trust and SWB
1. Why do we care about trust and swb?
2. OECD Guidelines on Measuring Trust and Subjective
Well-Being
• Overview
• Deep dive: mitigating measurement error
through smart survey design
5. 1. Why do we care about trust and
subjective well-being?
7. Definition – SWB
Good mental states, including the various evaluations that people
make of their lives, and the affective reactions of people to their
experiences
Life evaluations (e.g. life satisfaction)
Affect: positive (happy, content)
negative (sad, anxious)
Eudaimonia (positive mental functioning;
meaning + purpose in life; personal growth)
8. SWB emphasises the views of citizens
Captures people’s own views
about and experiences of life
•Doesn’t assume to know what makes people happy; lets people
be the judge of how their lives are going
• Helps understand whether changes in society having a positive
or negative impact on how people feel
•Experimental but increasing usage in social cost benefit
analysis
9. SWB moves with the times…
0
1
2
3
4
5
6
7
5.6
5.8
6.0
6.2
6.4
6.6
6.8
7.0
7.2
7.4
7.6
2007 2008 2009 2010 2011 2012
United States
Life satisfaction
Long-term unemployment rate (right hand y-axis)
0
1
2
3
4
5
6
7
5.6
5.8
6.0
6.2
6.4
6.6
6.8
7.0
7.2
7.4
7.6
2007 2008 2009 2010 2011 2012
OECD Euro area
(selected countries)
Source: OECD (2013) How’s Life? 2013, based on Gallup World Poll
12. Definition – Trust
A person’s belief that another person or institution will
act consistently with their expectations of positive
behavior
Other people (interpersonal trust)
Institutions (institutional trust)
13. 1
3
Positive association with income per capita….
Source: Algan and Cahuc, 2013
Interpersonal trust matters beyond its intrinsic value…
14. 1
4
….. and market regulation
Interpersonal trust matters beyond its intrinsic value…
Source: Algan and Cahuc, 2013
15. 1
5
…..Italian provinces with higher social capital adopted
COVID-19 social distancing earlier
Interpersonal trust matters beyond its intrinsic value…
Source: Durante et al, 2020
16. 1
6
Source: OECD, 2020
20406080
100
10 20 30 40 50 60 70 80 90
Confidence in national government,%
R-sq:0.54
Data source: Gallup World Poll, OECD Stat
(2006-2015)
Trust in Government vs. GDP per capita
20406080
100
GDPpercapita,thousandsofdollars
10 20 30 40 50 60 70 80 90
Confidence in the judicial system,%
R-sq:0.64
Data source: Gallup World Poll, OECD Stat
(2006-2015)
Trust in the Judicial System vs. GDP per capita
Institutional trust and GDP per capita
Institutional trust is also important
17. 1
7
Source: Algan et al, 2017
Impact of the GFC on trust in parliament in Europe
Institutional trust is affected by crises
19. • Key audiences: National Statistical
Offices, other data producers, and
data users
• Aim to improve the quality and
availability of measures of swb and
trust
• Essential that official measures are
collected in a consistent way to
enable comparisons, both between
surveys and over time
OECD Measurement Guidelines
20. What do the Guidelines cover?
Reporting and analysing the data
Methodological issues
Concept and validity
Good practice in data collection
*prototype question modules*
23. Where does self-report survey error come from?
Measurement
error
Respondent
factors
Survey
factors
Situation
factors
Construct
being measured
Survey
factors
Motivation
Fatigue
Memory
The weather
An election
How
interesting/
relevant is the
topic for a
respondent?
Cognitive demands of
questions
Survey mode
Question order effect
Sampling frame and time
24. There is no “perfect” measure …
• All measures contain error (also objective ones!)
• The goal is to find a “good enough” one to distinguish
meaningful patterns
• We need to understand how survey design can reduce or
manage error
– Question wording
– Response formats
– Survey context
– Survey mode
– (Cross-cultural) response styles
25. There is no “perfect” measure …
• All measures contain error (also objective ones!)
• The goal is to find a “good enough” one to distinguish
meaningful patterns
• We need to understand how survey design can reduce or
manage error
– Question wording
– Response formats
– Survey context
– Survey mode
– (Cross-cultural) response styles
26. Question wording - general considerations
Goal: respondents comprehend and interpret questions in a
similar and unambiguous way
• Avoid phrases that are idiomatic/ age-specific/ not widely used:
e.g. “in the past four weeks, have you felt full of pep?”
• “Double-barrelled” questions can confuse/ demotivate:
e.g. “in the past four weeks, have you felt calm and happy?”
• Avoid vague quantifiers (“very” “somewhat” “a little”)”:
e.g. “on a scale from 0-4, have you felt a little anxious recently?”
• Avoid leading questions
e.g. “Is it true that you are happier now than you were 5 years ago?”
27. Question wording – do changes in wording matter? I
Interpersonal trust is frequently asked via the “ Rosenberg
question”:
Generally speaking, would you say that most people can be
trusted or that you can’t be too careful in dealing with
people?
• Caution is not exactly the same concept as distrust
• Initial evidence that being careful carries different
connotations for different population subgroups
(Smith 1997; Soroka, Helliwell and Johnston 2007)
28. Question wording – do changes in wording matter? I
The “can’t be too careful” phrasing induces a priming effect on
relatively vulnerable groups. A more neutral question wording is
preferable for interpersonal trust.
29. Question wording – do changes in wording matter? II
For institutional trust, specifying the context of institutional
behaviour can make a difference in some cases. Further
investigation is needed.
30. Question wording - timeframes for affect
Timeframe Risk of recall
bias
(assumed)
Impact of
random events
Most usually seen in…
Right now Lower Higher
Experience sampling/
time use diaries
(repeated measures)Last four hours
Yesterday Large-scale surveys
Last two weeks
Small-scale survey and
mood(disorder) researchLast four weeks
Higher Lower
31. •Too few response options can restrict expression:
e.g. not at all happy / pretty happy / completely happy
e.g. do not trust/ trust somewhat/ trust
•Too many can overburden or fail to add value
e.g. 0-100 scale: people tend to respond in multiples of 5 or 10
• Verbal labels can add to cognitive burden and introduce vagueness
e.g. a little happy/ somewhat happy/ slightly happy
• Numerical scales are easier to remember, and reinforce idea of equal intervals… also
easier from a translation perspective
• Yes/no measures have some advantages for long batteries of negative
and positive affect items (which are later summed)
Response formats - general considerations
Goal: properly represent the construct of interest and the full range
of possible responses BUT also to be understandable to
respondents
32. Response formats: number of response options
“Taking all things together, how would you say things are these days – would you say
that you’re….?”
Source: Smith (1979) National Opinion Research Center, Surveys 5059A and 5059B
• Longer scale shows
greater variation
• Whole distribution
shifts upwards when
given 5 response
options
0%
10%
20%
30%
40%
50%
60%
Not at all happy Not too happy Pretty happy Very happy Completely
happy
Version A
Version B
33. Response formats: key take aways
Take a standardised approach to response format to ensure the
consistency of measurement, especially in an international context!
For life evaluation and trust:
• 0-10 numerical scales with verbal scale anchors are likely to
perform best
• Numerical response order should be presented consistently (i.e.
0-10 instead of 10-0)
• When choosing scale anchors, the labels should represent
absolute responses (e.g. completely/not at all)
For affect and eudaimonia:
• Less is known
• If multiple questions are used for affect, the sensitivity of any one
question is less critical
34. Response styles – general considerations
Response style:
- When a respondent exhibits a repeated tendency towards a
particular response bias or heuristic
e.g. ‘moderate responding’ = selecting response options towards the
middle of response scale
• Potential issue if there are systematic differences between groups
of individuals or countries
• Discussion to some degree for all perception measures, but
especially frequent for subjective well-being
35. Notes: OLS regression on individual-level data, including controls for: Survey year, age, age2, gender, marital status,
number of children, income, education, area of residence (urban, rural etc.) employment status, immigrant status,
migrated to country <5 years ago.
Gallup World Poll, N = 677,302 observations. Life evaluations measured on a 0-10 Cantril Ladder scale.
Source: Exton, C., C. Smith and D. Vandendriessche (2015), "Comparing Happiness across the World: Does Culture
Matter?", OECD Statistics Working Papers, No. 2015/04, OECD Publishing, Paris.
http://dx.doi.org/10.1787/5jrqppzd9bs2-en
Response styles – country fixed effects in life evaluation
36. Different possibilities:
• Unmeasured life circumstances (e.g. social context/ institutions/
social connections)
• Language differences that influence scale use, difficulty translating
concepts
• Cultural impact differences in how people feel about their life
circumstances (meaningful)
• Cultural bias difference in what people say about their life
circumstances and how they communicate emotions
Response styles – what could explain this?
37. • Important difference between cultural impact and cultural bias
• Even where existence of response styles is established, they do not
necessarily harm overall data quality
• Rather than relying on ex post statistical adjustment techniques,
focus on good survey design
• When cultural response effects are significant, look at changes in
response patterns rather than level of responding for international
comparisons
Response styles – key take aways
38. Main take away!
• Good survey design matters for all
measures, but particularly for
perceptions!
• We need more controlled
experimentation with methods
• Consistency is more important than
finding the “perfect measure”
•Check out the Guidelines at:
https://www.oecd.org/statistics/better-
life-initiative.htm
39. •It is increasingly recognised that it is
important to go beyond GDP, in
measuring the progress of societies
• Economic growth is a means to an
ends, not an ends in itself
Measuring well-being is about
capturing the final outcomes that
matter to people
•People’s subjective perceptions,
evaluations, and experiences of life are
a crucial part of that
The OECD Better Life Initiative
Editor's Notes
Welcome, everyone to the third webinar.
My name is Lara Fleischer, and I work at the newly launched OECD WISE Centre for Well-being, Inclusion, Sustainability and Equal Opportunities.
I am joined by two other great speakers, Mariano Rojas and Lina Martinez who I will introduce in a second.
Today’s webinar will concretely delve into two types of perception measures – trust and subjective well-being (SWB), which the OECD has written measurement guidelines on- to look at some solutions for potential measurement challenges, reflect on these measures in the Latin American context, as well as on how such measures can inform decision-making.
Before we beging, a note on the languages today.
I know we have a lot of Spanish speakers in the audience, so there is simultanous translation tino Spanish and English available. You can active it if you click on the world buttom at the buttom of your Zoom screen.
All slides will be in english and my part of the webinar will be in spoken English, but Mariano and Lina will be speaking in Spanish.
This is today’s agenda. We will start with a 40-45 min presentation on the guidelines on trust and swb.
This will be followed by two 25-30 min presentations by Mariano on swb in Latin America, as well by Lina who will give practical examples of how trust and SWB surveys have been used by the POLIS institute in Colombia to evaluate policy interventions and provide rapid results on the impact of COVID-19 on happiness.
We will have plenty of time for Q&A after.
Time to introduce the other two speakers:
Mariano Rojas is probably a familiar face: He holds a Phd in Econ from Ohio State University, was President of the International Society for Quality of Life Studies and is currently affiliated to Universidad Internacional de La Rioja, Spain.
Mariano has been doing research on subjective well-being and happiness for the past two decades, with a regional focus on Latin America. His research is fundamentally interdisciplinary and multicultural and he is the author of many books – too many to list them all - Well-Being in Latin America: Policies and Drivers, The Scientific Study of Happiness; The Economics of Happiness: How the Easterlin Paradox Transformed our Understanding of Well-Being and Progress, Measuring the Progress of Societies: Perspectives from Latin America.
Lina Martinez is an associate professor of public policy and director of the Polis observatory of public policies at Universidad Icesi, Colombia. She holds a Ph.D in public policy and her current research focuses on the informal economy, urban health policies, social mobility and life satisfaction.
I definitely encourage you to check out both of their publications.
My presentation will focus on some lessons learned from the oecd guidelines.
There is increasing consensus that we need to go beyond aggregate and solely economic measures of development, such as GDP, as a yardstick to how societies and the people in them are doing.
The OECD Better Life Initiative, which was launched almost a decade ago, underlines that measuring well-being is about the final outcomes that matter to people.
People’s subjective perceptions and experiences are a crucial part of that.
OECD has developed a multidimensional well-being framework shown on this slide.
It consists of three parts.
We operationalize this with more than 80 indicaotrs
We often say « happiness » but we usually mean much more than that.
It can be conceptualised as three types:
Global evaluations of life as whole
Poeples more short term moods
eudemonia
Including during covid – data from the UK ending march 2020
In the year ending March 2020, average ratings of life satisfaction, happiness and anxiety, in the UK, all deteriorated; this is the first time since we started measuring them, in 2011, that these three measures have significantly worsened when compared with the year before.
But life satisfaction is still higher than at the start of measurement (mids of financial crisis)
Compared to year before
average ratings of anxiety increased by 6.3%
average ratings of happiness in the UK fell by 1.1%
Life satisfaction ratings decreased by 0.9% over the year from an average rating of 7.72 (out of 10) in Quarter 1 (Jan to Mar) 2019 to 7.65 in the same quarter of 2020. Although life satisfaction declined in the first quarter of 2020, it was still well above the lowest levels observed in 2012.
It can also be a predictor
Ward (2015): ups and downs in life satisfaction predict election outcomes in Europe better than macroeconomic variables, and life satisfaction inequality was related to the likelihood to vote for brexit in UK local authorities
Insttutional trust is also related to SDG 16
Trust is essentially the best available proxy of social capital
intrinsically valuable
Some evidence that this is causal
Trust is computed as the country average from responses to the trust question in the five waves of the World Values Survey (1981-2008), the four waves of the European Values Survey (1981-2008) and the third wave of the Afrobarometer (2005). The question asks “Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?” Trust is equal to 1 if the respondent answers “Most people can be trusted” and 0 otherwise.
Ease of opening a business (days)
The state must step in to regulate the relations among individuals when they are incapable of cooperating spontaneously, generating higher transaction costs
sing Italian data from phone location tracking of movements made by individuals after the pandemic began, this column finds sharper drops in mobility in areas with higher ‘civic capital
(I think they also used blood donations and newspaper readership)
21 feb-9 march
Similar findings from other countries, eg counties in the untied states with higher social capital recorded less deaths per capita
Intrinsic: A prerequisite for people’s political voice
Instrumental: Important for success of policies and reforms that depend on people’s compliance
There is a sharp decline in the trust in national political system in the post-crisis period, you see distribution shift to the right
Data from European Social Survey
We will see what happens with COVID
Also, via the unemployment channel, has been linked to rise of populist voting in European countries post GFC
Trust and subjective well-being are inherently personal experiences, sth that is usually self reported via surveys so there is somewhat more of a debate on how to measure these intangible constructs. While lots of national statistical offices have started to collect official data on these in recent years, the majority still comes from smaller scale unofficial surveys, esp beyond OECD countries
Available for free online
There are lots of other guidelines as well (quality of working environment, hh microwealth)
Both guidelines cover 5 topics
-Sets out a conceptual framework and working definition
Reviews what is known about the statistical quality (reliability and validity) of measures I will not focus on this today, but the Guidelines show that there is pretty good evidence that measures of SWB and trust can be considered fit for purpose give reliable and valid results
Address methodological issues, such as the potential for measurement error in survey data
Give advice on best practice in data collection, planning, sample design and size, frequency, interviewer training
- Address the interpretation of results and analysis of microdata
- Provides a concrete set of question modules that data producers can insert into their household surveys
Both guidelines give a toolbox of measures for data produces to choose from, but each also includes a short core module that is supposed to be used as is and contains questions with highest evidence on validity, and that we want to encourage international comparability for.
Within each core module, there is an additional “primary” measure which is supposed to be chosen if there is only space for one single question.
In this case, life evaluation.
Diving into methodological issues:
Measurement error is the extent to which a survey measure reflect unintended concepts
Survey error usually arises from a complex interaction between a range of different factors such as
Respondent factors (such as motivation, fatigue and memory)
Survey factors (cognitive demands made by questions, if they are constructed in a non confusing manner, survey mode, question order effects, sampling frame and time of year/day)
Situation factors (irrelevant cues like the weather, election or a pandemic)
Construct of interest itself (how interesting/ boring are respondents finding the survey topic)
Some of these are systematic (fixed survey design) and others random (especially situational factors).
As a data producer the thing you have most control over is the ‘survey factors’, which we will focus on today.
Meaningful patterns, such as changes over time and differences between population subgroups, from noise in the data.
The guidelines discuss various aspects of survey design, I will focus today on question wording, response format and cross cultural response styles
I will give some examples, some drawn from trust and some from swb, including some of the solutions we know about (but there are many open issues and the guidelines will need to be updated a more evidence arrives)
ALL of these need to be consistent! If we want comparability!
Goal: respondents comprehend and interpret questions in a similar and unambiguous way
I am going to start with some very general (perhaps obvious) advice of what not to do and then will give one examples from the trust guidelines that address how specific question wording has to be to conclude that it indeed taps into trust
e.g. an early version of the European Health Invterview Survey asked if respondents felt “full of pep” in the last 4 weeks …
…. Which means full of energy, vitality and high spirits.
Cognitive field testing by ONS suggested many did not understand this (particularly when they were also asked if they felt “full of energy”, which broadly amounts to the same thing)
Ways to test if this is the case is by conducting cognitive testing (but results are not often published), comparing response latencies or comparting response distributions between different version of a question and looking for patterns
Unbalanced question: only specifies one direction of trust, instead of also spelling out the alternative or do you think that most people cannot be trusted?
Soroka, Helliwell and Johnston (2007) examine four different versions of interpersonal trust questions included in the 2000/2001 Equality, Security and Community (ESC) survey carried out in Canada, two of which feature a “caution rider”. They find that while women are less trusting than men when the standard trust question is used, they are more trusting than men when a question without the cannot be too careful rider is used. It therefore may be cautiousness, rather than trustworthiness, that drives gender differences in the Rosenberg trust question.
We worked with the ONS and Mexico to carrie out split sample experiments in the course of the Guidelines. Small sample size of 500 each, so not definite and not always stat significant, but point to similar results
In the first experiment, respondents were offered either the standard Rosenberg question with a caution rider: “Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?” (with a dichotomous answering choice) or the more neutral 11-point scale European Social Survey version: “On a scale where 0 is not at all and 10 is completely, in general how much do you think people can be trusted?”
When asked about trust with the caution rider version, fewer women reported that most people can be trusted than was the case for men (30.7% vs. 33.6%). By contrast, when using the question wording without the caution rider, more women (40.4%) reported a score of 7 to 10 than was the case for men (36.8%). A comparable pattern can be observed for older people (over 45 years) vs. younger people (aged 16-44).
In a follow-up experiment, the phrasing of the 11-point scale item was changed slightly to also include a caution rider. This effectively reversed the results of the first experiment.
This shows experimentation is very imporatn.
Key message:
Specifying the context of institutional behaviour can make a difference in some cases (“A trust B to do X” vs “A trusts B”).
For banks, adding “to act in the national interest” leads to a more differentiated evaluation.
It is worth further investigating which other specifications (e.g. “to act in my personal interest”) matter for which institutions.
The OECD also partnered with INEGI of Mexico, placing various trust question versions in its June 2016 National Survey on Urban Public Security. This study was not a split sample experiment (each respondent was asked two questions within the same survey) and therefore it cannot be ruled out that differences in responses are due to priming or shared method variance. Nevertheless, when the two versions (regular trust in institutions questions vs “to act in the national interest”) was posed to the 500 people Mexican sample, adding to “to act in the national interest” did not lead to a strong drop in share of the population that indicated that banks can be trusted “a great deal.” On the other hand, the civil service experienced a drop from 10.6% to 6.8% of respondents indicating that they trust this institution “a great deal.” These results potentially indicate that institutions could carry different connotation in the Mexican compared to the UK context. Further actual experimental research will be needed to clarify this question.
1) There are lots of different timeframes used in affect measures…
2) When choosing among different timeframes, there is a basic trade-off between the risk of RECALL BIAS and the impact of RANDOM EVENTS.
… so for example, very short timeframes are good for recall accuracy, but are very much going to be influenced by what someone is doing at the time.
… the reverse may be true for long timeframes. You might get much greater recall bias, context effects, personality trait information… but you will likely get a less strong impact of small-scale random events.
3) This means that the measure you select will depend on the study you are doing.
4) High impact of random events is tolerated in experience sampling and time use diaries because you usually have REPEATED MEASURES from the same individual.
Also, one of the goals of these studies is to understand the relationships between activities and affect = so if affect varies according to your activities, that is information not noise!
5) In small scale and clinical research, where you only take one measure per person and you have a small sample, tolerance for the impact of random events is lower. So you might trade off recall accuracy in order to get a more “dispositional” measure.
… In very large sample surveys, using a timeframe of “yesterday” is a compromise between recall accuracy and the risk of random events impacting on reporting. But you do need a large sample, because the measure will be noisier.
If it’s experienced affect, 24 hours might be the limit for tolerable accuracy of recall.
Not only the question wording but the response format matters.
Aspects of the response format include scale length, verbal or numerical and how the scale is labelled
Full range of possible responses includes no-response options because we want to capture varability
TRADE-OFF between making the response options too simple, causing meaningful variation to be lost, and minimising cognitive burden of the respondent.
With VERBAL LABELS the limit to memory and attention seems to be 5 response categories in a verbal interview setting (without showcards).
With NUMERICAL SCALES, only the scale anchors need to be held in memory, so that becomes easier.
Digit preferences (lucky number 7) could come into play more.
In BIPOLAR attitude measures, it’s a good idea to offer a neutral category…even though responses tend to bunch around the midpoint
Some evidence that longer numerical scales generally increase internal consistency and test-retest reliability (although gains are small), and possibly validity too.
For instance, 11 point trust questions but not dichotomous ones have been found to be correlated with expected other outcomes (volunteering)
An odd number of response options/a mid-point allow expressing a neutral position (Bradburn et al 2004)
Numerical scales are much less likely to pose translation challenges (OECD 2013).
Scale answers that include “agree/disagree” are linked to acquiescence (Krosnick 1999).
What is clear is that different response options lead to different and not necessarily interchangeable measures
Just to quickly illustrate what happens when you change the number of response options available on a scale:
Two surveys – same question:
one has 3 response options (not too happy, pretty happy, and very happy)
version B adds to that “not at all happy” at the negative pole, and “completely happy” at the positive pole.
We can see that at the negative pole, respondents are split between “not at all” and “not too” happy.
But there is NO OVERALL DIFFERENCE in the number of people at the negative pole.
Something different happens at the upper end of the scale.
Here, we see around 13% of respondents saying they are “completely happy”. But the number saying the are “very happy” stays about the same.
Which has a big impact on the number of people endorsing the scale midpoint, “pretty happy”.
So there seems to be a general upwards drag effect and more variation when you add the extra category. Meaning of a verbal label partly understood with reference to other labels surrounding it?
●.The available evidence suggests that a numerical 11-point scale with verbal scale anchors is preferable over the alternatives,
as it allows for a greater degree of variance in responses and increases overall data quality as well as translatability across languages.
At least for swb we see internatonal consensus moving into this direction
●. to minimise mental switching between positive and negative normative outcomes
to allow for the full spectrum of possible responses and minimize socially desirable responding
Less is known for affect and eudaimonia: various approaches continue to be used, but with little systematic experimentation.
>> BUT, good reason to think a 0-10 scale should still work well
These things are OK (or at least, we can live with them) when they occur AT RANDOM throughout a population.
…because that could influence comparisons between groups.
Response styles at the country or cultural level or “cultural bias”
It is also discussed for trust though, especially for trust in instutions
What do we know about the risk of cultural bias in subjective well-being?
This chart shows the size of “country fixed effects” in a regression analysis that controls for a few basic factors like demographics, income, unemployment and education.
Germany is used as the reference group here – and you’ll see that France, Italy and Belgium don’t vary much from the German baseline after taking these factors into account.
We can see that countries like Hungary, China and Russia have strongly negative coefficients.
While Brazil, Switzerland, Canada and Denmark have positive coefficients.
Now, a great many things could be driving these differences… (we’ve controlled for only a VERY limited number of covariates here…)
There are some concerns about potential biases in the data that might operate at country level. This could be about difficulties translating the concepts, or about different cultural values and norms when it comes to communicating feelings and emotions.
BUT, overall these are not large enough to declare SWB unfit for purpose or invalid.
A cross-country study drawing on Gallup World Poll data concluded that culture may account for at most 20% of unexplained country-specific variance in subjective-wellbeing (Exton, Smith, Vandendriessche 2015).
Cross-cultural response styles are difficult to verify externally against a common standard or actual behaviour.
We do not want to adjust for culture completely because then we also get rid of cultural impact, which is meaningful
Mention use of vignettes
Designing the questionnaire so that items are as simple, easy to interpret and minimally burdensome as possible – in line with some of the things already mentioned.
There are also ways of appropriately tailoring the survey design to reduce social desirability bias and threat of disclosure concerns.
Controlled, systematic experimentation with methods is important to move our knowledge of “good practice” forwards
NSOs have a critical role to play here!
Rapidly evolving literature – many recent contributions
Consistency of approach is important where data comparability is important (e.g. for policy uses)
There is increasing consensus that we need to go beyond aggregate and solely economic measures of development, such as GDP, as a yardstick to how societies and the people in them are doing.
The OECD Better Life Initiative, which was launched almost a decade ago, underlines that measuring well-being is about the final outcomes that matter to people.
People’s subjective perceptions and experiences are a crucial part of that.