The document discusses causality and experiments in modeling social data. It explains that while prediction involves forecasting based on observation, causation requires determining the effect of making a change. Three types of causal inference are discussed: observational studies which can be biased by confounding factors; randomized experiments which aim to reduce bias through random assignment but have limitations; and natural experiments which exploit real-world occurrences that resemble random experiments. Overall the document emphasizes that determining causal effects rather than just predictive correlations is important but challenging.
Sample size determination in clinical trials is considered from various ethical and practical perspectives. It is concluded that cost is a missing dimension and that the value of information is key.
Minimisation is an approach to allocating patients to treatment in clinical trials that forces a greater degree of balance than does randomisation. Here I explain why I dislike it.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one.
A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising.
On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former.
The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests.
Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws.
By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
Sample size determination in clinical trials is considered from various ethical and practical perspectives. It is concluded that cost is a missing dimension and that the value of information is key.
Minimisation is an approach to allocating patients to treatment in clinical trials that forces a greater degree of balance than does randomisation. Here I explain why I dislike it.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one.
A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising.
On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former.
The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests.
Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws.
By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
Causal inference for complex exposures: asking questions that matter, getting...Ellie Murray
Slides from Dec 3, 2021 talk at University of Minnesota School of Public Health, Epidemiology department.
Lecture topic: how do we ask good causal questions & once we've got our questions framed, how do we answer them?
Lecture recording will be posted to YouTube - date tbd.
An introduction to causal graphical models with examples of causality in practice from different fields of science. More focused discussion of causal inference in online ads and recommender systems.
Dr. Leonard Saltz, MD; Chief, Gastrointestinal Oncology Service; Head, Memorial Sloan Kettering
Dr. Saltz will discuss selected successes and failures in cancer research efforts, and what we can learn from each, and will take a frank look at costs of care, and at business and government policies that are undermining progress and creating disparities in access to affordable, effective care.
The concepts of Alpha and Beta errors have long been documented in the statistical literature. Often, the notion of significance has been more widely promoted than statistical Power. Under and over powered tests can easily lead data analysts to draw invalid conclusions. The problem is further complicated by the boom of ‘Big Data’ where rates of automation and data collection have increased exponentially. It is not uncommon for a data set to easily run into millions of rows and, potentially, thousands of columns. This talk will start by discussing the fundamental concepts of alpha and beta error in data analysis, as well as statistical power, while discussing some of the common pitfalls. As data collection and computing power grow at alarming rates, there is a risk for some to focus on the utilizing standard data analysis tools and be lured into making false conclusions. Newer analytic techniques have been developed to complement the wave of Big Data. Some of these newer methods will be briefly introduced as tools for modern day data analysts to consider in their quest for managing risk via proper data analysis.
Clearly identifies the root cause of skyrocketing health cost and what companies and employees can do to reduce cost of health care.
You will learn proven strategies used successfully to reduce company health cost for over 20 years.
Learning with Others - A Randomized Field Experiment on the Formation of Aspi...essp2
International Food Policy Research Institute (IFPRI) and Ethiopian Development Research Institute (EDRI) in collaboration with Ethiopian Economics Association. Eleventh Conference on Ethiopian Economy, July 18-20, 2013
Zsolt Nagykaldi: Shifting the focus from disease to healthaimlabstanford
In this talk from Stanford Medicine X 2013, the University of Oklahoma's Dr. Zsolt Nagykaldi, PhD, discusses a paradigm shift at the heart of patient-centered care, from treating the unwell to maintaining the healthy.
In order to understand medical statistics, you have to learn the very basic concepts as mean, median, and standard deviation. interpretation and understanding of clinical study results depends mainly on statistical background.
Case Study Hereditary AngioedemaAll responses must be in your .docxcowinhelen
Case Study: Hereditary Angioedema
All responses must be in your own words. Answers that have been copied and pasted will not receive credit.
1. Translate “angioedema”. [Note: I am not looking for a description of the disorder. Rather, I would like you to translate the medical term itself.]
2. The complement system is described as a ‘cascade system’. How does the system fit into this description of being a cascade? [Suggestion: Google the definition of cascade, then think about the complement system in light of the definition]
3. Is complement involved in the innate, or the adaptive immune system, or both? Please explain you answer.
4. What role does C1INH play in the complement system? Why is it so important?
5. What was the physiologic cause of Richard’s abdominal pain?
6. How can one distinguish the swelling of HAE from the swelling of allergic angioedema?
7. What is bradykinin’s role in HA?
8. Do you think Richard’s infancy colic was related to his HA? No need to research this. Just use your intuition. Explain your thinking.
9. What is typically used to treat attacks of HAE?
10. Swelling in the extremities is not dangerous. What other areas of the body are subject to swelling? What is the most dangerous location for swelling to occur and why is it the most dangerous?
2018
BUS 308 Week 2 Lecture 1
Examining Differences - overview
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. The importance of random sampling.
2. The meaning of statistical significance.
3. The basic approach to determining statistical significance.
4. The meaning of the null and alternate hypothesis statements.
5. The hypothesis testing process.
6. The purpose of the F-test and the T-test.
Overview
Last week we collected clues and evidence to help us answer our case question about
males and females getting equal pay for equal work. As we looked at the clues presented by the
salary and comp-ratio measures of pay, things got a bit confusing with results that did not see to
be consistent. We found, among other things, that the male and female compa-ratios were fairly
close together with the female mean being slightly larger. The salary analysis showed a different
view; here we noticed that the averages were apparently quite different with the males, on
average, earning more. Contradictory findings such as this are not all that uncommon when
examining data in the “real world.”
One issue that we could not fully address last week was how meaningful were the
differences? That is, would a different sample have results that might be completely different, or
can we be fairly sure that the observed differences are real and show up in the population as
well? This issue, often referred to as sampling error, deals with the fact that random samples
taken from a population will generally be a bit different than the actual population parameters,
but will be “close” enough to the actual.
This lecture looks at:
- An explanation of each of the steps in the research process flowchart
- Types of data
- Generating and testing theories
- Measurement error
- Validity
- Reliability
Causal inference for complex exposures: asking questions that matter, getting...Ellie Murray
Slides from Dec 3, 2021 talk at University of Minnesota School of Public Health, Epidemiology department.
Lecture topic: how do we ask good causal questions & once we've got our questions framed, how do we answer them?
Lecture recording will be posted to YouTube - date tbd.
An introduction to causal graphical models with examples of causality in practice from different fields of science. More focused discussion of causal inference in online ads and recommender systems.
Dr. Leonard Saltz, MD; Chief, Gastrointestinal Oncology Service; Head, Memorial Sloan Kettering
Dr. Saltz will discuss selected successes and failures in cancer research efforts, and what we can learn from each, and will take a frank look at costs of care, and at business and government policies that are undermining progress and creating disparities in access to affordable, effective care.
The concepts of Alpha and Beta errors have long been documented in the statistical literature. Often, the notion of significance has been more widely promoted than statistical Power. Under and over powered tests can easily lead data analysts to draw invalid conclusions. The problem is further complicated by the boom of ‘Big Data’ where rates of automation and data collection have increased exponentially. It is not uncommon for a data set to easily run into millions of rows and, potentially, thousands of columns. This talk will start by discussing the fundamental concepts of alpha and beta error in data analysis, as well as statistical power, while discussing some of the common pitfalls. As data collection and computing power grow at alarming rates, there is a risk for some to focus on the utilizing standard data analysis tools and be lured into making false conclusions. Newer analytic techniques have been developed to complement the wave of Big Data. Some of these newer methods will be briefly introduced as tools for modern day data analysts to consider in their quest for managing risk via proper data analysis.
Clearly identifies the root cause of skyrocketing health cost and what companies and employees can do to reduce cost of health care.
You will learn proven strategies used successfully to reduce company health cost for over 20 years.
Learning with Others - A Randomized Field Experiment on the Formation of Aspi...essp2
International Food Policy Research Institute (IFPRI) and Ethiopian Development Research Institute (EDRI) in collaboration with Ethiopian Economics Association. Eleventh Conference on Ethiopian Economy, July 18-20, 2013
Zsolt Nagykaldi: Shifting the focus from disease to healthaimlabstanford
In this talk from Stanford Medicine X 2013, the University of Oklahoma's Dr. Zsolt Nagykaldi, PhD, discusses a paradigm shift at the heart of patient-centered care, from treating the unwell to maintaining the healthy.
In order to understand medical statistics, you have to learn the very basic concepts as mean, median, and standard deviation. interpretation and understanding of clinical study results depends mainly on statistical background.
Case Study Hereditary AngioedemaAll responses must be in your .docxcowinhelen
Case Study: Hereditary Angioedema
All responses must be in your own words. Answers that have been copied and pasted will not receive credit.
1. Translate “angioedema”. [Note: I am not looking for a description of the disorder. Rather, I would like you to translate the medical term itself.]
2. The complement system is described as a ‘cascade system’. How does the system fit into this description of being a cascade? [Suggestion: Google the definition of cascade, then think about the complement system in light of the definition]
3. Is complement involved in the innate, or the adaptive immune system, or both? Please explain you answer.
4. What role does C1INH play in the complement system? Why is it so important?
5. What was the physiologic cause of Richard’s abdominal pain?
6. How can one distinguish the swelling of HAE from the swelling of allergic angioedema?
7. What is bradykinin’s role in HA?
8. Do you think Richard’s infancy colic was related to his HA? No need to research this. Just use your intuition. Explain your thinking.
9. What is typically used to treat attacks of HAE?
10. Swelling in the extremities is not dangerous. What other areas of the body are subject to swelling? What is the most dangerous location for swelling to occur and why is it the most dangerous?
2018
BUS 308 Week 2 Lecture 1
Examining Differences - overview
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. The importance of random sampling.
2. The meaning of statistical significance.
3. The basic approach to determining statistical significance.
4. The meaning of the null and alternate hypothesis statements.
5. The hypothesis testing process.
6. The purpose of the F-test and the T-test.
Overview
Last week we collected clues and evidence to help us answer our case question about
males and females getting equal pay for equal work. As we looked at the clues presented by the
salary and comp-ratio measures of pay, things got a bit confusing with results that did not see to
be consistent. We found, among other things, that the male and female compa-ratios were fairly
close together with the female mean being slightly larger. The salary analysis showed a different
view; here we noticed that the averages were apparently quite different with the males, on
average, earning more. Contradictory findings such as this are not all that uncommon when
examining data in the “real world.”
One issue that we could not fully address last week was how meaningful were the
differences? That is, would a different sample have results that might be completely different, or
can we be fairly sure that the observed differences are real and show up in the population as
well? This issue, often referred to as sampling error, deals with the fact that random samples
taken from a population will generally be a bit different than the actual population parameters,
but will be “close” enough to the actual.
This lecture looks at:
- An explanation of each of the steps in the research process flowchart
- Types of data
- Generating and testing theories
- Measurement error
- Validity
- Reliability
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
1. Modeling Social Data:
Causality and Experiments
Guest lecturers: Andrew Mao, Amit Sharma
Slides by: Jake Hofman, Andrew Mao, Amit Sharma
2. Prediction
Make a forecast, leaving the world as it is
(seeing my neighbor with an umbrella might predict rain)
vs.
Causation
Anticipate what will happen when you make a change in the world
(but handing my neighbor an umbrella doesn’t cause rain)
3. “Causes of effects”
It’s tempting to ask “what caused Y”, e.g.
◦ What makes an email spam?
◦ What caused my kid to get sick?
◦ Why did the stock market drop?
This is ”reverse causal inference”, and is generally quite hard
John Stuart Mill (1843)
4. “Effects of causes”
Alternatively, we can ask “what happens if we do X?”, e.g.
◦ How does education impact future earnings?
◦ What is the effect of advertising on sales?
◦ How does hospitalization affect health?
This is “forward causal inference”: still hard, but less contentious!
John Stuart Mill (1843)
5. Example: Hospitalization on health
What’s wrong with estimating this model from observational data?
Health
tomorrow
Hospital
visit today Effect?
Arrow means “X causes Y”
6. Confounds
The effect and cause might be
confounded by a common cause,
and be changing together as a
result
Health
tomorrow
Hospital
visit today Effect?
Health
today
Dashed circle means “unobserved”
7. Confounds
If we only get to observe them
changing together, we can’t
estimate the effect of
hospitalization changing alone
Health
tomorrow
Hospital
visit today Effect?
Health
today
8. Observational estimates
Let’s say all sick people in our dataset went to the hospital today, and
healthy people stayed home
The observed difference in health tomorrow is:
Δobs = (Sick and went to hospital) – (Healthy and stayed home)
9. Observational estimates
Let’s say all sick people in our dataset went to the hospital today, and
healthy people stayed home
The observed difference in health tomorrow is:
Δobs = [(Sick and went to hospital) – (Sick if stayed home)] +
[(Sick if stayed home) - (Healthy and stayed home)]
10. Selection bias
Let’s say all sick people in our dataset went to the hospital today, and
healthy people stayed home
The observed difference in health tomorrow is:
Δobs = [(Sick and went to hospital) – (Sick if stayed home)] +
[(Sick if stayed home) - (Healthy and stayed home)]
Causal effect
Selection bias
(Baseline difference between those who opted in to the treatment and those who didn’t)
11. Basic identity of causal inference1
Let’s say all sick people in our dataset went to the hospital today, and
healthy people stayed home
The observed difference in health tomorrow is:
Observed difference = Causal effect – Selection bias
Selection bias is likely negative here, making the observed difference
an underestimate of the causal effect
1Varian (2016)
12. A fundamental problem across science and
industry
• Knowledge (Science)
• What is the effect of doing X?
• Social sciences
• Biology and medicine
• Solving problems (Industry)
• When should I do X?
• Predictive models in practice
• Policies for better outcomes
24. Simpson’s paradox
Selection bias can be so large
that observational and causal
estimates give opposite effects
(e.g., going to hospitals makes
you less healthy)
http://vudlab.com/simpsons
26. Comparing old versus new algorithm
27
Old Algorithm (A) New Algorithm (B)
50/1000 (5%) 54/1000 (5.4%)
27. Frequent users of the Store tend to be
different from new users
28
Old Algorithm (A) New Algorithm (B)
10/400 (2.5%) 4/200 (2%)
Old Algorithm (A) New Algorithm (B)
40/600 (6.6%) 50/800 (6.2%)
0
2
4
6
8
Low-activity High-activity
CTR
28. The Simpson’s paradox
• Is Algorithm A better?
29
Old algorithm (A) New Algorithm
(B)
CTR for Low-
Activity users
10/400 (2.5%) 4/200 (2%)
CTR for High-
Activity users
40/600 (6.6%) 50/800 (6.2%)
Total CTR 50/1000 (5%) 54/1000 (5.4%)
29. Answer (as usual): May be, may be not.
• E.g., Algorithm A could have been shown at different
times than B.
• There could be other hidden causal variations.
30
30. Example: Simpson’s paradox in Reddit
• Average comment length decreases over time.
31
But for each yearly cohort of users, comment length
increases over time.
32. “To find out what happens when you change
something, it is necessary to change it.”
-GEORGE BOX
33. Counterfactuals
To isolate the causal effect, we have to change one and only one
thing (hospital visits), and compare outcomes
+ vs
(what happened)
Reality
(what would have happened)
Counterfactual
34. Counterfactuals
We never get to observe what would have happened if we did
something else, so we have to estimate it
+ vs
(what happened)
Reality
(what would have happened)
Counterfactual
35. Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
Heads Tails
36. Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
Heads Tails
37. Random assignment
We can use randomization to create two groups that differ only in
which treatment they receive, restoring symmetry
+
World 1 World 2
38. Basic identity of causal inference
The observed difference is now the causal effect:
Observed difference = Causal effect – Selection bias
= Causal effect
Selection bias is zero, since there’s no difference, on average,
between those who were hospitalized and those who weren’t
39. Hospital
visit today
Random assignment
Random assignment determines the treatment independent of any
confounds
Health
tomorrowEffect?
Health
today
Coin flip
Double lines mean
“intervention”
40. Experiments: Caveats / limitations
Random assignment is the “gold standard” for causal inference, but
it has some limitations:
◦ Randomization often isn’t feasible and/or ethical
◦ Experiments are costly in terms of time and money
◦ It’s difficult to create convincing parallel worlds
◦ Inevitably people deviate from their random assignments
Anyone can flip a coin, but it’s difficult to create convincing parallel
worlds
41. Two goals for experiments
Internal validity:
Could anything other than the
treatment (i.e. a confound) have
produced this outcome?
Did doctors give the experimental
drug to some especially sick
patients (breaking randomization)
hoping that it would save them?
External validity (Generalization)
Do the results of the experiment
hold in settings we care about?
Would this medication be just as
effective outside of a clinical trial,
when usage is less rigorously
monitored?
42. How we conduct behavioral experiments
Lab Experiment Field Experiment
Better internal validity (correctness):
• Greatest procedural control
• Can carefully curate situations
But less external validity (generalization):
• Artificial context, simple tasks
• Demand effects
• Homogeneous (WEIRD) subject pools
• Time/scale limitations
Better generalization
• Experiment findings apply to at least one
real-world setting
But:
• Less control, more potential confounds
• Demand of experiment conflict with
goals of real organizations
• More effort to conduct and manage
• More room for error
43. Natural experiments
Sometimes we get lucky and nature effectively runs experiments for
us, e.g.:
◦ As-if random: People are randomly exposed to water sources
◦ Instrumental variables: A lottery influences military service
◦ Discontinuities: Star ratings get arbitrarily rounded
◦ Difference in differences: Minimum wage changes in just one state
44. Natural experiments
Sometimes we get lucky and nature effectively runs experiments for
us, e.g.:
◦ As-if random: People are randomly exposed to water sources
◦ Instrumental variables: A lottery influences military service
◦ Discontinuities: Star ratings get arbitrarily rounded
◦ Difference in differences: Minimum wage changes in just one state
Experiments happen all the time, we just have to notice them
45. As-if random
Idea: Nature randomly assigns
conditions
Example: People are randomly
exposed to water sources (Snow,
1854)
http://bit.ly/johnsnowmap
47. Figure 4: Average Revenue around Discontinuous Changes in Rating
Notes: Each restaurant’s log revenue is de-meaned to normalize a restaurant’s average log
revenue to zero. Normalized log revenues are then averaged within bins based on how far the
restaurant’s rating is from a rounding threshold in that quarter. The graph plots average log
revenue as a function of how far the rating is from a rounding threshold. All points with a
positive (negative) distance from a discontinuity are rounded up (down).
Regression
discontinuities
Idea: Things change around an
arbitrarily chosen threshold
Example: Star ratings get
arbitrarily rounded (Luca, 2011)
http://bit.ly/yelpstars
48. Difference in
differences
Idea: Compare differences after a
sudden change with trends in a
control group
Example: Minimum wage changes
in just one state (Card & Krueger,
1994)
http://stats.stackexchange.com/a/125266
53. Natural experiments: Caveats
Natural experiments are great, but:
◦ Good natural experiments are hard to find
◦ They rely on many (untestable) assumptions
◦ The treated population may not be the one of interest
54. Natural experiments: Caveats
Natural experiments are great, but:
◦ Good natural experiments are hard to find
◦ They rely on many (untestable) assumptions
◦ The treated population may not be the one of interest
Sometimes we can use additional data + algorithms to
automatically find natural experiments
55. What can we do without an experiment or
natural experiment?
• Condition on the “right” variables
• How to find the right variables?
• No good answer
• Depends on domain knowledge
• Many different frameworks (e.g. Pearlian graphical models and do-calculus)
57. Behavioral science labs are very limiting
• High degree of procedural control
• Optimized for causal inference
But, many limitations:
• Artificial environment
• Simple tasks, demand effects
• Homogeneous (WEIRD)* subject pools
• Time/scale limitations
• Expensive, difficult to set up
Poor generalization, expensive, slow
ca. 1960s
ca. 2000s
* Western, Educated, Industrialized,
Rich, Democratic [Henrich et al. 2010]
59. Most social science experiments aren’t social
• Vast majority of experiments done
with individuals, focused on
individual behavior: logistically, it’s
just easier
• We actually don’t know the causal
effects of policies in many large-
scale, collective behavior settings!
60. Economic
Inequality
Social mobility: if you work
hard, can you be more
successful than your parents?
• How do social safety nets,
economic redistribution,
etc. affect social mobility
and resulting inequality?
• What policies can we enact
that will improve social
mobility or reduce
inequality?
The American Dream, Quantified at Last. NYTimes, Dec. 8 2016.
61. Systems of Governance
I think we should look to
countries like Denmark, like
Sweden and Norway, and
learn from what they have
accomplished for their
working people
What would happen if we
suddenly replaced our
current political system
with a Scandinavian
“socialist state”?
Would everyone be
happier?
Or would it be culturally
untenable?
62. The black box of
macroeconomic
policy
There are no clear theories
behind whether lowering
interest rates, tightening
interest rates, or quantitative
easing work as intended.
Given a sufficiently large
simulated economy, can we
study these experimentally?
63. So, about them experiments…
• They’re costly to run
• Yet limited in the types of questions they can answer
• Published experimental research is probably wrong
• They’re far from answering many big important questions
How do we fix this?
64. Expanding the experiment design space
Complexity,
Realism
Size, Scale
Duration, Participation
Physical labs • Longer periods of time
• Fewer constraints on location
• More samples of data
• Large-scale social interaction
• Realistic vs. abstract, simple tasks
• More precise instrumentation
A software-based “virtual lab”
with online participants
65. Behavioral experiments + the Internet
The Virtual Lab
Volunteer
web labs
Unwitting
labs in the
field
Control Generalization
67. Q: What happens to people’s behavior when
they are in a social dilemma for a long time?
Also known as: real life.
A: According to lab experiments, we don’t know. It seems like people
start stabbing each other in the back when they play this game
…but the world seems to (mostly) be doing okay.
68. Longitudinal behavior in a social dilemma
Round 1 Round 10
Game
anonymous
partners
Cooperate
Defect
69. Game 3 Game 4 Game 5 Game 6
Random rematching across games
Game 2Game 1
70. Game 3 Game 4 Game 5 Game 6
Random rematching across games
Game 2Game 1
73. Demo time!
You get to play prisoner’s dilemma with each other!
The person with the most money at the end wins.
Navigate your laptop or mobile browser to:
http://skygeirr.andrewmao.net:3000
74. A “supercollider” for social science
Realistic settings for collective
behavior…
with precise instrumentation and
measurement…
studied longitudinally over
periods of time…
with a high degree of
experimental control.
We have seen social data..descriptive..but want to know what policies to activate?
What we’d like to do is to make a copy of the world
Both experiments are pains in the butt to set up
Why experiments, particularly lab experiments, for decades?
From 1960s until now, mostly undergrads at universities (WH behavioral lab)
Lots of control, can construct environments, good internal validity - causal conclusions drawn from an experiment are correct. Causality is what we care about in design.
But curious setting: many limitations, including low generalization, logistical issues
One of the biggest limitations is that social science experiments not social
Let’s talk a little about potential problems
Bernie sanders Denmark
Wealth redistribution
To study the crowd experimentally, we want social science experiments to be social
One way to think about the types of tasks we want to study is along different design axes
Physical labs are clustered around the origin; fall short – simple, abstract tasks; for short times; by undergrads in a small room
Our goal is to expand this using the Internet as a lab, controlled by software, using crowdsourcing
Potential to study many novel social problems, increasing our knowledge about how people interact, with each other and within computational systems
This dichotomy changes when we have the Internet. I like to think of the Internet as basically replacing the binary lab/field distinction with a continuous spectrum.
Some of you, when thinking of virtual lab, might think of “Amazon Mechanical Turk”. But I would put that around here.
On the left we have things like TestMyBrain, an example of a web-based lab using pure volunteers. The amazing thing is that the volunteers have been enough to do 10^5 experiments.
All the way on the right, in the field, we have real systems like FB, Google, and Microsoft, which we know do experiments all the time on their users, although many of those are not scientifically interesting.
Interestingly, in the middle, there are some things that look like the field, but are actually set up much like a lab
This region on the spectrum is what I think of the virtual lab. We give up a bit of control, but get a lot more generalization from getting out of lab limitations. More importantly, we can choose where we design experiments on this spectrum.
What is a social dilemma? Interests of the individual at odds of what’s best for everyone.
Many examples, but one of the fun ones comes from this British TV show called Golden Balls
You’re always no worse off if you steal.
This interaction looks very similar to a game called PD.
Tension between private incentive and public interest underlies many social and economic interactions, and that’s why PD is used to study cooperation.
Make a smoother transition to the next side – maybe overlay payoffs?
That brings us to our experiment. We recruited about a hundred participants from MTurk and paired them to play 10 rounds of prisoner’s dilemma, many times.
I’ll call each set of 10 rounds a game.
Here’s an example of what players might do, where green represents cooperate and red means defect. The second player defected in round 8.
These players know that they have the same partner for 10 rounds, but their identity is anonymous.
Pairs of participants play many of these games simultaneously. After each set of games, we randomly re-match players with different partners and they play again
Black line shows all different partners for a player.
We do this so on for many games in each session, which takes place over a day.
Pairs of participants play many of these games simultaneously. After each set of games, we randomly re-match players with different partners and they play again
identities are still anonymous but players know they have a new partner.
We do this so on for many games in each session, which takes place over a day.
Each day has a total of 20 sets of games. We match players within a rather large pool, so that there are about 50 pairs of players for each set of games.
And finally, we did this every day for 20 weekdays in August.
This slide effectively shows all of the data from our experiment.
Even if you aren‘t interested in PD, this shows it’s possible to study interaction among 100 people over a month.
Instead of just showing you this experiment, we’re going to do something really dangerous…a live demo.
Simple game showing all features of turkserver.
Most social science isn’t social
We often can’t measure the data we’re interested in
We see stuff over time that we can’t in short studies
We want to be able to test theories causally
ScienceAtHome is very promising for this
Physics approach to social science
Mobile devices