Say:
Greetings friends. Happy to have you with us.
We will circle back to the warm up throughout the next 90 minutes, as we work tirelessly toward being able to answer those three questions.
Give:
G/S’s 30 seconds to read today’s objectives, also on your interactive handout, pg. 1
Say:
Here’s our agenda for the day, also in your interactive handout pg. 1. A couple thoughts on our pacing for the day…
Aggregated data is not raw data! This is when we start grouping results together by some shared characteristic or quality.
Ask:
Entertainers vs. Athletes – is there a difference?
Give grad students 2 minutes to answer the question.
ASR: No, the only difference is Sheen vs. Serena. Otherwise everybody is the same.
Not every finding is an interesting finding. (Good rules of thumb for everybody): Folks in here with very small classrooms should be mindful of this scenario – there is a difference between a statistical finding and an interesting finding. Make sure to interrogate your data and think carefully!
Ask:
Entertainers vs. Athletes – is there a difference?
Give grad students 2 minutes to answer the question.
ASR: No, the only difference is Sheen vs. Serena. Otherwise everybody is the same.
Not every finding is an interesting finding. (Good rules of thumb for everybody): Folks in here with very small classrooms should be mindful of this scenario – there is a difference between a statistical finding and an interesting finding. Make sure to interrogate your data and think carefully!
Another way to say 'statistical finding' vs. 'interesting finding'
Another way to say 'statistical finding' vs. 'interesting finding'
We also say this as we talk about 'statistically significant' vs. practically significant.
Speaking of it…What is statistical significance?
Say:
The phrase “statistically significant” does not belong on your Data Narrative! Unless you are using a statistical software package and you are basing your analyses on a series of statistical hypothesis tests that are appropriately suited for the set of data you have collected.
When do we talk about statistical significance?
http://xkcd.com/539/
Speaking of it…What is statistical significance?
Randomized trial measuring cancer rates in postmenopausal women who received hormone therapy vs placebo (sugar pills) – was the increased incidence of cancer “statistically significant”?
There’s a funny story where the world of statistics collided with the legal world. Everybody knows cigarettes cause cancer. Oh wait, did you say “cause” cancer? As in, cigarettes themselves ‘cause’ people to get cancer? Is that a semantic argument like “guns don’t kill people, people kill people”? The problem is that you can only prove that something causes something if you follow the principles of the scientific method and create a control group and an experimental group, apply the treatment (cigarette smoking, in this case), and see what happens. Maybe people who smoke are already reckless and their smoking is a manifestation of their disregard for their personal health. How can you say it causes cancer?
Important Point:
Unless you're using randomized trials and experimentation, in the statistical world, you cannot say that something caused something else. You can say that two things are 'related', or may be 'contributing factors', but not they caused each other.
Say
Next we’re going to discuss a phenomenon called Simpson's Paradox. This will help up understand why goals of 1.5 years of growth isn’t enough.
You'll notice some data here for SAT scores from 1981 to 2002. But the data appears contradictory.
Turn and Talk:
Using everyday language, clearly describe the paradoxical nature of the data in the graphs/table.
ASR: The overall average doesn’t change at all, but when you look at scores by subgroup every single group increased.
Say
What was going on here? There was a difference in the composition of who was taking the test.
Say
What was going on here? There was a difference in the composition of who was taking the test.
Say
What was going on here? There was a difference in the composition of who was taking the test.
Meaning, in 1981, the test-takers were comprised of almost exclusively one racial subgroup, but that had changed in 2002. So while every group's scores went up, the percentage of test takers in each group changed, and this reweighted the average.
Let’s further examine what ‘on average’ can actually hide.
Let’s further examine what ‘on average’ can actually hide.
Let’s further examine what ‘on average’ can actually hide.
Let’s further examine what ‘on average’ can actually hide.
UC Berkeley: In 1973, a lawsuit was filed against UC Berkeley for discrimination – overall, 44% of men were admitted and only 35% of women were admitted to graduate programs. Why? A greater percentage of women were applying to the programs that had a lower rate of acceptance.
http://hoopedia.nba.com/index.php?title=Oski_the_Bear_California
UC Berkeley: In 1973, a lawsuit was filed against UC Berkeley for discrimination – overall, 44% of men were admitted and only 35% of women were admitted to graduate programs. Why? A greater percentage of women were applying to the programs that had a lower rate of acceptance.
http://hoopedia.nba.com/index.php?title=Oski_the_Bear_California
Justice vs. Jeter: You don’t have to understand the mechanism to this paradox, you just have to know that averages alone can provide deceiving information. Batting averages differed, on average, because the number of at-bats differed.
http://blog.naver.com/PostView.nhn?blogId=bottle1k&logNo=140099282872&redirect=Dlog&widgetTypeCall=true
http://www.askmehelpdesk.com/trading-cards/nolan-ryan-david-justice-rookie-cards-192484.html
Justice vs. Jeter: You don’t have to understand the mechanism to this paradox, you just have to know that averages alone can provide deceiving information. Batting averages differed, on average, because the number of at-bats differed.
http://blog.naver.com/PostView.nhn?blogId=bottle1k&logNo=140099282872&redirect=Dlog&widgetTypeCall=true
http://www.askmehelpdesk.com/trading-cards/nolan-ryan-david-justice-rookie-cards-192484.html
So how does this apply to the Data Narrative? Well you need to disaggregate your data! For the Data Narrative, we want you to interrogate your data. Rough it up a little. Shake out the real story. Don’t just look at the averages Then tell that story in your Data Narrative.