2. Conor vs Khabib
This fight is taking place on 6th October.
Who should you bet on to win? What do the experts think?
3. Conor vs Khabib
• Let’s imagine that there are 550 experts who offer a prediction
publicly
• Let’s say that 286 of the experts think Connor will win
• Seems like a majority opinion, right? Connor is a clear favourite
• Well…
4. Conor vs Khabib
• When an expert is asked they will have to say Connor will win OR Khabib
will win
(draws are technically possible in MMA, but very rarely happen).
• There are essentially 2 options, and nobody knows which is right.
• Given 2 options about anything, we might expect that 50% of people
choose option a, and 50% choose option b.
• Therefore we would expect 275 votes for Connor if people decided by
tossing a coin!
5. Conor vs Khabib
• The question is whether significantly more experts think Connor will
win.
• To decide, we have to compare the number of people voting Connor
to the number we would expect by chance alone.
• This is what chi square does.
6. Chi square
• Test of nominal data
• Count people who fall into different categories
• People can only fit into one category
Connor supporters Khabib supporters
286 264
7. Chi square
• WARNING – there’s a formula on the next page. Don’t run away, I’ll
explain everything and you’ll find it’s not that scary
9. Chi square
E
EO
2
2
This is the way to write “chi square.” It is the Greek lowercase letter chi (not an X) and a superscript 2.
Looks fancy, but there’s nothing scary about that.
10. Chi square
E
EO
2
2
This is the Greek uppercase letter sigma. It means “the sum of” –
basically add up all the numbers you get from doing the next bit to the right
13. Chi square
• In real words chi square tells you to
1. Take a number from your data collection (e.g. 286 Connor supporters)
2. Subtract the number you’d expect to pick that option by chance (in this case half of
550 = 275, so 286 – 275 = 11)
3. Square the result of step 2 (11 x 11 = 121)
4. Divide the result of 3 by the number expected by chance (121/275 = 0.44)
5. Repeat steps 1-4 for another number from your data collection until you run out of
numbers. (264 Khabib supporters – 275 = -11 then -11 x -11 = 121 then 121/275 =
0.44)
6. Add up the results of all the step 4s you had to do. That’s your chi square value. (0.44
+ 0.44 = 0.88)
14. Chi square
• In the old days, you’d then look up the critical value of chi square for your
degrees of freedom
• For a single row chi-square like ours, the degrees of freedom are k – 1, where k is the number of
available categories.
• see Glossary section on Blackboard for an explanation of what degrees of freedom are if you can’t
remember
• If the value you calculated (0.88 in our case) was bigger than the critical
value (in this case 3.84) then you have observed a number that is
significantly different from what would be expected by chance
• We haven’t. Don’t use the experts to influence your bet. Plus I made it up
anyway
15. RxC Chi square
• Most of the time our design is more complicated than the Connor vs
Khabib example.
• By the way, that version is known as the “goodness of fit chi square” because we are
seeing how well an observed pattern fits the expected distribution.
• It is more likely that you have 2 variables changing at once.
• As long as your data follow the rules, you can still use chi square
• The rules are the same as before – see slide 6
16. RxC Chi square
• Say someone wanted to determine whether fans of different music
were more or less likely to get hurt at a show
• Count the number of fans leaving the venue bloodied
• Three different acts
• Concerts lasted the same amount of time
Welsh National Opera Slayer Ed Sheeran Total
Ended up bleeding
Did not bleed at all
Total
17. RxC Chi square
• Say someone wanted to determine whether fans of different music
were more or less likely to get hurt at a show
• Count the number of fans leaving the venue bloodied
• Three different acts
• Concerts lasted the same amount of time
Welsh National Opera Slayer Ed Sheeran Total
Ended up bleeding
Did not bleed at all
Total
Act seen
18. RxC Chi square
• Say someone wanted to determine whether fans of different music
were more or less likely to get hurt at a show
• Count the number of fans leaving the venue bloodied
• Three different acts
• Concerts lasted the same amount of time
Welsh National Opera Slayer Ed Sheeran Total
Ended up bleeding
Did not bleed at all
Total
Injury?
19. RxC Chi square
• Say someone wanted to determine whether fans of different music
were more or less likely to get hurt at a show
• Count the number of fans leaving the venue bloodied
• Three different acts
• Concerts lasted the same amount of time
Welsh National Opera Slayer Ed Sheeran Total
Ended up bleeding 5 18 12
Did not bleed at all 95 82 88
Total
People
observed for
each category
20. RxC Chi square
• Say someone wanted to determine whether fans of different music
were more or less likely to get hurt at a show
• Count the number of fans leaving the venue bloodied
• Three different acts
• Concerts lasted the same amount of time
Welsh National Opera Slayer Ed Sheeran Total
Ended up bleeding 5 18 12 35
Did not bleed at all 95 82 88 265
Total 100 100 100
21. RxC Chi square
• This is an example of a multivariate or row (R) x column (C) chi square
• The principle is the same, and so is the maths
• The only difference is in the calculation of the expected frequency for each cell.
• There are just more numbers
• If there is no relationship between the act onstage and the number of
people injured then we should observe the same proportion of
bleeders in each of our columns – chi square tells you whether this is
what happened.
22. THE LARGER YOUR χ2 VALUE, THE BIGGER THE
DIFFERENCE BETWEEN WHAT WAS OBSERVED AND
WHAT WAS EXPECTED
Remember