SlideShare a Scribd company logo
1 of 48
Download to read offline
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Making Sense of Statistics in HCI
Part 1 – Wild and Wide
concerning randomness and distributions
Alan Dix
http://alandix.com/statistics/
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
unexpected wildness
of random
just how random is the world?
raindrops and horse races
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
a story
In the far off land of Gheisra there lies the plain of Nali. For one hundred miles
in each direction it spreads, featureless and flat, no vegetation, no habitation; except,
at its very centre, a pavement of 25 tiles of stone, each perfectly level with the others
and with the surrounding land.
The origins of this pavement are unknown – whether it was set there by some
ancient race for its own purposes, or whether it was there from the beginning of the
world.
Rain falls but rarely on that barren plain, but when clouds are seen gathering
over the plain of Nali, the monks of Gheisra journey on pilgrimage to this shrine of
the ancients, to watch for the patterns of the raindrops on the tiles. Oftentimes the
rain falls by chance, but sometimes the raindrops form patterns, giving omens of
events afar off.
Some of the patterns recorded by the monks are shown on the following pages.
Which are mere chance and which foretell great omens?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
day 1
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
day 2
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
day 3
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
your choice?
day 1 day 2 day 3
which are by chance ...
and which are unusual?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
why?
……………………………………….
……………………………………….
……………………………………….
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
really random
empty squares & overfull squares
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
random but not uniform
clumped towards the middle
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
too uniform
every square has 5 rain drops
too good to be true!
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
two horse races
toss 20 coins
add the heads to one row
the tails to a second
the winner is the first row to 10
before you start:
what do you think will happen?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
the race
did you get a clear winner?
or was it neck and neck?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
the world is very random
probability head = 0.5
number of heads ≠
number of tosses
2
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
lessons
apparent differences may be chance
real data has some bad values
e.g. Mendel's sweet peas and
electron discovery were too good!
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
quick (and dirty!) tip
for survey or other count data
do square root times two
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Quick (and dirty!) Tip [1]
estimate variation of survey data
(for categories with small response)
• survey 1000 people on favourite colour
• say 10% of sample say it is blue
• this represents 100 people
• variation is roughly 2 x square root of 100
• that is about +/- 20 people, or +/- 2%
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Quick (and dirty!) Tip [2]
For response nearer top end the same, but use
the number who did not answer
• Say 85% of sample of 200 say colour is green
• 15% = 30 people did not say green
• variation is roughly 2 x square root of 30
• that is about +/- 11 people, or +/- 5.5%
• real value anything from 80 to 90 %
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Quick (and dirty!) Tip [3]
If nearer 50:50 (e.g. presidential election!) then
this is a little over estimate, use 1.5 instead
• Say 50% of sample of 2000 say yes to question
• 50% = 1000 people did not say green
• variation is roughly 1.5 x square root of 1000
• that is about +/- 50 people, or +/- 2.5%
N.B exact formula variance ~ n p (1-p)
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
… but really
far more important:
the fairness of the sample
self-selection for surveys
the phrasing of the question
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
bias and
non-independence
is it fair?
is it reliable?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
bias
systematic effects
that skew results one way or other
e.g. choice of LinkedIn vs. Snapchat survey
WEIRD (Western, Educated, Industrialized, Rich, and Democratic)
bias persists no matter your sample size
but can sometimes be corrected
e.g. sample variance … hence the n-1
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
bias vs. variability
high variability with no bias
– measurements equally likely to be more or less
than real value, but may be far away in either
direction
– poor estimate of right thing
low variability with bias
– Measurements consistent with one another, but
systematically one way
– good estimate of wrong thing
increase sample size
or reduce variability
eliminate bias
or model it
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
independence
different kinds of independence:
• measurements (normal kind)
• factor effects
• sample prevalence
non-independence often increases variability
… but may cause bias too
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
independence – measurements
measurements may be related to each other:
order effects
sequential measures on same person (learning, etc.)
‘day effects’
e.g. sunny day improves productivity
experimenter effects
your mood effects participants
all increase variability
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
independence – factor effects
Relationships and correlations between things
you are measuring (or failing to measure)
can confuse causality
e.g. death rates often higher in specialist hospitals
… because more sick people sent there
even reverse effects
… Simpson Paradox
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Simpson’s Paradox
You run a course with some full-time and some
part-time students
You calculate:
– average FT student marks go up each year
– average PT student marks increase each year
University complains your marks are going down
Who is right?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Simpson’s paradox – you both are!
FT students getting
higher marks
number of PT
students increased
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
independence – sample
the way you obtain your sample
internal – subjects related to each other
e.g. snowball sampling
if not external – increases variance
external – subject choice related to topic
e.g. measuring age based on mobile app survey
may create bias
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
crucial question
Is the way I’ve organised my sample
likely to be independent
of the thing I want to measure?
e.g. Fitts’ law with different colour targets.
Are student subjects likely to behave
differently than the general population?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
play!
experiment with
bias and independence
http://www.meandeviation.com/tutorials/stats/morecoin.html
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
nos.of coins
per row
nos.of rows
of coins
press coin
to start
modify independence
+ve more likely to be same
-ve more likely to alternate
set bias
0.5 = fair coin
summary
statistics
area
coins
appear
here
row counts
appear here
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
fair coin (independent tosses)
but individual
rows vary a lot
average
close to
50:50
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
biased coin (independent)
average
lower
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
positive correlation (runs)
average
50:50
but variation
higher
note
long
runs
see them
swop!
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
negative correlation (alternation)
average
still 50:50
but variation
lower
largely
alternate
H/T
very
consistent
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
distributions
normal or not
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
stats 101 – what kind of data
continuous:
e.g. time to complete task (12.73 secs)
discrete ...
arithmetic:
e.g. number of errors (average makes some sense)
ordered/ordinal:
.g. satisfaction rating (?average rating?)
nominal/categorical:
e.g. menu item chosen ( (File+Font)/2 = Flml ?)
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
bounded, tails, …
the data
number of heads in 6 tosses – discrete, finite
number of heads until first tail – discrete, unbounded
wait before next bus – continuous bounded below (zero),
… but not above!
difference between heights – continuous, unbounded
your question
is error rate higher – discrete, one tailed
are completion times different – continuous, two tailed
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
approximations
may approximate
one type of
distribution with
another
esp. using Normal
https://commons.wikimedia.org/wiki/File:Binomial_Distribution.svg
continuous
/ discrete
bounded /
unbounded
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
why is Normal normal?
central limit theorem
if you:
– average lots of things (or near linearly combine)
– around the same size (so none dominates)
– nearly independent
– and have finite variance
then you get Normal distribition
what?
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
non-Normal – what can go wrong?
non-linearity – e.g. thresholds
+ve / -ve feedback
snowflakes and clouds
bi-modal exam marks
unbounded variance
the more you sample the bigger the variation
used to be rare e.g. wage/wealth distributions
… but now Power Law …
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
earthquakes, sand piles,
… and networks
e.g. Facebook connections
power law data is NOT Normal
even when averaged
power law – scale free
https://en.wikipedia.org/wiki/Power_law
long tail
unbounded
variance
small number
of frequent
elements
Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix

More Related Content

More from Alan Dix

Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Alan Dix
 
Beyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesBeyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesAlan Dix
 
Forever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionForever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionAlan Dix
 
Truth in an Age of Information
Truth in an Age of InformationTruth in an Age of Information
Truth in an Age of InformationAlan Dix
 
Rome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIRome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIAlan Dix
 
Tools and technology to support rich community heritage
Tools and technology to support rich community heritageTools and technology to support rich community heritage
Tools and technology to support rich community heritageAlan Dix
 
Maps with Meaning
Maps with MeaningMaps with Meaning
Maps with MeaningAlan Dix
 
Democratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesDemocratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesAlan Dix
 
Follow your nose: history frames the future
Follow your nose: history frames the futureFollow your nose: history frames the future
Follow your nose: history frames the futureAlan Dix
 
What Next for UX Tools: from screens to smells, from sketch to code, supporti...
What Next for UX Tools: from screens to smells, from sketch to code, supporti...What Next for UX Tools: from screens to smells, from sketch to code, supporti...
What Next for UX Tools: from screens to smells, from sketch to code, supporti...Alan Dix
 
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...Alan Dix
 
Alien Presence Detector – Background
Alien Presence Detector – BackgroundAlien Presence Detector – Background
Alien Presence Detector – BackgroundAlan Dix
 
Alien Presence Detector – What is it?
Alien Presence Detector – What is it?Alien Presence Detector – What is it?
Alien Presence Detector – What is it?Alan Dix
 
AI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive ActionAI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive ActionAlan Dix
 
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...Alan Dix
 
Regret from cognition to code
Regret from cognition to codeRegret from cognition to code
Regret from cognition to codeAlan Dix
 
Designing User Interactions with AI: Servant, Master or Symbiosis.
Designing User Interactions with AI: Servant, Master or Symbiosis. Designing User Interactions with AI: Servant, Master or Symbiosis.
Designing User Interactions with AI: Servant, Master or Symbiosis. Alan Dix
 
Acting out of the Box
Acting out of the BoxActing out of the Box
Acting out of the BoxAlan Dix
 
Modelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourModelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourAlan Dix
 
Modelling interactions: digital and physical – Part 2 – getting physical
Modelling interactions: digital and physical – Part 2 – getting physicalModelling interactions: digital and physical – Part 2 – getting physical
Modelling interactions: digital and physical – Part 2 – getting physicalAlan Dix
 

More from Alan Dix (20)

Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...
 
Beyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesBeyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devices
 
Forever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionForever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interaction
 
Truth in an Age of Information
Truth in an Age of InformationTruth in an Age of Information
Truth in an Age of Information
 
Rome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIRome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AI
 
Tools and technology to support rich community heritage
Tools and technology to support rich community heritageTools and technology to support rich community heritage
Tools and technology to support rich community heritage
 
Maps with Meaning
Maps with MeaningMaps with Meaning
Maps with Meaning
 
Democratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesDemocratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community Archives
 
Follow your nose: history frames the future
Follow your nose: history frames the futureFollow your nose: history frames the future
Follow your nose: history frames the future
 
What Next for UX Tools: from screens to smells, from sketch to code, supporti...
What Next for UX Tools: from screens to smells, from sketch to code, supporti...What Next for UX Tools: from screens to smells, from sketch to code, supporti...
What Next for UX Tools: from screens to smells, from sketch to code, supporti...
 
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...
Apply for 2022 Cohort – Centre for Doctoral Training in Enhancing Human Inter...
 
Alien Presence Detector – Background
Alien Presence Detector – BackgroundAlien Presence Detector – Background
Alien Presence Detector – Background
 
Alien Presence Detector – What is it?
Alien Presence Detector – What is it?Alien Presence Detector – What is it?
Alien Presence Detector – What is it?
 
AI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive ActionAI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive Action
 
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...
Qualitative–Quantitative Reasoning: understanding and managing the behaviour ...
 
Regret from cognition to code
Regret from cognition to codeRegret from cognition to code
Regret from cognition to code
 
Designing User Interactions with AI: Servant, Master or Symbiosis.
Designing User Interactions with AI: Servant, Master or Symbiosis. Designing User Interactions with AI: Servant, Master or Symbiosis.
Designing User Interactions with AI: Servant, Master or Symbiosis.
 
Acting out of the Box
Acting out of the BoxActing out of the Box
Acting out of the Box
 
Modelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourModelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tour
 
Modelling interactions: digital and physical – Part 2 – getting physical
Modelling interactions: digital and physical – Part 2 – getting physicalModelling interactions: digital and physical – Part 2 – getting physical
Modelling interactions: digital and physical – Part 2 – getting physical
 

Recently uploaded

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk494f574xmv
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 

Recently uploaded (19)

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Film cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrkFilm cover research.pptx for media courseowrk
Film cover research.pptx for media courseowrk
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 

Making Sense of Statistics in HCI: part 1 - wild and wide

  • 1. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Making Sense of Statistics in HCI Part 1 – Wild and Wide concerning randomness and distributions Alan Dix http://alandix.com/statistics/
  • 2. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix unexpected wildness of random just how random is the world? raindrops and horse races
  • 3. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix a story In the far off land of Gheisra there lies the plain of Nali. For one hundred miles in each direction it spreads, featureless and flat, no vegetation, no habitation; except, at its very centre, a pavement of 25 tiles of stone, each perfectly level with the others and with the surrounding land. The origins of this pavement are unknown – whether it was set there by some ancient race for its own purposes, or whether it was there from the beginning of the world. Rain falls but rarely on that barren plain, but when clouds are seen gathering over the plain of Nali, the monks of Gheisra journey on pilgrimage to this shrine of the ancients, to watch for the patterns of the raindrops on the tiles. Oftentimes the rain falls by chance, but sometimes the raindrops form patterns, giving omens of events afar off. Some of the patterns recorded by the monks are shown on the following pages. Which are mere chance and which foretell great omens?
  • 4. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix day 1
  • 5. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix day 2
  • 6. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix day 3
  • 7. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix your choice? day 1 day 2 day 3 which are by chance ... and which are unusual?
  • 8. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix why? ………………………………………. ………………………………………. ……………………………………….
  • 9. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix really random empty squares & overfull squares
  • 10. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix random but not uniform clumped towards the middle
  • 11. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix too uniform every square has 5 rain drops too good to be true!
  • 12. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix two horse races toss 20 coins add the heads to one row the tails to a second the winner is the first row to 10 before you start: what do you think will happen?
  • 13. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix the race did you get a clear winner? or was it neck and neck?
  • 14. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix the world is very random probability head = 0.5 number of heads ≠ number of tosses 2
  • 15. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix lessons apparent differences may be chance real data has some bad values e.g. Mendel's sweet peas and electron discovery were too good!
  • 16. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  • 17. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix quick (and dirty!) tip for survey or other count data do square root times two
  • 18. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Quick (and dirty!) Tip [1] estimate variation of survey data (for categories with small response) • survey 1000 people on favourite colour • say 10% of sample say it is blue • this represents 100 people • variation is roughly 2 x square root of 100 • that is about +/- 20 people, or +/- 2%
  • 19. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Quick (and dirty!) Tip [2] For response nearer top end the same, but use the number who did not answer • Say 85% of sample of 200 say colour is green • 15% = 30 people did not say green • variation is roughly 2 x square root of 30 • that is about +/- 11 people, or +/- 5.5% • real value anything from 80 to 90 %
  • 20. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Quick (and dirty!) Tip [3] If nearer 50:50 (e.g. presidential election!) then this is a little over estimate, use 1.5 instead • Say 50% of sample of 2000 say yes to question • 50% = 1000 people did not say green • variation is roughly 1.5 x square root of 1000 • that is about +/- 50 people, or +/- 2.5% N.B exact formula variance ~ n p (1-p)
  • 21. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix … but really far more important: the fairness of the sample self-selection for surveys the phrasing of the question
  • 22. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  • 23. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix bias and non-independence is it fair? is it reliable?
  • 24. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix bias systematic effects that skew results one way or other e.g. choice of LinkedIn vs. Snapchat survey WEIRD (Western, Educated, Industrialized, Rich, and Democratic) bias persists no matter your sample size but can sometimes be corrected e.g. sample variance … hence the n-1
  • 25. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix bias vs. variability high variability with no bias – measurements equally likely to be more or less than real value, but may be far away in either direction – poor estimate of right thing low variability with bias – Measurements consistent with one another, but systematically one way – good estimate of wrong thing increase sample size or reduce variability eliminate bias or model it
  • 26. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix independence different kinds of independence: • measurements (normal kind) • factor effects • sample prevalence non-independence often increases variability … but may cause bias too
  • 27. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix independence – measurements measurements may be related to each other: order effects sequential measures on same person (learning, etc.) ‘day effects’ e.g. sunny day improves productivity experimenter effects your mood effects participants all increase variability
  • 28. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix independence – factor effects Relationships and correlations between things you are measuring (or failing to measure) can confuse causality e.g. death rates often higher in specialist hospitals … because more sick people sent there even reverse effects … Simpson Paradox
  • 29. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Simpson’s Paradox You run a course with some full-time and some part-time students You calculate: – average FT student marks go up each year – average PT student marks increase each year University complains your marks are going down Who is right?
  • 30. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Simpson’s paradox – you both are! FT students getting higher marks number of PT students increased
  • 31. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix independence – sample the way you obtain your sample internal – subjects related to each other e.g. snowball sampling if not external – increases variance external – subject choice related to topic e.g. measuring age based on mobile app survey may create bias
  • 32. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix crucial question Is the way I’ve organised my sample likely to be independent of the thing I want to measure? e.g. Fitts’ law with different colour targets. Are student subjects likely to behave differently than the general population?
  • 33. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  • 34. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix play! experiment with bias and independence http://www.meandeviation.com/tutorials/stats/morecoin.html
  • 35. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix nos.of coins per row nos.of rows of coins press coin to start modify independence +ve more likely to be same -ve more likely to alternate set bias 0.5 = fair coin summary statistics area coins appear here row counts appear here
  • 36. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix fair coin (independent tosses) but individual rows vary a lot average close to 50:50
  • 37. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix biased coin (independent) average lower
  • 38. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix positive correlation (runs) average 50:50 but variation higher note long runs see them swop!
  • 39. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix negative correlation (alternation) average still 50:50 but variation lower largely alternate H/T very consistent
  • 40. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  • 41. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix distributions normal or not
  • 42. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix stats 101 – what kind of data continuous: e.g. time to complete task (12.73 secs) discrete ... arithmetic: e.g. number of errors (average makes some sense) ordered/ordinal: .g. satisfaction rating (?average rating?) nominal/categorical: e.g. menu item chosen ( (File+Font)/2 = Flml ?)
  • 43. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix bounded, tails, … the data number of heads in 6 tosses – discrete, finite number of heads until first tail – discrete, unbounded wait before next bus – continuous bounded below (zero), … but not above! difference between heights – continuous, unbounded your question is error rate higher – discrete, one tailed are completion times different – continuous, two tailed
  • 44. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix approximations may approximate one type of distribution with another esp. using Normal https://commons.wikimedia.org/wiki/File:Binomial_Distribution.svg continuous / discrete bounded / unbounded
  • 45. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix why is Normal normal? central limit theorem if you: – average lots of things (or near linearly combine) – around the same size (so none dominates) – nearly independent – and have finite variance then you get Normal distribition what?
  • 46. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix non-Normal – what can go wrong? non-linearity – e.g. thresholds +ve / -ve feedback snowflakes and clouds bi-modal exam marks unbounded variance the more you sample the bigger the variation used to be rare e.g. wage/wealth distributions … but now Power Law …
  • 47. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix earthquakes, sand piles, … and networks e.g. Facebook connections power law data is NOT Normal even when averaged power law – scale free https://en.wikipedia.org/wiki/Power_law long tail unbounded variance small number of frequent elements
  • 48. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix