View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
1.
Intro to Research in InformationStudiesInferential StatisticsStandard Error of the MeanSignificanceInferential tests you can use 1
2.
Do you speak the language? — — XA -X Bt= Σ XA - Σ X )2 ΣX )2 1 1 [( )+(ΣX )] x ( n 2 ( 2 ( n2 ) + A B - B n1 n2 1 (n1-1) + (n2-1) 2
3.
Difference between Don’t Panic !! Don’t Panic means — — XA -X Bt= Σ XA - ΣX )2 ΣX )2 1 1 [( )+(ΣX )] x ( n 2 ( 2 ( n2 ) + A B - B n1 n2 1 (n1-1) + (n2-1) Compare with SD formula 3
4.
Basic types of statistical treatment o Descriptive statistics which summarize the characteristics of a sample of data o Inferential statistics which attempt to say something about a population on the basis of a sample of data - infer to all on the basis of some Statistical tests are inferential Statistical tests are inferential 4
5.
Two kinds of descriptive statistic: o Measures of central tendency Or where about on the Or where about on the – mean measurement scale measurement scale most of the data fall most of the data fall – median – mode Or how spread out they Or how spread out they are are o Measures of dispersion (variation) – range – inter-quartile range – variance/standard deviation The different measures have different sensitivity and The different measures have different sensitivity and should be used at the appropriate times… should be used at the appropriate times… 5
6.
Symbol check ∑ o Sigma: Means the ‘sum of’ n o Sigma (1 to n) x of i: ∑ xi means add all values of i from 1 to n in a data set i =1 o Xi = the i th data point 6
7.
MeanSum of all observations divided by the number of observations Σx n iIn notation: Refer to handout on notation i=1 Refer to handout on notation n See example on next slide See example on next slideMean uses every item of data but is sensitive to extreme ‘outliers’ 7
8.
To overcome problems with range etc. To overcome problems with range etc.we need a better measure of spread we need a better measure of spread Variance and standard deviation o A deviation is a measure of how far from the mean is a score in our data o Sample: 6,4,7,5 mean =5.5 o Each score can be expressed in terms of distance from 5.5 o 6,4,7,5, => 0.5, -1.5, 1.5, -0.5 (these are distances from mean) o Since these are measures of distance, some are positive (greater than mean) and some are negative (less than the mean) 8 o TIP: Sum of these distances ALWAYS = 0
9.
Symbol check − o Called ‘x bar’; refers to x the ‘mean’ − Called ‘x minus x-bar’; (x − x) o implies subtracting the mean from a data point x. also known as a deviation from the mean 9
10.
Two ways to get SD sd = ∑ (x − x)2 •Sum the sq. deviations from the mean •Divide by No. of observations n •Take the square root of the result sd = ∑x 2 −x 2• Sum the squared raw scores •Divide by N •Subtract the squared mean n •Take the square root of the result 10
11.
x x Σ - x2 2 x2 2 4 s= n 2 4 2 4 2 4 2 4 95 2 3 9 = - 2.9 3 9 10 4 16 4 16 5 25 = 9.5 - 8.41Σ x = 29 Σ x 2 = 95 = 1.09 = 1.044 IfIf we recalculate the we recalculate the variance with the 60 variance with the 60 instead of the 55 in the instead of the in the data… data…
12.
If we include a large outlier: x 2 x s= Σ - x2 x2 2 2 4 4 n 2 4 2 4 3760 2 2 3 4 9 = - 8.4 10 3 9 4 16 4 60 16 3600 = 367 - 70.56 Σ x = 84 Σ x 2 = 3670 = 296.44 Like the mean, the Like the mean, the standard deviation uses standard deviation uses every piece of data and = every piece of data and is therefore sensitive to is therefore sensitive to extreme values 17.22 extreme values Note increase in SD
13.
MeanTwo sets of data can have the same mean but different standard deviations.The bigger the SD, the more s-p-r-e-a-d out are the data.
14.
On the use of N or N-1 When your ∑ (x − x) 2 o observations are thesd = complete set of people n that could be measured (parameter)sd = ∑ (x − x) 2 o When you are observing only a n−1 sample of potential users (statistic), the use of N-1 increases size 14 of sd slightly
15.
Summary Measures of Central Tendency Most frequent observation. Mode • Use with nominal data ‘Middle’ of data. Use with ordinal Median • data or when data contain outliers Mean • ‘Average’. Use with interval and ratio data if no outliers Measures of Dispersion Range • Dependent on two extreme values More useful than range. Interquartile Range • Often used with median Same conditions as mean. WithVariance / Standard Deviation • mean, provides excellent summary of data
16.
Andrew Dillon: Andrew Dillon:Move this to later in the course, after Move this to later in the course, after Deviation units: Z scoresdistributions? distributions? Any data point can be expressed in terms of its Distance from the mean in SD units: x−x z= sd A positive z score implies a value above the mean A negative z score implies a value below the mean 16
17.
Interpreting Z scoreso Mean = 70,SD = 6 o By using Z scores, we can standardize a set ofo Then a score of 82 is scores to a scale that is 2 sd [ (82-70)/6] more intuitive above the mean, or 82 o Many IQ tests and = Z score of 2 aptitude tests do this,o Similarly, a score of setting a mean of 100 and 64 = a Z score of -1 an SD of 10 etc. 17
18.
Comparing data with Z scoresYou score 49 in class A but 58 in class BHow can you compare your performance in both?Class A: Class B:Mean =45 Mean =55SD=4 SD = 649 is a Z=1.0 58 is a Z=0.5 18
19.
With normal distributionsMean,SD andZ tablesIn combination provide powerful means of estimating what your data indicates 19
20.
Graphing data - the histogram The frequency of 100 occurrence for 90 measure of 80 Number interest, 70 Of errors e.g., errors, time, 60 scores on a test 50 etc. 40 30 20 10 0Graph gives instant Graph gives instantsummary of data - - summary of data 1 2 3 4 5 6 7 8 9 10check spread, check spread,similarity, outliers, etc. similarity, outliers, etc. The categories of data we are studying, e.g., task or interface, or user group etc. 20
21.
Very large data sets tend to havedistinct shape: 80 70 60 50 40 30 20 10 0 21
22.
Normal distributiono Bell shaped, symmetrical, measures of central tendency converge o mean, median, mode are equal in normal distribution o Mean lies at the peak of the curveo Many events in nature follow this curve o IQ test scores, height, tosses of a fair coin, user performance in tests, 22
23.
The Normal Curve NB: position of NB: position of measures of measures of central central tendency 50% of scores tendency f fall below mean Mean Median Mode 23
24.
Positively skewed distribution Note how the various measures of Note how the various measures of central tendency separate now - - central tendency separate now note the direction of the change… note the direction of the change… mode moves left of other two, mode moves left of other two, mean stays highest, indicating mean stays highest, indicating frequency of scores less than the frequency of scores less than the mean mean f Mode Median Mean 24
25.
Negatively skewed distribution Here the tendency Here the tendency to have higher to have higher values more values more common serves to common serves to increase the value increase the value of the mode of the modef Mean Median Mode 25
26.
Other distributionso Bimodal o Data shows 2 peaks separated by trougho Multimodal o More than 2 peakso The shape of the underlying distribution determines your choice of inferential test 26
27.
Bimodal Will occur in situations where Will occur in situations where there might be distinct groups there might be distinct groups being tested e.g., novices and being tested e.g., novices and experts experts Note how each mode is itself part Note how each mode is itself part of aanormal distribution (more of normal distribution (more later) later)f Mode Mean Mode Median 27
28.
Standard deviations and the normal curve 68% of observations fall within ± 1 s.d.f 95% of observations fall within ± 2 s.d. (approx) 1 sd 1 sd 1 sd 1 sd Mean 28
29.
Z scores and tables Knowing a Z score allows you to determine where under the normal distribution it occurs Z score between: 0 and 1 = 34% of observations 1 and -1 = 68% of observations etc. Or 16% of scores are >1 Z score above mean Check out Z tables in any basic stats book 29
30.
Remember:o A Z score reflects position in a normal distributiono The Normal Distribution has been plotted out such that we know what proportion of the distribution occurs above or below any point 30
31.
Importance of distributiono Given the mean, the standard deviation, and some reasonable expectation of normal distribution, we can establish the confidence level of our findingso With a distribution, we can go beyond descriptive statistics to inferential statistics (tests of significance) 31
32.
So - for your research:o Always summarize the data by graphing it - look for general pattern of distributiono Then, determine the mean, median, mode and standard deviationo From these we know a LOT about what we have observed 32
33.
Inference is built on Probabilityo Inferential statistics rely on the laws of probability to determine the ‘significance’ of the data we observe.o Statistical significance is NOT the same as practical significanceo In statistics, we generally consider ‘significant’ those differences that occur less than 1:20 by chance alone 33
34.
At this point I Iask people to take out aa At this point ask people to take out coin and toss it 10 times, noting the exact coin and toss it 10 times, noting the exactCalculating probability sequence of outcomes e.g., sequence of outcomes e.g., h,h,t,h,t,t,h,t,t,h. h,h,t,h,t,t,h,t,t,h. Then I Ihave people compare outcomes…. Then have people compare outcomes….o Probability refers to the likelihood of any given event occurring out of all possible events e.g.: o Tossing a coin - outcome is either head or tail o Therefore probability of head is 1/2 o Probability of two heads on two tosses is 1/4 since the other possible outcomes are two tails, and two possible sequences of head and tail.o The probability of any event is expressed as a value between 0 (no chance) and 1 (certain) 34
35.
Sampling distribution for 3 cointosses 3.5 3 2.5 2 1.5 1 0.5 0 0 heads 1 1 head 3 2 heads 3 3 heads 1 35
36.
Probability and normal curveso Q? When is the probability of getting 10 heads in 10 coin tosses the same as getting 6 heads and 4 tails? o HHHHHHHHHH o HHTHTHHTHTo Answer: when you specify the precise order of the 6 H/4T sequence: o (1/2)10 =1/1024 (specific order) o But to get 6 heads, in any order it is: 210/1024 (or about 1:5) 36
37.
What use is probability to us?o It tells us how likely is any event to occur by chanceo This enables us to determine if the behavior of our users in a test is just chance or is being affected by our interfaces 37
38.
Determining probabilityo Your statistical test result is plotted against the distribution of all scores on such a test.o It can be looked up in stats tables or is calculated for you in EXCEL or SPSS etco This tells you its probability of occurrenceo The distributions have been determined by Introduce Introduce statisticians. simple stats simple stats tables here :: tables here 38
39.
What is a significance level?o In research, we estimate the probability level of finding what we found by chance alone.o Convention dictates that this level is 1:20 or a probability of .05, usually expressed as : p<.05.o However, this level is negotiable o But the higher it is (e.g., p<.30 etc) the more likely you are to claim a difference that is really just 39 occurring by chance (known as a Type 1 error)
40.
What levels might we chose?o In research there are two types of errors we can make when considering probability: o Claiming a significant difference when there is none (type 1 error) o Failing to claim a difference where there is one (type 2 error)o The p<.05 convention is the ‘balanced’ case but tends to minimize type 1 errors 40
41.
Using other levelso Type 1 and 2 errors are interwoven, if we lessen the probability of one occurring, we increase the chance of the other.o If we think that we really want to find any differences that exist, we might accept a probability level of .10 or higher 41
42.
Thinking about p levelso The p<.x level means we believe our results could occur by chance alone (not because of our manipulation) at least x/100 times o P<.10 => our results should occur by chance 1 in 10 times o P<.20=> our results should occur by chance 2 in 10 times o Depending on your context, you can take your chances :) o In research, the consensus is 1:20 is high enough….. 42
43.
Putting probability to worko Understanding the probability of gaining the data you have can guide your decisionso Determine how precise you need to be IN ADVANCE, not after you see the resulto It is like making a bet….you cannot play the odds after the event! 43
44.
I Ifind that this is the hardest part of stats for find that this is the hardest part of stats fornovices to grasp, since ititis the bridge novices to grasp, since is the bridgebetween descriptive and inferential between descriptive and inferential Sampling error and the meanstats…..needs to be explained slowly!! stats…..needs to be explained slowly!! o Usually, our data forms only a small part of all the possible data we could collect o All possible users do not participate in a usability test o Every possible respondent did not answer our questions o The mean we observe therefore is unlikely to be the exact mean for the whole population o The scores of our users in a test are not going to be an exact index of how all users would perform 44
45.
How can we relate our sample toeveryone else?o Central limit theorem o If we repeatedly sample and calculate means from a population, our list of means will itself be normally distributed o Holds true even for samples taken from a skewed population distributiono This implies that our observed mean follows the same rules as all data under the normal curve 45
46.
The distribution of the means forms a smaller normaldistribution about the true mean: 2 4 6 8 10 12 14 16 18 46
47.
True for skewed distributions too Here the tendency Here the tendency to have higher to have higher values more values more common serves to common serves to increase the value increase the value of the mode Plot of means from of the mode samplesf Mean 47
48.
How means behave..o A mean of any sample belongs to a normal distribution of possible means of sampleso Any normal distribution behaves lawfullyo If we calculate the SD of all these means, we can determine what proportion (%) of means fall within specific distances of the ‘true’ or population mean 48
49.
But...o We only have a sample, not the population…o We use an estimate of this SD of means known as the Standard Error of the Mean SD SE = N 49
50.
Implicationso Given a sample of data, we can estimate how confident we are in it being a true reflection of the ‘world’ or…o If we test 10 users on an interface or service, we can estimate how much variability about our mean score we will find within the intended full population of users 50
51.
Exampleo We test 20 users on a new interface: o Mean error score: 10, sd: 4 o What can we infer about the broader user population?o According to the central limit theorem, our observed mean (10 errors) is itself 95% likely to be within 2 s.d. of the ‘true’ (but unknown to us) mean of the population 51
52.
The Standard Error of the Means s.d.(sample) SE = N 4 4 = = = 0.89 20 4.47 52
53.
If standard error of mean = 0.89 o Then observed (sample) mean is within a normal distribution about the ‘true’ or population mean: o So we can be o68% confident that the true mean=10 ± 0.89 o 95% confident our population mean = 10 ± 1.78 o 99% confident it is within 10 ±2.67 o This offers a strong method of interpreting of our data 53
54.
Issues to noteo If s.d. is large and/or sample size is small, the estimated deviation of the population means will appear large. o e.g., in last example, if n=9, SE mean=1.33 o So confidence interval becomes 10 ± 2.66 (i.e., we are now 95% confident that the true mean is somewhere between 7.34 and 12.66. o Hence confidence improves as sample increases and variability lessens o Or in other words: the more users you study, the more sure you can be….! 54
55.
Exercise:o If the mean = 10 and the s.d.=4, what is the 68% confidence interval when weAnswers: have: Answers: o 16 users? 9-11 9-11 8.66-11.33 o 9 users? 8.66-11.33 4-16 4-16o If the s.d. = 12, and mean is still 10, what 2-18 2-18 is the 95% confidence interval for those N? 55
56.
Exercise answers:o If the mean = 10 and the s.d.=4, what is the 68% confidence interval when we have: 16 users?= 9-11 (hint: sd/√n = 4/4=1) 9 users? = 8.66-11.33o If the s.d. = 12, and mean is still 10, what is the 95% confidence interval for those N? 16 users: 4-16 (hint: 95% CI implies 2 SE either side of mean) 9 users: 2-18 56
57.
Recapo Summarizing data effectively informs us of central tendencieso We can estimate how our data deviates from the population we are trying to estimateo We can establish confidence intervals to enable us to make reliable ‘bets’ on the effects of our designs on users 57
58.
This is the This is thebeginning of beginning ofsignificance Comparing 2 means significancetesting testing o The differences between means of samples drawn from the same population are also normally distributed o Thus, if we compare means from two samples, we can estimate if they belong to the same parent population 58
59.
SE of difference between means σ [x 1 −x 2 ] = σ x 1 + σ x 2 2 2 SEdiff .m eans = SE(sample1) + SE(sample2) 2 2 This lets us set up confidence limits for the differences between the two means 59
60.
Regardless of population mean:o The difference between 2 true measures of the mean of a population is 0o The differences between pairs of sample means from this population is normally distributed about 0 60
61.
Consider two interfaces:We capture 10 users’ times per task on each.The results are: Interface A = mean 8, sd =3 Interface B = mean 10, sd=3.5Q? - is Interface A really different?How do we tackle this question? 61
62.
Calculate the SE difference betweenthe meansSEa = 3/√10 = 0.95SEb= 3.5/ √10=1.11SE a-b = √(0.952+1.112) = √(0.90+1.23)=1.46Observed Difference between means= 2.095% Confidence interval of difference between means is 2 x(1.46) or 2.92 (i.e. we expect to find difference between 0-2.92 by chance alone).suggests there is no significant difference at the p<.05 level. 62
63.
But what else?We can calculate the exact probability of finding this difference by chance: Divide observed difference between the means by the SE(diff between means): 2.0/1.46 = 1.37 Gives us the number of standard deviation units between two means (Z scores) Check Z table: 82% of observations are within 1.37 sd, 18% are greater; thus the precise sig level of our findings is p<.18.Thus - Interface A is different, with rough odds of 5:1 63
64.
Hold it!o Didn’t we first conclude there was no significant difference? o Yes, no significant difference at p<.05 o But the probability of getting the differences we observed by chance was approximately 0.18 o Not good enough for science (must avoid type 1 error), but very useful for making a judgment on design o But you MUST specify levels you will accept BEFORE not after….o Note - for small samples (n<20) t- distribution is better than z distribution, when looking up probability 64
65.
Why t?o Similar to the normal distributiono t distribution is flatter than Z for small degrees of freedom (n-1), but virtually identical to Z when N>30o Exact shape of t-distribution depends on sample size 65
66.
Simple t-test:o You want all users of a new interface to score at least 70% on an effectiveness test. You test 6 users on a new interface and gain the following scores: 62 92 75 Mean = 79.17 68 Sd=13.17 83 95 66
67.
T-test: 79.17 − 70 9.17t = 13.17 = = 1.71 5.38 6From t-tables, we can see that this value of t exceeds t value(with 5 d.f.) for p.10 levelSo we are confident at 90% level that our new interface leadsto improvement 67
68.
Sample mean T-test: 79.17 − 70 9.17 t= = = 1.71 13.17 5.38 6 SE meanThus - we can still talk in confidence intervals, e.g.,We are 68% confident the mean of population =79.17 ± 5.38 68
69.
Predicting the direction of thedifferenceo Since you stated that you wanted to see if new Interface was BETTER (>70), not just DIFFERENT (< or > 70%), this is asking for a one-sided test….o For a two-sided test, I just want to see if there is ANY difference (better or worse) between A and B. 69
70.
One tail (directional) testo Tester narrows the odds by half by testing for a specific differenceo One sided predictions specify which part of the normal curve the difference observed must reside in (left or right)o Testing for ANY difference is known as ‘two-tail’ testing,o Testing for a directional difference (A>B) is known as ‘one-tail’ testing 70
71.
So to recapo If you are interested only in certain differences, you are being ‘directional’ or ‘one-sided’o Under the normal curve, random or chance differences occur equally on both sideso You MUST state directional expectations (hypothesis) in advance 71
72.
Why would you predict thedirection?o Theoretical grounds o Experience or previous findings suggested the differenceo Practical grounds o You redesigned the interface to make it better, so you EXPECT users will perform better…. 72
Views
Actions
Embeds 0
Report content