Like this document? Why not share!

# Introduction To Statistics

## by Alan P Jack on Oct 11, 2008

• 1,311 views

### Views

Total Views
1,311
Views on SlideShare
1,253
Embed Views
58

Likes
1
84
0

### 1 Embed58

 http://post.blackboard.com 58

### Categories

Uploaded via SlideShare as Microsoft Word

## Introduction To StatisticsDocument Transcript

• Introduction to Statistics Statistics We don’t have to actually do statistis but how to READ them Som understanding of what they are telling you!!! The wohole point is to be able to tell if research is usable in practice I know what that word means Why they are doing it Don’t have to understand the workings out!!!! Not high felutinng mathematics Why they have done that they hvave done!!!!
• Statistics The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling. about a population numbers a sample of the population use numbers to help us in what we are looking at Numerical data. Statistics is the study of large numbers, such as those produced by government departments, with the aim of extracting some approximate truth from them. APROXIMATE TRUTH – or as near as we can get….
• Two Types Descriptive Statistics Numerical values such as mean, median, and mode which describe the chief features of a group of scores, without regard to a larger population. Not necearilty infer from the results Inferential Statistics Inferential statistics do not just describe numbers, they infer causes. We use them to draw inferences (informed guesses) about situations where we have only gathered part of the information that exists. The part of the information is called a sample. The whole body of information from which it is taken is called the population. In a basic statistical test we would have two samples and
• would try to establish if they are significantly different. One lot has something so you infer that it may be applied to the whole population 2 samples – are they the same or difference Then hope to infer that it can be that it applies to the whole population Lookingat the two types and see what goes on in them…….. Descriptive Statistics Two types of summaries of this…… Pictorial Summaries frequency tables, histograms, pie charts, graphs, bar charts.
• Numerical Summaries mean, median, mode, variance, range, standard deviation. Bar Charts Picture How many represented in each bar You can see immediately what the difference Catagories You can use a bar chart (nominal (= name) data Histograms Picture Like a bar chart
• In order Ie patient satisfactory Make it easier Likert score. Put something Not categories but different levels….. Pie Charts See pitcture Represent a % Want to present it as a picture Descriptive stats presented in the best light……
• Frequency Table You get a whole list of data……….. Scores: 1,3,5,2,4,3,1,4,3,2,3,3,4,2,2,3 Then the number of times it appeared in the data….. See picture If you are dealing with a thousand scores You can do a lot more with it… You can see where the majority of scores lay…
• Frequency Table: Patient satisfaction with care Representation of averages (central tendency) The average usually add up the ages and devide by the number… But inresearch it is different ways…. Use the term……….. CENTRAL TENDENCY
• To look at the middle of the data…… 1. Mean: (x with a bar over it…..) The arithmetic average. Indicates the frequency of occurrence and range (ie. The sum of scores divided by the total number of scores. •Problem: It can be distorted by abnormally high or low scores. You can skew your results…. 2. Median: The middle point in a set of data (ie. Half way between highest and lowest score)
• Indication of the range… But high or low will distort…. Won’t tell you what the most people got…. •Problem: Shows an indication of the range, but any abnormally high or low score will distort the median. Gives no indication of frequency of occurrence 3. Mode: The most frequently occurring in a set of data. •Problem: Gives no indic ation of highest/lowest scores ie. range of data So you need some combinataion The mean is used the most….. Descripted statisteicw
• Picture 1 = normal disatrution When the particular attribute are distributed in that mean, mode and median are allin the same place….. Totally unskewed. The “perfect world” …. If you have a normal distribution curve you can do lots of statuistis Pic 2 = negative skew…. Most people are scoring about 20 and only a few scoring 100 (early mode… mean towards the upper end median is in the middle… ) Pic 3 = positive skew
• Most people scoring high (low mean… high mean median still in the middle) Calculate the mean Of list 12 k Median 12.5 k Middle of the range So you want the score between 12 and 13 = 12.5 Mode is 12. So they are fairly near each terh
• Adding in 2 £50k Mean = 19k Mdiealn 13 k Mode = 12 It increase the mean and and median Being aware of where you think it is very obvious but it may not be Need to know more of what the mean is We use some other form of meausment Measurement of distru bmtiono…. How far where the other scores distributed around the mean…
• Measurement of distribution Range: describes the difference between the highest & lowest score. Highlights extreme values only Either between top and bottom and top or the diffence in the score… Quartiles: A way of dividing the range into 4 equal parts. Each part will contain 25% of the data. Inter- quartile range The distance between the lower and upper quartile in
• the distribution curve, representing 50% of the data Measurement of distribution See picture Tells you where half the data lies… Can plot if your data can see where you sit.. Ie Centile charts… Mean weight where you are expected…. Do you fall within or without
• Standard Deviation (SD) Most often used after mean… X bar =…. Sd =…. A measure of how the scores deviate from the mean. This is the most common way of showing distribution. Are they bunched or not…. Mean + SD = accurate description of data Probably the most accurate… In a normal distribution: One SD above the mean + one SD below the mean includes approx 68% of the scores
• There is a formular but we don’t need to work it out…… Works it out from the mean and the numbers….. Can show you where they are distributed and where they lie… Two SD above the mean + two SD below the mean includes approx 95% of the scores See diabram….. Ie mean = 100 Sd = 10 So 1 sd bellow = 90 And 1 sd above = 110 Thefore 68% lie between 90 and 110 See the picture…
• You don’t have to worry too much how they get it…. They have done the calucations and have come up with a figute of twhtat the stadrnder dviddation its…. But they are alsyws the same 68 % between 1 Standard Deviation (SD) Helps undersgandint of a set of data Applies to the sample that you’ve got Want to leap from what you got for yoursample To
• The wider population…… So will they share the same thing? So we talk about………. Confidence Intervals (CI) You sometimes get given a CI A confidence interval is a range or interval of values. In a normal distribution 95% of the area falls between + or - 1.96 SD from the mean. 95% of the time, the population mean will fall within +/- 1.96SD of a sample mean. It is this range that is called a confidence interval. The narrower the CI, the more certain you can be about the true result. Small is good!!!!!!!!!!!!!!!!!!!! Wide is ……. Uncertain…..!!!!
• May not be translatable to the wider population. Levels of measurement Dependant of what level you’ve got you can use specific tests…. Names. Catagories They just exist.. Not able to be put in order Are or are not….. Nominal: Non interval. Indicates ‘sameness’ or ‘differences.’ Male/female: blood groups: categories
• You can do the least with it… You can do some things but not much Ordinal: Scale where size of intervals is not known or not equal. Measures ‘more’ or ‘less’ Good, Fair, Bad. Staff grades etc You can put them in order but you can’t be exact…. There is an order but not a measurable level. Interval: Scale of equal intervals. No fixed zero Temperature (ºC) & (ºF) You can do a lot more with it now…..
• You have a scale but there is no fixed zero… but there is a relationship. Measurable. Comparable… Ratio: Scale of equal intervals with fixed zero As interval but with a zero…. Weight/time/length Look at the type of data we are looking at…… Descrive stuff.. Like modes and mediains Or more complicated stuff……
• Inferential Statistics Inferring form the data ….. Parametric Tests: Powerful tests used on large, homogenous, random, normally distributed samples. (We like them….) Interval & Ratio level data eg. T test; ANOVA; Pearson’s R (correlation) we are not expected to know what each test it is we need to know is it the right test…? Has it been used appropriately.?
• What kind of test? What kind of data…? Non-parametric Tests: Used on small, random, non-normally distributed samples Nominal & Ordinal level data eg. Chi square; Spearman Rank order (correlation); Kendall Tau; Mann-Whitney U For SKEWED samples…. Don’t have to work them out… expected to know what they are and what they are used for…… Probability (p) What is the probability of it happping by chance.? Was it a fluke?
• Did it happened because it is a real phenomenon….!!!!! This is the likelihood that the event, situation or pattern of numbers has occurred by chance. Was it do with the IV or just chance… Is it applyable to the whole population…? Is it generisable.??? Is it SIGNIFICANT…? This may be expressed as a ratio; percentage or decimal: 50:50; 50%; 0.5 0.5 is a probality of tossing a coin and getting heads
• Significance Need a high degree of certainty in the chance… If we want to be sure that the research then we want to have a ghinger certainty,…. Statistical tests are needed to determine how safe it is to attribute any difference obtained to a real difference in the phenomena as opposed to being due to chance. The level of significance is normally set at p = 0.01 or 0.05 Normally set 0.01 (one in 100) Or very minimum which is acceptable in research is 0.05 What level have they set the p value…?
• USUALLY 0.05.. A significance of 0.05 is saying that 95 times out of 100 the result was not due to chance. BECAUSE it was a result of the IV not just chance…. A REAL finding 5/100 may be chance…. Still got to bear in mind 5% chance of chance So you may want a higher confidence
• And you may get an exact correlation.. p = 0.07 not significant p = 0.000000001 significant. Sometimes then we want 0.0000000000000000001 levels of confidence…..!!!! Like going in a plain….. “The result suggests that….” Correlation Is there a relationship between x and y Do a scatter diagram Do a test Is there a relationship?
• The examination of relationships between variables. Statistically this is known as Correlation co-efficient. It is measured on a scale of +1 to -1 +1 is perfect positive correlation The more the more 0 is no association at all Nothing happens -1 is perfect inverse correlation. The more the less… Absolute “1’s” not really possible….. But it would be nice… Over 0.75 +/- is acceptable in some case Inter rater reliableity Use Pearson’s
• Using a computer for statistical analysis Statistical analysis can be conducted manually or by computer. Advantages of computer software include: Only having to input data once You only need to know the tests you want to conduct, not how to do them No errors in calculation Able to recalculate or do different tests, and produce graphs, tables etc. once input is carried out. SPSS, Minitab are both on UoP system Not a test but a computer program.. Type it all in
• And you get what happens…. Computer may not make a mistake But you have to put the right things in….. Still have to know which tests you have to do… It will do what you ask it…. But you do need some different tests