Successfully reported this slideshow.
Upcoming SlideShare
×

# T7 data analysis

1,606 views

Published on

Introduction to Data Analysis and descriptive measures

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### T7 data analysis

1. 1. Analyzing and interpreting data By Rama Krishna Kompella
2. 2. Myths– Complex analysis and big words impress people.– Analysis comes at the end when there is data to analyze.– Qualitative analysis is easier than quantitative analysis– Data have their own meaning– Stating limitations weakens the evaluation– Computer analysis is always easier and better
3. 3. Blind men and an elephant - Indian fableThings aren’t always what we think!Six blind men go to observe an elephant. One feels the side and thinks theelephant is like a wall. One feels the tusk and thinks the elephant is a like aspear. One touches the squirming trunk and thinks the elephant is like asnake. One feels the knee and thinks the elephant is like a tree. Onetouches the ear, and thinks the elephant is like a fan. One grasps the tail andthinks it is like a rope. They argue long and loud and though each was partlyin the right, all were in the wrong.For a detailed version of this fable see: http://www.wordinfo.info/words/index/info/view_unit/1/?letter=B&spage=3
4. 4. Data analysis and interpretation • Think about analysis EARLY • Start with a plan • Code, enter, clean • Analyze • Interpret • Reflect − What did we learn? − What conclusions can we draw? − What are our recommendations? − What are the limitations of our analysis?
5. 5. Why do I need an analysis plan? • To make sure the questions and your data collection instrument will get the information you want • Think about your “report” when you are designing your data collection instruments
6. 6. Do you want to report…• the number of people who answered each question?• how many people answered a, b, c, d?• the percentage of respondents who answered a, b, c, d?• the average number or score?• the mid-point among a range of answers?• a change in score between two points in time?• how people compared?• quotes and people’s own words
7. 7. Common descriptive statistics• Count (frequencies)• Percentage• Mean• Mode• Median• Range• Standard deviation• Variance• Ranking
8. 8. Key components of a data analysis plan• Purpose of the evaluation• Questions• What you hope to learn from the question• Analysis technique• How data will be presented
9. 9. Steps in Processing of Data• Preparing of raw data• Editing – Field editing – Office editing• Coding – Establishment of appropriate category – Mutually exclusive• Tabulation – Sorting and counting – Summarizing of data
10. 10. Types of Tabulation• Simple or one-way tabulation – Question with only one response (adds up to 100) – Multiple response to a question ( doesn’t add up to 100)• Cross tabulation or two-way tabulation
11. 11. Classification of Data• Number of groups• Width of the class interval• Exclusive categories• Exhaustive categories• Avoid extremes
12. 12. Frequency Distribution TablesLower LimitUpper Limit
13. 13. Frequency Distribution Tables
14. 14. Measures of Central Tendency• Measure of central tendency, of a data set is a measure of the "middle" value of the data set• The mean, median and mode are all valid measures of central tendency• But, under different conditions, some measures of central tendency become more appropriate to use than others
15. 15. Mean• The mean (or average) is the most popular and well known measure of central tendency• It can be used with both discrete and continuous data, although its use is most often with continuous data
16. 16. Median & Mode• The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data.• The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
17. 17. Skewed Distributions
18. 18. Choosing appropriate measure Type of Variable Best measure of central tendency Nominal Mode Ordinal MedianInterval/Ratio (not skewed) Mean Interval/Ratio (skewed) Median
19. 19. How to represent the results• Graphics should be used whenever practical• Generally used graphics to depict the results are: – Bar charts – Line charts – Pie / round charts
20. 20. Measures of DispersionMeasures of dispersion (or variability or spread) indicate the extent to which the observed values are “spread out” around that center — how “far apart” observed values typically are from each other and therefore from some average value (in particular, the mean).
21. 21. Measures of Dispersion• There are three main measures of dispersion: – The range – The semi-interquartile range (SIR) – Variance / standard deviation 21
22. 22. The Range• The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, XL - XS• What is the range of the following data: 4 8 1 6 6 2 9 3 6 9• The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS = 9 - 1 = 8 22
23. 23. When To Use the Range• The range is used when – you have ordinal data or – you are presenting your results to people with little or no knowledge of statistics• The range is rarely used in scientific work as it is fairly insensitive – It depends on only two scores in the set of data, XL and XS – Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9 23
24. 24. The Semi-Interquartile Range• The semi-interquartile range (or SIR) is defined as the difference of the first and third quartiles divided by two – The first quartile is the 25th percentile – The third quartile is the 75th percentile• SIR = (Q3 - Q1) / 2 24
25. 25. SIR Example• What is the SIR for the 2 data to the right? 4 ← 5 = 25th %tile• 25 % of the scores are 6 below 5 8 – 5 is the first quartile 10• 25 % of the scores are 12 above 25 14 – 25 is the third quartile 20• SIR = (Q3 - Q1) / 2 = (25 - ← 25 = 75th %tile 30 5) / 2 = 10 60 25
26. 26. When To Use the SIR• The SIR is often used with skewed data as it is insensitive to the extreme scores 26
27. 27. Mean DeviationThe key concept for describing normal distributionsand making predictions from them is calleddeviation from the mean.We could just calculate the average distance between each observation and the mean.• We must take the absolute value of the distance, otherwise they would just cancel out to zero!Formula: | X − Xi | ∑ n
28. 28. Mean Deviation: An ExampleData: X = {6, 10, 5, 4, 9, 8} X = 42 / 6 = 7X – Xi Abs. Dev. 1. Compute X (Average)7–6 1 2. Compute X – X and take the7 – 10 3 Absolute Value to get Absolute Deviations7–5 2 3. Sum the Absolute7–4 3 Deviations 4. Divide the sum of the7–9 2 absolute deviations by N7–8 1 Total: 12 12 / 6 = 2
29. 29. What Does it Mean?• On Average, each observation is two units away from the mean.Is it Really that Easy?• No!• Absolute values are difficult to manipulate algebraically• Absolute values cause enormous problems for calculus (Discontinuity)• We need something else…
30. 30. Variance• Variance is defined as the average of the square deviations: ∑ ( X − µ) 2 σ2 = N 30
31. 31. What Does the Variance Formula Mean?• First, it says to subtract the mean from each of the scores – This difference is called a deviate or a deviation score – The deviate tells us how far a given score is from the typical, or average, score – Thus, the deviate is a measure of dispersion for a given score 31
32. 32. What Does the Variance Formula Mean?• Why can’t we simply take the average of the deviates? That is, why isn’t variance defined as: σ 2 ≠ ∑ ( X − µ) N This is not the formula for variance! 32
33. 33. What Does the Variance Formula Mean?• One of the definitions of the mean was that it always made the sum of the scores minus the mean equal to 0• Thus, the average of the deviates must be 0 since the sum of the deviates must equal 0• To avoid this problem, statisticians square the deviate score prior to averaging them – Squaring the deviate score makes all the squared scores positive 33
34. 34. Computational Formula• When calculating variance, it is often easier to use a computational formula which is algebraically equivalent to the definitional formula: ( ∑ X) 2 ∑X ∑( X −µ) 2 − 2 Nσ 2 = = N N∀ σ2 is the population variance, X is a score, µ is the population mean, and N is the number of 34
35. 35. Computational Formula Example X X2 X-µ (X-µ2 ) 9 81 2 4 8 64 1 1 6 36 -1 1 5 25 -2 4 8 64 1 1 6 36 -1 1 Σ 42 = Σ 306 = Σ 0 = Σ 12 = 35
36. 36. Computational Formula Example ( ∑ X) 2 ∑X ∑( X −µ) 2 2 − N σ 2σ = 2 = N N 2 12 306 − 42 == 6 6 6 =2 306 − 294= 6 12= 6=2 36
37. 37. Variance of a Sample• Because the sample mean is not a perfect estimate of the population mean, the formula for the variance of a sample is slightly different from the formula for the variance of a population: s 2 = ( ∑ X −X )2 N −1• s2 is the sample variance, X is a score, X is the sample mean, and N is the number of 37 scores
38. 38. Homework• The following are test scores from a class of 20 students:• 96 95 93 89 83 83 81 77 77 77 71 71 70 68 68 65 57 55 48 42• Find out the measures of central tendency and dispersion• What do you observe from the values of measures of central tendency?
39. 39. Q & As