Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Scales of measurement in statistics by Shahid Imran Khan 229 views
- Is the Data Scaled, Ordinal, or Nom... by Ken Plummer 1338 views
- Standard Deviation by JRisi 10019 views
- 1a difference between inferential a... by Ken Plummer 7385 views
- Basic concepts of statistics by Tarun Gehlot 1590 views
- Basic Statistics & Data Analysis by Ajendra Sharma 2592 views

790 views

Published on

No Downloads

Total views

790

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

31

Comments

0

Likes

1

No embeds

No notes for slide

Can we present this graphically?

Called a segmented bar chart

Unemployment Rates sheet

ExcelTutorial5_timeseriesgraph

Non zerio origins are a great way to lie

Very popular in government

Boy, that really changes your impression of the data and the underlying trend. The drop from 1992 to 1997 was 7%. Does this graph under or overstate a 7% change over this period?

No mode

For the first distribution we have 15.5+3.338= 18.838 and 15.5-3.338=12.162

Assuming this is how much restaurant patrons spend, what this means is that most of the patrons probably spend between $12.16 and $18.84.

In the second example, we have 15.5+0.926=16.43 and 15.5-0.926=14.57 which as you can see shows less spread in the data.

In the third example we have 15.5+4.57=20.07 and 15.5-4.57=10.93 which is the most spread.

Excel 4 minutes

Food Expenditures 2

ExcelTutorial9_Dispersion.mp4

More often than not, we are interested in describing relationship between variables

On Oct. 28 we learned about scatter plots as a graphical way to describe a relationship between two variables.

We also learned about cross tabs aka contingency tables for nominal/ordinal variables

Let’s look a little more closely at measure of relationships for ratio level data

- 1. Week 11: Basic Descriptive Quantitative Data Analysis Tables, Graphs, & Summary Statistics 1
- 2. Objectives Learn about basic descriptive quantitative analysis How to perform these tasks in Excel Starting point for 502B Excel knowledge and quantitative skills are highly desired by Employers EC stream 2
- 3. Introduction 3 Without data, it is anyone’s opinion Why use tables, graphs, summary stats? “At their best, tables, graphs, and statistics are instruments for reasoning about complex quantitative information.” Why learn how to design them appropriately? “At their worst, tables, graphs and summary statistics are instruments of evil used for deceiving a naive viewer.” Does your mindset match my dataset! http://www.ted.com/talks/hans_rosling_at_state.html
- 4. Quantitative Research Process Page 4
- 5. Introduction Page 5
- 6. Page 6 Presenting the Data
- 7. Frequency Distribution Page 7 A convenient way of summarizing a lot of tabular data What is a Frequency Distribution? A frequency distribution is a list or a table … containing class groupings (categories or ranges within which the data fall) ... and the corresponding frequencies with which data fall within each class or category For nominal/ordinal data
- 8. Introduction Page 8
- 9. Page 9 Table 1 Univariate Frequencies of Percentage of Sales Reported to Tax Authorities Source: 1999 World Bank World Business Environment Survey (WBES), excludes missing observations % of Sales Reported 100% 90-99% 80-89% 70-79% 60-69% 50-59% <50% Total Frequency 3307 1096 916 703 501 694 936 8153 Percent (%) 40.56 13.44 11.24 8.62 6.14 8.51 11.48 100 http://www.enterprisesurveys.org/
- 10. Contingency/Pivot/Cross Table 10 May also want to produce a table with more categories Cross table or Contingency table or Pivot table Suitable if you have two nominal/ordinal variables Simple extension to a univariate table Considers relationship between two variables Row variable (Dependent) Column variable (Independent)
- 11. Table2 Percentage of Sales Reported to Tax Authorities by Region Page 11 Africa Transition Asia Latin OECD Former Total Europe America Soviet Countries 100% 490 554 416 794 446 607 3,307 90-99% 266 196 142 119 145 228 1,096 80-89% 158 152 117 192 73 224 916 70-79% 162 117 103 153 43 125 703 60-69% 140 69 70 115 22 85 501 50-59% 140 105 141 118 16 174 694 <50% 100 106 283 296 25 126 936 Total 1,456 1,299 1,272 1,787 770 1,569 8,153 Source: 1999 World Bank World Business Environment Survey (WBES) * Excludes missing observations
- 12. Features of a Table 12 Title that accurately summarizes the data Simple, indicates major variables, and time frame (if applicable) Source: data set or origin of table Explanatory footnotes Easy to read & separated from text Properly formatted for style (see APA Rules) Necessary to advance analysis See Module 7 for APA Table Checklist Reproduced from APA manual
- 13. Page 13 Presenting the Data
- 14. Bar Graph Page 14 Often used to describe categorical data Ordinal/Nominal Draws attention to the frequency of each category
- 15. Page 15 Table 1 Univariate Frequencies of Percentage of Sales Reported to Tax Authorities Source: 1999 World Bank World Business Environment Survey (WBES), excludes missing observations % of Sales Reported 100% 90-99% 80-89% 70-79% 60-69% 50-59% <50% Total Frequency 3307 1096 916 703 501 694 936 8153 Percent (%) 40.56 13.44 11.24 8.62 6.14 8.51 11.48 100 http://www.enterprisesurveys.org/
- 16. Bar Graph Page 16 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 17. Relative Frequency Polygone 17
- 18. Pie Graph Page 18 Emphasizes the proportion of each category Something that may be good for our tax evasion data Circle represents the total Segments the shares of the total Segment size is proportional to frequency
- 19. Pie Graph 19 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 20. Page 2020 Pie Graph Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 21. Page 2121 Pie Graph Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 22. Charts in Excel I 22
- 23. Table2 Percentage of Sales Reported to Tax Authorities by Region Page 23 Africa Transition Asia Latin OECD Former Total Europe America Soviet Countries 100% 490 554 416 794 446 607 3,307 90-99% 266 196 142 119 145 228 1,096 80-89% 158 152 117 192 73 224 916 70-79% 162 117 103 153 43 125 703 60-69% 140 69 70 115 22 85 501 50-59% 140 105 141 118 16 174 694 <50% 100 106 283 296 25 126 936 Total 1,456 1,299 1,272 1,787 770 1,569 8,153
- 24. Bar Graph Page 24 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 25. Page 2525 Segmented Bar Chart Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 26. Pie Graph Page 26 Figure 2 Percentage of sales reported to tax authority by region Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
- 27. Vertical Bar Chart 27
- 28. Charts in Excel II 28
- 29. Time Series Graph Page 29 Time series are often used in social sciences Data collected at various time period: daily, weekly, monthly, quarterly, annually, etc. Examples include GDP, Unemployment, University Tuition Plot series of interest over time Let’s look at a graph of the unemployment rate by gender and age
- 30. Line Graph Page 30
- 31. InstructorPage 31 Histogram Used for continuous data Frequency Distribution for continuous data Summary graph showing count of the data pints falling in various ranges Rough approximate of the distribution of the data A histogram is a way to summarize data The distribution condenses the raw data into a more useful form... and allows for a quick visual interpretation of the data
- 32. Histogram 32
- 33. InstructorPage 33 Scatter Graphs Graphs relationship between two continuous variables
- 34. Scatter Graph 34
- 35. Principles of Graphical Excellence 35 Well-designed presentation of interesting data Substance & design Simplicity of design, complexity of data Proportion and Balance Clear, precise, efficient Know what you are trying to show (have a story) make sure you graph shows it Well formatted, professional Choose format that reflects your data and the story Informative and legible axis Fully labelled & legible Gets across main point(s) in the shortest time with the least ink in the smallest space Adds information not otherwise available to the reader But supplemented with text describing the figure Tells the truth about the data Limits complexity and confusion Avoid Chart Junk
- 36. 36 0 10 20 30 40 50 60 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0 20 40 60 80 100 120 West North Northeast Southwest Mexico Europe Japan East South International Examples of Chartjunk
- 37. 37 Examples of Chartjunk 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Gridlines! Vibration Pointless Fake 3-D Effects Filled “Floor” Clip Art In or out? Filled “Walls” Borders and Fills Galore Unintentional Heavy or Double Lines Filled Labels Serif Font with Thin & Thick Lines
- 38. Displaying Data: “Mistakes” Page 38 Graphs are also instruments of evil used for deceiving a naive viewer. Non-zero origin Omitting data that refutes your “evidence” Limiting scope of data
- 39. What is Wrong with this Graph? 39 Provincial Personal Income Taxes Single Individual with $45,000 in income claiming basic personal tax credits
- 40. The Real Story 40
- 41. Exaggerates a change in data Page 41 Source: Statistics Canada, CANSIM II, V31215364
- 42. Dr. Kendall 42
- 43. Worst Recession Since the Depression (?) 43
- 44. Page 44 Presenting the Data
- 45. Describing Data Numerically 45 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
- 46. Mode 46 A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode or several modes What are the modes for the displayed data? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
- 47. Mode 47 A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
- 48. Mode 48 There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 & 9
- 49. Mode 49 Caution: Mode may not be representative of the data {0.1, 0.1, 5000, 4900, 4500, 5200,…}
- 50. Median 50 In an ordered list, the median is the “middle” number (50% above, 50% below) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
- 51. Mean 51 The “balancing point” (centre of gravity) of the data E.g. The data “balances” at 5 1 2 3 4 5 6 7 8 9 -2 -1 +3
- 52. Arithmetic Mean 52 The arithmetic mean (mean) is the most common measure of central tendency Calculated by summing the value observations and dividing by the number of observations For a sample of size n: # of observationsn xxx n x x n21 n 1i i +++ == ∑= Observed values
- 53. Arithmetic Mean 53 The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) What is the mean for these examples? 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
- 54. Arithmetic Mean 54 The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 54321 == ++++ 4 5 20 5 104321 == ++++
- 55. Measures of Central Tendency 55 Central Tendency Mean Median Mode n x x n 1i i∑= = Overview Midpoint of ranked values Most frequently observed valueArithmetic average 50% 50%
- 56. The “Shape of a Distribution” 56 Use information on mean, median, and mode to “visualize” the data A data distribution is said to be symmetric if its shape is the same on both sides of the median Symmetry implies that median=arithmetic mean If a distribution is uni-modal and symmetric then Median=mean=mode
- 57. The “Shape of a Distribution” 57 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 #ofObs. Value MEDIAN50% 50% Symmetric: Median=Mean Sym m etric: Median=M ean UNIMODAL Symmetric & Unimodel: Median=Mean=Mode
- 58. The “Shape of a Distribution” 58 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 #ofObs. Value MEDIAN50% 50% Sym m etric: Median=M ean Symmetric: Median=Mean BIMODAL BIMODAL Symmetric & Bimodel: Median=Mean≠Mode
- 59. The “Shape of a Distribution” 59 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 #ofObs. Values MEDIAN50% 50% Symmetric: Median=Mean Symmetric: Median=Mean MODE? Symmetric & no mode: Median=Mean (Uniform
- 60. The “Shape of a Distribution” 60 An asymmetric distribution is said to be skewed 1. Negatively if Mean<Median<Mode 2. Positively if Mean>Median>Mode Hence, by comparing our measures of cental tendancy, we can start to visualize the shape and characteristics of the data
- 61. The “Shape of a Distribution” 61 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 MODE=2 MEDIAN=3 50% 50% MEAN=3.2 MODE < MEDIAN < MEAN = POSITIVELY SKEWED DISTRIBUTION
- 62. Example: Positively skewed variable 62 The Distribution of After-Tax Income shows the distribution of income across all Canadian households
- 63. Example: Positively skewed variable 63 The mode income is the most common income and was in the range from $15,000 to $19,999. The median income is the level of income that separates the population into two groups of equal size and was $39,700. The mean income is the average income and was $48,400.
- 64. Example: Positively skewed variable 64 A distribution in which the mean exceeds the median and the median exceeds the mode is positively skewed, which means it has a long tail of high values. The distribution of income in Canada is positively skewed. Most likely to report median rather than mean since long tail distorts average
- 65. Example: Positively skewed variable 65 Volunteer hours Charitable contributions # of Cigarette packs smoked (excluding 0) Collective bargaining agreement duration (in years) # of beers consumed on a Saturday night Duration of low income (in years) Number of children
- 66. The “Shape of a Distribution” 66 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 MODE=6 MEDIAN=5 50% 50% MEAN=4.7 Mean< MEDIAN < Mode = NEGATIVELY SKEWED DISTRIBUTION
- 67. Examples 67 University Grades Age Years in school Etc.
- 68. Describing Data Numerically 68 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
- 69. Same center, different variation Measures of Dispersion/Variability 69 Variation Variance Standard Deviation Range Measures of variation give information on the spread or variability of the data values.
- 70. Range 70 Simplest measure of variation Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Example:
- 71. Range 71 Simplest measure of variation Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
- 72. The Range 72 • Problem • Ignores all but two data points • These values may be “outliers” (i.e. not representative)
- 73. Disadvantages of the Range 73 Ignores the way in which data are distributed Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
- 74. The Variance 74 • A single summary measure of dispersion would be more helpful • Takes account of all data Values
- 75. The Variance 1. Variance 2. Standard Deviation ∑= − − = N i i Xx n s 1 22 )( 1 1 75 siancedeviationdards == vartan
- 76. Measuring variation 76 Small standard deviation Large standard deviation
- 77. Comparing Standard Deviations 77 Mean = 15.5 s = 3.33811 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.570 Data C
- 78. Describing Data Numerically 78 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
- 79. The Sample Covariance 79 The covariance measures the strength of the linear relationship between two variables The sample covariance: Only concerned with the strength of the relationship No causal effect is implied 1n )y)(yx(x sy),(xCov n 1i ii xy − −− == ∑=
- 80. Interpreting Covariance 80 Covariance between two variables: Cov(x,y) > 0 x and y tend to move in the same direction Cov(x,y) < 0 x and y tend to move in opposite directions Cov(x,y) = 0 x and y are independent
- 81. Coefficient of Correlation 81 Measures the relative strength of the linear relationship between two variable Sample correlation coefficient: YX ss y),(xCov r =
- 82. Features of Correlation Coefficient, r 82 Unit free Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship
- 83. Interpreting the Correlation Coefficient, r 83
- 84. Scatter Plots of Data with Variou Correlation Coefficients 84 Y X Y X Y X Y X Y X r = -1 Cov<0 r = -.6 Cov<0 r = 0 Cov=0 r = +.3r = +1 Y X r = 0
- 85. 502B 85
- 86. Fun with Graphs 86 Does your mindset match my dataset! http://www.ted.com/talks/hans_rosling_at_state.html
- 87. Looking ahead SRs to client (cc) and Turnitin on Wednesday by noon No class next week Work on 598 critiques 598 Critiques due in class & Turnitin Nov. 30 Comments on your SRs will be ready Nov. 30 Final SRs (if required) due Dec. 8 @11:55PM PST Note carefully the requirements Moodle site will be inaccessible sometime in December Final Grades reported via usource once approved by the Director 87

No public clipboards found for this slide

Be the first to comment