We've updated our privacy policy. Click here to review the details. Tap here to review the details.

Successfully reported this slideshow.

Your SlideShare is downloading.
×

Activate your 30 day free trial to unlock unlimited reading.

Activate your 30 day free trial to continue reading.

Top clipped slide

1 of 48
Ad

defining descriptive statistics required for business statistics

defining descriptive statistics required for business statistics

- 1. Descriptive Statistics
- 2. Descriptive Statistics • Tabular, graphical, or numerical summaries of data. Age Mean 42.57 Median 40 Mode 40 Standard Deviation 10.63 Sample Variance 113.01 Range 44 Minimum 21 Maximum 65 Frequency Female 12 Male 18 Grand Total 30 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 Frequency Opinion Bar Chart for Opinions
- 3. Summarizing Data for Categorical Variables • Let us focus on Tabular and Graphical summaries first. We will deal with numerical summaries later. • Tabular: • Frequency distribution • Relative frequency distribution • Percent frequency distribution • Graphical: • Bar chart • Pie chart
- 4. Frequency Distribution • A frequency distribution is a tabular summary of data showing the number (frequency) of observations in each of several non- overlapping categories or classes. Opinion Frequency Strongly disagree 8 Disagree 4 Neutral 6 Agree 7 Strongly agree 5 Grand Total 30
- 5. Relative Frequency Distribution Relative frequency of a class = Frequency of the class Total number of observations Percent frequency of a class = Frequency of the class Total number of observations × 100 %
- 6. Opinion Frequency Relative frequency Percent Frequency Strongly disagree 8 0.27 27% Disagree 4 0.13 13% Neutral 6 0.20 20% Agree 7 0.23 23% Strongly agree 5 0.17 17% Grand Total 30 1.00 100%
- 7. Bar Chart 0 5 10 15 20 25 Elderly Middle-aged Young FREQUENCY AGE CATEGORY Number of people in each age category
- 8. Pie Chart Age distribution of people Elderly Middle-aged Young
- 9. Summarizing Data for Quantitative Variables • Let us focus on Tabular and Graphical summaries first. We will deal with numerical summaries later. • Tabular: • Frequency distribution • Relative frequency distribution • Percent frequency distribution • Graphical: • Histogram
- 10. Frequency Distribution • We need to bin/bucket the quantitative variable of interest. • Three Steps: 1. Determine the number of nonoverlapping classes. 2. Determine the width of each class. 3. Determine the class limits. • Choosing the number of classes is tricky! It is done by trial and error. • Five to twenty classes are preferred. (Not too few, not too many, just enough to informatively show the variation in the frequencies.)
- 11. Frequency Distribution Approximate class width = Largest data value − Smallest data value Number of classes
- 12. Frequency Distribution Class Frequency [31000, 35200] 1 (35200, 39400] 3 (39400, 43600] 2 (43600, 47800] 7 (47800, 52000] 3 (52000, 56200] 4 (56200, 60400] 3 (60400, 64600] 4 (64600, 68800] 1 (68800, 73000] 0 (73000, 77200] 0 (77200, 81400] 2
- 13. Relative/Percent Frequency Class Frequency Rel. Freq. Perc. Freq. [31000, 35200] 1 0.033 3.33 (35200, 39400] 3 0.100 10.00 (39400, 43600] 2 0.067 6.67 (43600, 47800] 7 0.233 23.33 (47800, 52000] 3 0.100 10.00 (52000, 56200] 4 0.133 13.33 (56200, 60400] 3 0.100 10.00 (60400, 64600] 4 0.133 13.33 (64600, 68800] 1 0.033 3.33 (68800, 73000] 0 0.000 0.00 (73000, 77200] 0 0.000 0.00 (77200, 81400] 2 0.067 6.67
- 14. Histogram
- 15. Skewness • To which side is the tail of the distribution longer or more drawn out? • Positive/Right skew • Negative/Left skew • Zero skewness means symmetric distribution.
- 16. Skewness
- 17. Summarizing Data for Two Categorical Variables • Tabular • Crosstabulation • Graphical • Side-by-side bar chart • Stacked bar chart
- 18. Crosstabulation Strongly agree Agree Neutral Disagree Strongly disagree Grand Total Elderly 0 0 0 0 3 3 Middle-aged 4 6 5 2 4 21 Young 1 1 1 2 1 6 Grand Total 5 7 6 4 8 30
- 19. Side-by-side Bar Chart 0 1 2 3 4 5 6 7 Strongly agree Agree Neutral Disagree Strongly disagree Frequency Opinions Opinions vs. age categories Elderly Middle-aged Young
- 20. Stacked Bar Chart 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Strongly agree Agree Neutral Disagree Strongly disagree Percentage Opinions Opinions vs. age categories Elderly Middle-aged Young
- 21. Scatterplot: Visualizing the Relationship Between Two Quantitative Variables $0 $10,000 $20,000 $30,000 $40,000 $50,000 $60,000 $70,000 $80,000 $90,000 0 10 20 30 40 50 60 70 Salary Age (years) Salary vs. Age
- 22. Creating Effective Graphical Displays • Give the display a clear and concise title. • Keep the display simple. • Clearly label each axis and provide the units of measure. • If colors are used, make sure they are distinct. • If multiple colors or line types are used, provide a legend.
- 23. Statistical Inference (Recap) Population Sample Population parameter E.g., Population average income 𝜇 Draw Infer Sample statistic E.g., Sample average income 𝑥 A sample statistic is a point estimator of the corresponding population parameter.
- 24. Descriptive Statistics: Numerical Measures • Measures of location: • Measures of central location: (A single number which indicates a typical value of the data.) • Sample mean • Sample median • Sample mode • Sample percentiles • Sample quartiles • Measures of variability: (A single number which indicates the variability in the data.) • Sample range • Sample IQR • Sample variance • Sample standard deviation • Measures of distribution shape: (A single number which lets us know the shape of the distribution of the data.) • Skewness • Kurtosis
- 25. Some Common Notation • Let 𝑥 represent a variable of interest. • Let 𝑛 be the number of observations in the sample. This is the sample size. • Let 𝑥𝑖 be the 𝑖𝑡ℎ observation. • Let 𝑁 be the number of observations in the population. This is the size of the population.
- 26. Measures of Location • Measures of central location: (A single number which indicates a typical value of the data.) • Sample mean • Sample median • Sample mode • Sample percentiles • Sample quartiles
- 27. Sample Mean Sample mean 𝑥 = 𝑖=1 𝑛 𝑥𝑖 𝑛 Population mean 𝜇 = 𝑖=1 𝑁 𝑥𝑖 𝑁
- 28. Sample Median • The median of a data set is the value in the middle when the data items are arranged in ascending order. • The median divides the dataset into two parts, each with approximately 50% of observations. • Arrange the data in ascending order (smallest value to largest value). • For an odd number of observations, the median is the middle value. • For an even number of observations, the median is the average of the two middle values.
- 29. Sample Mode • The mode of a data set is the value that occurs with greatest frequency.
- 30. Sample Percentile • The 𝑝𝑡ℎ percentile is a value such that at least 𝒑 percent of the observations are less than or equal to this value and at least (𝟏𝟎𝟎 − 𝒑) percent of the observations are greater than or equal to this value
- 31. Sample Percentile • Arrange the data in ascending order. • Location of the 𝑝𝑡ℎ percentile: 𝐿𝑝 = 𝑝 100 (𝑛 + 1)
- 32. Sample Quartiles • The quartiles divide the dataset into four parts, each with approximately 25% of observations. • First Quartile 𝑄1 = 25th Percentile • Second Quartile 𝑄2 = 50th Percentile • Third Quartile 𝑄3 = 75th Percentile
- 33. Measures of Variability • Measures of variability: (A single number which indicates the variability in the data.) • Sample range • Sample IQR • Sample variance • Sample standard deviation
- 34. Sample Range Sample Range = Largest value – Smallest Value
- 35. Sample Interquartile Range (IQR) 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
- 36. Box Plot Q1 Median Q3 Max value less than inner fence Min value greater than inner fence Q3 + 1.5*IQR Inner fence Q3 + 3*IQR Outer fence Q1 – 1.5*IQR Inner fence Q1 – 3*IQR Outer fence Major outlier Minor outlier
- 37. Sample Variance Sample variance 𝑠2 = 𝑖=1 𝑛 𝑥𝑖−𝑥 2 𝑛−1 Population variance 𝜎2 = 𝑖=1 𝑁 𝑥𝑖−𝑥 2 𝑁
- 38. Sample Standard Deviation Sample standard deviation 𝑠 = 𝑠2 Sample standard deviation 𝜎 = 𝜎2
- 39. Chebyshev’s Theorem • At least (1 − 1 𝑧2) of the data values must be within 𝑧 standard deviations of the mean, where 𝑧 is any value greater than 1.
- 40. Suppose that you are interested in analyzing the amount of time spent by users browsing through Swiggy before they come to a decision about what to order. You know that the average time spent browsing is 6.9 minutes. Suppose that the standard deviation is 1.2 minutes. • What can you say about the percentage of users who spend between 4.5 minutes and 9.3 minutes browsing Swiggy? • What can you say about the percentage of users who spend between 5.4 minutes and 9.3 minutes browsing Swiggy?
- 41. Measures of Association Between Two Variables • Covariance • Correlation
- 42. Covariance • Covariance is a descriptive measure of the strength of linear association between two variables. Sample covariance 𝑠𝑥𝑦 = 𝑖=1 𝑛 𝑥𝑖−𝑥 𝑦𝑖−𝑦 𝑛−1 Population Covariance 𝜎𝑥𝑦 = 𝑖=1 𝑁 𝑥𝑖−𝜇𝑥 𝑦𝑖−𝜇𝑦 𝑁 • +ve value +ve relationship • -ve value -ve relationship • Sensitive to units of measurement of the variables!
- 43. Correlation • Correlation coefficient is a dimensionless measure of the strength of linear association between two variables. Sample correlation coefficient 𝑟𝑥𝑦 = 𝑠𝑥𝑦 𝑠𝑥𝑠𝑦 Population correlation coefficient 𝜌𝑥𝑦 = 𝜎𝑥𝑦 𝜎𝑥𝜎𝑦 • Bounded between [-1, 1] • Values close to 0 indicate weak linear relationship. • Values close to 1 indicate strong positive linear relationship. • Values close to -1 indicate strong negative linear relationship.

No public clipboards found for this slide

You just clipped your first slide!

Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.Hate ads?

Enjoy access to millions of presentations, documents, ebooks, audiobooks, magazines, and more **ad-free.**

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.
Be the first to like this

Total views

3

On SlideShare

0

From Embeds

0

Number of Embeds

1

Unlimited Reading

Learn faster and smarter from top experts

Unlimited Downloading

Download to take your learnings offline and on the go

You also get free access to Scribd!

Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.

Read and listen offline with any device.

Free access to premium services like Tuneln, Mubi and more.

We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.

You can read the details below. By accepting, you agree to the updated privacy policy.

Thank you!

We've encountered a problem, please try again.