This document provides an overview of various methods for displaying and exploring data, including frequency distributions, histograms, dot plots, stem-and-leaf displays, measures of position such as quartiles and percentiles, box plots, coefficients of skewness and kurtosis, scatterplots, and contingency tables. It discusses how to construct and interpret each method, provides examples, and outlines the key learning objectives for each topic. The document aims to help readers gain additional insight into the distribution and relationships within data through visual and statistical analysis techniques.
The document discusses various methods for describing and exploring data, including dot plots, stem-and-leaf displays, percentiles, box plots, and skewness. It provides examples of each method using sample data sets and step-by-step calculations. Contingency tables are also introduced as a way to study relationships between nominal or ordinal variables.
business statistics . In this chapter, you learn:
To construct and interpret confidence interval estimates for the population mean
To determine the sample size necessary to develop a confidence interval for the population mean
A point estimate is a single number,
a confidence interval provides additional information about the variability of the estimate
Suppose confidence level = 95%
Also written (1 - ) = 0.95, (so = 0.05)
A relative frequency interpretation:
95% of all the confidence intervals that can be constructed will contain the unknown true parameter
A specific interval either will contain or will not contain the true parameter
No probability involved in a specific interval. A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.
We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms
Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean
Interpreting this interval requires the assumption that the population you are sampling from is approximately a normal distribution (especially since n is only 25).
This condition can be checked by creating a:
Normal probability plot or
Boxplot
The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - )
The margin of error is also called sampling error
the amount of imprecision in the estimate of the population parameter
the amount added and subtracted to the point estimate to form the confidence interval
Descriptive statistics helps users to describe and understand the features of a specific dataset, by providing short summaries and a graphic depiction of the measured data. Descriptive Statistical algorithms are sophisticated techniques that, within the confines of a self-serve analytical tool, can be simplified in a uniform, interactive environment to produce results that clearly illustrate answers and optimize decisions.
This chapter discusses various methods for summarizing and exploring data, including dot plots, stem-and-leaf displays, percentiles, box plots, and scatter plots. Dot plots and stem-and-leaf displays organize data in a way that shows the distribution while maintaining each data point. Percentiles such as the median and quartiles divide data into equal portions. Box plots graphically show the center, spread, and outliers of data. Scatter plots reveal relationships between two variables, while contingency tables summarize categorical data relationships.
This chapter discusses various methods for describing and exploring quantitative data, including dot plots, stem-and-leaf displays, percentiles, box plots, measures of skewness, scatter diagrams, and contingency tables. It provides examples and explanations of how to construct and interpret each method. Key goals are to develop an understanding of distributions and relationships within data sets.
The document provides an overview of techniques for describing and exploring data, including dot plots, stem-and-leaf displays, measures of central tendency, box plots, coefficients of skewness, scatterplots, and contingency tables. It defines each technique and provides examples to illustrate how to construct and interpret the visualizations. The learning objectives cover how to use each technique to analyze and draw conclusions from sets of quantitative data.
The document provides an overview of various techniques for describing and exploring data, including dot plots, stem-and-leaf displays, measures of position such as percentiles, box plots, skewness, scatterplots, and contingency tables. It defines each technique and provides examples of their construction and interpretation. Learning objectives cover how to construct and interpret each type of graph or statistical measure.
The document discusses various methods for describing and exploring data, including dot plots, stem-and-leaf displays, percentiles, box plots, and skewness. It provides examples of each method using sample data sets and step-by-step calculations. Contingency tables are also introduced as a way to study relationships between nominal or ordinal variables.
business statistics . In this chapter, you learn:
To construct and interpret confidence interval estimates for the population mean
To determine the sample size necessary to develop a confidence interval for the population mean
A point estimate is a single number,
a confidence interval provides additional information about the variability of the estimate
Suppose confidence level = 95%
Also written (1 - ) = 0.95, (so = 0.05)
A relative frequency interpretation:
95% of all the confidence intervals that can be constructed will contain the unknown true parameter
A specific interval either will contain or will not contain the true parameter
No probability involved in a specific interval. A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.
We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms
Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean
Interpreting this interval requires the assumption that the population you are sampling from is approximately a normal distribution (especially since n is only 25).
This condition can be checked by creating a:
Normal probability plot or
Boxplot
The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - )
The margin of error is also called sampling error
the amount of imprecision in the estimate of the population parameter
the amount added and subtracted to the point estimate to form the confidence interval
Descriptive statistics helps users to describe and understand the features of a specific dataset, by providing short summaries and a graphic depiction of the measured data. Descriptive Statistical algorithms are sophisticated techniques that, within the confines of a self-serve analytical tool, can be simplified in a uniform, interactive environment to produce results that clearly illustrate answers and optimize decisions.
This chapter discusses various methods for summarizing and exploring data, including dot plots, stem-and-leaf displays, percentiles, box plots, and scatter plots. Dot plots and stem-and-leaf displays organize data in a way that shows the distribution while maintaining each data point. Percentiles such as the median and quartiles divide data into equal portions. Box plots graphically show the center, spread, and outliers of data. Scatter plots reveal relationships between two variables, while contingency tables summarize categorical data relationships.
This chapter discusses various methods for describing and exploring quantitative data, including dot plots, stem-and-leaf displays, percentiles, box plots, measures of skewness, scatter diagrams, and contingency tables. It provides examples and explanations of how to construct and interpret each method. Key goals are to develop an understanding of distributions and relationships within data sets.
The document provides an overview of techniques for describing and exploring data, including dot plots, stem-and-leaf displays, measures of central tendency, box plots, coefficients of skewness, scatterplots, and contingency tables. It defines each technique and provides examples to illustrate how to construct and interpret the visualizations. The learning objectives cover how to use each technique to analyze and draw conclusions from sets of quantitative data.
The document provides an overview of various techniques for describing and exploring data, including dot plots, stem-and-leaf displays, measures of position such as percentiles, box plots, skewness, scatterplots, and contingency tables. It defines each technique and provides examples of their construction and interpretation. Learning objectives cover how to construct and interpret each type of graph or statistical measure.
The chi-square test is used to determine if there is a significant difference between expected frequencies and observed frequencies in categorical data. It compares observed values to expected values that would occur according to a specific hypothesis. The chi-square test formula calculates the sum of the squares of the differences between observed and expected values divided by the expected value. A researcher would perform a chi-square test on survey data to understand the relationship between categorical variables like gender, age, and responses.
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxRanggaMasyhuriNuur
The document discusses various graphical methods for presenting data, including histograms, polygons, pie charts, ogives, and stem-and-leaf plots. Histograms display the frequency distribution of data using bars of varying heights. Polygons connect the midpoints of histogram bars with straight lines. Pie charts represent proportions using circular slices. Ogives show cumulative frequencies with class limits on the x-axis and cumulative counts on the y-axis. Stem-and-leaf plots break values into "stems" and "leaves" for an organized display of the raw data. Examples are provided for constructing each type of graph using sample data sets.
This document provides an overview of key concepts in statistics including:
- Descriptive statistics such as frequency distributions which organize and summarize data
- Inferential statistics which make estimates or predictions about populations based on samples
- Types of variables including quantitative, qualitative, discrete and continuous
- Levels of measurement including nominal, ordinal, interval and ratio
- Common measures of central tendency (mean, median, mode) and dispersion (range, standard deviation)
This chapter discusses various methods for describing and exploring data, including dot plots, percentiles, box plots, and scatter diagrams. Dot plots display each data point along a number line and are useful for small data sets. Percentiles divide a data set into equal percentages and are used to calculate quartiles. Box plots graphically depict the center, spread, and outliers of a data set. Scatter diagrams show the relationship between two variables by plotting one on the x-axis and one on the y-axis. Contingency tables organize counts of observations into categories to study relationships between nominal or ordinal variables.
TSTD 6251 Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
TSTD 6251 Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n =
30 major league baseball teams:
19.4 26.6 22.9 44.5 24.4 19.0 27.5 19.9 22.8 19.0 16.9 15.2 25.7 19.0 15.5 17.1 15.6 10.6 16.2 15.6 15.4 18.2 15.5 14.2 9.5 9.9
10.7 11.9 26.7 17.5
Require:
a. Compute the mean, variance and standard deviation.
b. Find the sample median, first quartile, and third quartile.
c. Construct a boxplot and interpret the distribution of the data.
d. Discuss the distribution of this set of data by examining kurtosis and skewness
statistics, such as if the distribution is skewed to one side of the distribution, and if the
distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics
:
Enter the 30 data values in the first column of SPSS
Data View
Tab
Variable View
and name this variable
receipts
Adjust
Decimals
to 3 decimal points
Type
Admission Receipts
($ mn)
in the
Label
column for output viewer
Return to
Data View
and click
A
nalyze
on the menu bar
Click the second menu
D
e
scriptive Statistics
Click
F
requencies …
Move
Admission Receipts
to the
Variable(s)
list by clicking the arrow button
Click
S
tatistics …
button at the top of the dialog box
Now, you can select the descriptive statistics according to what the question requires. For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes
except for
P
ercentile(s): and Va
l
ues are group midpoints
.
Click
Continue
to return to the
Frequencies
dialog box
Click
OK
to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
N
Valid
30
Missing
0
Mean
18.76333
Std. Error of Mean
1.278590
Median
17.30000
Mode
19.000
Std. Deviation
7.003127
Variance
49.043782
Skewness
1.734
Std. Error of Skewness
.427
Kurtosis
5.160
Std. Error of Kurtosis
.833
Range
35.000
Minimum
9.500
Maximum
44.500
Sum
562.900
Percentiles
10
10.61000
20
14.40000
25
15.35000
30
15.50000
40
15.84000
50
17.30000
60
19.00000
70
19.75000
75
22.82500
80
24.10000
90
26.69000
Admission Receipts
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
9.500
1
3.3
3.3
3.3
9.900
1
3.3
3.3
6.7
10.600
1
3.3
3.3
10.0
10.700
1
3.3
3.3
13.3
11.900
1
3.3
3.3
16.7
14.200
1
3.3
3.3
20.0
15.2.
This document provides an overview of descriptive statistics techniques for summarizing categorical and quantitative data. It discusses frequency distributions, measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for visualizing data through charts, graphs, and other displays. The goal of descriptive statistics is to organize and describe the characteristics of data through counts, averages, and other summaries.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
This document provides an introduction to statistics, including what statistics is, who uses it, and different types of variables and data presentation. Statistics is defined as collecting, organizing, analyzing, and interpreting numerical data to assist with decision making. Descriptive statistics organizes and summarizes data, while inferential statistics makes estimates or predictions about populations based on samples. Variables can be qualitative or quantitative, and quantitative variables can be discrete or continuous. Data can be presented through frequency tables, graphs like histograms and polygons, and cumulative frequency distributions.
This document provides an overview of key numerical measures used to describe data, including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It defines each measure, provides examples of calculating them, and discusses their characteristics, uses, and advantages/disadvantages. The document also covers weighted means, geometric means, Chebyshev's theorem, and calculating measures for grouped data.
This document provides an overview of statistics concepts and tasks. It includes 5 tasks covering topics like data collection methods, graphing data, measures of central tendency, and variance. The document also defines key statistical terms and graphs. It aims to introduce students to fundamental statistical concepts and how statistics are used across various domains like weather, health, business and more.
1. This document discusses various quantitative techniques used in business, including measures of central tendency (mean, median, mode), cumulative frequency distributions, different types of graphs (pie charts, bar charts, histograms, frequency polygons), and methods for determining trends in time series data.
2. Measures of central tendency include the mean, median, and mode. Different measures are more appropriate depending on the data. The document also defines the arithmetic mean, geometric mean, median, and mode.
3. Graphs covered include pie charts, single/grouped/stacked bar charts, histograms, and frequency polygons. Trend analysis discusses using the method of least squares to fit a straight line trend to time series data.
Statistics involves collecting, organizing, and analyzing data. There are several ways to present data including lists, frequency charts, histograms, percentage charts, and pie charts. Central tendency refers to averages that describe the center of a data set. The three main measures of central tendency are the mean, median, and mode. The mean is calculated by adding all values and dividing by the total number. The median is the middle value when data is arranged from lowest to highest. The mode is the most frequent value. A weighted mean assigns different weights or importance to values before calculating the average.
The document discusses various steps and techniques for preparing and analyzing data, including:
1) Steps for data preparation such as questionnaire checking, editing, coding, transcribing, cleaning, statistical adjustments, and selecting an analysis strategy.
2) Common graphical presentations like bar charts, pie charts, and frequency tables to visualize categorical data.
3) Cross tabulation as a method to analyze relationships between multiple variables by grouping them in contingency tables to identify patterns and correlations.
4) The chi-square test as a statistical hypothesis test to determine if sample data matches a population distribution or to compare two variables in a contingency table to assess independence.
This document discusses various methods for describing and exploring data, including dot plots, box plots, and measures of position such as percentiles and quartiles. Dot plots display each observation along a number line to show the distribution of values. Box plots use the minimum, quartiles, median and maximum to graphically depict a dataset. Percentiles and quartiles split a sorted dataset into equal groups to indicate location within the data. The interquartile range and coefficients of skewness are also introduced to analyze the shape, spread and outliers of distributions. Examples are provided to demonstrate computing and interpreting these descriptive statistics.
The document contains an outline of the table of contents for a textbook on general statistics. It covers topics such as preliminary concepts, data collection and presentation, measures of central tendency, measures of dispersion and skewness, and permutations and combinations. Sample chapters discuss introduction to statistics, variables and data, methods of presenting data through tables, graphs and diagrams, computing the mean, median and mode, and other statistical measures.
This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
In our increasingly data-driven world, it's more important than ever to have accessible ways to view and understand data.
After all, employees' demand for data skills steadily increases each year.
Employees and Business owners at every level need to understand data and its impact.
That's where Data Visualization comes in handy.
To make Data more accessible and understandable, Data Visualization in Dashboards is the go-to tool for many businesses to Analyze and share Information.
Principal Component Analysis and ClusteringUsha Vijay
Identifying the borrower segments from the give bank data set which has 27000 rows and 77 variable using PROC PRINCOMP. variables, it is important to reduce the data set to a smaller set of variables to derive a feasible
conclusion. With the effect of multicollinearity two or more variables can share the same plane in the in dimensions. Each row of the data can
be envisioned as a 77 dimensional graph and when we project the data as orthonormal, it is expected that the certain characteristics of the
data based on the plots to cluster together as principal components. In order to identify these principal components. PROC PRINCOMP is
executed with all the variables except the constant variables(recoveries and collection fees) and we derive a plot of Eigen values of all the
principal components
More Related Content
Similar to Chap 04 - Describing Data_Displaying and Exploring Data.pdf
The chi-square test is used to determine if there is a significant difference between expected frequencies and observed frequencies in categorical data. It compares observed values to expected values that would occur according to a specific hypothesis. The chi-square test formula calculates the sum of the squares of the differences between observed and expected values divided by the expected value. A researcher would perform a chi-square test on survey data to understand the relationship between categorical variables like gender, age, and responses.
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxRanggaMasyhuriNuur
The document discusses various graphical methods for presenting data, including histograms, polygons, pie charts, ogives, and stem-and-leaf plots. Histograms display the frequency distribution of data using bars of varying heights. Polygons connect the midpoints of histogram bars with straight lines. Pie charts represent proportions using circular slices. Ogives show cumulative frequencies with class limits on the x-axis and cumulative counts on the y-axis. Stem-and-leaf plots break values into "stems" and "leaves" for an organized display of the raw data. Examples are provided for constructing each type of graph using sample data sets.
This document provides an overview of key concepts in statistics including:
- Descriptive statistics such as frequency distributions which organize and summarize data
- Inferential statistics which make estimates or predictions about populations based on samples
- Types of variables including quantitative, qualitative, discrete and continuous
- Levels of measurement including nominal, ordinal, interval and ratio
- Common measures of central tendency (mean, median, mode) and dispersion (range, standard deviation)
This chapter discusses various methods for describing and exploring data, including dot plots, percentiles, box plots, and scatter diagrams. Dot plots display each data point along a number line and are useful for small data sets. Percentiles divide a data set into equal percentages and are used to calculate quartiles. Box plots graphically depict the center, spread, and outliers of a data set. Scatter diagrams show the relationship between two variables by plotting one on the x-axis and one on the y-axis. Contingency tables organize counts of observations into categories to study relationships between nominal or ordinal variables.
TSTD 6251 Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
TSTD 6251 Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n =
30 major league baseball teams:
19.4 26.6 22.9 44.5 24.4 19.0 27.5 19.9 22.8 19.0 16.9 15.2 25.7 19.0 15.5 17.1 15.6 10.6 16.2 15.6 15.4 18.2 15.5 14.2 9.5 9.9
10.7 11.9 26.7 17.5
Require:
a. Compute the mean, variance and standard deviation.
b. Find the sample median, first quartile, and third quartile.
c. Construct a boxplot and interpret the distribution of the data.
d. Discuss the distribution of this set of data by examining kurtosis and skewness
statistics, such as if the distribution is skewed to one side of the distribution, and if the
distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics
:
Enter the 30 data values in the first column of SPSS
Data View
Tab
Variable View
and name this variable
receipts
Adjust
Decimals
to 3 decimal points
Type
Admission Receipts
($ mn)
in the
Label
column for output viewer
Return to
Data View
and click
A
nalyze
on the menu bar
Click the second menu
D
e
scriptive Statistics
Click
F
requencies …
Move
Admission Receipts
to the
Variable(s)
list by clicking the arrow button
Click
S
tatistics …
button at the top of the dialog box
Now, you can select the descriptive statistics according to what the question requires. For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes
except for
P
ercentile(s): and Va
l
ues are group midpoints
.
Click
Continue
to return to the
Frequencies
dialog box
Click
OK
to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
N
Valid
30
Missing
0
Mean
18.76333
Std. Error of Mean
1.278590
Median
17.30000
Mode
19.000
Std. Deviation
7.003127
Variance
49.043782
Skewness
1.734
Std. Error of Skewness
.427
Kurtosis
5.160
Std. Error of Kurtosis
.833
Range
35.000
Minimum
9.500
Maximum
44.500
Sum
562.900
Percentiles
10
10.61000
20
14.40000
25
15.35000
30
15.50000
40
15.84000
50
17.30000
60
19.00000
70
19.75000
75
22.82500
80
24.10000
90
26.69000
Admission Receipts
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
9.500
1
3.3
3.3
3.3
9.900
1
3.3
3.3
6.7
10.600
1
3.3
3.3
10.0
10.700
1
3.3
3.3
13.3
11.900
1
3.3
3.3
16.7
14.200
1
3.3
3.3
20.0
15.2.
This document provides an overview of descriptive statistics techniques for summarizing categorical and quantitative data. It discusses frequency distributions, measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for visualizing data through charts, graphs, and other displays. The goal of descriptive statistics is to organize and describe the characteristics of data through counts, averages, and other summaries.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
This document provides an introduction to statistics, including what statistics is, who uses it, and different types of variables and data presentation. Statistics is defined as collecting, organizing, analyzing, and interpreting numerical data to assist with decision making. Descriptive statistics organizes and summarizes data, while inferential statistics makes estimates or predictions about populations based on samples. Variables can be qualitative or quantitative, and quantitative variables can be discrete or continuous. Data can be presented through frequency tables, graphs like histograms and polygons, and cumulative frequency distributions.
This document provides an overview of key numerical measures used to describe data, including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It defines each measure, provides examples of calculating them, and discusses their characteristics, uses, and advantages/disadvantages. The document also covers weighted means, geometric means, Chebyshev's theorem, and calculating measures for grouped data.
This document provides an overview of statistics concepts and tasks. It includes 5 tasks covering topics like data collection methods, graphing data, measures of central tendency, and variance. The document also defines key statistical terms and graphs. It aims to introduce students to fundamental statistical concepts and how statistics are used across various domains like weather, health, business and more.
1. This document discusses various quantitative techniques used in business, including measures of central tendency (mean, median, mode), cumulative frequency distributions, different types of graphs (pie charts, bar charts, histograms, frequency polygons), and methods for determining trends in time series data.
2. Measures of central tendency include the mean, median, and mode. Different measures are more appropriate depending on the data. The document also defines the arithmetic mean, geometric mean, median, and mode.
3. Graphs covered include pie charts, single/grouped/stacked bar charts, histograms, and frequency polygons. Trend analysis discusses using the method of least squares to fit a straight line trend to time series data.
Statistics involves collecting, organizing, and analyzing data. There are several ways to present data including lists, frequency charts, histograms, percentage charts, and pie charts. Central tendency refers to averages that describe the center of a data set. The three main measures of central tendency are the mean, median, and mode. The mean is calculated by adding all values and dividing by the total number. The median is the middle value when data is arranged from lowest to highest. The mode is the most frequent value. A weighted mean assigns different weights or importance to values before calculating the average.
The document discusses various steps and techniques for preparing and analyzing data, including:
1) Steps for data preparation such as questionnaire checking, editing, coding, transcribing, cleaning, statistical adjustments, and selecting an analysis strategy.
2) Common graphical presentations like bar charts, pie charts, and frequency tables to visualize categorical data.
3) Cross tabulation as a method to analyze relationships between multiple variables by grouping them in contingency tables to identify patterns and correlations.
4) The chi-square test as a statistical hypothesis test to determine if sample data matches a population distribution or to compare two variables in a contingency table to assess independence.
This document discusses various methods for describing and exploring data, including dot plots, box plots, and measures of position such as percentiles and quartiles. Dot plots display each observation along a number line to show the distribution of values. Box plots use the minimum, quartiles, median and maximum to graphically depict a dataset. Percentiles and quartiles split a sorted dataset into equal groups to indicate location within the data. The interquartile range and coefficients of skewness are also introduced to analyze the shape, spread and outliers of distributions. Examples are provided to demonstrate computing and interpreting these descriptive statistics.
The document contains an outline of the table of contents for a textbook on general statistics. It covers topics such as preliminary concepts, data collection and presentation, measures of central tendency, measures of dispersion and skewness, and permutations and combinations. Sample chapters discuss introduction to statistics, variables and data, methods of presenting data through tables, graphs and diagrams, computing the mean, median and mode, and other statistical measures.
This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
In our increasingly data-driven world, it's more important than ever to have accessible ways to view and understand data.
After all, employees' demand for data skills steadily increases each year.
Employees and Business owners at every level need to understand data and its impact.
That's where Data Visualization comes in handy.
To make Data more accessible and understandable, Data Visualization in Dashboards is the go-to tool for many businesses to Analyze and share Information.
Principal Component Analysis and ClusteringUsha Vijay
Identifying the borrower segments from the give bank data set which has 27000 rows and 77 variable using PROC PRINCOMP. variables, it is important to reduce the data set to a smaller set of variables to derive a feasible
conclusion. With the effect of multicollinearity two or more variables can share the same plane in the in dimensions. Each row of the data can
be envisioned as a 77 dimensional graph and when we project the data as orthonormal, it is expected that the certain characteristics of the
data based on the plots to cluster together as principal components. In order to identify these principal components. PROC PRINCOMP is
executed with all the variables except the constant variables(recoveries and collection fees) and we derive a plot of Eigen values of all the
principal components
Similar to Chap 04 - Describing Data_Displaying and Exploring Data.pdf (20)
2. ◼ Chapter 02 ⇒ descriptive statistics ⇒ transform raw or
ungrouped data into a meaningful form.
Organization of data into a frequency distribution.
Presentation of frequency distribution in graphic form as a
histogram or a frequency polygon.
Visualization of where data tends to cluster, the largest
and the smallest values, and general shape of the data.
◼ Chapter 03 ⇒ computed several measures of location,
such as mean and median.
Reporting of a typical value in the set of observations.
Computation of several measures of dispersion, such as
range and standard deviation.
Describing variation or spread in a set of observations.
Review of Descriptive Statistics
4-2
3. ◼ Dot plot
◼ Stem-and-leaf display
◼ Measures of position
◼ Box plot
◼ Coefficient of skewness
◼ Scatterplot
◼ Contingency table
Content
4-3
◼ Providing additional insight into where the values are
concentrated as well as the general shape of the data.
◼ Consideration of bivariate data:
Two variables for each individual or observation selected.
Examples:
◼ Number of studying hours and the points earned on an
examination.
◼ Whether a sampled product is acceptable or not and the
shift on which it is manufactured.
5. Disadvantages of Frequency
Distribution
◼ Two disadvantages to organizing the data
into a frequency distribution:
❑The exact identity of each value is lost.
❑Difficult to tell how the values within each
class are distributed.
4-5
6. Dot Plots
◼ A dot plot groups the data as little as
possible and the identity of an individual
observation is not lost.
◼ To develop a dot plot, each observation is
simply displayed as a dot along a horizontal
number line indicating the possible values of
the data.
◼ If there are identical observations or the
observations are too close to be shown
individually, the dots are “piled” on top of each
other.
4-6
8. Stem-and-Leaf
◼ Stem-and-leaf display is a statistical technique
to present a set of data.
Each numerical value is divided into two
parts.
The leading digit(s) becomes the stem and
the trailing digit the leaf.
The stems are located along the vertical
axis, and the leaf values are stacked
against each other along the horizontal
axis.
4-8
9. Stem-and-Leaf
EXAMPLE
Listed in Table 4–1 is the number of 30-second radio
advertising spots purchased by each of the 45 members
of the Greater Buffalo Automobile Dealers Association last
year. Organize the data into a stem-and-leaf display.
Around what values do the number of advertising spots
tend to cluster? What is the fewest number of spots
purchased by a dealer? The largest number purchased?
4-9
10. Stem-and-Leaf
EXAMPLE
Listed in Table 4–1 is the number of 30-second radio advertising spots
purchased by each of the 45 members of the Greater Buffalo Automobile
Dealers Association last year. Organize the data into a stem-and-leaf
display. Around what values do the number of advertising spots tend to
cluster? What is the fewest number of spots purchased by a dealer? The
largest number purchased?
4-10
11. Quartiles and Percentiles
◼ The standard deviation is the most widely used
measure of dispersion.
◼ Alternative ways of describing spread of data
include determining the location of values that
divide a set of observations into equal parts.
◼ These measures include quartiles and
percentiles.
4-11
12. Percentiles and Quartiles
◼ To formalize the computational procedure, let Lp
refer to the location of a desired percentile. So, if
we wanted to find the 33rd percentile we would use
L33 and if we wanted the median, the 50th
percentile, then L50.
◼ The number of observations is n, so if we want to
locate the median, its position is at (n + 1)/2, or we
could write this as (n + 1)(P/100), where P is the
desired percentile.
4-12
13. Percentiles - Example
EXAMPLE
Listed below are the commissions earned last
month by a sample of 15 brokers at XYZ
Securities Ltd.
$2,038 $1,758 $1,721 $1,637 $2,097 $2,047
$2,205 $1,787 $2,287 $1,940 $2,311 $2,054
$2,406 $1,471 $1,460
Locate the median, the first quartile, and the
third quartile for the commissions earned.
4-13
14. Percentiles - Example
EXAMPLE
Listed below are the commissions earned last month by a
sample of 15 brokers at XYZ Securities Ltd.
$2,038 $1,758 $1,721 $1,637 $2,097 $2,047 $2,205 $1,787
$2,287 $1,940 $2,311 $2,054 $2,406 $1,471 $1,460
Locate the median, the first quartile, and the third quartile
for the commissions earned.
Step 1: Organize the data from lowest to largest value.
$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038
$2,047 $2,054 $2,097 $2,205 $2,287 $2,311 $2,406
Step 2: Compute the first and third quartiles.
Locate L25 and L75 using:
4-14
15. Percentiles - Example
Step 1: Organize the data from lowest to largest value
$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038 $2,047
$2,054 $2,097 $2,205 $2,287 $2,311 $2,406
Step 2: Compute the first and third quartiles. Locate L25 and L75 using:
𝐿25 = (15 + 1)
25
100
= 4 𝐿75 = (15 + 1)
75
100
= 12
Therefore, the first and third quartiles are located at the 4th
and 12th positions, respectively
𝐿25 = $1,721
𝐿75 = $2,205
4-15
17. Box Plots
◼ A box plot is a graphical display, based on
quartiles, that helps us picture a set of data.
◼ To construct a box plot, we need only five
statistics:
Minimum value,
Q1 (the first quartile),
Median,
Q3 (the third quartile), and
Maximum value.
4-17
19. Boxplot - Example
Step 1: Create an appropriate scale along
horizontal axis.
Step 2: Draw a box that starts at Q1 (15 mins)
and ends at Q3 (22 mins).
Inside the box we place a vertical line to
represent the median (18 mins).
Step 3: Extend horizontal lines from the box out
to minimum value (13 mins) and maximum value
(30 mins).
4-19
20. Boxplot - Example
Step 1: Create an appropriate scale along horizontal axis.
Step 2: Draw a box that starts at Q1 (15 mins) and ends at Q3 (22
mins).
Inside the box we place a vertical line to represent the median (18
mins).
Step 3: Extend horizontal lines from the box out to minimum value
(13 mins) and maximum value (30 mins).
4-20
21. Skewness
◼ Another characteristic of a set of data is the shape.
◼ Commonly observed shapes: symmetric,
positively skewed, negatively skewed.
4-21
22. Skewness
4-22
◼ The coefficient of skewness can range from -3
up to 3.
A value near -3, indicates considerable
negative skewness.
A value such as 1.63 indicates moderate
positive skewness.
A value of 0, which will occur when the mean
and median are equal, indicates the
distribution is symmetrical and that there is
no skewness present.
23. Kurtosis
4-23
◼ Like skewness, kurtosis is a statistical measure
that is used to describe distribution.
◼ Kurtosis refers to a measure of the degree to
which a given distribution is more or less
‘peaked’, relative to the normal distribution.
25. Kurtosis
4-25
◼ Leptokurtic
More peaked than the normal distribution.
The higher peak results from clustering of
data points along the X-axis.
The coefficient of kurtosis is usually found to
be more than 3.
26. Kurtosis
4-26
◼ Platykurtic
Has extremely dispersed points along the X-axis
resulting to a lower peak when compared to the
normal distribution.
The distribution’s shape is wide and flat.
The points are less clustered around the mean.
The coefficient of kurtosis is usually less than 3.
◼ Mesokurtic
Has a curve similar to that of the normal distribution.
27. Scatter Diagram
Describing Relationship between Two Variables:
◼ When we study the relationship between two
variables, we refer to the data as bivariate.
◼ One graphical technique we use to show the
relationship between variables is called a scatter
diagram.
◼ To draw a scatter diagram, we need two variables.
◼ We scale one variable along the horizontal axis (X-
axis) of a graph and the other variable along the
vertical axis (Y-axis).
4-27
31. Describing Relationship between Two
Variables – Scatter Diagram Example
In Chapter 2, data from Auto USA was presented. We
gathered information concerning several variables,
including the profit earned from the sale of 180
vehicles sold last month. In addition to the amount
of profit on each sale, one of the other variables is
the age of the purchaser.
Is there a relationship between the profit earned on a
vehicle sale and the age of the purchaser?
Would it be reasonable to conclude that the more
expensive vehicles are purchased by older
buyers?
4-31
33. Describing Relationship between Two
Variables – Scatter Diagram Example
◼ The scatter diagram
shows a rather weak
positive relationship
between the two
variables.
◼ It does not appear
there is much
relationship
between the vehicle
profit and the age of
the buyer.
4-33
34. Contingency Tables
◼ A scatter diagram requires that both of the
variables be at least interval scale.
◼ What if we wish to study the relationship
between two variables when one or both are
nominal or ordinal scale?
◼ In this case, we tally the results in a
contingency table.
4-34
35. Contingency Tables
4-35
Examples:
1. Students at a university are classified by
gender and class rank.
2. A product is classified as acceptable or
unacceptable and by the shift (day, afternoon,
or night) on which it is manufactured.
36. Contingency Tables – An Example
There are four dealerships in the Apple wood Auto
group.
Suppose we want to compare the profit earned on
each vehicle sold by the particular dealership. To
put it another way, is there a relationship between the
amount of profit earned and the dealership? The
table next slide is the cross-tabulation of the raw
data of the two variables.
4-36
37. Contingency Tables – An Example
There are four dealerships in the Apple wood Auto group.
Suppose we want to compare the profit earned on each vehicle sold
by the particular dealership. To put it another way, is there a
relationship between the amount of profit earned and the dealership?
The table below is the cross-tabulation of the raw data of the two
variables.
4-37
38. Contingency Tables – An Example
From the contingency table, we observe the following:
1. From the Total column on the right, 90 of the 180 cars
sold had a profit above the median and half below. From
the definition of the median this is expected.
4-38
39. Contingency Tables – An Example
From the contingency table, we observe the following:
2. For the Kane dealership 25 out of the 52, or 48 percent, of
the cars sold were sold for a profit more than the median.
3. The percent profits above the median for the other
dealerships are 50 percent for Olean, 42 percent for Sheffield,
and 60 percent for Tionesta.
4-39
40. LO1 Construct and interpret a dot plot.
LO2 Construct and describe a stem-and-leaf display.
LO3 Identify and compute measures of position.
LO4 Construct and analyze a box plot.
LO5 Compute and describe the coefficient of skewness.
LO6 Create and interpret a scatterplot.
LO7 Develop and explain a contingency table.
Learning Objectives
4-40