Frequencies, Proportion, Graphs
February 8th, 2016
Frequency, Percentages, and Proportions
Frequency number of participants or cases.
Denoted by the symbol f.
N can also mean frequency.
f=50 or N=50 had a score of 80. Both mean that 50 people had a score of 80.
Percentage, number per 100 who have a certain characteristic.
64% of registered voters are Democrats; each 100 registered voters 64 are Democrats
To determine how many are Democrats
Multiply the total number of registered voters by .64; if we have 2,200 registered voters; .64 X 2200=1408 are democrats.
2
Percentage and Proportions
Percentages
32 of 96 children reported that a dog is their favorite animal ; 32/96=.33*100=33% of these students like dogs.
Interpretation: Based on this sample, out of 100 participants from the same population, we can expect about 33 of them to report that dogs are their favorite animal.
Proportions is part of numeral 1.
Proportion of children who like dogs is .33
Meaning that 33 hundredths of the children like dogs.
Percentages are easier to interpret.
Percentages Cont’d
Good to report the sample size with the frequency
Percentages can help us understand differences between groups of individuals
College ACollege BNumber of Education MajorsN=500N=800Early Childhood EducationN= 400 (80%)N=600
(75%)
Shapes of Distributions
Frequency distribution
Number of participants have each score
Remember that we are describing our data
X (Score)f2522442352110207194181171N=34
Frequency Polygon
Histogram
Shapes of Distributions Cont’d
Normal Distribution
Most Important shape (shape found in nature)
Heights of 10 year old boys in a large population
Bell-shaped curve
Used for inferential Statistics
Skewed Distributions
Skew: most frequent scores are clustered at one end of the distribution
The symmetry of the distribution.
Positive skew (scores bunched at low values with the tail pointing to high values).
Negative skew (scores bunched at high values with the tail pointing to low values).
Consider how groups differ depending on their standard deviation.
68% of the cases lie within one standard deviation unit of the mean in a normal distribution.
95% of the cases lie within two standard deviation unit of the mean in a normal distribution
99% of the cases lie within three standard deviation unit of the mean in a normal distribution
Standard Deviation and the Normal Distribution
February 1, 2016
Descriptive Statistics
Number of Children in families
Order of finish in the Boston Marathon
Grading System (A, B, C, D, F)
Level of Blood Sugar
Time required to complete a maze
Political Party Affiliation
Amount of gasoline consumed
Majors in College
IQ scores
Number of Fatal Accidents
Level of Measurement Examples
2
Types of Statistics
We use descriptive statistics to summarize data
Think about measures of central tendency and variability
We use correlational statistics to describe the relationship between two variables
Considered as a ...
This document provides an overview and objectives for Chapter 3 of the textbook "Statistical Techniques in Business and Economics" by Lind. The chapter covers describing data through numerical measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). It includes examples of computing various measures like the weighted mean, median, mode, and interpreting their relationships. The document also lists learning activities for students such as reading the chapter, watching video lectures, completing practice problems in the book, and participating in an online discussion forum.
It's about statistical methods.
Data analysis,Grouped-Ungrouped data,Mean,Median,Mode,Percentile,Standard Deviation,Variance,Frequency Distribution Graphs,Corelation
TSTD 6251 Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
TSTD 6251 Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n =
30 major league baseball teams:
19.4 26.6 22.9 44.5 24.4 19.0 27.5 19.9 22.8 19.0 16.9 15.2 25.7 19.0 15.5 17.1 15.6 10.6 16.2 15.6 15.4 18.2 15.5 14.2 9.5 9.9
10.7 11.9 26.7 17.5
Require:
a. Compute the mean, variance and standard deviation.
b. Find the sample median, first quartile, and third quartile.
c. Construct a boxplot and interpret the distribution of the data.
d. Discuss the distribution of this set of data by examining kurtosis and skewness
statistics, such as if the distribution is skewed to one side of the distribution, and if the
distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics
:
Enter the 30 data values in the first column of SPSS
Data View
Tab
Variable View
and name this variable
receipts
Adjust
Decimals
to 3 decimal points
Type
Admission Receipts
($ mn)
in the
Label
column for output viewer
Return to
Data View
and click
A
nalyze
on the menu bar
Click the second menu
D
e
scriptive Statistics
Click
F
requencies …
Move
Admission Receipts
to the
Variable(s)
list by clicking the arrow button
Click
S
tatistics …
button at the top of the dialog box
Now, you can select the descriptive statistics according to what the question requires. For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes
except for
P
ercentile(s): and Va
l
ues are group midpoints
.
Click
Continue
to return to the
Frequencies
dialog box
Click
OK
to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
N
Valid
30
Missing
0
Mean
18.76333
Std. Error of Mean
1.278590
Median
17.30000
Mode
19.000
Std. Deviation
7.003127
Variance
49.043782
Skewness
1.734
Std. Error of Skewness
.427
Kurtosis
5.160
Std. Error of Kurtosis
.833
Range
35.000
Minimum
9.500
Maximum
44.500
Sum
562.900
Percentiles
10
10.61000
20
14.40000
25
15.35000
30
15.50000
40
15.84000
50
17.30000
60
19.00000
70
19.75000
75
22.82500
80
24.10000
90
26.69000
Admission Receipts
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
9.500
1
3.3
3.3
3.3
9.900
1
3.3
3.3
6.7
10.600
1
3.3
3.3
10.0
10.700
1
3.3
3.3
13.3
11.900
1
3.3
3.3
16.7
14.200
1
3.3
3.3
20.0
15.2.
The document provides information about measures of central tendency (mean, median, mode) and measures of dispersion (range, quartiles, variance, standard deviation) using examples of data distributions. It defines key terms like mean, median, mode, range, quartiles, variance and standard deviation. It also shows how to calculate and interpret these measures of central tendency and dispersion using sample data sets.
This document discusses significant figures and their importance in science and measurement. It defines accuracy as how close a measurement is to the true value, and precision as the reproducibility of measurements under the same conditions. Significant figures refer to the meaningful digits in a measurement, with rules for determining which digits are significant. Trailing zeros and places of a decimal can indicate precision. Understanding significant figures allows scientists to properly convey the certainty of measurements.
This document provides an overview of key concepts related to normal distributions, including:
1) It introduces density curves and how they can be used to model distributions, with the normal distribution having a bell-shaped curve defined by a mean and standard deviation.
2) It explains how the mean and median can differ for skewed distributions and how they are the same for symmetric normal distributions.
3) It outlines the "68-95-99.7 rule" which indicates what percentage of observations fall within a certain number of standard deviations of the mean for a normal distribution.
4) It describes how data can be standardized using z-scores to transform it into a standard normal distribution for comparison purposes.
The document discusses the central limit theorem and how it relates to the shape of sampling distributions. The central limit theorem states that under certain conditions, sample statistics will follow a normal distribution. It provides examples of null distributions from hypothesis tests that are symmetric and bell-shaped due to applying the central limit theorem. It also outlines the two conditions for the central limit theorem to apply: 1) observations must be independent and 2) the sample size must be sufficiently large. Finally, it discusses the normal distribution in more detail, including how to calculate probabilities and percentiles using a calculator.
Frequencies, Proportion, Graphs
February 8th, 2016
Frequency, Percentages, and Proportions
Frequency number of participants or cases.
Denoted by the symbol f.
N can also mean frequency.
f=50 or N=50 had a score of 80. Both mean that 50 people had a score of 80.
Percentage, number per 100 who have a certain characteristic.
64% of registered voters are Democrats; each 100 registered voters 64 are Democrats
To determine how many are Democrats
Multiply the total number of registered voters by .64; if we have 2,200 registered voters; .64 X 2200=1408 are democrats.
2
Percentage and Proportions
Percentages
32 of 96 children reported that a dog is their favorite animal ; 32/96=.33*100=33% of these students like dogs.
Interpretation: Based on this sample, out of 100 participants from the same population, we can expect about 33 of them to report that dogs are their favorite animal.
Proportions is part of numeral 1.
Proportion of children who like dogs is .33
Meaning that 33 hundredths of the children like dogs.
Percentages are easier to interpret.
Percentages Cont’d
Good to report the sample size with the frequency
Percentages can help us understand differences between groups of individuals
College ACollege BNumber of Education MajorsN=500N=800Early Childhood EducationN= 400 (80%)N=600
(75%)
Shapes of Distributions
Frequency distribution
Number of participants have each score
Remember that we are describing our data
X (Score)f2522442352110207194181171N=34
Frequency Polygon
Histogram
Shapes of Distributions Cont’d
Normal Distribution
Most Important shape (shape found in nature)
Heights of 10 year old boys in a large population
Bell-shaped curve
Used for inferential Statistics
Skewed Distributions
Skew: most frequent scores are clustered at one end of the distribution
The symmetry of the distribution.
Positive skew (scores bunched at low values with the tail pointing to high values).
Negative skew (scores bunched at high values with the tail pointing to low values).
Consider how groups differ depending on their standard deviation.
68% of the cases lie within one standard deviation unit of the mean in a normal distribution.
95% of the cases lie within two standard deviation unit of the mean in a normal distribution
99% of the cases lie within three standard deviation unit of the mean in a normal distribution
Standard Deviation and the Normal Distribution
February 1, 2016
Descriptive Statistics
Number of Children in families
Order of finish in the Boston Marathon
Grading System (A, B, C, D, F)
Level of Blood Sugar
Time required to complete a maze
Political Party Affiliation
Amount of gasoline consumed
Majors in College
IQ scores
Number of Fatal Accidents
Level of Measurement Examples
2
Types of Statistics
We use descriptive statistics to summarize data
Think about measures of central tendency and variability
We use correlational statistics to describe the relationship between two variables
Considered as a ...
This document provides an overview and objectives for Chapter 3 of the textbook "Statistical Techniques in Business and Economics" by Lind. The chapter covers describing data through numerical measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). It includes examples of computing various measures like the weighted mean, median, mode, and interpreting their relationships. The document also lists learning activities for students such as reading the chapter, watching video lectures, completing practice problems in the book, and participating in an online discussion forum.
It's about statistical methods.
Data analysis,Grouped-Ungrouped data,Mean,Median,Mode,Percentile,Standard Deviation,Variance,Frequency Distribution Graphs,Corelation
TSTD 6251 Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
TSTD 6251 Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n =
30 major league baseball teams:
19.4 26.6 22.9 44.5 24.4 19.0 27.5 19.9 22.8 19.0 16.9 15.2 25.7 19.0 15.5 17.1 15.6 10.6 16.2 15.6 15.4 18.2 15.5 14.2 9.5 9.9
10.7 11.9 26.7 17.5
Require:
a. Compute the mean, variance and standard deviation.
b. Find the sample median, first quartile, and third quartile.
c. Construct a boxplot and interpret the distribution of the data.
d. Discuss the distribution of this set of data by examining kurtosis and skewness
statistics, such as if the distribution is skewed to one side of the distribution, and if the
distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics
:
Enter the 30 data values in the first column of SPSS
Data View
Tab
Variable View
and name this variable
receipts
Adjust
Decimals
to 3 decimal points
Type
Admission Receipts
($ mn)
in the
Label
column for output viewer
Return to
Data View
and click
A
nalyze
on the menu bar
Click the second menu
D
e
scriptive Statistics
Click
F
requencies …
Move
Admission Receipts
to the
Variable(s)
list by clicking the arrow button
Click
S
tatistics …
button at the top of the dialog box
Now, you can select the descriptive statistics according to what the question requires. For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes
except for
P
ercentile(s): and Va
l
ues are group midpoints
.
Click
Continue
to return to the
Frequencies
dialog box
Click
OK
to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
N
Valid
30
Missing
0
Mean
18.76333
Std. Error of Mean
1.278590
Median
17.30000
Mode
19.000
Std. Deviation
7.003127
Variance
49.043782
Skewness
1.734
Std. Error of Skewness
.427
Kurtosis
5.160
Std. Error of Kurtosis
.833
Range
35.000
Minimum
9.500
Maximum
44.500
Sum
562.900
Percentiles
10
10.61000
20
14.40000
25
15.35000
30
15.50000
40
15.84000
50
17.30000
60
19.00000
70
19.75000
75
22.82500
80
24.10000
90
26.69000
Admission Receipts
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
9.500
1
3.3
3.3
3.3
9.900
1
3.3
3.3
6.7
10.600
1
3.3
3.3
10.0
10.700
1
3.3
3.3
13.3
11.900
1
3.3
3.3
16.7
14.200
1
3.3
3.3
20.0
15.2.
The document provides information about measures of central tendency (mean, median, mode) and measures of dispersion (range, quartiles, variance, standard deviation) using examples of data distributions. It defines key terms like mean, median, mode, range, quartiles, variance and standard deviation. It also shows how to calculate and interpret these measures of central tendency and dispersion using sample data sets.
This document discusses significant figures and their importance in science and measurement. It defines accuracy as how close a measurement is to the true value, and precision as the reproducibility of measurements under the same conditions. Significant figures refer to the meaningful digits in a measurement, with rules for determining which digits are significant. Trailing zeros and places of a decimal can indicate precision. Understanding significant figures allows scientists to properly convey the certainty of measurements.
This document provides an overview of key concepts related to normal distributions, including:
1) It introduces density curves and how they can be used to model distributions, with the normal distribution having a bell-shaped curve defined by a mean and standard deviation.
2) It explains how the mean and median can differ for skewed distributions and how they are the same for symmetric normal distributions.
3) It outlines the "68-95-99.7 rule" which indicates what percentage of observations fall within a certain number of standard deviations of the mean for a normal distribution.
4) It describes how data can be standardized using z-scores to transform it into a standard normal distribution for comparison purposes.
The document discusses the central limit theorem and how it relates to the shape of sampling distributions. The central limit theorem states that under certain conditions, sample statistics will follow a normal distribution. It provides examples of null distributions from hypothesis tests that are symmetric and bell-shaped due to applying the central limit theorem. It also outlines the two conditions for the central limit theorem to apply: 1) observations must be independent and 2) the sample size must be sufficiently large. Finally, it discusses the normal distribution in more detail, including how to calculate probabilities and percentiles using a calculator.
This document discusses various statistical measures used to summarize and analyze data, including measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for comparing two variables (correlation coefficient, scatterplots). It provides examples and formulas for calculating these common statistical metrics.
The document discusses various techniques for fitting curves to data including linear regression, polynomial regression, and linearization of nonlinear relationships.
Linear regression finds the line that best fits a set of data points by minimizing the sum of the squared residuals. The normal equations are derived and solved to determine the slope and intercept. Polynomial regression extends this to find the best-fit polynomial curve through the data. An example shows fitting a second-order polynomial. Nonlinear relationships can sometimes be linearized by a transformation of variables to apply linear regression. Examples demonstrate applying these techniques.
This document provides an outline and overview of descriptive statistics. It discusses the key concepts including:
- Visualizing and understanding data through graphs and charts
- Measures of central tendency like mean, median, and mode
- Measures of spread like range, standard deviation, and interquartile range
- Different types of distributions like symmetrical, skewed, and their properties
- Levels of measurement for variables and appropriate statistics for each level
The document serves as an introduction to descriptive statistics, the goals of which are to summarize key characteristics of data through numerical and visual methods.
This document provides an overview of basic statistics concepts including descriptive statistics, measures of central tendency, variability, sampling, and distributions. It defines key terms like mean, median, mode, range, standard deviation, variance, and quantiles. Examples are provided to demonstrate how to calculate and interpret these common statistical measures.
The document discusses statistical analysis and concepts such as standard deviation, normal distribution, and t-tests. It provides examples of how to calculate and interpret standard deviation to understand the variation in data compared to the mean. It also explains how a t-test can be used to determine if there is a statistically significant difference between the means of two samples by taking into account the means, standard deviations, and population sizes.
1) The document provides information about statistics homework help and tutoring services offered by Homework Guru. It discusses various types of statistics help available, including online tutoring, homework help, and exam preparation.
2) Key aspects of their tutoring services are highlighted, including the qualifications of tutors, availability, and interactive online classrooms. Confidence intervals and how to calculate them are also explained in detail.
3) Examples are given to demonstrate how to calculate 95% and 99% confidence intervals for a population mean when the population standard deviation is known or unknown. Interval estimation procedures and when to use z-tests or t-tests are summarized.
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happinessourbusiness0014
This document discusses measures of central tendency and dispersion used in descriptive statistics. It defines mean, median, and mode as the three main measures of central tendency, and explains how to calculate and interpret each. Measures of dispersion discussed include range, variance, standard deviation, interquartile range, skewness, and kurtosis. The goals of these statistical measures are to condense data into single values, facilitate comparisons between data sets, and describe how dispersed or spread out the data are around the central tendency. Examples are provided to demonstrate calculating and applying these key statistical concepts.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This document provides information about discrete and continuous probability distributions. It defines discrete and continuous random variables and gives examples of each. It describes how to calculate the mean and variance of discrete distributions. It also introduces the binomial, Poisson, and normal distributions and provides the key properties and formulas to describe and calculate probabilities for each distribution.
The document discusses measures of variation used to describe how scores in a data set are distributed. It introduces the concept of variation and explains that just as the mean describes the central point, measures of variation describe how scores deviate from the mean. Two main types are described: those based on the distance between lowest and highest scores, and those based on deviation from the mean. The range, interquartile range, and standard deviation are discussed as examples of measures of variation.
This document provides information about the normal distribution and calculating z-scores. It includes examples of calculating z-scores based on given means, standard deviations, and individual scores. It also provides examples calculating the mean and standard deviation from raw data and frequency tables. Worked examples are provided to demonstrate how to calculate z-scores in different contexts like test scores, physical attributes, and manufacturing data.
This chapter discusses numerical measures used to describe data, including measures of center (mean, median, mode), location (percentiles, quartiles), and variation (range, variance, standard deviation, coefficient of variation). It defines these terms and how to calculate and interpret them, as well as how to construct and use box and whisker plots to graphically display data distributions.
Statistical data analysis helps achieve scientific goals of description, prediction, explanation, and control. There are descriptive statistics like measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation) to describe data. Inferential statistics allow inferences about populations from samples using hypothesis testing, estimation, and considerations of sampling error, assumptions, and spatial autocorrelation. Key challenges include accounting for spatial dependencies in geographic data and issues like the modifiable areal unit problem.
This document provides examples and explanations of various graphical methods for describing data, including frequency distributions, bar charts, pie charts, stem-and-leaf diagrams, histograms, and cumulative relative frequency plots. It demonstrates how to construct these graphs using sample data on student weights, grades, ages, and other examples. The goal is to help readers understand different ways to visually represent data distributions and patterns.
The document discusses various measures of central tendency, dispersion, and shape used to describe data numerically. It defines terms like mean, median, mode, variance, standard deviation, coefficient of variation, range, interquartile range, skewness, and quartiles. It provides formulas and examples of how to calculate these measures from data sets. The document also discusses concepts like normal distribution, empirical rule, and how measures of central tendency and dispersion do not provide information about the shape or symmetry of a distribution.
The document discusses various measures of central tendency, dispersion, and shape used to describe data numerically. It defines terms like mean, median, mode, variance, standard deviation, coefficient of variation, range, interquartile range, skewness, and quartiles. It provides formulas and examples of how to calculate these measures from data sets. The document also discusses concepts like normal distribution, empirical rule, and how measures of central tendency and dispersion do not provide information about the shape or symmetry of a distribution.
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical
distributions. In statistical terms, the sample mean from a group of observations is an
estimate of the population mean . Given a sample of size n, consider n independent random
variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these
variables has the distribution of the population, with mean and standard deviation . The
sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that
it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number
of items taken from a population. For example, if you are measuring American people’s weights,
it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the
weights of every person in the population. The solution is to take a sample of the population, say
1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are
going to analyze. In statistical terminology, it can be defined as the average of the squared
differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
• Determine the mean
• Then for each number: subtract the Mean and square the result
• Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
• Next we need to divide by the number of data points, which is simply done by
multiplying by "1/N":
Statistically it can be stated by the following:
•
http://www.statisticshowto.com/find-sample-size-statistics/
http://www.mathsisfun.com/algebra/sigma-notation.html
• This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each bush is
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Work out the sample variance
Step 1. Work out the mean
In the formula above, μ (the Greek letter "mu") is the mean of all our values.
For this example, the data points are: 9, 2, 5, 4, 12, 7, 8,.
This document provides an introduction to inferential statistics and statistical significance. It discusses key concepts like standard error of the mean, confidence intervals, and comparing means from two samples using a t-test. The document explains how inferential statistics allow researchers to make inferences about populations based on samples and determine if observed differences are likely due to chance or a real effect.
This document discusses statistical concepts like frequency, mean, standard deviation, normal distribution, z-scores, and outliers. It provides the descriptive statistics of a survey measuring job satisfaction on a 5-point scale. The mean was 4.048, with most responses between 3-5. However, some outliers with extremely low values of 1.17 were identified, questioning the normality of the distribution. Various graphs and tests were used to further analyze the data distribution and identify outliers.
This document discusses various statistical measures used to summarize and analyze data, including measures of central tendency (mean, median, mode), measures of variability (range, variance, standard deviation), and methods for comparing two variables (correlation coefficient, scatterplots). It provides examples and formulas for calculating these common statistical metrics.
The document discusses various techniques for fitting curves to data including linear regression, polynomial regression, and linearization of nonlinear relationships.
Linear regression finds the line that best fits a set of data points by minimizing the sum of the squared residuals. The normal equations are derived and solved to determine the slope and intercept. Polynomial regression extends this to find the best-fit polynomial curve through the data. An example shows fitting a second-order polynomial. Nonlinear relationships can sometimes be linearized by a transformation of variables to apply linear regression. Examples demonstrate applying these techniques.
This document provides an outline and overview of descriptive statistics. It discusses the key concepts including:
- Visualizing and understanding data through graphs and charts
- Measures of central tendency like mean, median, and mode
- Measures of spread like range, standard deviation, and interquartile range
- Different types of distributions like symmetrical, skewed, and their properties
- Levels of measurement for variables and appropriate statistics for each level
The document serves as an introduction to descriptive statistics, the goals of which are to summarize key characteristics of data through numerical and visual methods.
This document provides an overview of basic statistics concepts including descriptive statistics, measures of central tendency, variability, sampling, and distributions. It defines key terms like mean, median, mode, range, standard deviation, variance, and quantiles. Examples are provided to demonstrate how to calculate and interpret these common statistical measures.
The document discusses statistical analysis and concepts such as standard deviation, normal distribution, and t-tests. It provides examples of how to calculate and interpret standard deviation to understand the variation in data compared to the mean. It also explains how a t-test can be used to determine if there is a statistically significant difference between the means of two samples by taking into account the means, standard deviations, and population sizes.
1) The document provides information about statistics homework help and tutoring services offered by Homework Guru. It discusses various types of statistics help available, including online tutoring, homework help, and exam preparation.
2) Key aspects of their tutoring services are highlighted, including the qualifications of tutors, availability, and interactive online classrooms. Confidence intervals and how to calculate them are also explained in detail.
3) Examples are given to demonstrate how to calculate 95% and 99% confidence intervals for a population mean when the population standard deviation is known or unknown. Interval estimation procedures and when to use z-tests or t-tests are summarized.
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happinessourbusiness0014
This document discusses measures of central tendency and dispersion used in descriptive statistics. It defines mean, median, and mode as the three main measures of central tendency, and explains how to calculate and interpret each. Measures of dispersion discussed include range, variance, standard deviation, interquartile range, skewness, and kurtosis. The goals of these statistical measures are to condense data into single values, facilitate comparisons between data sets, and describe how dispersed or spread out the data are around the central tendency. Examples are provided to demonstrate calculating and applying these key statistical concepts.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This document provides information about discrete and continuous probability distributions. It defines discrete and continuous random variables and gives examples of each. It describes how to calculate the mean and variance of discrete distributions. It also introduces the binomial, Poisson, and normal distributions and provides the key properties and formulas to describe and calculate probabilities for each distribution.
The document discusses measures of variation used to describe how scores in a data set are distributed. It introduces the concept of variation and explains that just as the mean describes the central point, measures of variation describe how scores deviate from the mean. Two main types are described: those based on the distance between lowest and highest scores, and those based on deviation from the mean. The range, interquartile range, and standard deviation are discussed as examples of measures of variation.
This document provides information about the normal distribution and calculating z-scores. It includes examples of calculating z-scores based on given means, standard deviations, and individual scores. It also provides examples calculating the mean and standard deviation from raw data and frequency tables. Worked examples are provided to demonstrate how to calculate z-scores in different contexts like test scores, physical attributes, and manufacturing data.
This chapter discusses numerical measures used to describe data, including measures of center (mean, median, mode), location (percentiles, quartiles), and variation (range, variance, standard deviation, coefficient of variation). It defines these terms and how to calculate and interpret them, as well as how to construct and use box and whisker plots to graphically display data distributions.
Statistical data analysis helps achieve scientific goals of description, prediction, explanation, and control. There are descriptive statistics like measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation) to describe data. Inferential statistics allow inferences about populations from samples using hypothesis testing, estimation, and considerations of sampling error, assumptions, and spatial autocorrelation. Key challenges include accounting for spatial dependencies in geographic data and issues like the modifiable areal unit problem.
This document provides examples and explanations of various graphical methods for describing data, including frequency distributions, bar charts, pie charts, stem-and-leaf diagrams, histograms, and cumulative relative frequency plots. It demonstrates how to construct these graphs using sample data on student weights, grades, ages, and other examples. The goal is to help readers understand different ways to visually represent data distributions and patterns.
The document discusses various measures of central tendency, dispersion, and shape used to describe data numerically. It defines terms like mean, median, mode, variance, standard deviation, coefficient of variation, range, interquartile range, skewness, and quartiles. It provides formulas and examples of how to calculate these measures from data sets. The document also discusses concepts like normal distribution, empirical rule, and how measures of central tendency and dispersion do not provide information about the shape or symmetry of a distribution.
The document discusses various measures of central tendency, dispersion, and shape used to describe data numerically. It defines terms like mean, median, mode, variance, standard deviation, coefficient of variation, range, interquartile range, skewness, and quartiles. It provides formulas and examples of how to calculate these measures from data sets. The document also discusses concepts like normal distribution, empirical rule, and how measures of central tendency and dispersion do not provide information about the shape or symmetry of a distribution.
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical
distributions. In statistical terms, the sample mean from a group of observations is an
estimate of the population mean . Given a sample of size n, consider n independent random
variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these
variables has the distribution of the population, with mean and standard deviation . The
sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that
it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number
of items taken from a population. For example, if you are measuring American people’s weights,
it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the
weights of every person in the population. The solution is to take a sample of the population, say
1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are
going to analyze. In statistical terminology, it can be defined as the average of the squared
differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
• Determine the mean
• Then for each number: subtract the Mean and square the result
• Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
• Next we need to divide by the number of data points, which is simply done by
multiplying by "1/N":
Statistically it can be stated by the following:
•
http://www.statisticshowto.com/find-sample-size-statistics/
http://www.mathsisfun.com/algebra/sigma-notation.html
• This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each bush is
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Work out the sample variance
Step 1. Work out the mean
In the formula above, μ (the Greek letter "mu") is the mean of all our values.
For this example, the data points are: 9, 2, 5, 4, 12, 7, 8,.
This document provides an introduction to inferential statistics and statistical significance. It discusses key concepts like standard error of the mean, confidence intervals, and comparing means from two samples using a t-test. The document explains how inferential statistics allow researchers to make inferences about populations based on samples and determine if observed differences are likely due to chance or a real effect.
This document discusses statistical concepts like frequency, mean, standard deviation, normal distribution, z-scores, and outliers. It provides the descriptive statistics of a survey measuring job satisfaction on a 5-point scale. The mean was 4.048, with most responses between 3-5. However, some outliers with extremely low values of 1.17 were identified, questioning the normality of the distribution. Various graphs and tests were used to further analyze the data distribution and identify outliers.
Similar to Lecture 2 Descriptive statistics.pptx (20)
Lecture 4 Probability Distributions.pptxABCraftsman
This document discusses probability distributions and expected value. It defines discrete and continuous random variables and their corresponding probability distributions. Discrete distributions assign probabilities to distinct outcomes while continuous distributions use probability densities over intervals. Expected value is the average value of a distribution, calculated by taking a weighted average of all possible outcomes. Several examples are provided to illustrate expected value calculations for games of chance.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Global Situational Awareness of A.I. and where its headed
Lecture 2 Descriptive statistics.pptx
1. Part II
Each slide has its own narration in an audio file.
For the explanation of any slide click on the audio icon to start it.
Professor Friedman's Statistics Course by H & L Friedman is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
2. A third important property of data – after
location and dispersion - is its shape.
Shape can be described by degree of
asymmetry (i.e., skewness).
◦ mean > median positive or right-skewness
◦ mean = median symmetric or zero-skewness
◦ mean < median negative or left-skewness
Positive skewness can arise when the mean is
increased by some unusually high values.
Negative skewness can arise when the mean is
decreased by some unusually low values.
Descriptive Statistics II 2
3. Left skewed:
Right skewed:
Symmetric:
Descriptive Statistics II 3
Source: Levine et al., Business Statistics, Pearson, 2013.
4. Data (for n=12 employees):
2 3 8 ┋ 8 9 10 ┋ 10 12 15 ┋ 18 22 63
𝑋= 180/12 = 15 hours
Median = 10 hours
The (extremely slow) employee who took 63 hours to
complete the task skewed the entire distributon to the
right.
s2 = 2868 / 11 = 260.79
s = 16.25 hours
CV = 107.7%
Descriptive Statistics II 4
This guy
took a VERY
long time!
5. Scores of 17 students on a national calculus
exam. Data:
0, 0, 10, 12, 15, 18, 20, 25, 30, 33, 34, 41, 56,
87, 92, 94, 95
Open MS Excel.
Go to Data Analysis—Analysis Tools —
Descriptive Statistics.
If you do not have Data Analysis-Analysis Tools, you
have to use the Add-in feature and add it to MS Excel.
Make sure to check the Summary Statistics box
once you are in descriptive statistics.
See MS Excel Output on next slide.
Descriptive Statistics II 5
6. MS Excel uses a formula – the Pearson Coefficient of
Skewness – to calculate skewness. You do not have to know
the formula. If the coefficient is 0 or very close to it, you
have a symmetric distribution.
Descriptive Statistics II 6
Column1
Mean 38.94117647
Standard Error 8.111117365
Median 30
Mode 0
Standard Deviation 33.44299364
Sample Variance 1118.433824
Kurtosis -0.82259021
Skewness 0.782252352
Range 95
Minimum 0
Maximum 95
Sum 662
Count 17
From the output:
• mean is 38.94
• median is 30
• mode is 0
• standard deviation is 33.44
• variance is 1118.43
• skewness is .78 (positive)
• range is 95
• n is 17
7. We can convert the original scores to new
scores with 𝑋 = 0 and s = 1.
This will give us a pure number with no
units of measurement.
Any score below the mean will now be
negative.
Any score at the mean will be 0.
Any score above the mean will be positive.
Descriptive Statistics II 7
8. To compute the Z-scores:
𝑍 =
𝑋 − 𝑋
𝑠
Example.
Data: 0, 2, 4, 6, 8, 10
𝑋 = 30/6 = 5; s = 3.74
Descriptive Statistics II 8
X Z
0 0−5
3.74
-1.34
2 2−5
3.74
-.80
4 4−5
3.74
-.27
6 6−5
3.74
.27
8 8−5
3.74
.80
10 10−5
3.74
1.34
9. Descriptive Statistics II 9
Data: Exam Scores
Original data Change 7 to 97 Change 23 to 93
X Z X Z X Z
65 -0.45 65 -0.81 65 -1.40
73 -0.11 73 -0.38 73 -0.79
78 0.10 78 -0.10 78 -0.40
69 -0.28 69 -0.60 69 -1.09
78 0.10 78 -0.10 78 -0.40
7 -2.89 <= 97 0.94 97 1.07
23 -2.21 23 -3.12 <= 93 0.76
98 0.94 98 0.99 98 1.14
99 0.99 99 1.05 99 1.22
99 0.99 99 1.05 99 1.22
97 0.90 97 0.94 97 1.07
99 0.99 99 1.05 99 1.22
75 -0.02 75 -0.27 75 -0.63
79 0.14 79 -0.05 79 -0.32
85 0.40 85 0.28 85 0.14
63 -0.53 63 -0.92 63 -1.56
67 -0.36 67 -0.70 67 -1.25
72 -0.15 72 -0.43 72 -0.86
73 -0.11 73 -0.38 73 -0.79
93 0.73 93 0.72 93 0.76
95 0.82 95 0.83 95 0.91
Mean 75.57 Mean 79.86 Mean 83.19
s 23.75 s 18.24 s. 12.96
10. No matter what you are measuring, a Z-score of
more than +5 or less than – 5 would indicate a
very, very unusual score.
For standardized data, if it is normally distributed,
95% of the data will be between ±2 standard
deviations about the mean.
If the data follows a normal distribution,
◦ 95% of the data will be between -1.96 and +1.96.
◦ 99.7% of the data will fall between -3 and +3.
◦ 99.99% of the data will fall between -4 and +4.
Worst case scenario: 75% of the data are between 2
standard deviations about the mean.
[Chebychev.]
Descriptive Statistics II 10
11. When examining a distribution for shape,
sometime the five number summary is useful:
Smallest| Q1 | Median | Q3 | Largest
Example:
𝑋 = 15
5-number summary: 2 | 8 | 10 | 16.5 | 63
This data is right-skewed.
In right-skewed distributions, the distance from Q3 to
Xlargest (16.5 to 63) is significantly greater than the distance
from Xsmallest to Q1(2 to 8).
Descriptive Statistics II 11
2 3 8 8 9 10 10 12 15 18 22 63
Smallest Largest
Median
Q1
Q3
12. The boxplot is a way to graphically portray a
distribution of data by means of its five-number
summary.
Boxplot can be drawn along the horizontal or vertically.
Descriptive Statistics II 12
Vertical line drawn within the box is the
median
Vertical line at the left side of box is Q1
Vertical line at the right side of box is Q3
Line on left connects left side of box with
Xsmallest (lower 25% of data)
Line on right connects right side of box
with Xlargest (upper 25% of data)
13. A “bell-shaped” symmetric data distribution
would look like this:
Descriptive Statistics II 13
14. We summarize categorical data using
frequencies and graphical methods.
Descriptive Statistics II 14
15. A frequency distribution records data
grouped into classes and the number of
observations that fell into each class.
A frequency distribution can be used for:
◦ categorical data
◦ numerical data that can be grouped into intervals
◦ numerical data with repeated observations
A percentage distribution records the percent
of the observations that fell into each class.
Descriptive Statistics II 15
16. Example. A sample was taken of 200 professors at a (fictitious)
local college. Each was asked for his or her (take-home) weekly
salary. The responses ranged from about$520 to $590. If we
wanted to display the data in, say, 7 equal intervals, we would use
an interval width of $10.
Width of interval =
𝑅𝑎𝑛𝑔𝑒
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
=
$70
7
= $10/class.
The Frequency / Percentage
Distribution:
.
Descriptive Statistics II 16
Take-home pay frequency percentage
520 and under 530 6 3 %
530 " " 540 30 15
540 " " 550 38 19
550 " " 560 52 26
560 " " 570 42 21
570 " " 580 24 12
580 to 590 8 4
200 100 %
17. A Cumulative Distribution focuses on the
number or percentage of cases that lie below
or above specified values rather than within
intervals.
Descriptive Statistics II 17
Take-home pay frequency percentage
less than 520 0 0
" " 530 6 3
" " 540 36 18
" " 550 74 37
" " 560 126 63
" " 570 168 84
" " 580 192 96
" " 590 200 100
21. Categorical Data – graphical representation
◦ Contingency Table
◦ Side-by-Side Bar Chart
Numerical Data – looking for relationships in
bivariate data
◦ Scatter Plot
◦ Correlation
◦ The Regression Line
Descriptive Statistics II 21
22. Two categorical variables are most easily displayed in a
contingency table. This is a table of two-way frequencies.
Example: “Who would you vote for in the next election?”
This also works for two-way percentages:
.
Descriptive Statistics II 22
Male Female
Republican Candidate 250 250 500
Democrat Candidate 150 350 500
400 600 1000
24. What can we do with 2 numerical variables? We
can graph them.
Example – Grade and Height (in inches)
Descriptive Statistics II 24
Y (Grade) 100 95 90 80 70 65 60 40 30 20
X (Height) 73 79 62 69 74 77 81 63 68 74
25. Correlation coefficient is r = .12
Coefficient of determination is r2 = .01
We will learn about the above measures, as well
as more about scatter plots, in the topic
onCORRELATION.
Descriptive Statistics II 25
26. Practice, practice, practice.
◦ As always, do lots and lots of problems. You can
find these in the online lecture notes and
homework assignments.
Descriptive Statistics II 26