3.4.-variance-and-stndard-deviation.pdf

•

0 likes•6 views

DebarpanHaldar1

hjbuhjk

Environment

Measures of Variability
(DATA SCIENCE BASIC)
Variance and
Standard Deviation

Measures of Variability
Another measure of the variability in a
data set uses the deviations from the
mean (x – x).

Remember the sample of 6 fish that we caught from
the lake . . .
They were the following lengths:
3”, 4”, 5”, 6”, 8”, 10”
The mean length was 6 inches. Recall that we
calculated the deviations from the mean. What was the
sum of these deviations?
Can we find an average deviation?
What can we do to the deviations so that
we could find an average?

The estimated average of the deviations squared
is called the variance.
s 2
=
x -m
( )
2
å
N

Standard Deviation
- is the square root of the variance.
- is the average distance from the
center(mean).
s =
x -m
( )
2
å
N

Notations
s for populationstandard deviation
s for sample standard deviation

( )
1
2
2
−
−
=

n
x
x
s
Degree of
freedom
When calculating sample variance, we use degrees of freedom (n – 1)
in the denominator instead of n because this tends to produce
better estimates.
Degrees of freedom will be revisited again in Chapter 8.

x (x - x) (x - x)2
3 -3
4 -2
5 -1
6 0
8 2
10 4
Sum 0
What is the sum
of the deviations
squared?
Remember the sample of 6 fish that we caught from the lake . . .
Find the variance of the length of fish.
Divide this by 5.
First square the
deviations
9
4
1
0
4
16
34
s2 = 6.8

A typical deviation from the mean is the
standard deviation.
s2 = 6.8 inches2 so s = 2.608 inches
The fish in our sample deviate from the mean of
6 by an average of 2.608 inches.

The most commonly used measures of
center and variability are the mean
and standard deviation, respectively.

Choosing Measures of Center and Spread
- Mean and Standard Deviation
- Median and Interquartile Range

• The median and IQR are usually better than
the mean and standard deviation for
describing a skewed distribution or a
distribution with outliers.
• Use mean and standard deviation only for
reasonably symmetric distributions that don’t
have outliers.

Rule of Thumb
The range is 4 times as much as the
standard deviation.

symmetrical distribution of data
Consider the following data set:
4 5 6 6 6 7 7 7 7 7 7 8 8 8 9 10
This data set produces the histogram shown below. Each interval has width one and each
value is located in the middle of an interval. The histogram displays
a symmetrical distribution of data

Skewness
• Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A
distribution, or data set, is symmetric if it looks the same to the left and right of the
center point.
• The skewness for a normal distribution is zero, and any symmetric data should have
a skewness near zero. Negative values for the skewness indicate data that are
skewed left and positive values for the skewness indicate data that are skewed right.
By skewed left, we mean that the left tail is long relative to the right tail. Similarly,
skewed right means that the right tail is long relative to the left tail.
• [Ref: https://en.wikipedia.org/wiki/Skewness] Skewness in a data series may
sometimes be observed not only graphically but by simple inspection of the
values. For instance, consider the numeric sequence (49, 50, 51), whose values are
evenly distributed around a central value of 50. We can transform this sequence
into a negatively skewed distribution by adding a value far below the mean, e.g.
(40, 49, 50, 51). Similarly, we can make the sequence positively skewed by adding
a value far above the mean, e.g. (49, 50, 51, 60).

Skewness
Denoted by Sk
Sk = 0 Symmetric
Sk > 0 Positively skewed
Sk < 0 Negatively skewed

Negative skew
Positive skew
Symmetrical distributions (STD
– 0) where mean,median &
mode are lying on the same
line

Similar to 3.4.-variance-and-stndard-deviation.pdf

Lecture. Introduction to Statistics (Measures of Dispersion).pptx

NabeelAli89

Graphical presentation of data

drasifk

best for normal distribution.ppt

DejeneDay

statical-data-1 to know how to measure.ppt

NazarudinManik1

Statistics.pdf

Shruti Nigam (CWM, AFP)

Measures of dispersions

Inamul Hossain Imran

Lect 3 background mathematics for Data Mining

hktripathy

The-Normal-Distribution, Statics and Pro

GiancarloMercado2

These is info only ill be attaching the questions work CJ 301 – Measures of Dispersion/Variability Think back to the description of measures of central tendency that describes these statistics as measures of how the data in a distribution are clustered, around what summary measure are most of the data points clustered. But when comes to descriptive statistics and describing the characteristics of a distribution, averages are only half story. The other half is measures of variability. In the most simple of terms, variability reflects how scores differ from one another. For example, the following set of scores shows some variability: 7, 6, 3, 3, 1 The following set of scores has the same mean (4) and has less variability than the previous set: 3, 4, 4, 5, 4 The next set has no variability at all – the scores do not differ from one another – but it also has the same mean as the other two sets we just showed you. 4, 4, 4, 4, 4 Variability (also called spread or dispersion) can be thought of as a measure of how different scores are from one another. It is even more accurate (and maybe even easier) to think of variability as how different scores are from one particular score. And what “score” do you think that might be? Well, instead of comparing each score to every other score in a distribution, the one score that could be used as a comparison is – that is right- the mean. So, variability becomes a measure of how much each score in a group of scores differs from the mean. Remember what you already know about computing averages – that an average (whether it is the mean, the median or the mode) is a representative score in a set of scores. Now, add your new knowledge about variability- that it reflects how different scores are from one another. Each is important descriptive statistic. Together, these two (average and variability) can be used to describe the characteristics of a distribution and show how distribution differ from one another. Measures of dispersion/variability describe how the data in a distribution a re scattered or dispersed around, or from, the central point represented by the measure of central tendency. We will discuss four different measures of dispersion , the range , the mean deviation , the variance , and the standard deviation . RANGE The range is a very simple measure of dispersion to calculate and interpret. The range is simply the difference between the highest score and the lowest score in a distribution. Consider the following distribution that measures the “Age” of a random sample of eight police officers in a small rural jurisdiction. Officer X = Age_ 41 20 35 25 23 30 21 32 First, let’s calculate the mean as our measure of central tendency by adding the individual ages of each officer and dividing by the number of officers. The calculation is 227/8 = 28.375 years. In general, the formula for the range is: R=h-l Where: r is the range h.

These is info only ill be attaching the questions work CJ 301 – .docx

meagantobias

Sriram seminar on introduction to statistics

Sriram Chakravarthy

ch-4-measures-of-variability-11 2.ppt for nursing

windri3

measures-of-variability-11.ppt

NievesGuardian1

Central tendency _dispersion

Kirti Gupta

template.pptx

uzmasulthana3

PG STAT 531 Lecture 2 Descriptive statistics

Aashish Patel

Properties of Standard Deviation

Rizwan Sharif

Confidence Intervals in the Life Sciences Presentation Names Statistics for the Life Sciences STAT/167 Date Fahad M. Gohar M.S.A.S 1 Conservation Biology of Bears Normal Distribution Standard normal distribution Confidence Interval Population Mean Population Variance Confidence Level Point Estimate Critical Value Margin of Error Welcome to the presentation on Confidence Intervals of Conservation Biology on Bears. The team will define normal distribution and use an example of variables why this is important. A standard and normal distribution is discussed as well as the difference between standard and other normal distributions. Confidence interval will be defined and how it is used in Conservation Biology and Bears. We will learn how a confidence interval helps researchers estimate of population mean and population variance. The presenters defined a point estimate and try to explain how a point estimate found from a confidence interval. Confidence level is defined and a short explanation of confidence level is related to the confidence interval. Lastly, a critical value and margin of error are explained with examples from the Statdisk. 2 Normal Distribution A normal distribution is one which has the mean, median, and mode are the same and the standard deviations are apart from the mean in the probabilities that go with the empirical rule. Not all data has the measures of central tendency, since some data sets may not have one unique value which occurs more than once. But every data set has a mean and median. The mean is only good with interval and ratio data, while the median can be used with interval, ratio and ordinal data. Mean is used when they're a lot of outliers, and median is used when there are few. The normal distribution is continuous, and has only two parameters - mean and variance. The mean can be any positive number and variance can be any positive number (can't be negative - the mean and variance), so there are an infinite number of normal distributions. You want your data to represent the population distribution because when you make claims from the distribution of the sample you took, you want it to represent the whole entire population. Some examples in the business world: Some industries which use normal distributions are pharmaceutical companies. They model the average blood pressure through normal distributions, and can make medicine which will help majority of the people with high blood pressure. A company can also model its average time to create something using the normal distribution. Several statistics can be calculated with the normal distribution, and hypothesis tests can be done with the normal distribution which models the average time. Our chosen life science is BEARS. The age of the bears can be modeled by normal distributions and it is important to monitor since that tells us the average age of the bear, and can tell us a lot about the population. If the mean is high and the standard deviatio.

Confidence Intervals in the Life Sciences PresentationNamesS.docx

maxinesmith73660

Standard deviation and standard error

Shahla Yasmin

Describing quantitative data with numbers

Ulster BOCES

Working with Numerical Data

Global Polis

Similar to 3.4.-variance-and-stndard-deviation.pdf (20)

Lecture. Introduction to Statistics (Measures of Dispersion).pptx

Graphical presentation of data

best for normal distribution.ppt

statical-data-1 to know how to measure.ppt

Statistics.pdf

Measures of dispersions

Lect 3 background mathematics for Data Mining

The-Normal-Distribution, Statics and Pro

These is info only ill be attaching the questions work CJ 301 – .docx

Sriram seminar on introduction to statistics

ch-4-measures-of-variability-11 2.ppt for nursing

measures-of-variability-11.ppt

Central tendency _dispersion

template.pptx

PG STAT 531 Lecture 2 Descriptive statistics

Properties of Standard Deviation

Confidence Intervals in the Life Sciences PresentationNamesS.docx

Standard deviation and standard error

Describing quantitative data with numbers

Working with Numerical Data

Recently uploaded

Rising temperatures also mean that more plant pests are appearing earlier and...

Christina Parmionova

2024-05-08 Composting at Home 101 for the Rotary Club of Pinecrest.pptx

Ellen Book

Corporate_Science-based_Target_Setting.pptx

arnab132

CAUSES,EFFECTS,CONTROL OF DEFORESTATION.pptx

Sangram Sahoo

Cooperative Mangrove Project: Introduction, Scope, and Perspectives

CIFOR-ICRAF

Town and Country Planning-he term 'town planning' first appeared in 1906 and ...

Ange Felix NSANZIYERA

Palynology: History, branches, basic principles and application, collection o...

Sangram Sahoo

My Museum presentation by Jamilyn Gonzalez

jamilyngonzalez24

Global warming, Types, Causes and Effects.

meenakshiii2706

Heavy metals with their causes and effect.ppt

SycoQueen11

Christmas Palm Trees in Florida The Ultimate Guide to Festive Landscaping wit...

EvergladesFarm

Elemental Analysis of Plants using ICP-OES(2023)

The Hebrew University of Jerusalem

ADBB 5cladba Precursor JWH018 +85244677121

leephoebe968

A Complete Guide to Understanding Air Quality Monitoring.pptx

ArabcalUAE

Urban Farming: 3 Benefits, Challenges & The Rise of Green Cities | CIO Women ...

CIOWomenMagazine

River basins appear to be important in the Philippines due to rising water demand for residential needs, agriculture, commerce, and industry. While the country has a total available freshwater resource of 145,900 million Cubic Meters per year based on an 80% probability for surface water and 20,000 million Cubic Meters per year for groundwater recharge or extraction (ASEAN, 2005), the concern for sustainable water supply continues to be a major concern due to the continued degradation of river basins and watersheds.

A Review on Integrated River Basin Management and Development Master Plan of ...

Mark Jaeno P. Duyan

Role of Copper and Zinc Nanoparticles in Plant Disease Management

Ravikumar Vaniya

NO1 Pakistan online istikhara for love marriage vashikaran specialist love pr...

Amil Baba Dawood bangali

Smart Watering Solutions for Your Garden

cleaningmachineryau

Understanding Air Quality Monitoring A Comprehensive Guide.pdf

ArabcalUAE

Recently uploaded (20)

Rising temperatures also mean that more plant pests are appearing earlier and...

2024-05-08 Composting at Home 101 for the Rotary Club of Pinecrest.pptx

Corporate_Science-based_Target_Setting.pptx

CAUSES,EFFECTS,CONTROL OF DEFORESTATION.pptx

Cooperative Mangrove Project: Introduction, Scope, and Perspectives

Town and Country Planning-he term 'town planning' first appeared in 1906 and ...

Palynology: History, branches, basic principles and application, collection o...

My Museum presentation by Jamilyn Gonzalez

Global warming, Types, Causes and Effects.

Heavy metals with their causes and effect.ppt

Christmas Palm Trees in Florida The Ultimate Guide to Festive Landscaping wit...

Elemental Analysis of Plants using ICP-OES(2023)

ADBB 5cladba Precursor JWH018 +85244677121

A Complete Guide to Understanding Air Quality Monitoring.pptx

Urban Farming: 3 Benefits, Challenges & The Rise of Green Cities | CIO Women ...

A Review on Integrated River Basin Management and Development Master Plan of ...

Role of Copper and Zinc Nanoparticles in Plant Disease Management

NO1 Pakistan online istikhara for love marriage vashikaran specialist love pr...

Smart Watering Solutions for Your Garden

Understanding Air Quality Monitoring A Comprehensive Guide.pdf

3.4.-variance-and-stndard-deviation.pdf

1. Measures of Variability (DATA SCIENCE BASIC) Variance and Standard Deviation

2. Measures of Variability Another measure of the variability in a data set uses the deviations from the mean (x – x).

3. Remember the sample of 6 fish that we caught from the lake . . . They were the following lengths: 3”, 4”, 5”, 6”, 8”, 10” The mean length was 6 inches. Recall that we calculated the deviations from the mean. What was the sum of these deviations? Can we find an average deviation? What can we do to the deviations so that we could find an average?

4. The estimated average of the deviations squared is called the variance. s 2 = x -m ( ) 2 å N

5. Standard Deviation - is the square root of the variance. - is the average distance from the center(mean). s = x -m ( ) 2 å N

6. Notations s for populationstandard deviation s for sample standard deviation

7. ( ) 1 2 2 − − =  n x x s Degree of freedom When calculating sample variance, we use degrees of freedom (n – 1) in the denominator instead of n because this tends to produce better estimates. Degrees of freedom will be revisited again in Chapter 8.

8. x (x - x) (x - x)2 3 -3 4 -2 5 -1 6 0 8 2 10 4 Sum 0 What is the sum of the deviations squared? Remember the sample of 6 fish that we caught from the lake . . . Find the variance of the length of fish. Divide this by 5. First square the deviations 9 4 1 0 4 16 34 s2 = 6.8

9. A typical deviation from the mean is the standard deviation. s2 = 6.8 inches2 so s = 2.608 inches The fish in our sample deviate from the mean of 6 by an average of 2.608 inches.

10. The most commonly used measures of center and variability are the mean and standard deviation, respectively.

11. Choosing Measures of Center and Spread - Mean and Standard Deviation - Median and Interquartile Range

12. • The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. • Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers.

13. Rule of Thumb The range is 4 times as much as the standard deviation.

14. symmetrical distribution of data Consider the following data set: 4 5 6 6 6 7 7 7 7 7 7 8 8 8 9 10 This data set produces the histogram shown below. Each interval has width one and each value is located in the middle of an interval. The histogram displays a symmetrical distribution of data

15. Skewness • Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. • The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. • [Ref: https://en.wikipedia.org/wiki/Skewness] Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For instance, consider the numeric sequence (49, 50, 51), whose values are evenly distributed around a central value of 50. We can transform this sequence into a negatively skewed distribution by adding a value far below the mean, e.g. (40, 49, 50, 51). Similarly, we can make the sequence positively skewed by adding a value far above the mean, e.g. (49, 50, 51, 60).

16. Skewness Denoted by Sk Sk = 0 Symmetric Sk > 0 Positively skewed Sk < 0 Negatively skewed

17. Negative skew Positive skew Symmetrical distributions (STD – 0) where mean,median & mode are lying on the same line

18.

19.

20.

21.

22.

23.

24.

25. Example:

3.4.-variance-and-stndard-deviation.pdf

Recommended

Recommended

More Related Content

Similar to 3.4.-variance-and-stndard-deviation.pdf

Similar to 3.4.-variance-and-stndard-deviation.pdf (20)

More from DebarpanHaldar1

More from DebarpanHaldar1 (8)

Recently uploaded

Recently uploaded (20)

3.4.-variance-and-stndard-deviation.pdf