Statistics 
Deepu Krishnan R
PREFACE 
Mathematics forms an integral part of everyday life. We have to 
teach it with freshness and variety to make it meaningfully 
applicable to life. Statistics helps you interpret data in your daily 
lives and make good decisions! For example, is it possible to eat 
too much grapefruit? Is it safer for your brain cells to use a 
headset when you talk on the phone? Can an online profile help 
you get a job? What steps can you take during college to increase 
your future salary? I cannot claim that all the materials I have 
written in this book are mine. I have learned the subject from 
many excellent books. This text books is designed to meet the 
everyday requirements of students at school and the general 
readers of mathematics. 
Suggestions for improvement are welcome. 
The Author
Contents 
1 Measures Of Central Tendency 
1.1 Mean 
1.2 Median 
1.3 Mode 
2. Measures of Dispersion 
2.1 Variance 
2.2 Standard deviation 
3. Central Moments 
3.1 Skewness 
3.2 Kurtosis
Unit 1 
Measures of Central Tendency 
Introduction 
Measures of central tendency are statistical 
measures which describe the position of a distribution. 
They are also called statistics of location, and are the 
complement of statistics of dispersion, which provide 
information concerning the variance or distribution of 
observations. In the univariate context, the mean, median 
and mode are the most commonly used measures of 
central tendency. Computable values on a distribution 
that discuss the behavior of the center of a distribution.
Measures of Central Tendency 
The value or the figure which represents the whole 
series is neither the lowest value in the series nor the 
highest it lies somewhere between these two extremes. 
The average represents all the measurements made on 
a group, and gives a concise description of the group as 
a whole. 
When two are more groups are measured, the central 
tendency provides the basis of comparison between 
them. 
1.1Mean 
In mathematics, mean has several different definitions 
depending on the context. 
In probability and statistics, mean and expected value are 
used synonymously to refer to one measure of the central
tendency either of a probability distribution or of 
the random variable characterized by that distribution.[1] In 
the case of a discrete probability distribution of a random 
variable X, the mean is equal to the sum over every 
possible value weighted by the probability of that value; 
that is, it is computed by taking the product of each 
possible value x of X and its probability P(x), and then 
adding all these products together, giving . An 
analogous formula applies to the case of a continuous 
probability distribution. Not every probability distribution 
has a defined mean; see the Cauchy distribution for an 
example. Moreover, for some distributions the mean is 
infinite: for example, when the probability of the value 
is for n = 1, 2, 3,… 
For a data set, the terms arithmetic mean, mathematical 
expectation, and sometimes average are used 
synonymously to refer to a central value of a discrete set 
of numbers: specifically, the sum of the values divided by 
the number of values. The arithmetic mean of a set of 
numbers x1, x2, ..., xn is typically denoted by , pronounced
"x bar". If the data set were based on a series of 
observations obtained by sampling from a statistical 
population, the arithmetic mean is termed the sample 
mean (denoted ) to distinguish it from the population 
mean (denoted or ). 
Types of mean 
In mathematics, the three classical Pythagorean 
means are the arithmetic mean (A), the geometric 
mean (G), and the harmonic mean (H). 
They are defined by: 
Arithmetic mean 
The most common type of average is the arithmetic 
mean. If n numbers are given, each number denoted 
by ai, where i = 1, n, the arithmetic mean is the [sum] of 
the ai' s divided by n or 
The arithmetic mean, often simply called the mean, of 
two numbers, such as 2 and 8, is obtained by finding a 
value A such that 2 + 8 = A + A. One may find
that A = (2 + 8)/2 = 5. Switching the order of 2 and 8 to 
read 8 and 2 does not change the resulting value 
obtained for A. The mean 5 is not less than the 
minimum 2 or greater than the maximum 8. If we 
increase the number of terms in the list to 2, 8, and 11, 
the arithmetic mean is found by solving for the value 
of A in the equation 2 + 8 + 11 = A + A + A. One finds 
that A= (2 + 8 + 11)/3 = 7. 
Arithmetic Mean Calculated Methods: 
• Direct Method : 
• Short cut method :
• Step deviation Method : 
Geometric mean 
The geometric mean of n non-negative numbers is 
obtained by multiplying them all together and then taking 
the nth root. In algebraic terms, the geometric mean 
of a1, a2… an is defined as 
Geometric mean can be thought of as the antilog of the 
arithmetic mean of the logs of the numbers. 
Example: Geometric mean of 2 and 8 is 
Harmonic mean 
Harmonic mean for a non-empty collection of 
numbers a1, a2,…, an, all different from 0, is defined as 
the reciprocal of the arithmetic mean of the reciprocals 
of the ai’s:
One example where the harmonic mean is useful is 
when examining the speed for a number of fixed-distance 
trips. For example, if the speed for going 
from point A to B was 60 km/h, and the speed for 
returning from B to A was 40 km/h, then the harmonic 
mean speed is given by 
Inequality concerning AM, GM, and HM 
A well known inequality concerning arithmetic, 
geometric, and harmonic means for any set of 
positive numbers is 
It is easy to remember noting that the 
alphabetical order of the letters A, G, and H is 
preserved in the inequality. See Inequality of 
arithmetic and geometric means. 
Thus for the above harmonic mean example: AM 
= 50, GM = 49, and HM = 48 km/h.
Problems 
1. Calculated the AM,GM,HM of the following. 
x 15 12 15 23 14 17 18 19 20 16 
f 4 3 2 3 5 4 1 2 7 8 
Median 
Median is a central value of the distribution, or the 
value which divides the distribution in equal parts, 
each part containing equal number of items. Thus it is 
the central value of the variable, when the values are 
arranged in order of magnitude. 
Connor has defined as “The median is that value of 
the variable which divides the group into two equal 
parts, one part comprising of all values greater, and 
the other, and all values less than median” 
Calculation of Median –Discrete series:
 Arrange the data in ascending or descending 
order. 
 Calculate the cumulative frequencies. 
 Apply the formula. 
Calculation of median – Continuous series 
For calculation of median in a continuous frequency 
distribution the following formula will be employed. 
Algebraically, 
Advantages of Median: 
• Median can be calculated in all distributions. 
• Median can be understood even by common people. 
• Median can be ascertained even with the extreme 
items. 
• It can be located graphically 
• It is most useful dealing with qualitative data
Disadvantages of Median: 
• It is not based on all the values. 
• It is not capable of further mathematical treatment. 
• It is affected fluctuation of sampling. 
• In case of even no. of values it may not the value from 
the data. 
Mode 
Mode is the most frequent value or score in the 
distribution. It is defined as that value of the item in a 
series. Thus to find the median, order the list 
according to its elements' magnitude and then 
repeatedly remove the pair consisting of the highest 
and lowest values until either one or two values are 
left. If exactly one value is left, it is the median; if two 
values, the median is the arithmetic mean of these 
two. This method takes the list 1, 7, 3, 13 and orders 
it to read 1, 3, 7, and 13. Then the 1 and 13 are 
removed to obtain the list 3, 7. Since there are two
elements in this remaining list, the median is their 
arithmetic mean, (3 + 7)/2 = 5. 
Advantages of Mode: 
• Mode is readily comprehensible and easily calculated 
• It is the best representative of data 
• It is not at all affected by extreme value. 
• The value of mode can also be determined 
graphically. 
• It is usually an actual value of an important part of the 
series. 
Disadvantages of mode; 
• It is not based on all observations. 
• It is not capable of further mathematical manipulation. 
• Mode is affected to a great extent by sampling 
fluctuations. 
Choice of grouping has great influence on the value of 
mode
Unit 2 
Measures of Dispersion 
Introduction 
Measures of dispersion are descriptive statistics that 
describe how similar a set of scores are to each other 
The more similar the scores are to each other, the 
lower the measure of dispersion will be 
The less similar the scores are to each other, the 
higher the measure of dispersion will be 
In general, the more spread out a distribution is, the 
larger the measure of dispersion will be 
There are three main measures of dispersion: 
1. The range 
2. The semi- interquartile range (SIR) 
3. Variance / standard deviation
Variance 
This measure the average of the squared deviations from 
the mean (as opposed the average of the absolute 
deviations) is called the variance. 
The variance is the usual measure of dispersion in 
statistical theory, but it has a drawback when researchers 
want to describe the dispersion in data in a practical way. 
To calculate variance; 
Find the mean of the data. 
Hint – mean is the average so add up the values and 
divide by the number of items 
Subtract the mean from each value – the result is called 
the deviation from the mean. 
Square each deviation of the mean 
Divide the total by the number of items.
The variance formula includes the Sigma Notation, 
which represents the sum of all the items to the right of 
Sigma. 
2 ()x 
n 
   
Standard deviation 
Standard Deviation shows the variation in data. If the 
data is close together, the standard deviation will be small. 
If the data is spread out, the standard deviation will be 
large 
Standard Deviation is often denoted by the lowercase 
Greek letter sigma,  
. 
The standard deviation formula can be represented 
using Sigma Notation: 
(x  
)2 
n 
 
 
  
Find the variance and standard deviation. The math test 
scores of five students are: 92,88,80,68 and 52
Unit 3 
Central moments 
Introduction 
Central Moments- The average of all the deviations of all 
observations in a dataset from the mean of the 
observations raised to the power r 
In the previous equation, n is the number of observations, 
X is the value of each individual observation, m is the 
arithmetic mean of the observations, and r is a positive 
integer. 
There are 4 central moments: 
 The first central moment, r=1, is the sum of the 
difference of each observation from the sample 
Average (arithmetic mean), which always equals 0 
 The second central moment, r=2, is variance. 
 The third central moment, r=3, is skewness.
Skewness 
Skweness describes how the sample differs in shape 
from a symmetrical distribution. If a normal distribution 
has a skewness of 0, right skewed is greater than 0 
and left skewed is less than 0.Negatively skewed 
distributions, skewed to the left, occur when most 
of the scores are toward the high end of the 
distribution.In a normal distribution where skewness is 
0, the mean, median and mode are equal. In a 
negatively skewed distribution, the mode > median > 
mean. 
Positively skewed distributions occur when most of 
the scores are toward the low end of the distribution. 
In a positively skewed distribution, mode< median< 
mean
Kurtosis 
Kurtosis is the 4th central moment. 
This is the “peakedness” of a distribution. 
It measures the extent to which the data are 
distributed in the tails versus the center of the 
distribution 
There are three types of peakedness. 
Leptokurtic- very peaked 
Platykurtic – relatively flat 
Mesokurtic – in between 
Mesokurtic has a kurtosis of 0 
Leptokurtic has a kurtosis that is + 
Platykurtic has a kurtosis that is -
Reference 
 Web resources 
 https://www.google.co.in/?gfe_rd=cr&ei=gJw 
eVIPQK6vM8geL6YDwCg&gws_rd=ssl#q=v 
ariance+standard+deviation+ppt
Statistics digital text book

Statistics digital text book

  • 1.
  • 2.
    PREFACE Mathematics formsan integral part of everyday life. We have to teach it with freshness and variety to make it meaningfully applicable to life. Statistics helps you interpret data in your daily lives and make good decisions! For example, is it possible to eat too much grapefruit? Is it safer for your brain cells to use a headset when you talk on the phone? Can an online profile help you get a job? What steps can you take during college to increase your future salary? I cannot claim that all the materials I have written in this book are mine. I have learned the subject from many excellent books. This text books is designed to meet the everyday requirements of students at school and the general readers of mathematics. Suggestions for improvement are welcome. The Author
  • 3.
    Contents 1 MeasuresOf Central Tendency 1.1 Mean 1.2 Median 1.3 Mode 2. Measures of Dispersion 2.1 Variance 2.2 Standard deviation 3. Central Moments 3.1 Skewness 3.2 Kurtosis
  • 4.
    Unit 1 Measuresof Central Tendency Introduction Measures of central tendency are statistical measures which describe the position of a distribution. They are also called statistics of location, and are the complement of statistics of dispersion, which provide information concerning the variance or distribution of observations. In the univariate context, the mean, median and mode are the most commonly used measures of central tendency. Computable values on a distribution that discuss the behavior of the center of a distribution.
  • 5.
    Measures of CentralTendency The value or the figure which represents the whole series is neither the lowest value in the series nor the highest it lies somewhere between these two extremes. The average represents all the measurements made on a group, and gives a concise description of the group as a whole. When two are more groups are measured, the central tendency provides the basis of comparison between them. 1.1Mean In mathematics, mean has several different definitions depending on the context. In probability and statistics, mean and expected value are used synonymously to refer to one measure of the central
  • 6.
    tendency either ofa probability distribution or of the random variable characterized by that distribution.[1] In the case of a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value x of X and its probability P(x), and then adding all these products together, giving . An analogous formula applies to the case of a continuous probability distribution. Not every probability distribution has a defined mean; see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite: for example, when the probability of the value is for n = 1, 2, 3,… For a data set, the terms arithmetic mean, mathematical expectation, and sometimes average are used synonymously to refer to a central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values. The arithmetic mean of a set of numbers x1, x2, ..., xn is typically denoted by , pronounced
  • 7.
    "x bar". Ifthe data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is termed the sample mean (denoted ) to distinguish it from the population mean (denoted or ). Types of mean In mathematics, the three classical Pythagorean means are the arithmetic mean (A), the geometric mean (G), and the harmonic mean (H). They are defined by: Arithmetic mean The most common type of average is the arithmetic mean. If n numbers are given, each number denoted by ai, where i = 1, n, the arithmetic mean is the [sum] of the ai' s divided by n or The arithmetic mean, often simply called the mean, of two numbers, such as 2 and 8, is obtained by finding a value A such that 2 + 8 = A + A. One may find
  • 8.
    that A =(2 + 8)/2 = 5. Switching the order of 2 and 8 to read 8 and 2 does not change the resulting value obtained for A. The mean 5 is not less than the minimum 2 or greater than the maximum 8. If we increase the number of terms in the list to 2, 8, and 11, the arithmetic mean is found by solving for the value of A in the equation 2 + 8 + 11 = A + A + A. One finds that A= (2 + 8 + 11)/3 = 7. Arithmetic Mean Calculated Methods: • Direct Method : • Short cut method :
  • 9.
    • Step deviationMethod : Geometric mean The geometric mean of n non-negative numbers is obtained by multiplying them all together and then taking the nth root. In algebraic terms, the geometric mean of a1, a2… an is defined as Geometric mean can be thought of as the antilog of the arithmetic mean of the logs of the numbers. Example: Geometric mean of 2 and 8 is Harmonic mean Harmonic mean for a non-empty collection of numbers a1, a2,…, an, all different from 0, is defined as the reciprocal of the arithmetic mean of the reciprocals of the ai’s:
  • 10.
    One example wherethe harmonic mean is useful is when examining the speed for a number of fixed-distance trips. For example, if the speed for going from point A to B was 60 km/h, and the speed for returning from B to A was 40 km/h, then the harmonic mean speed is given by Inequality concerning AM, GM, and HM A well known inequality concerning arithmetic, geometric, and harmonic means for any set of positive numbers is It is easy to remember noting that the alphabetical order of the letters A, G, and H is preserved in the inequality. See Inequality of arithmetic and geometric means. Thus for the above harmonic mean example: AM = 50, GM = 49, and HM = 48 km/h.
  • 11.
    Problems 1. Calculatedthe AM,GM,HM of the following. x 15 12 15 23 14 17 18 19 20 16 f 4 3 2 3 5 4 1 2 7 8 Median Median is a central value of the distribution, or the value which divides the distribution in equal parts, each part containing equal number of items. Thus it is the central value of the variable, when the values are arranged in order of magnitude. Connor has defined as “The median is that value of the variable which divides the group into two equal parts, one part comprising of all values greater, and the other, and all values less than median” Calculation of Median –Discrete series:
  • 12.
     Arrange thedata in ascending or descending order.  Calculate the cumulative frequencies.  Apply the formula. Calculation of median – Continuous series For calculation of median in a continuous frequency distribution the following formula will be employed. Algebraically, Advantages of Median: • Median can be calculated in all distributions. • Median can be understood even by common people. • Median can be ascertained even with the extreme items. • It can be located graphically • It is most useful dealing with qualitative data
  • 13.
    Disadvantages of Median: • It is not based on all the values. • It is not capable of further mathematical treatment. • It is affected fluctuation of sampling. • In case of even no. of values it may not the value from the data. Mode Mode is the most frequent value or score in the distribution. It is defined as that value of the item in a series. Thus to find the median, order the list according to its elements' magnitude and then repeatedly remove the pair consisting of the highest and lowest values until either one or two values are left. If exactly one value is left, it is the median; if two values, the median is the arithmetic mean of these two. This method takes the list 1, 7, 3, 13 and orders it to read 1, 3, 7, and 13. Then the 1 and 13 are removed to obtain the list 3, 7. Since there are two
  • 14.
    elements in thisremaining list, the median is their arithmetic mean, (3 + 7)/2 = 5. Advantages of Mode: • Mode is readily comprehensible and easily calculated • It is the best representative of data • It is not at all affected by extreme value. • The value of mode can also be determined graphically. • It is usually an actual value of an important part of the series. Disadvantages of mode; • It is not based on all observations. • It is not capable of further mathematical manipulation. • Mode is affected to a great extent by sampling fluctuations. Choice of grouping has great influence on the value of mode
  • 15.
    Unit 2 Measuresof Dispersion Introduction Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be In general, the more spread out a distribution is, the larger the measure of dispersion will be There are three main measures of dispersion: 1. The range 2. The semi- interquartile range (SIR) 3. Variance / standard deviation
  • 16.
    Variance This measurethe average of the squared deviations from the mean (as opposed the average of the absolute deviations) is called the variance. The variance is the usual measure of dispersion in statistical theory, but it has a drawback when researchers want to describe the dispersion in data in a practical way. To calculate variance; Find the mean of the data. Hint – mean is the average so add up the values and divide by the number of items Subtract the mean from each value – the result is called the deviation from the mean. Square each deviation of the mean Divide the total by the number of items.
  • 17.
    The variance formulaincludes the Sigma Notation, which represents the sum of all the items to the right of Sigma. 2 ()x n    Standard deviation Standard Deviation shows the variation in data. If the data is close together, the standard deviation will be small. If the data is spread out, the standard deviation will be large Standard Deviation is often denoted by the lowercase Greek letter sigma,  . The standard deviation formula can be represented using Sigma Notation: (x  )2 n     Find the variance and standard deviation. The math test scores of five students are: 92,88,80,68 and 52
  • 18.
    Unit 3 Centralmoments Introduction Central Moments- The average of all the deviations of all observations in a dataset from the mean of the observations raised to the power r In the previous equation, n is the number of observations, X is the value of each individual observation, m is the arithmetic mean of the observations, and r is a positive integer. There are 4 central moments:  The first central moment, r=1, is the sum of the difference of each observation from the sample Average (arithmetic mean), which always equals 0  The second central moment, r=2, is variance.  The third central moment, r=3, is skewness.
  • 19.
    Skewness Skweness describeshow the sample differs in shape from a symmetrical distribution. If a normal distribution has a skewness of 0, right skewed is greater than 0 and left skewed is less than 0.Negatively skewed distributions, skewed to the left, occur when most of the scores are toward the high end of the distribution.In a normal distribution where skewness is 0, the mean, median and mode are equal. In a negatively skewed distribution, the mode > median > mean. Positively skewed distributions occur when most of the scores are toward the low end of the distribution. In a positively skewed distribution, mode< median< mean
  • 21.
    Kurtosis Kurtosis isthe 4th central moment. This is the “peakedness” of a distribution. It measures the extent to which the data are distributed in the tails versus the center of the distribution There are three types of peakedness. Leptokurtic- very peaked Platykurtic – relatively flat Mesokurtic – in between Mesokurtic has a kurtosis of 0 Leptokurtic has a kurtosis that is + Platykurtic has a kurtosis that is -
  • 22.
    Reference  Webresources  https://www.google.co.in/?gfe_rd=cr&ei=gJw eVIPQK6vM8geL6YDwCg&gws_rd=ssl#q=v ariance+standard+deviation+ppt