Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Variance and Standard Deviation

1,468 views

Published on

Whenever we talk about a data/sample of population, there are certain common characteristics or features that almost every data set has. There are basically three common characteristics we can find in any data. The very first feature of the data is central tendency. Every data has a tendency that they tend to cluster around some central value. Second property of the data is variation. In every data set you will find some variation, data would be scattered from each other and the last one is skewness. This data may differ in the extent of departure from some standard distribution called normal distribution.
Central Value: Central value is a single value, a single digit in the range of the data set that is used to represent the entire data set. For example: If someone asks “What is the hight of Indians”? We generally say “5/8”. That doesn’t mean that every Indian has the same height. In India we may have height of 4 feet, 5, 6 or even 7. But 5/8 is an average which represents the entire Indian height.
We have certain measures in place, by using that we can find out central tendency and interpret our data. The very common ways to measure our central tendency are mean, mode and Median. We generally use mean and median than mode.
Variation
Variation means the difference. So the difference between expected output and observed output is called variation. First property of the data is central tendency or central value. Whenever you want to study a data, first you need to know the central value of the data or we can say to find out the value wherein most of the data points are clustering around.
But central value alone can’t explain you any data set very clearly and precisely. It will give you just half information. If you want complete information then you need to include information of variation. So to measure central tendency we calculate or use mean, median or mode and if I want to calculate variation then I will have to calculate 1) Range, 2) Quartile, 3) Inter Quartile Range, 4) Variance, 5) Standard Deviation. 6) Stability Factor

Additionally, it is advisable that you also visit and subscribe Advance Innovation Group Blog (http://advanceinnovationgroup.com/blog) for more Lean Six Sigma Projects, Case Studies on Lean Six Sigma, Lean Six Sigma Videos, Lean Six Sigma Discussions, Lean Six Sigma Jobs etc.

Published in: Business
  • Be the first to comment

Variance and Standard Deviation

  1. 1. Overview Central Tendency Mean Median Mode Variation Range Quartiles Inter Quartile Range (IQR) Stability Factor Variance Standard Deviation Basic Statistics
  2. 2. Variation - Range RANGE One of the measurements of variation in the data is RANGE, it is the simplest measure of variation and is the difference between the highest and the lowest values. It represents the End to End spread of the data, how many data points are we replacing when we move from the highest value to the lowest value or from the lowest to the highest value.- RANGE Maximum Minimum 2 12 14 5 4 9 10 21 Here, maximum is 21 and minimum is 2, hence the value of Range would be (21-2), which is 19. That means, when we are moving from the highest to the lowest value, or from the lowest to the highest value, we are moving by 19 data points. Range should be targeted to be kept to the Lowest possible.
  3. 3. Variation - Quartiles QUARTILES Quartiles, as the name says, divide the data in 4 parts, each covering 25% data. To divide the data in 4 Parts, there are 3 Quartiles, Q1, Q2 and Q3 respectively. To calculate Quartiles, we need to arrange/sort the data in Ascending or Descending order, after the data is ranked, the first Quartile from the lowest value side is Q1, below which is 25% of data 75% of data is above it. Q2 is the Mid Point of the ranked data, which is also the Median, 50% of the data is above and below the Q2. Q3 is the last Quartile from the lowest value side or the First Quartile form the Highest value side, below Q3 is 75% data and above it is 25%. Q3 Q2 Q1 MEDIAN 25% 25% 25% 25% MIN MAX data point; 4 N 1 Q th 1       data point; 4 N 1 Q 2 th 2        data point 4 N 1 Q 3 th 3        NOTE: The above formulae give the position of the Quartiles, and we will need to calculate the values from the data
  4. 4. Variation -Quartiles QUARTILES •FirstQuartile(designatedQ1)=lowerquartile=splitslowest25%ofdata=25thpercentile First Quartile (Q1) •SecondQuartile(designatedQ2)=median=cutsdatasetinhalf=50thpercentile Second Quartile (Q2) •Third Quartile(designated Q3) = upper quartile= splits highest 25% of data, or lowest 75% = 75th percentile Third Quartile (Q3)
  5. 5. Variation - Quartiles QUARTILES Example Lets consider a simple example, lets say, we have a data set having data points as 4, 2, 8, 6, 10 and we need to calculate the Quartiles for this data set., First we will need to arrange the data in Ascending or descending order. After arranging the data in Ascending order, using the formulae, we will calculate the position of the Quartiles, here in our example, there are 5 data points, so using the formulae for Quartiles, we have 1.5 data point 4 6 4 5 1 Q th th th 1             Similarly, we can calculate the position of Q2 and Q3 which will come to 3rd and 4.5th data point respectively
  6. 6. Variation -Quartiles QUARTILES Example Letsconsiderasimpleexample,letssay,wehaveadatasethavingdatapointsas4,2,8,6,10andweneedtocalculatetheQuartilesforthisdataset.,FirstwewillneedtoarrangethedatainAscendingordescendingorder. AfterarrangingthedatainAscendingorder,usingtheformulae,wewillcalculatethepositionoftheQuartiles,hereinourexample,thereare5datapoints,sousingtheformulaeforQuartiles,wehave •Inthisexample,Q1,the1.5thdatapointwouldbe3,Q2(Median),the3rddatapointis6andQ3, the4.5thDatapointis9 2, 4, 6, 8, 10 Q1 (1.5th) Q3 (4.5th) Q2 (3rd)
  7. 7. Variation - Quartiles Inter Quartile Range (IQR) IQR is the spread of the MID 50 % of the data, when the data is ranked in Ascending or Descending order. It is the difference between the third and the first quartile   3 1 IQR  Q Q Q3 Q2 Q1 MEDIAN 25% 25% 25% 25% MIN MAX IQR
  8. 8. Variation - Quartiles Inter Quartile Range (IQR) - Example Lets move ahead with the example of data set that we considered while calculating the Quartiles, we Calculated Q1 as 3 and Q3 as 9, hence, the Inter Quartile Range (IQR), here would be (Q3 – Q1), which would be:   3 1 IQR  Q Q IQR  9 3  6 which implies that, the difference between the Third and the First Quartile is 6 OR the spread of the MID 50% data is 6
  9. 9. Variation – Stability Factor Stability Factor Stability Factor (Sf), as the name suggests ,reflects the Stability of the process, closer the value of the Stability Factor to 1, more stable the process is.          3 1 Q Q Stability Factor Stability Factor Closer to 1 means that Q1 and Q3 are closer to each other Stability Factor exactly 1 means that Q1 and Q3 are equal Stability Factor closer to 0 (zero) means that Q1 and Q3 are away from each other NOTE: It is again a measure of MID 50% of the ranked data
  10. 10. Variation – Stability Factor Stability Factor - Example Lets consider the Q1 and Q3 calculated in the previous example: Q1 = 3 Q3 = 9 .333 9 3 Factor Stability       Stability Factor away from 1, means the Q1 and Q3 are away from each other, i.e. there is huge variation between the MID 50% data
  11. 11. Let’s assume, we have a data set, wherein the data points are x1, x2, x3, x4…….. xn Mean is x and Total number of data points is n Variation - Variance VARIANCE Variance, is the measure of dispersion (spread) of data points from MEAN, to calculate variance, we calculate the Squared Distance of all the data points from MEAN ( x ), take their summation and divide the sum by (n-1), which is the Degree of Freedom. (n 1) (x x) n -1 Sum of Squares Variance n i 1 2          (n 1) (x x ) (x x ) (x x )......(x x ) Variance 1 2 3 n         VARIANCE is the average of Squared distance of all the data points from MEAN
  12. 12. Variation –Variance VARIANCE –Calculation Steps Calculate the mean of all the data points (Xbar) Calculate the difference between each data point and the average (Xi–Xbar) Square those figures for all data points Add the squared values together (a value called the sum of squares in statistics) Divide that total by n-1 (the number of data values minus 1)
  13. 13. Variation – Standard Deviation STANDARD DEVIATION Standard Deviation is the most preferred measure of dispersion as is enjoys contribution from all the data points. To calculate Standard Deviation , we take Square Root of Variation to Nullify the Impact of Squared/Amplified values (n 1) (x x) n -1 Sum of Squares Standard Deviaton n i 1 2          (n 1) (x x ) (x x ) (x x ) ...... (x x ) Standard Deviation ( ) 1 2 3 n            STANDARD DEVIATION is the average distance of all the data points from MEAN
  14. 14. Advance Innovation Group www.advanceinnovationgroup.com E-26, Sector 8 Noida, UP –201301 India Advance Innovation Group 3 continents. One team. AIG is headquartered in Boston, Massachusetts and maintains several consulting and training delivery centers across Asia Pacificincluding India. Asia Pacific operations is headquartered at Noida, India with several offices and training facilities. Global offices allow us closer client contact to better serve your needs, while enriching our services with global perspective and experience.

×