Applied Statistics and DOEMayank
Applied StatisticsMeasures of central tendency (central position of data)µMeanPopulation :Sample:MedianModeMeasures of dispersion (spread of data)Varianceσ2s2Population :Sample:Standard deviationσsPopulation :Sample:Coefficient of variation
Measures of Central tendencyData: 34, 43, 81, 106, 106 and 115MeanAverage         Σx/n			=80.83ModeHighest frequency 			 =106MedianMiddle score     (81+106)/2		=93.5
Measures of dispersionVariance:Standard deviation:xSSSS/(n-1)MSsd√MSMost of the data lies between 44.5±4,57 = 39to 49
Measures of dispersionCoefficient of VarianceCV = s/    *100%    4.57/44.5*100% = 10.28%Standard deviation is 10.28% of the mean
Measures of dispersionNormal DistributionExample: IQ Score
Measures of dispersionNormal DistributionIQ ScoreCountScore<55115130145100857055145<
Measures of dispersionNormal Distribution34.13%34.13%Probability13.59%13.59 %Score2.14%2.14%0.13%0.13%0.0031%0.0031%0.000028%0.000028%Sd from-6σ-5σ-4σ-2σ-1σ1σ2σ3σ-3σ5σ6σ4σμ68.2689%95.4499%99.7300%99.9936%99.999942669%99.999999802%
Measures of dispersionNormal DistributionSix SigmaDPMODPHOLSLUSLSd from-6σ-5σ-4σ-2σ-1σ1σ2σ3σ-3σ5σ6σ4σμ99.999999802%
Measures of dispersionNormal DistributionLSLLSLUSLUSL
Measures of dispersionNormal Distribution1.5 σLSLUSL3.4 DMPO-6σ-5σ-4σ-2σ-1σ1σ2σ3σ4σ-3σ5σ6σμ
Statistical significance testsSignificance tests Z- test t- test F- test ANOVA
Statistical significance testsZ - test Z-value :How many standard deviations away from mean?+ve z:	values are above the mean, -ve z:	values are below the meanPopulationSampleGroup compared to population1 point compared to population
Statistical significance testsZ - test Sample :BMIMean 	     (   )			= 26.20Standard deviation      (s)		= 6.57What is the probability that of a person having BMI 19.2 sdbelow the mean19.2 sd above the meanA person with a BMI of 19.2 has a z score of:So this person has a BMI 1.07 standard deviations below the mean
Statistical significance testsZ - test Sample :Probability<19.6>19.6Sd16 %84 %-1σμStandard deviationZ score0-1
Statistical significance testsZ - test Population :Test group	: Employee having two wheelerTest		: Commuting time from home to BioconClaim		: Average commuting time is less than 24 minAt 0.01 level of significance (α=0.01):Is there enough evidence to support the research claim???Samples		: 3018	16	23	19	25	48	13	17	20	2316	21	18	16	29	15	8	19	20	715	16	24	15	6	11	14	23	18	12
Statistical significance testsZ - test Population :Assumption: Population is normally distributed ProbabilityScore24MeanX
Statistical significance testsZ - test Population :Hypothesis testingTest vs PopulationComparison of means:Null hypothesis		: H0No difference (Claim not true)H0 :    x  ≥ µµ = 24Alternate hypothesis	: H1It is different (Claim is true)H1 :    x  < µ
Statistical significance testsZ - test Population :ProbabilityProbability24MeanXZ valueScoreLevel of significanceα  = 0.01CriticalvalueZ0-2.33
Statistical significance testsZ - test Population :Ztest< ZcriticalZtest>ZcriticalRejection regionAcceptance region-2.33Z     = 18.2s   =  7.7Z  = - 4.13µ  = 24n  = 30
Statistical significance testsZ - test Population :Rejection region-2.33- 4.13ZSo is test value is significantly different (lower) than the mean Yes: There are significant evidence to reject the null hypothesisH0  :  s  ≥ 24Rejectedand therefore accept the claimH1  :  s  < 24Significantly supported
Statistical significance testst - testComparison of means between two groupsH0: H1: Null hypothesis will be rejectedttest > tcriticalNull hypothesis will not be rejectedttest < tcritical
Statistical significance testst - testComparison of means between two groupsSignalDifference between group meanst = =NoiseVariability of groups
Statistical significance testst - testEffect of fertilizer on plant heightCase 1Fertilizerw/o Fertilizer27.15 – 17.9t test = = 2.4t critical with 38 df	at 0.05 significance level= 2.03Plant heightdf = 2n-2ttest > tcriticalSo                 is significantly different from H0: RejectedH1: s2
Statistical significance testst - testCase 2Fertilizerw/o Fertilizert critical  =2.031.3t test  = Plant heightttest < tcriticalSo                 is  not significantly different from H0: Not rejectedRejectedH1: s2
Statistical significance testst - testOverview
Statistical significance testsF - testComparison of  variances  where       and       are the sample variancesF =The F hypothesis test is defined as:H0:           =RejectedHa: <>≠If Ftest > Fcritical (at significant level)
Statistical significance testsANOVAANalysisOf VArianceOne way : Effect of one factor (variable)Two way : Effect of two factors (variables)
 Effect of interactionStatistical significance testsOne way ANOVAStrategy:Compare variability within group MSwg to between groups MSbgMSbgF = MSwgGroup 1Group 1Group 2Group 2Between groupsWithin groups
Statistical significance testsOne way ANOVAIs there any impact of exam room temperature on student performance?Factor ( Independent Variable): Temperature (cold, optimum, hot)Effect ( Dependent Variable): Score (marks obtained)Null hypothesis (H0)	:	No effect 		(µ1= µ2 = µ3)Alternate hypothesis (H1)	:	There is an effect 	(µ1 ≠ µ2 ≠ µ3)
Statistical significance testsOne way ANOVACOHColdOptHotNumber of AttendeesSS=  X̄
Statistical significance testsOne way ANOVAMSbg==F = 6.40MSwgFcriticalfor Numerator degrees of freedom	: 2Denominator degrees of freedom	: 33 At significance level (α) 		: 0.05=4.17Ftest > FcriticalSo there are enough evidence to reject null hypothesisH0: All means are same (no effect of Temperature)RejectedAt 95% confidence level we can say:That the variation between means is not just by chanceExamination Room temperature matters significantly
Statistical significance testsTwo way ANOVAFactors ( Independent Variable): 1) Gender:Man	Woman2) Type of sport Indoor      OutdoorEffect ( Dependent Variable): 1) Number of participantsRelative impact of gender or type of sprot?Any interaction between gender and type of sport?Null hypothesis (H0a)	:	No effect of gender Null hypothesis (H0b)	:	No effect of type of sportNull hypothesis (H0c)	:	No interaction Alternate hypothesis (H1)	:	There is an effect
Statistical significance testsTwo way ANOVAMan			 Woman  s↓g->IndoorOutdoor
Statistical significance testsTwo way ANOVAIndoor                   OutdoorNull hypothesis (H0a)	:	No effect of genderRejectedRejectedNull hypothesis (H0b)	:	No effect of type of sportsRejectedNull hypothesis (H0c)	:	No interaction
Statistical significance testsTwo way ANOVAFactors ( Independent Variable): 1) Temperature:30	352) pH  5      	 7Effect ( Dependent Variable): 1) Total product (g)pH 7pH 530o C                 35o C
Regression and correlationRegression analysis:Investigation of relationship between variables
Regression and correlationRegression analysis:Investigation of relationship between variablesy = -0.951x + 50.49y = ax +bR² = 0.955One independent variableSimple linear regression
Regression and correlationRegression analysis:Simple linear regressiony = ax + bNon linearMultiple linear regressiony = a1x1+ a2x2+ a11 x2 + a12 x1x2+by = a1x1+ a2x2+ a3x3+ bLinearNon Linear
Regression and correlationCorrelation analysis:To find how well (or badly) a line fits the observationWhat is the strength of this relationship- r2 (coefficient of determination) or adjusted r2Is the relationship we have described statistically significant?-Significant tests
Regression and correlationCorrelation analysis:ŷ = ax + binterceptslopeε=  ŷ, predicted value=  y i, true valueε   =residual error   =y - ŷA and b values are calculated that minimize Sum of Squares (SS) of residuals =Σ (y – ŷ)2  : minimum
Regression and correlationCorrelation analysis:r2 : Coefficient of determinationErrorTotal(yi – y)2(y – ŷ)2Always between 0 and 1Increase with number of predictorSSError= 1-r2SSTotalIt can be negative alsoSSError/(n-p-1)Adjusted r2= 1- True representative of relationship strengthSSTotal/(n-1)n= total observationp= Number of predictor
MSbgMSModel==FFMSwgMSErrorGroup 1Group 1Group 2Group 2Regression and correlationCorrelation analysis:Statistical significance of relationshipErrorModel
Design of experimentTraditional methodOne factor at time (OFAT)Statistical methodMultiple factor at time (MFAT)
Design of experiment
Design of experimentHow to select a design?
Design of experiment- terminologyIndependent variable/sFactorsContinuousNumeric: any value between lower and upper valueeg. Temperature, pH, concentrationCategoricalNumeric/non-numeric : only characters or levelseg. Gender, operator, type, temperatureLevels-1(lower)+1(higher)0(middle)Range of a factor/sEffectsDependent variable/s: ResponseMain effect/sEffect/s due to individual factor/sInteraction effect/sEffect/s due to interaction of multiple factorsConfounding/AliasingWhen two or more effects can not be distinguishedeg. Main effect is confounded with interaction effects      Main effects and interaction effects are aliased
Design of experimentResolution of a designPower of a designHigher order interaction are less significant than lower order interaction
Design of experimentFactorial designFactorLfFull factorial:Level
Design of experimentFactorial design22ba4 experiments
Design of experimentFactorial design23cba8 experiments
Design of experimentFactorial design32ba9 experiments
Design of experimentFactorial design33cb27 experiments
Design of experimentFractional Factorial design23-1238 experiments4 experiments
Design of experimentResponse surface methodology
Design of experimentGeometry of some important response surface designsBox - Behnkeneg. 3 factor 3 level12 experiments
Design of experimentGeometry of some important response surface designsCentral composite designeg. 2 factor 2level+=
Design of experimentGeometry of some important response surface designsTaguchi designSignalMedia, pH, feed rateInner array:Controllable variables during productionOuter array:Uncontrollable variables during productionNoiseTemp, DO,

Applied Statistics And Doe Mayank

  • 1.
  • 2.
    Applied StatisticsMeasures ofcentral tendency (central position of data)µMeanPopulation :Sample:MedianModeMeasures of dispersion (spread of data)Varianceσ2s2Population :Sample:Standard deviationσsPopulation :Sample:Coefficient of variation
  • 3.
    Measures of CentraltendencyData: 34, 43, 81, 106, 106 and 115MeanAverage Σx/n =80.83ModeHighest frequency =106MedianMiddle score (81+106)/2 =93.5
  • 4.
    Measures of dispersionVariance:Standarddeviation:xSSSS/(n-1)MSsd√MSMost of the data lies between 44.5±4,57 = 39to 49
  • 5.
    Measures of dispersionCoefficientof VarianceCV = s/ *100%    4.57/44.5*100% = 10.28%Standard deviation is 10.28% of the mean
  • 6.
    Measures of dispersionNormalDistributionExample: IQ Score
  • 7.
    Measures of dispersionNormalDistributionIQ ScoreCountScore<55115130145100857055145<
  • 8.
    Measures of dispersionNormalDistribution34.13%34.13%Probability13.59%13.59 %Score2.14%2.14%0.13%0.13%0.0031%0.0031%0.000028%0.000028%Sd from-6σ-5σ-4σ-2σ-1σ1σ2σ3σ-3σ5σ6σ4σμ68.2689%95.4499%99.7300%99.9936%99.999942669%99.999999802%
  • 9.
    Measures of dispersionNormalDistributionSix SigmaDPMODPHOLSLUSLSd from-6σ-5σ-4σ-2σ-1σ1σ2σ3σ-3σ5σ6σ4σμ99.999999802%
  • 10.
    Measures of dispersionNormalDistributionLSLLSLUSLUSL
  • 11.
    Measures of dispersionNormalDistribution1.5 σLSLUSL3.4 DMPO-6σ-5σ-4σ-2σ-1σ1σ2σ3σ4σ-3σ5σ6σμ
  • 12.
    Statistical significance testsSignificancetests Z- test t- test F- test ANOVA
  • 13.
    Statistical significance testsZ- test Z-value :How many standard deviations away from mean?+ve z: values are above the mean, -ve z: values are below the meanPopulationSampleGroup compared to population1 point compared to population
  • 14.
    Statistical significance testsZ- test Sample :BMIMean ( ) = 26.20Standard deviation (s) = 6.57What is the probability that of a person having BMI 19.2 sdbelow the mean19.2 sd above the meanA person with a BMI of 19.2 has a z score of:So this person has a BMI 1.07 standard deviations below the mean
  • 15.
    Statistical significance testsZ- test Sample :Probability<19.6>19.6Sd16 %84 %-1σμStandard deviationZ score0-1
  • 16.
    Statistical significance testsZ- test Population :Test group : Employee having two wheelerTest : Commuting time from home to BioconClaim : Average commuting time is less than 24 minAt 0.01 level of significance (α=0.01):Is there enough evidence to support the research claim???Samples : 3018 16 23 19 25 48 13 17 20 2316 21 18 16 29 15 8 19 20 715 16 24 15 6 11 14 23 18 12
  • 17.
    Statistical significance testsZ- test Population :Assumption: Population is normally distributed ProbabilityScore24MeanX
  • 18.
    Statistical significance testsZ- test Population :Hypothesis testingTest vs PopulationComparison of means:Null hypothesis : H0No difference (Claim not true)H0 : x ≥ µµ = 24Alternate hypothesis : H1It is different (Claim is true)H1 : x < µ
  • 19.
    Statistical significance testsZ- test Population :ProbabilityProbability24MeanXZ valueScoreLevel of significanceα = 0.01CriticalvalueZ0-2.33
  • 20.
    Statistical significance testsZ- test Population :Ztest< ZcriticalZtest>ZcriticalRejection regionAcceptance region-2.33Z = 18.2s = 7.7Z = - 4.13µ = 24n = 30
  • 21.
    Statistical significance testsZ- test Population :Rejection region-2.33- 4.13ZSo is test value is significantly different (lower) than the mean Yes: There are significant evidence to reject the null hypothesisH0 : s ≥ 24Rejectedand therefore accept the claimH1 : s < 24Significantly supported
  • 22.
    Statistical significance testst- testComparison of means between two groupsH0: H1: Null hypothesis will be rejectedttest > tcriticalNull hypothesis will not be rejectedttest < tcritical
  • 23.
    Statistical significance testst- testComparison of means between two groupsSignalDifference between group meanst = =NoiseVariability of groups
  • 24.
    Statistical significance testst- testEffect of fertilizer on plant heightCase 1Fertilizerw/o Fertilizer27.15 – 17.9t test = = 2.4t critical with 38 df at 0.05 significance level= 2.03Plant heightdf = 2n-2ttest > tcriticalSo is significantly different from H0: RejectedH1: s2
  • 25.
    Statistical significance testst- testCase 2Fertilizerw/o Fertilizert critical =2.031.3t test = Plant heightttest < tcriticalSo is not significantly different from H0: Not rejectedRejectedH1: s2
  • 26.
  • 27.
    Statistical significance testsF- testComparison of variances where and are the sample variancesF =The F hypothesis test is defined as:H0: =RejectedHa: <>≠If Ftest > Fcritical (at significant level)
  • 28.
    Statistical significance testsANOVAANalysisOfVArianceOne way : Effect of one factor (variable)Two way : Effect of two factors (variables)
  • 29.
    Effect ofinteractionStatistical significance testsOne way ANOVAStrategy:Compare variability within group MSwg to between groups MSbgMSbgF = MSwgGroup 1Group 1Group 2Group 2Between groupsWithin groups
  • 30.
    Statistical significance testsOneway ANOVAIs there any impact of exam room temperature on student performance?Factor ( Independent Variable): Temperature (cold, optimum, hot)Effect ( Dependent Variable): Score (marks obtained)Null hypothesis (H0) : No effect (µ1= µ2 = µ3)Alternate hypothesis (H1) : There is an effect (µ1 ≠ µ2 ≠ µ3)
  • 31.
    Statistical significance testsOneway ANOVACOHColdOptHotNumber of AttendeesSS= X̄
  • 32.
    Statistical significance testsOneway ANOVAMSbg==F = 6.40MSwgFcriticalfor Numerator degrees of freedom : 2Denominator degrees of freedom : 33 At significance level (α) : 0.05=4.17Ftest > FcriticalSo there are enough evidence to reject null hypothesisH0: All means are same (no effect of Temperature)RejectedAt 95% confidence level we can say:That the variation between means is not just by chanceExamination Room temperature matters significantly
  • 33.
    Statistical significance testsTwoway ANOVAFactors ( Independent Variable): 1) Gender:Man Woman2) Type of sport Indoor OutdoorEffect ( Dependent Variable): 1) Number of participantsRelative impact of gender or type of sprot?Any interaction between gender and type of sport?Null hypothesis (H0a) : No effect of gender Null hypothesis (H0b) : No effect of type of sportNull hypothesis (H0c) : No interaction Alternate hypothesis (H1) : There is an effect
  • 34.
    Statistical significance testsTwoway ANOVAMan Woman s↓g->IndoorOutdoor
  • 35.
    Statistical significance testsTwoway ANOVAIndoor OutdoorNull hypothesis (H0a) : No effect of genderRejectedRejectedNull hypothesis (H0b) : No effect of type of sportsRejectedNull hypothesis (H0c) : No interaction
  • 36.
    Statistical significance testsTwoway ANOVAFactors ( Independent Variable): 1) Temperature:30 352) pH 5 7Effect ( Dependent Variable): 1) Total product (g)pH 7pH 530o C 35o C
  • 37.
    Regression and correlationRegressionanalysis:Investigation of relationship between variables
  • 38.
    Regression and correlationRegressionanalysis:Investigation of relationship between variablesy = -0.951x + 50.49y = ax +bR² = 0.955One independent variableSimple linear regression
  • 39.
    Regression and correlationRegressionanalysis:Simple linear regressiony = ax + bNon linearMultiple linear regressiony = a1x1+ a2x2+ a11 x2 + a12 x1x2+by = a1x1+ a2x2+ a3x3+ bLinearNon Linear
  • 40.
    Regression and correlationCorrelationanalysis:To find how well (or badly) a line fits the observationWhat is the strength of this relationship- r2 (coefficient of determination) or adjusted r2Is the relationship we have described statistically significant?-Significant tests
  • 41.
    Regression and correlationCorrelationanalysis:ŷ = ax + binterceptslopeε= ŷ, predicted value= y i, true valueε =residual error =y - ŷA and b values are calculated that minimize Sum of Squares (SS) of residuals =Σ (y – ŷ)2 : minimum
  • 42.
    Regression and correlationCorrelationanalysis:r2 : Coefficient of determinationErrorTotal(yi – y)2(y – ŷ)2Always between 0 and 1Increase with number of predictorSSError= 1-r2SSTotalIt can be negative alsoSSError/(n-p-1)Adjusted r2= 1- True representative of relationship strengthSSTotal/(n-1)n= total observationp= Number of predictor
  • 43.
    MSbgMSModel==FFMSwgMSErrorGroup 1Group 1Group2Group 2Regression and correlationCorrelation analysis:Statistical significance of relationshipErrorModel
  • 44.
    Design of experimentTraditionalmethodOne factor at time (OFAT)Statistical methodMultiple factor at time (MFAT)
  • 45.
  • 46.
    Design of experimentHowto select a design?
  • 47.
    Design of experiment-terminologyIndependent variable/sFactorsContinuousNumeric: any value between lower and upper valueeg. Temperature, pH, concentrationCategoricalNumeric/non-numeric : only characters or levelseg. Gender, operator, type, temperatureLevels-1(lower)+1(higher)0(middle)Range of a factor/sEffectsDependent variable/s: ResponseMain effect/sEffect/s due to individual factor/sInteraction effect/sEffect/s due to interaction of multiple factorsConfounding/AliasingWhen two or more effects can not be distinguishedeg. Main effect is confounded with interaction effects Main effects and interaction effects are aliased
  • 48.
    Design of experimentResolutionof a designPower of a designHigher order interaction are less significant than lower order interaction
  • 49.
    Design of experimentFactorialdesignFactorLfFull factorial:Level
  • 50.
    Design of experimentFactorialdesign22ba4 experiments
  • 51.
    Design of experimentFactorialdesign23cba8 experiments
  • 52.
    Design of experimentFactorialdesign32ba9 experiments
  • 53.
    Design of experimentFactorialdesign33cb27 experiments
  • 54.
    Design of experimentFractionalFactorial design23-1238 experiments4 experiments
  • 55.
    Design of experimentResponsesurface methodology
  • 56.
    Design of experimentGeometryof some important response surface designsBox - Behnkeneg. 3 factor 3 level12 experiments
  • 57.
    Design of experimentGeometryof some important response surface designsCentral composite designeg. 2 factor 2level+=
  • 58.
    Design of experimentGeometryof some important response surface designsTaguchi designSignalMedia, pH, feed rateInner array:Controllable variables during productionOuter array:Uncontrollable variables during productionNoiseTemp, DO,