1.
Statistics For Management1. Introduction2. Statistical Survey3. Classification, Tabulation & Presentation of data4. Measures used to summarise data5. Probabilities6. Theoretical Distributions7. Sampling & Sampling Distributions8. Estimation9. Testing of Hypothesis in case of large & small samples10. Chi-Square11. F-Distribution and Analysis of variance (ANOVA)12. Simple correlation and Regression13. Business Forecasting14. Time Series Analysis15 . Index NumbersIndian B2B site for Manufacturers & ExportersQ: What’s the definition of Statistics ?A : Statistics are usually defined as:1. A collection of numerical data that measure something.2. The science of recording, organising, analysing and reportingquantitative information.Q: What are the different components of statistics ?A: There are four components as per Croxton & Cowden1. Collection of Data.2. Presentation of Data
2.
3. Analysis of Data4. Interpretation of DataQ: What’s the use of Correlation & Regression ?A: Correlation & Regression is a statistical tools, are used to measurestrength of relationships between two variables.Q. What is the need for Statistics?Statistics gives us a technique to obtain, condense, analyze and relatenumerical data. Statistical methods are of a supreme value ineducation and psychology.Q. How is statistics used in everyday life?Statistics are everywhere, election predictions are statistics, anythingfood product that says they x% more or less of a certain ingredient is astatistic. Life expectancy is a statistic. If you play card games cardcounting is using statistics. There are tons of statistics everywhere youlook.Statistical Survey1. What is statistical survey?Statistical surveys are used to collect quantitative informationabout items in a population. A survey may focus on opinions or factualinformation depending on its purpose, and many surveys involveadministering questions to individuals. When the questions areadministered by a researcher, the survey is called a structuredinterview or a researcher-administered survey. When the questionsare administered by the respondent, the survey is referred to asa questionnaire or a self-administered survey.2. What are the advantages of survey?
3.
Efficient way of collecting information Wide range of information can be collected Easy to administer Cheaper to run3. What are the disadvantages of survey? Responses may be subjective Motivation may be low to answer Errors due to sampling If the question is not specific, it may lead to vague data.4. What are the various modes of data collection? Telephone Mail Online surveys Personal survey Mall intercept survey5. What is sampling?―Sampling‖ basically means selecting people/objects from a―population‖ in order to test the population for something. Forexample, we might want to find out how people are going to vote at thenext election. Obviously we can‘t ask everyone in the country, so weask a sample.Classification, Tabulation & Presentation of data1. What are the types of data collection?Qualitative Data Nominal, Attributable or Categorical data Ordinal or Ranked data
4.
Quantitative or Interval data Discrete data Continuous measurements2. What is tabulation of data?Tabulation refers to the systematic arrangement of the information inrows and columns. Rows are the horizontal arrangement. In simplewords, tabulation is a layout of figures in rectangular form withappropriate headings to explain different rows and columns. The mainpurpose of the table is to simplify the presentation and to facilitatecomparisons.3. What is presentation of data?Descriptive statistics can be illustrated in an understandable fashionby presenting them graphically using statistical and data presentationtools.4. What are the different elements of tabulation?Tabulation: Table Number Title Captions and Stubs Headnotes Body Source5. What are the forms of presentation of the data?Grouped and ungrouped data may be presented as :
5.
Pie Charts Frequency Histograms Frequency Polygons Ogives BoxplotsMeasures used to summarise data1. What are the measures of summarizing data? Measures of Central tendency: Mean, median, mode Measures of Dispersion: Range, Variance, Standard Deviation2. Define mean, median, and mode?Mean: The mean value is what we typically call the ―average.‖ Youcalculate the mean by adding up all of the measurements in a groupand then dividing by the number of measurements.Median: Median is the middle most value in a series when arranged inascending or descending orderMode: The most repeated value in a series.3. Which measure of central tendency is to be used?The measure to be used differs in different contexts. If your resultsinvolve categories instead of continuous numbers, then the bestmeasure of central tendency will probably be the most frequentoutcome (the mode). On the other hand, sometimes it is an advantageto have a measure of central tendency that is less sensitive to changesin the extremes of the data.4. Define range, variance and standard deviation?The range is defined by the smallest and largest data values in the set.
6.
Variance: The variance (σ ) is a measure of how far each value in the 2data set is from the mean.Standard Deviation: it is the square root of the variance.5. How can standard deviation be used?The standard deviation has proven to be an extremely useful measureof spread in part because it is mathematically tractable.Probablity1. What is Probability?Probability is a way of expressing knowledge or belief that an eventwill occur or has occurred.2. What is a random experiment?An experiment is said to be a random experiment, if it‘s out-comecan‘t be predicted with certainty.3. What is a sample space?The set of all possible out-comes of an experiment is called the samplespace. It is denoted by ‗S‘ and its number of elements are n(s).Example; In throwing a dice, the number that appears at top is anyone of 1,2,3,4,5,6. So here:S ={1,2,3,4,5,6} and n(s) = 6Similarly in the case of a coin, S={Head,Tail} or {H,T} and n(s)=2.4. What is an event? What are the different kinds of event?
7.
Event: Every subset of a sample space is an event. It is denoted by ‗E‘.Example: In throwing a dice S={1,2,3,4,5,6}, the appearance of anevent number will be the event E={2,4,6}.Clearly E is a sub set of S.Simple event: An event, consisting of a single sample point is calleda simple event.Example: In throwing a dice, S={1,2,3,4,5,6}, so each of{1},{2},{3},{4},{5} and {6} are simple events.Compound event: A subset of the sample space, which has morethan on element is called a mixed event.Example: In throwing a dice, the event of appearing of odd numbers isa compound event, because E={1,3,5} which has ‘3′ elements.5. What is the definition of probability?If ‗S‘ be the sample space, then the probability of occurrence of anevent ‗E‘ is defined as:P(E) = n(E)/N(S) =number of elements in ‗E‘number of elements in sample space ‗S‘Theoretical Distributions1. What are theoretical distributions?Theoretical distributions are based on mathematical formulae andlogic. It is used in statistics to define statistics. When empirical and
8.
theoretical distributions correspond, you can use the theoretical one todetermine probabilities of an outcome, which will lead to inferentialstatistics.2. What are the various types of theoretical distributions? Rectangular distribution (or Uniform Distribution) Binomial distribution Normal distribution3. Define rectangular distribution and binomialdistribution?Rectangular distribution: Distribution in which all possible scoreshave the same probability of occurrence.Binomial distribution: Distribution of the frequency of events that canhave only two possible outcomes.4. What is normal distribution?The normal distribution is a bell-shaped theoretical distribution thatpredicts the frequency of occurrence of chance events. The probabilityof an event or a group of events corresponds to the area of thetheoretical distribution associated with the event or group of event.The distribution is asymptotic: its line continually approaches butnever reaches a specified limit. The curve is symmetrical: half of thetotal area is to the left and the other half to the right.5. What is the central limit theorem?This theorem states that when an infinite number of successiverandom samples are taken from a population, the sampling
9.
distribution of the means of those samples will become approximatelynormally distributed with mean μ and standard deviation σ/√ N asthe same size (N) becomes larger, irrespective of the shape of thepopulation distribution.Sampling & Sampling Distributions1. What is sampling distribution?Suppose that we draw all possible samples of size n from a givenpopulation. Suppose further that we compute a statistic (mean,proportion, standard deviation) for each sample. The probabilitydistribution of this statistic is called Sampling Distribution.2. What is variability of a sampling distribution?The variability of sampling distribution is measured by its variance orits standard deviation. The variability of a sampling distributiondepends on three factors: N: the no. of observations in the population. n: the no. of observations in the sample The way that the random sample is chosen.3. How to create the sampling distribution of the mean?Suppose that we draw all possible samples of size n from a populationof size N. Suppose further that we compute a mean score for eachsample. In this way we create the sampling distribution of the mean.We know the following. The mean of the population (μ) is equal to themean of the sampling distribution (μ ). And the standard error of the xsampling distribution (σ ) is determined by the standard deviation of x
10.
the population (σ), the population size, and the sample size. Theserelationships are shown in the equations below:μ = μ and σ = σ * sqrt( 1/n – 1/N ) x x4. What is the sampling distribution of the population?In a population of size N, suppose that the probability of the occurenceof an event (dubbed a ―success‖) is P; and the probability of the event‘snon-occurence (dubbed a ―failure‖) is Q. From this population,suppose that we draw all possible samples of size n. And finally, withineach sample, suppose that we determine the proportion ofsuccesses p and failures q. In this way, we create a samplingdistribution of the proportion.5. Show the mathematical expression of the samplingdistribution of the population.We find that the mean of the sampling distribution of the proportion(μ ) is equal to the probability of success in the population (P). And the pstandard error of the sampling distribution (σ ) is determined by the pstandard deviation of the population (σ), the population size, and thesample size. These relationships are shown in the equations below:μ = P and σ = σ * sqrt( 1/n – 1/N ) = sqrt[ PQ/n - PQ/N ] p pwhere σ = sqrt[ PQ ].Estimation1. When will the sampling distribution be normallydistributed?Generally, the sampling distribution will be approximately normallydistributed if any of the following conditions apply. The population distribution is normal.
11.
The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.2. Get the variability of the sample mean.Suppose k possible samples of size n can be selected from a populationof size N. The standard deviation of the sampling distribution is the―average‖ deviation between the ksample means and the truepopulation mean, μ. The standard deviation of the sample mean σ is: xσ = σ * sqrt{ ( 1/n ) * ( 1 – n/N ) * [ N / ( N - 1 ) ] } xwhere σ is the standard deviation of the population, N is thepopulation size, and n is the sample size. When the population size ismuch larger (at least 10 times larger) than the sample size, thestandard deviation can be approximated by:σ = σ / sqrt( n ) x3. How can standard error of the population calculated?When the standard deviation of the population σ is unknown, thestandard deviation of the sampling distribution cannot be calculated.Under these circumstances, use the standard error. The standard error(SE) provides an unbiased estimate of the standard deviation. It canbe calculated from the equation below.SE = s * sqrt{ ( 1/n ) * ( 1 – n/N ) * [ N / ( N - 1 ) ] } xwhere s is the standard deviation of the sample, N is the populationsize, and n is the sample size. When the population size is much larger(at least 10 times larger) than the sample size, the standard error canbe approximated by:
12.
SE = s / sqrt( n ) x4. How to find the confidence interval of the mean? Identify a sample statistic. Use the sample mean to estimate thepopulation mean. Select a confidence level. The confidence level describes theuncertainty of a sampling method. Often, researchers choose 90%,95%, or 99% confidence levels; but any percentage can be used. Specify the confidence interval. The range of the confidence intervalis defined by the sample statistic + margin of error. And theuncertainty is denoted by the confidence level.Testing of Hypothesis in case of large & small samples1. What is a statistical hypothesis?A statistical hypothesis is an assumption about apopulation parameter. This assumption may or may not be true.2. What are the types of statistical hypothesis?There are two types of statistical hypotheses. Null hypothesis. The null hypothesis, denoted by H , is usually 0 the hypothesis that sample observations result purely from chance. Alternative hypothesis. The alternative hypothesis, denoted by H or H , is the hypothesis that sample observations are influenced 1 a by some non-random cause.3. What is hypothesis testing?Statisticians follow a formal process to determine whether to reject anull hypothesis, based on sample data. This process iscalled hypothesis testing.4. Define the steps of hypothesis testing?Hypothesis testing consists of four steps.
13.
State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false. Formulate an analysis plan. The analysis plan describes how to use sample data to evaluate the null hypothesis. The evaluation often focuses around a single test statistic. Analyze sample data. Find the value of the test statistic (mean score, proportion, t-score, z-score, etc.) described in the analysis plan. Interpret results. Apply the decision rule described in the analysis plan. If the value of the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.5. What are decision errors?Two types of errors can result from a hypothesis test. Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α. Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by β. The probability of not committing a Type II error is called the Power of the test.6. How to arrive at a decision on hypothesis?The decision rules can be taken in two ways – with reference to a P-value or with reference to a region of acceptance.
14.
P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The P-value is the probability of observing a test statistic as extreme as S, assuming the null hypotheis is true. If the P-value is less than the significance level, we reject the null hypothesis. Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level.The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the α level of significance.7. Explain one-tailed and two-tailed tests?A test of a statistical hypothesis, where the region of rejection is ononly one side of the sampling distribution, is called a one-tailed test.For example, suppose the null hypothesis states that the mean is lessthan or equal to 10. The alternative hypothesis would be that the meanis greater than 10. The region of rejection would consist of a range ofnumbers located located on the right side of sampling distribution;that is, a set of numbers greater than 10.A test of a statistical hypothesis, where the region of rejection is onboth sides of the sampling distribution, is called a two-tailed test.For example, suppose the null hypothesis states that the mean is equalto 10. The alternative hypothesis would be that the mean is less than10 or greater than 10. The region of rejection would consist of a rangeof numbers located located on both sides of sampling distribution;
15.
that is, the region of rejection would consist partly of numbers thatwere less than 10 and partly of numbers that were greater than 10.What is Chi Sqare in Statistics?Suppose Sachin plays 100 tests, and 20 times he made 50. Is he a goodplayer ?In statistics, the chi-square test calculates how well a series ofnumbers fits a distribution. In this module, we only test for whetherresults fit an even distribution. It doesn‘t simply say ―yes‖ or ―no‖.Instead, it gives you a confidence interval, which sets upper and lowerbounds on the likelihood that the variation in your data is due tochance.There are basically two types of random variables and they yield twotypes of data:numerical and categorical.A chi square (X2) statistic is used to investigate whether distributionsof categorical variables differ from one another. Basically categoricalvariable yield data in the categories and numerical variables yield datain numerical form.Responses to such questions as ―What is your major?‖ or Do you owna car?‖ are categorical because they yield data such as ―biology‖ or―no.‖ In contrast, responses to such questions as ―How tall are you?‖or ―What is your G.P.A.?‖ are numerical. Numerical data can be eitherdiscrete or continuous.Datatype Questiontype Possible answerCategorical Where are you from ? India / USA / UK / Any country
16.
Numerical How tall are you ? 70 inchesF-Distribution and Analysis of variance (ANOVA)1. What is ANOVA?Analysis of variance (ANOVA) is a collection of statistical models andtheir associated procedures in which the observed variance ispartitioned into components due to different sources of variation.ANOVA provides a statistical test of whether or not the means ofseveral groups are all equal.2. What are the assumption in ANOVA?The following assumptions are made to perform ANOVA: Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or ―homogeneity‖) of variances, called homoscedasticity — the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constant-variance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessary consequence of the randomized design and the assumption of unit treatment additivity (Hinkelmann and Kempthorne): If the responses of a randomized balanced experiment fail to have constant variance, then the assumption of unit treatment additivity is necessarily violated. It has been shown, however, that the F-test is robust to violations of this assumption.3. What is the logic of ANOVA?
17.
Partitioning of the sum of squaresThe fundamental technique is a partitioning of the total sum ofsquares (abbreviated SS) into components related to the effects usedin the model. For example, we show the model for a simplified ANOVAwith one type of treatment at different levels.So, the number of degrees of freedom (abbreviated df) can bepartitioned in a similar way and specifies the chi-square distributionwhich describes the associated sums of squares.4. What is the F-test?The F-test is used for comparisons of the components of the totaldeviation. For example, in one-way, or single-factor ANOVA,statistical significance is tested for by comparing the F test statisticwhere I = number of treatments
18.
and n = total number of cases Tto the F-distribution with I − 1,n − I degrees of freedom. Using the F- Tdistribution is a natural candidate because the test statistic is thequotient of two mean sums of squares which have a chi-squaredistribution.5. Why is ANOVA helpful?ANOVAs are helpful because they possess a certain advantage over atwo-sample t-test. Doing multiple two-sample t-tests would result in alargely increased chance of committing a type I error. For this reason,ANOVAs are useful in comparing three or more means.Simple correlation and Regression1. What is correlation?Correlation is a measure of association between two variables. Thevariables are not designated as dependent or independent.2. What can be the values for correlation coefficient?The value of a correlation coefficient can vary from -1 to +1. A -1indicates a perfect negative correlation and a +1 indicated a perfectpositive correlation. A correlation coefficient of zero means there is norelationship between the two variables.3. What is the interpretation of the correlation coefficientvalues?When there is a negative correlation between two variables, as thevalue of one variable increases, the value of the other variable
19.
decreases, and vise versa. In other words, for a negative correlation,the variables work opposite each other. When there is a positivecorrelation between two variables, as the value of one variableincreases, the value of the other variable also increases. The variablesmove together.4. What is simple regression?Simple regression is used to examine the relationship between onedependent and one independent variable. After performing ananalysis, the regression statistics can be used to predict the dependentvariable when the independent variable is known. Regression goesbeyond correlation by adding prediction capabilities.5. Explain the mathematical analysis of regression?In the regression equation, y is always the dependent variable and x isalways the independent variable. Here are three equivalent ways tomathematically describe a linear regression model.y = intercept + (slope x) + errory = constant + (coefficient x) + errory = a + bx + eThe significance of the slope of the regression line is determined fromthe t-statistic. It is the probability that the observed correlationcoefficient occurred by chance if the true correlation is zero. Some
20.
researchers prefer to report the F-ratio instead of the t-statistic. The F-ratio is equal to the t-statistic squared.Business Forecasting1. What is forecasting?Forecasting is a prediction of what will occur in the future, and it is anuncertain process. Because of the uncertainty, the accuracy of aforecast is as important as the outcome predicted by the forecast.2. What are the various business forecasting techniques?3. How to model the Causal time series?With multiple regressions, we can use more than one predictor. It isalways best, however, to be parsimonious, that is to use as fewvariables as predictors as necessary to get a reasonably accurateforecast. Multiple regressions are best modeled with commercialpackage such as SAS or SPSS. The forecast takes the form:Y = + X + X + . . .+ X , 0 1 1 2 2 n nwhere is the intercept, , , . . . are coefficients representing the 0 1 2 ncontribution of the independent variables X , X ,…, X . 1 2 n4. What are the various smoothing techniques?Simple Moving average: The best-known forecasting methods isthe moving averages or simply takes a certain number of past periods
21.
and add them together; then divide by the number of periods. SimpleMoving Averages (MA) is effective and efficient approach provided thetime series is stationary in both mean and variance. The followingformula is used in finding the moving average of order n, MA(n) for aperiod t+1,MA = [D + D + ... +D ] / n t+1 t t-1 t-n+1where n is the number of observations used in the calculation.Weighted Moving Average: Very powerful and economical. Theyare widely used where repeated forecasts required-uses methods likesum-of-the-digits and trend adjustment methods. As an example, aWeighted Moving Averages is:Weighted MA(3) = w .D + w .D + w .D 1 t 2 t-1 3 t-2where the weights are any positive numbers such that: w1 + w2 + w3 =1.5. Explain exponential smoothing techniques?Single Exponential Smoothing: It calculates the smoothed seriesas a damping coefficient times the actual series plus 1 minus thedamping coefficient times the lagged value of the smoothed series. Theextrapolated smoothed series is a constant, equal to the last value ofthe smoothed series during the period when actual data on theunderlying series are available.F = D + (1 - ) F t+1 t twhere: D is the actual value t F is the forecasted value t
22.
is the weighting factor, which ranges from 0 to 1 t is the current time period.Double Exponential Smoothing: It applies the process describedabove three to account for linear trend. The extrapolated series has aconstant growth rate, equal to the growth of the smoothed series at theend of the data period.6. What are time series models?A time series is a set of numbers that measures the status of someactivity over time. It is the historical record of some activity, withmeasurements taken at equally spaced intervals (exception: monthly)with a consistency in the activity and the method of measurement.Time Series Analysis1. What is time series forecasting?The time-series can be represented as a curve that evolve over time.Forecasting the time-series mean that we extend the historical valuesinto the future where the measurements are not available yet.2. What are the different models in time series forecasting? Simple moving average Weighted moving average Simple exponential smoothing Holt‘s double Exponential smoothing Winter‘s triple exponential smoothing Forecast by linear regression3. Explain simple moving average and weighted movingaverage models?
23.
Simple Moving average: The best-known forecasting methods isthe moving averages or simply takes a certain number of past periodsand add them together; then divide by the number of periods. SimpleMoving Averages (MA) is effective and efficient approach provided thetime series is stationary in both mean and variance. The followingformula is used in finding the moving average of order n, MA(n) for aperiod t+1,MA = [D + D + ... +D ] / n t+1 t t-1 t-n+1where n is the number of observations used in the calculation.Weighted Moving Average: Very powerful and economical. Theyare widely used where repeated forecasts required-uses methods likesum-of-the-digits and trend adjustment methods. As an example, aWeighted Moving Averages is:Weighted MA(3) = w .D + w .D + w .D 1 t 2 t-1 3 t-2where the weights are any positive numbers such that: w1 + w2 + w3 =1.4. Explain the exponential smoothing techniques?Single Exponential Smoothing: It calculates the smoothed seriesas a damping coefficient times the actual series plus 1 minus thedamping coefficient times the lagged value of the smoothed series. Theextrapolated smoothed series is a constant, equal to the last value ofthe smoothed series during the period when actual data on theunderlying series are available.F = a D + (1 - a) F t+1 t twhere:
24.
D is the actual value t F is the forecasted value t a is the weighting factor, which ranges from 0 to 1 t is the current time period.Double Exponential Smoothing: It applies the process describedabove three to account for linear trend. The extrapolated series has aconstant growth rate, equal to the growth of the smoothed series at theend of the data period.Triple exponential Smoothing: It applies the process describedabove three to account for nonlinear trend.5. How should one forecast by linear regression?Regression is the study of relationships among variables, a principalpurpose of which is to predict, or estimate the value of one variablefrom known or assumed values of other variables related to it.Types of AnalysisSimple Linear Regression: A regression using only one predictoris called a simple regression.Multiple Regression: Where there are two or more predictors,multiple regression analysis is employed.Index Numbers1. What are index numbers?Index numbers are used to measure changes in some quantity whichwe cannot observe directly. E.g changes in business activity.2. Describe the classification of index numbers?Index numbers are classified in terms of the variables that areintended to measure. In business, different groups of variables in the
25.
measurement of which index number techniques are commonly usedare i) price ii) quantity iii) value iv) Business activity3. What are simple and composite index numbers?Simple index numbers: A simple index number is a number thatmeasures a relative change in a single variable with respect to a base.Composite index numbers: A composite index number is anumber that measures an average relative change in a group of relativevariables with respect to a base.4. What are price index numbers?Price index numbers measure the relative changes in the prices ofcommodities between two periods. Prices can be retail or wholesale.5. What are quantity index numbers?These index numbers are considered to measure changes in thephysical quantity of goods produced, consumed, or sold of an item or agroup of items. About these adsLikeBe the first to like this.About Arindam ChakrabortyAsp.Net Developer & Architect. Apart from work i love photography, green trees,sea, music & sometimes i enjoy food & drinks. thats mehttp://etgconsultancy.com/View all posts by Arindam Chakraborty →
26.
This entry was posted in Statistics For Management and tagged Statistics For Management.Bookmark the permalink.
Be the first to comment