ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS.docx

1
ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS
QUESTION : probability distribution and its features
ANS: Probability Distribution :Probability distribution is a function that is used to give the probability of all the possible values that
a random variable can take. A discrete probability distribution can be described by a probability distribution function and a probability
mass function. Similarly, a probability distribution function and a probability density function are used to describe a continuous
probability distribution. Binomial, Bernoulli, normal, and geometric distributions are examples of probability distributions.
1. DISCRETE DISTRIBUTIONS:
Discrete distributions have finite number of different possible outcomes.
Characteristics of Discrete Distribution
 We can add up individual values to find out the probability of an interval
 Discrete distributions can be expressed with a graph, piece-wise function or table
 In discrete distributions, graph consists of bars lined up one after the other
 Expected values might not be achievable
 P(Y≤y) = P(Y < y + 1)
Examples of Discrete Distributions:
1. Bernoulli Distribution
2. Binomial Distribution
3. Uniform Distribution
4. Poisson Distribution
1.1 Bernoulli Distribution In Bernoulli distribution there is only one trial and only two possible outcomes i.e. success or failure. It is
denoted by y ~Bern(p).
Characteristics of Bernoulli distributions
 It consists of a single trial
 Two possible outcomes
 E(Y) = p
 Var(Y) = p × (1 – p)
1.2 Binomial Distribution A sequence of identical Bernoulli events is called Binomial and follows a Binomial distribution. It is
denoted by Y ~B(n, p).
Characteristics of Binomial distribution
 Over the n trials, it measures the frequency of occurrence of one of the possible result.
 E(Y) = n × p
 P(Y = y) = C(y, n) × py
× (1 – p)n-y
 Var(Y) = n × p × (1 – p)
1.3 Uniform Distribution In uniform distribution all the outcomes are equally likely. It is denoted by Y ~U(a, b). If the values are
categorical, we simply indicate the number of categories, like Y ~U(a).
Characteristics of Uniform Distribution
 In uniform distribution all the outcomes are equally likely.
 In graph, all the bars are equally tall
 The expected value and variance have no predictive power

2
1.4 Poisson Distribution Poisson distribution is used to determine how likelihood a certain event occur over a given interval of time
or distance. It is denoted by Y ~ Po( λ ).
Characteristics of poisson distribution
 It measures the frequency over an interval of time or distance.
2. CONTINUOUS DISTRIBUTIONS: Continuous distributions have infinite many consecutive possible values.
Characteristics of Continuous Distributions
 We cannot add up individual values to find out the probability of an interval because there are many of them
 Continuous distributions can be expressed with a continuous function or graph
 In continuous distributions, graph consists of a smooth curve
 To calculate the chance of an interval, we required integrals
 P(Y = y) = 0 for any distinct value y.
 P(Y<y) = P(Y ≤ y)
2.1 Normal Distribution It shows a distribution that most natural events follow. It is denoted by Y ~ (µ, σ2
). The main characteristics
of normal distribution are:
Characteristics of normal distribution
 Graph obtained from normal distribution is bell-shaped curve, symmetric and has shrill tails.
 68% of all its all values should fall in the interval, i.e. (µ – σ , µ+ σ )
 E(Y) = µ
 Var(Y) = σ2
2.2 Chi-Squared Distribution Chi-Squared distribution is frequently being used. It is mostly used to test wow of fit. It is denoted by
Y ~ X2
(k).
Characteristics of Chi-Squared distribution
 The graph obtained from Chi-Squared distribution is asymmetric and skewed to the right.
 It is square of the t-distribution.
 E(Y) = k
 Var(Y) = 2k
2.3 Exponential Distribution It is usually observed in events which considerably change early on. It is denoted by Y ~ Exp(λ).
Characteristics of exponential distribution
 Probability and Cumulative Distributed Functions (PDF & CDF) plateau after a certain point.
 We do not have a table to known the values like the Normal or Chi-Squared Distributions, therefore, we mostly used natural
logarithm to change the values of exponential distributions.

3
2.4 Logistic Distribution It is used to observe how continuous variable inputs can affect the probability of a binary result. It is
denoted by Y ~ Logistic(µ, s).
Characteristics of logistic distribution
 The Cumulative Distributed Function picks up when we reach values near the mean.
 The lesser the scale parameter, the faster it reaches values close to 1.
2.5 Students’ T Distribution :Students’ T Distribution or simply called T Distribution is used to estimate population limitation when
the sample size is small and population variance is not known. It is denoted by Y~ t(k).
Characteristics of Students’ T Distribution
 A small sample size estimation of a normal distribution
 Its graph is symmetric and bell-shaped curve, however, it has large tails.
QUESTION : WHEN WE CAN APPLY BINOMIAL DISTRIBUTION OVER POISSON DISTRIBUTION?
ANS : You should only use the binomial in fitting data when there is an upper limit to the number of possible successes. When N� is
large and p� is small, so that the probability of getting N successes is small, the binomial approaches the Poisson distribution. The
Poisson distribution has no upper limit, although values much larger than the mean value are highly improbable. This characteristic
provides a rule for choosing between the binomial and Poisson. If you expect to observe a “ceiling” on the number of counts, you
should use the binomial.
QUESTION :DIFFERENTIATE BETWEEN ONE WAY TWO WAY CLASSIFICATION OF VARIANCE
ANS : Key Differences Between One-Way and Two-Way ANOVA.The differences between one- way and two-way ANOVA can be
drawn clearly on the following grounds:
1. A hypothesis test that enables us to test the equality of three or more means simultaneously using variance is called One way
ANOVA. A statistical technique in which the interrelationship between factors, influencing variable can be studied for
effective decision making, is called Two-way ANOVA.
2. There is only one factor or independent variable in one way ANOVA whereas in the case of two-way ANOVA there are two
independent variables.
3. One-way ANOVA compares three or more levels (conditions) of one factor. On the other hand, two-way ANOVA compares
the effect of multiple levels of two factors.
4. In one-way ANOVA, the number of observations need not be same in each group whereas it should be same in the case of
two-way ANOVA.
5. One-way ANOVA need to satisfy only two principles of design of experiments, i.e. replication and randomization. As
opposed to Two-way ANOVA, which meets all three principles of design of experiments which are replication,
randomization, and local control.
QUESTION :What is sampling? methods of probability sampling and non probability sampling?
ANS : Probability sampling means that every member of the population has a chance of being selected. It is mainly used
in quantitative research. If you want to produce results that are representative of the whole population, probability sampling
techniques are the most valid choice.
There are four main types of probability sample

4
.1. Simple random sampling: In a simple random sample, every member of the population has an equal chance of being selected.
Your sampling frame should include the whole population.To conduct this type of sampling, you can use tools like random number
generators or other techniques that are based entirely on chance.
2. Systematic sampling Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct. Every
member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular
intervals. For example, if the HR database groups employees by team, and team members are listed in order of seniority, there is a risk
that your interval might skip over people in junior roles, resulting in a sample that is skewed towards senior employees.
3. Stratified sampling Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It
allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the sample.To use this sampling
method, you divide the population into subgroups (called strata) based on the relevant characteristic (e.g., gender identity, age range,
income bracket, job role).Based on the overall proportions of the population, you calculate how many people should be sampled from
each subgroup. Then you use random or systematic sampling to select a sample from each subgroup.
4. Cluster sampling: Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups.If it is
practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also
sample individuals from within each cluster using one of the techniques above. This is called multistage sampling.
Non-probability sampling methods
In a non-probability sample, individuals are selected based on non-random criteria, and not every individual has a chance of being
included.This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias. That means the inferences you
can make about the population are weaker than with probability samples, and your conclusions may be more limited. If you use a non-
probability sample, you should still aim to make it as representative of the population as possible.
Non probability sampling
1. Convenience sampling :A convenience sample simply includes the individuals who happen to be most accessible to the
researcher.This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the
population, so it can’t produce generalizable results. Convenience samples are at risk for both sampling bias and selection bias.
3. Purposive sampling : This type of sampling, also known as judgement sampling, involves the researcher using their expertise to
select a sample that is most useful to the purposes of the research.It is often used in qualitative research, where the researcher wants to
gain detailed knowledge about a specific phenomenon rather than make statistical inferences, or where the population is very small
and specific. An effective purposive sample must have clear criteria and rationale for inclusion. Always make sure to describe your
inclusion and exclusion criteria and beware of observer bias affecting your arguments.
4. Snowball sampling : If the population is hard to access, snowball sampling can be used to recruit participants via other
participants. The number of people you have access to “snowballs” as you get in contact with more people. The downside here is also
representativeness, as you have no way of knowing how representative your sample is due to the reliance on participants recruiting
others. This can lead to sampling bias.
5. Quota sampling : Quota sampling relies on the non-random selection of a predetermined number or proportion of units. This is
called a quota.You first divide the population into mutually exclusive subgroups (called strata) and then recruit sample units until you
reach your quota. These units share specific characteristics, determined by you prior to forming your strata. The aim of quota sampling
is to control what or who makes up your sample.
Question : define simple random sampling and state its properties.
Ans : A simple random sample is a randomly selected subset of a population. In this sampling method, each member of the population
has an exactly equal chance of being selected.This method is the most straightforward of all the probability sampling methods, since it
only involves a single random selection and requires little advance knowledge about the population.there are two important
properties of a sampling distribution of the mean:
1. Sampling distribution’s mean (μ¯X) = Population mean (μ)

5
2. Sampling distribution’s standard deviation (Standard error) = σ√n, where σ is the population’s standard deviation and n is the
sample size
3. Question :mention principle of stratification in statistics
PRINCIPLES OF STRATIFICATION The principles to be kept in mind while stratifying a population are given below:
1,The strata should not be overlapping and should together comprise the whole population
2.The strata should be homogeneous within themselves and heterogeneous between themselves with respect to characteristic under
study.
3.If a investigator is facing difficulties in stratifying a population with respect to the characteristic under study, then he/she has to
consider the administrative convenience as the basis for stratification.
4.If the limit of precision is given for certain sub-population then it should be treated as stratum.
Question : define multiple linear regression model and write down its assumptions
Ans:Multiple linear regression models are a type of regression model that deals with one dependent variable and several independent
variables. Regression analysis is a statistical method or technique used for determining relationships between variables that have a
cause-and-effect relationship. Regressions can also reveal how close and well one can determine a relationship.
Assumptions
The calculation of Multiple linear regression requires several assumptions, and a few of them are as follows:
1.Linearity : One can model the linear (straight-line) relationship between Y and the X’s using multiple regression. Any curvilinear
relationship is not taken into account. This can be analyzed by scatter plots on the primary stages. At the same time, non-linear
patterns may be found in the residual plots.
2.Constant variance : For all values of the X’s, the variance of the ε is constant. To detect this, the residual plots of X’s can be used.
It is also easy to assume constant variance if the residual plots have a rectangular shape. In addition, non-constant variance exists and
must be addressed if a residual plot reveals a changing wedge shape.
3.Special Occasions : The presumption is that the data is eliminated from all special clauses resulting from one-time events.
Accordingly, the regression model may have non-constant variance, non-normality, or other issues if they don’t.
4.Normality : When one uses hypothesis tests and confidence limits, the assumption is that there is a normal distribution of ε’s.
5.Multi co-linearity : The presence of near-linear connections among the set of independent variables is co-linearity or multi-co-
linearity. Here, since multi-co-linearity causes plenty of difficulties with regression analysis, the assumption is that the data isn’t
multi-co-linear.
Question: what is equation of regression line? Which predictors are significant?
Ans: A regression line depicts the relationship between two variables. It is applied in scenarios where the change in the value of the
independent variable causes changes in the value of the dependent variable.
Question : explain the prerequisites of testing hypothesis
In the field of statistics, a hypothesis is a claim about some aspect of a population. A hypothesis test allows us to test the claim about
the population and find out how likely it is to be true. The hypothesis test consists of several components; two statements, the null
hypothesis and the alternative hypothesis, the test statistic and the critical value, which in turn give us the P-value and the rejection

6
region (𝛼), respectively. The null hypothesis, denoted as 𝐻0 is the statement that the value of the parameter is, in fact, equal to the
claimed value. We assume that the null hypothesis is true until we prove that it is not. The alternative hypothesis, denoted as 𝐻1 is the
statement that the value of the parameter differs in some way from the null hypothesis. The alternative hypothesis can use the symbols
, 𝑜𝑟 ≠. The test statistic is the tool we use to decide whether or not to reject the null hypothesis. It is obtained by taking the observed
value (the sample statistic) and converting it into a standard score under the assumption that the null hypothesis is true. The P-value
for any given hypothesis test is the probability of getting a sample statistic at least as extreme as the observed value. That is to say, it is
the area to the left or right of the test statistic. The critical value is the standard score that separates the rejection region (𝛼) from the
rest of a given curve.
Question : explain the situation where the null hypothesis is true
Ans: : Failing to Reject the Null Hypothesis : Conversely, when the p-value is greater than your significance level, you fail to reject
the null hypothesis. The sample data provides insufficient data to conclude that the effect exists in the population. When the p-value
is high, the null must fly!
Question : define statistical quality control and its significance in manufacturing.
Ans: Statistical Quality Control refers to the use of data and statistical analysis in order to identify the reasons for the variations in
quality in industrial processes such as manufacturing.
Advantages of Statistical Quality Control:
1. It helps in the control, maintainance and improvement of the quality standards.
2. Trying to identify assignable causes of variation can help us find out the sources of many production errors. It can also help us to
improve the production process by reducing the source of errors.
3. Since it is a systematic method, we do not constantly need to keep making changes. There are fixed criteria to tell us when we
should take remedial action to correct the production process.
4. If the process in control is not giving good enough results, then the quality control standards can be updated to bring them up to
the mark.
5. It allows us to be certain of the quality of the end product as long as the process is in control. When buying from a particular
supplier if the previous lots are good then we can be sure of the quality of the new lots as long as the quality control process
continues.
6. Sometimes the testing process is destructive. For example, this happens when checking the lifetimes of bulbs. Since we cannot test
the entire lot we can only check a sample. The SQC technique assures us of the quality of the entire lot on the basis of the sample.
7. The SQC process is efficient in the sense that it reduces the inspection cost by finding the optimal size of the sample that need to
be randomly tested.
Question : why is chi square called a nonparametric approach?
The chi-square test is one of the most important non-parametric statistics that can be used to determine whether observed frequencies
are significantly different from expected frequencies. It is a non-parametric statistics because it involves no assumption regarding the
normally of distribution or homogeneity of the variance.
QUESTIONS : mention the preconditions of child square test
Ans : Conditions for the Validity of Chi-Square Test: The Chi-square test statistic can be used if the following conditions are satisfied:
1. N, the total frequency, should be reasonably large, say greater than 50.
2. The sample observations should be independent. This implies that no individual item should be included twice or more in the
sample.

7
3. The constraints on the cell frequencies, if any, should be linear (i.e., they should not involve square and higher powers of the
frequencies) such as ∑fo = ∑fe = N.
4. No theoretical frequency should be small. Small is a relative term. Preferably each theoretical frequency should be larger than 10
but in any case not less than 5.If any theoretical frequency is less than 5 then we cannot apply χ2 -test as such. In that case we use the
technique of “pooling” which consists in adding the frequencies which are less than 5 with the preceding or succeeding frequency
(frequencies) so that the resulting sum is greater than 5 and adjust for the degrees of freedom accordingly.
5. The given distribution should not be replaced by relative frequencies or proportions but the data should be given in original units.
6. Yates’ correction should be applied in special circumstances when df = 1 (i.e. in 2 x 2 tables) and when the cell entries are small.
7. χ2-test is mostly used as a non-directional test (i.e. we make a two-tailed test.). However, there may be cases when χ2 tests can be
employed in making a one-tailed test.In one-tailed test we double the P-value. For example with df = 1, the critical value of χ2 at 05
level is 2.706 (2.706 is the value written under. 10 level) and the critical value of; χ2 at .01 level is 5.412 (the value is written under
the .02 level).
Question: Explain the following terms with examples
 Equality likely event
 Sure event
 Mutually exclusive event exhaustive event
 Independent event
Ans : Impossible and Sure Events : If the probability of occurrence of an event is 0, such an event is called an impossible
event and if the probability of occurrence of an event is 1, it is called a sure event. In other words, the empty set ϕ is an
impossible event and the sample space S is a sure event.
Independent Events and Dependent Events : If the occurrence of any event is completely unaffected by the occurrence of any other
event, such events are known as an independent event in probability and the events which are affected by other events are known
as dependent events.
Mutually Exclusive Events : If the occurrence of one event excludes the occurrence of another event, such events are
mutually exclusive events i.e. two events don’t have any common point. For example, if S = {1 , 2 , 3 , 4 , 5 , 6} and E1, E2 are two
events such that E1 consists of numbers less than 3 and E2 consists of numbers greater than 4.
So, E1 = {1,2} and E2 = {5,6} .Then, E1 and E2 are mutually exclusive.
Exhaustive Events : A set of events is called exhaustive if all the events together consume the entire sample space.5. Equally likely
EventsWhen the outcomes of an experiment are equally likely to happen, they are called equally likely events. Like during a coin toss you
are equally likely to get heads or tails.
Question : what are the different approaches of defining probability
Ans: Approaches There are three ways to assign probabilities to events: classical approach, relative-frequency approach, subjective
approach.

8
Classical Approach : If an experiment has n simple outcomes, this method would assign a probability of 1/n to each outcome. In
other words, each outcome is assumed to have an equal probability of occurrence. This method is also called the axiomatic approach.
Example 1: Roll of a Die S = {1, 2, · · · , 6} Probabilities: Each simple event has a 1/6 chance of occurring. Example 2: Two Rolls of
a Die S = {(1, 1), (1, 2), · · · , (6, 6)} Assumption: The two rolls are “independent.” Probabilities: Each simple event has a (1/6) · (1/6)
= 1/36 chance of occurring. 5
Relative-Frequency Approach: Probabilities are assigned on the basis of experimentation or historical data. Formally, Let A be an
event of interest, and assume that you have performed the same experiment n times so that n is the number of times A could have
occurred. Further, let nA be the number of times that A did occur. Now, consider the relative frequency nA/n. Then, in this method,
we “attempt” to define P(A) as: P(A) = lim n→∞ nA n . The above can only be viewed as an attempt because it is not physically
feasible to repeat an experiment an infinite number of times. Another important issue with this definition is that two sets of n
experiments will typically result in two different ratios. However, we expect the discrepancy to converge to 0 for large n. Hence, for
large n, the ratio nA/n may be taken as a reasonable approximation for P(A).
Subjective approach Subjective research is generally referred to as phenomenological research. This is because it is concerned with
the study of experiences from the perspective of an individual, and emphasises the importance of personal perspectives and
interpretations.
Question : define poisson distributions and its example
Ans : Poisson Distribution Definition The Poisson distribution is a discrete probability function that means the variable can only take
specific values in a given list of numbers, probably infinite. A Poisson distribution measures how many times an event is likely to
occur within “x” period of time.
Poisson Distribution Examples An example to find the probability using the Poisson distribution is given below:
Example 1:A random variable X has a Poisson distribution with parameter λ such that P (X = 1) = (0.2) P (X = 2). Find P (X = 0).
Solution: For the Poisson distribution, the probability function is defined as:
P (X =x) = (e– λ
λx
)/x!, where λ is a parameter.
Given that, P (x = 1) = (0.2) P (X = 2)
(e– λ
λ1
)/1! = (0.2)(e– λ
λ2
)/2!
⇒λ = λ2
/ 10
⇒λ = 10
Now, substitute λ = 10, in the formula, we get:
P (X =0 ) = (e– λ
λ0
)/0!

9
P (X =0) = e-10
= 0.0000454
Thus, P (X= 0) = 0.0000454
Question : mention important characteristics of normal distribution
Ans: Characteristics of a Normal Distribution : In our earlier discussion of descriptive statistics, we introduced the mean as a
measure of central tendency and variance and standard deviation as measures of variability. We can now use these parameters to
answer questions related to probability.For a normally distributed variable in a population the mean is the best measure of central
tendency, and the standard deviation(s) provides a measure of variability.
The notation for a sample from a population is slightly different:
We can use the mean and standard deviation to get a handle on probability. It turns out that, as demonstrated in the figure below,
 Approximately 68% of values in the distribution are within 1 SD of the mean, i.e., above or below.
P (µ - σ < X < µ + σ) = 0.68
 Approximately 95% of values in the distribution are within 2 SD of the mean.
P (µ - 2σ < X < µ + 2σ) = 0.95
 Approximately 99% of values in the distribution are within 3 SD of the mean.
P (µ - 3σ < X < µ + 3σ) = 0.99
There are many variables that are normally distributed and can be modeled based on the mean and standard deviation. For example,
 BMI: µ=25.5, σ=4.0
 Systolic BP: µ=133, σ=22.5
 Birth Wgt. (gms) µ=3300, σ=500
 Birth Wgt. (lbs.) µ=7.3, σ=1.1
The ability to address probability is complicated by having many distributions with different means and different standard deviations.
The solution to this problem is to project these distributions onto a standard normal distribution that will make it easy to compute
probabilities.
Question : state central limit theorem

10
Ans: The central limit theorem, which is a statistical theory, states that when a large sample size has a finite variance, the samples will
be normally distributed, and the mean of samples will be approximately equal to the mean of the whole population.
Question : define estimation ,interval estimate and sampling distribution
Ans: Estimation :Estimation is the process of determining a likely value for a population parameter (eg, the true population mean or
proportion) based on a random sample.
Interval estimation Interval estimation (or set estimation) is a kind of statistical inference in which we search for an interval of values
that contains the true parameter with high probability. Such an interval is called a confidence interval.
sampling distribution A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of
a specific population.
Questions : statistical hypothesis,level of significance and steps of hypothesis testing
Ans : Statistical hypothesis: A statement about the nature of a population. It is often stated in terms of a population parameter. Null
hypothesis: A statistical hypothesis that is to be tested. Alternative hypothesis: The alternative to the null hypothesis. Test statistic: A
function of the sample data.
The level of significance :The level of significance is defined as the fixed probability of wrong elimination of null hypothesis when in
fact, it is true. The level of significance is stated to be the probability of type I error and is preset by the researcher with the outcomes
of error.
Hypothesis Testing
There are 5 main steps in hypothesis testing:
1. State your research hypothesis as a null hypothesis and alternate hypothesis (Ho) and (Ha or H1).
2. Collect data in a way designed to test the hypothesis.
3. Perform an appropriate statistical test.
4. Decide whether to reject or fail to reject your null hypothesis.
5. Present the findings in your results and discussion section.
Step 1: State your null and alternate hypothesis : After developing your initial research hypothesis (the prediction that you want to
investigate), it is important to restate it as a null (Ho) and alternate (Ha) hypothesis so that you can test it mathematically.The alternate
hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no
relationship between the variables you are interested in.

11
Step 2: Collect data : For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to
test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are
interested in.
Step 3: Perform a statistical test : There are a variety of statistical tests available, but they are all based on the comparison of within-
group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from
one another)
Step 4: Decide whether to reject or fail to reject your null hypothesis: Based on the outcome of your statistical test, you will have
to decide whether to reject or fail to reject your null hypothesis.In most cases you will use the p-value generated by your statistical
test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 –
that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.
Step 5: Present your findings
The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.
Question : define conditional probability
Ans: Conditional probability is The probability of occurrence of any event A when another event B in relation to A has already
occurred is known as conditional probability.
Question : define Mathematical expectation
Ans: Mathematical expectation, also known as the expected value, is the summation or integration of a possible values from a random
variable. It is also known as the product of the probability of an event occurring, denoted P(x), and the value corresponding with the
actual observed occurrence of the event. The binomial distribution describes the behavior of a count variable X if the following
conditions apply:
Question : conditions of binomial distribution.
Ans: The binomial distribution describes the behavior of a count variable X if the following conditions apply:
1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes ("success" or "failure").
4: The probability of "success" p is the same for each outcome.
Question :properties of normal distribution
Properties
All forms of (normal) distribution share the following characteristics:

12
1. It is symmetric : A normal distribution comes with a perfectly symmetrical shape. This means that the distribution curve can be
divided in the middle to produce two equal halves. The symmetric shape occurs when one-half of the observations fall on each side of
the curve.
2. The mean, median, and mode are equal :The middle point of a normal distribution is the point with the maximum frequency,
which means that it possesses the most observations of the variable. The midpoint is also the point where these three measures fall.
The measures are usually equal in a perfectly (normal) distribution.
3. Empirical rule: In normally distributed data, there is a constant proportion of distance lying under the curve between the mean and
specific number of standard deviations from the mean
4. Skewness and kurtosis : Skewness and kurtosis are coefficients that measure how different a distribution is from a normal
distribution. Skewness measures the symmetry of a normal distribution while kurtosis measures the thickness of the tail ends relative
to the tails of a normal distribution.
Question : deifine probability,event and experiment
Ans: probability : The probability is the measure of the likelihood of an event to happen. It measures the certainty of the event. The
formula for probability is given by; P(E) = Number of Favourable Outcomes/Number of total outcomes.
Event : In probability theory, an event is an outcome or defined collection of outcomes of a random experiment. Since the collection
of all possible outcomes to a random experiment is called the sample space, another definiton of event is any subset of a sample space.
Experience : Statistical experiments are designed to compare the outcomes of applying one or more treatments to experimental units,
then comparing the results to a control group that does not receive a treatment
Question :define Sampling and its size
Ans:The sample size is defined as the number of observations used for determining the estimations of a given population. The size of
the sample has been drawn from the population. Sampling is the process of selection of a subset of individuals from the population to
estimate the characteristics of the whole population.
Question: Assumption of anova
Ans:Assumptions of the Factorial ANOVA
The factorial ANOVA has a several assumptions that need to be fulfilled – (1) interval data of the dependent variable, (2) normality,
(3) homoscedasticity, and (4) no multicollinearity. Furthermore similar to all tests that are based on variation (e.g. t-test, regression
analysis, and correlation analyses) the quality of results is stronger when the sample contains a lot of variation – i.e., the variation is
unrestricted and not truncated.Firstly, the factorial ANOVA requires the dependent variable in the analysis to be of metric
measurement level (that is ratio or interval data) the independent variables can be nominal or better. If the independent variables are
not nominal or ordinal they need to be grouped first before the factorial ANOVA can be done.Secondly, the factorial analysis of
variance assumes that the dependent variable approximates a multivariate normal distribution. The assumption needs can be verified
by checking graphically (either a histogram with normal distribution curve, or with a Q-Q-Plot) or tested with a goodness of fit test
against normal distribution (Chi-Square or Kolmogorov-Smirnov test, the later being preferable for interval or ratio scaled data).

13
Questions : basic principle of sample survey
Ans : Target population :The total (finite) population of individuals about which we require information; issues such as content,
location, and time may need to be considered.
Study Variables:These are the aspects of the population that we wish to measure, and will define the type(s) of measurement to be
taken; usually some aggregate feature of the population, e.g. total amount of debt, average age, proportion of target population that
have more than one car.
Sampling Units:Entities that could potentially be included in the sample. May not necessarily be the same as the individuals in the
population. For example, could take a sample of addresses in a certain area in order to gain access to information on families that may
be living there: thus theaddresses are the sampling units, whereas the families are the individuals in the populationthat we are really
interested in.
Sampling Frame: This consists of the set of all sampling units.
Selection Process :In general, we do not examine the entire population, but look at a sample, bearing in mind considerations of
accuracy, speed, cost etc. Moreover, we may have a huge population or destructive measurements (e.g. measuring lifetime of batteries,
cooking time of a packaged meal etc.): thus selection of survey material needs to be considered carefully.
Questions : Main steps involved in a sample survey
Ans:1. Stage 1: Clearly Define Target Population : The first stage in the sampling process is to clearly define target population.
Population is commonly related to the number of people living in a particular country, or in particular, a group or number of elements
that researcher plans to study among.
2. Stage2: Select Sampling Frame : A sample frame is drawn from the identified population/target population. A sampling frame is a
list of the actual cases from which sample will be drawn. Thus, the sampling frame must be representative of the population. In brief,
sampling frame is a list of all the items in the research population.
3. Stage 3: Choose Sampling Technique : Prior to examining the various types of sampling method, it is worth noting what is meant
by sampling, along with reasons why researchers are likely to select a sample. Taking a subset from chosen sampling frame or entire
population is called sampling. In general, sampling techniques can be divided into two types:
 Probability or random sampling
 Non- probability or non- random sampling
Before choosing specific type of sampling technique, it is needed to decide broad sampling technique. Figure 1 shows the various types
of sampling techniques.
Figure 1. Sampling techniques

14
4. Stage 4: Determine Sample Size : There are numerous approaches, incorporating a number of different formulas, for calculating
the sample size for categorical data.
n= p (100-p)z2
/E2
n is the required sample size
P is the percentage occurrence of a state or condition
E is the percentage maximum error required
Z is the value corresponding to level of confidence required
There are two key factors to this formula[1]
. First, there are considerations relating to the estimation of the levels of precision and risk
that the researcher is willing to accept:
5. Stage 5: Collect Data :Once target population, sampling frame, sampling technique and sample size have been established, the next
step is to collect data.
6. Stage 6: Assess Response Rate
Response rate is the number of cases agreeing to take part in the study. These cases are taken from original sample. In reality, most
researchers never achieve a 100 percent response rate. Reasons for this might include refusal to respond, ineligibility to respond, inability
to respond, or the respondent has been located but researchers are unable to make contact.
Question:define control chart : The control chart is a graph used to study how a process changes over time. Data are plotted in time
order. A control chart always has a central line for the average, an upper line for the upper control limit, and a lower line for the lower
control limit. These lines are determined from historical data.
Question: define sample space and random variable
Ans : A set of all possible outcomes of an experiment is called a sample space. We shall denote the sample space by S. Hence, that the
sample space is simply the set of all possible sample points of a given experiment.A random variable is a variable whose value is
unknown or a function that assigns values to each of an experiment's outcomes.
Question : state and Prove the addition law of probabilities for two events
Ans:Addition Theorem on Probability: If two occurrences are denoted by the letters A and B, then the probability that at least one of
the events will occur can be calculated as follows: P(AUB) = P(A) + P(B) – P(AB).
Proof: Considering that occurrences are nothing more than sets,The following is derived from set theory:
n(AUB) = n(A) + n(B)- n(A∩B).
When the above equation is divided by n(S), we get: (where S is the sample space)
n(AUB)/ n(S) = n(A)/ n(S) + n(B)/ n(S)- n(A∩B)/ n (S)
Therefore, according to the accepted definition of probability,
P(AUB) is equal to P(A) + P(B) – P(AB).

15
Question : define sample,population,cencus
Ans: Sample : A sample refers to a smaller, manageable version of a larger group. It is a subset containing the characteristics of a
larger population.
Population: In statistics, a population is the pool of individuals from which a statistical sample is drawn for a study.
Cencus: A census is a study of every unit, everyone or everything, in a population.
Question : define simple random sampling
A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member
has an equal probability of being chosen.
Advantages of simple random sampling
1. Ensures equal chance of selection : The main advantage of SRS is that it ensures that each member of the population has an equal
chance of being selected, which leads to a representative sample of the population. This is important because it allows for accurate
estimation of population characteristics and unbiased inference.
2. Easy to understand and implement : Another advantage of SRS is that it is easy to understand and implement. This makes it a
popular choice for researchers who are new to sampling or who have limited resources.
3. Versatile method: Additionally, SRS is a versatile method that can be used for both large and small populations, and it can be used
to sample from both homogeneous and heterogeneous populations.
Disadvantages of simple random sampling
1. Time-consuming : One of the main disadvantages of SRS is that it can be time-consuming and costly to identify and contact every
member of the population. This is especially true for large populations or populations that are spread out geographically.
2. Difficult to achieve a good sample : Another disadvantage of SRS is that it can be difficult to achieve a good sample size, especially
for small populations. This can lead to low precision and unreliable estimates.
3. Do not consider heterogeneous population : Another disadvantage is that SRS assumes that the population is homogeneous.
However, if the population is heterogeneous, the sample may not be representative of the population.
4. Difficult to achieve a good sample size : Another potential disadvantage of SRS is that it can be difficult to achieve a good sample
size, especially for small populations. This can lead to low precision and unreliable estimates.
5. Difficult to obtain a response : Another disadvantage is that it can be difficult to obtain a response from every member of the
population. This can lead to non-response bias and make it difficult to estimate population characteristics.
6. Affected by non-response bias : Another limitation of SRS is that it can be affected by non-response bias. This occurs when some
individuals in the population are not included in the sample because they cannot or will not respond to the survey.
Question : definition of alternative hypothesis and null hypothesis

16
Ans:The null-hypothesis is considered an accepted truth. It assumes that the research is false, that the observations are caused by
random factors. Researchers must prove the null-hypothesis wrong to prove their alternate hypothesis.
Question : what do you mean by test of significance ?important steps in test of significance
The alternative hypothesis (Ha) is the other answer to your research question. It claims that there's an effect in the population. Often,
your alternative hypothesis is the same as your research hypothesis.In Statistics, tests of significance are the method of reaching a
conclusion to reject or support the claims based on sample data. In the process of testing for statistical significance, there are the
following steps:
1. Stating a Hypothesis for Research
2. Stating a Null Hypothesis
3. Selecting a Probability of Error Level
4. Selecting and Computing a Statistical Significance Test
5. Interpreting the results
Question : state the assumptions of hypothesis and steps of testing a hypothesis
Ans: ASSUMPTIONS OF STATISTICAL HYPOTHESIS TESTING
Statistical hypothesis testing requires several assumptions. These assumptions include considerations of the level of measurement of
the variable, the method of sampling, the shape of the population distribution, and the sample size. The specific assumptions may
vary, depending on the test or the conditions of testing. The test we are considering meets these conditions: 1. The sample of
California gas stations was randomly selected. 2. The variable price per gallon is measured at the interval-ratio level. 3. We cannot
assume that the population is normally distributed. However, because our sample size is sufficiently large (N > 50), we know, based
on the central limit theorem, that the sampling distribution of the mean will be approximately normal.
What Are The Stages of Hypothesis Testing?
To successfully confirm or refute an assumption, the researcher goes through five (5) stages of hypothesis testing;
1. Determine the null hypothesis
2. Specify the alternative hypothesis
3. Set the significance level
4. Calculate the test statistics and corresponding P-value
5. Draw your conclusion
 Determine the Null Hypothesis : Like we mentioned earlier, hypothesis testing starts with creating a null hypothesis which
stands as an assumption that a certain statement is false or implausible. For example, the null hypothesis (H0) could
suggest that different subgroups in the research population react to a variable in the same way.
 Specify the Alternative Hypothesis : Once you know the variables for the null hypothesis, the next step is to determine
the alternative hypothesis. The alternative hypothesis counters the null assumption by suggesting the statement or
assertion is true. Depending on the purpose of your research, the alternative hypothesis can be one-sided or two-sided.
 Set the Significance Level : Many researchers create a 5% allowance for accepting the value of an alternative hypothesis,
even if the value is untrue. This means that there is a 0.05 chance that one would go with the value of the alternative
hypothesis, despite the truth of the null hypothesis.
.
 Calculate the Test Statistics and Corresponding P-Value : Test statistics in hypothesis testing allow you to compare
different groups between variables while the p-value accounts for the probability of obtaining sample statistics if your null
hypothesis is true. In this case, your test statistics can be the mean, median and similar parameters.

17
 Draw Your Conclusions : After conducting a series of tests, you should be able to agree or refute the hypothesis based on
feedback and insights from your sample data.
Objectives of Anova Question:
Ans: : 12-1: Understand differences between the single-factor research design and factorial research designs.
12-2: Understand advantages and disadvantages of factorial research designs compared with the single-factor research design.
12-3: Define an interaction effect between independent variables.
12-4: Identify the presence or absence of an interaction effect between independent variables.
12-5: Understand the relationship between main effects and interaction effects.Inferential statistics: Two-way analysis of variance
(ANOVA)
12-6: Understand what the letters A and B represent in an “A x B factorial research design”.
12-7: Understand what between-group variance and within-group variance are comprised of in the two-factor research design.
12-8: Understand the three sets of statistical hypotheses in a two-factor research design.
12-9: Understand why is the word ‘Error’ is included in an ANOVA summary table.
12-10: Calculate and interpret between-group variance, within-group variance, and the F-ratios (F) for the two-way ANOVA, create
an ANOVA summary table, and calculate measures of effect size (R2).Investigating a significant A x B interaction effect: Analysis of
simple effects
12-11: Understand what conclusion can be drawn (and not drawn) from a significant A x B interaction.
12-12: Understand the difference between a main effect and a simple effect.
12-13: Understand the purpose of analyzing simple effects in the A x B research design.
Question: one way anova ,layout and find anova table
Ans: : A two-way ANOVA test is a statistical analysis tool that determines the effect of two variables on an outcome, as well as testing
how altering the variables will affect the outcome.A one-way layout consists of a single factor with several levels and multiple
observations at each level. With this kind of layout we can calculate the mean of the observations within each level of our factor. The
residuals will tell us about the variation within each level. We can also average the means of each level to obtain a grand mean. We
can then look at the deviation of the mean of each level from the grand mean to understand something about the level effects. Finally,
we can compare the variation within levels to the variation across levels. Hence the name analysis of variance.One-Way ANOVA
("analysis of variance") compares the means of two or more independent groups in order to determine whether there is statistical
evidence that the associated population means are significantly different. One-Way ANOVA is a parametric test. This test is also
known as: One-Factor ANOVA.
Question: define child square distribution and application
Ans: A chi-square distribution is a continuous probability distribution. The shape of a chi-square distribution depends on its degrees of
freedom, k. The mean of a chi-square distribution is equal to its degrees of freedom (k) and the variance is 2k.
Applications of chi-square distribution
 To test the variance of the normal population, using the statistic in note (ii)
 To test the independence of attributes.

18
 To test the goodness of fit of a distribution.
 The sampling distributions of the test statistics used in the last two applications are approximately chi-square distributions.
Question: Determining UCL & LCL of control chart
Ans: The upper control limit is calculated from the data that is plotted on the control chart. It is placed 3 sigma (of the data being
plotted) away from the average line.The upper control limit is used to mark the point beyond which a sample value is considered a
special cause of variation. It is also used to define the upper limit of the common cause variation. On a control chart, the lower control
limit is a line below the centerline that indicates the number below which any individual data point would be considered out
of statistical control due to special cause variation.
Question: why x bar and are charged should be used simultaneously? find the control limit of x bar and Sigma charts when
standards are not given
X-bar and R charts must be analyzed together because they provide complementary information about the process. The X-bar chart
monitors the process mean, while the R chart monitors the process variation.The X bar chart control limits are derived from the values
of S bar (average standard deviation). If the values are out of control in the S chart, the X bar chart control limits are inaccurate. If the
points are out of control in the S chart, then stop the process. Identify the special cause and address the issue.
Question : explain the justification for using the three Sigma limits in control chart
Control limits on a control chart are commonly drawn at 3σ from the center line because 3-sigma limits are a good balance point
between two types of errors:Type I or alpha errors occur when a point falls outside the control limits even though no special cause is
operating. The result is a witch-hunt for special causes and adjustment of things here and there. The tampering usually distorts a
stable process as well as wasting time and energy.Type II or beta errors occur when you miss a special cause because the chart isn’t
sensitive enough to detect it. In this case, you will go along unaware that the problem exists and thus unable to root it out.
Question: define type 1 and type 2 errors, define size of a test, define power of a test,Z test
Ans: A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the population; a type II
error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the population.
The size of a test is the probability of incorrectly rejecting the null hypothesis if it is true. The power of a test is the probability of
correctly rejecting the null hypothesis if it is false.
The power of a test is the probability of rejecting the null hypothesis when it is false; in other words, it is the probability of avoiding a
type II error.
A z-test is a statistical test to determine whether two population means are different when the variances are known and the sample size
is large. A z-test is a hypothesis test in which the z-statistic follows a normal distribution. A z-statistic, or z-score, is a number
representing the result from the z-test.
Question :when T test is used for testing population mean why it is called small sample test
Ans : A small sample is generally regarded as one of size n<30. A t-test is necessary for small samples because their distributions are
not normal. If the sample is large (n>=30) then statistical theory says that the sample mean is normally distributed and a z test for a
single mean can be used.
Various Types of Statistical Software Used in Social Sciences:
1. SPSS (Statistical Package for Social Sciences)
2. Stata

19
3. R
4. SAS (Statistical Analysis Software)
5. MATLAB (MATrix LABoratory)
1. SPSS (Statistical Package for Social Sciences)
 SPSS is the most widely used powerful software for complex statistical data analysis.
 It easily compiles descriptive statistics, parametric and non- parametric analysis as well as delivers graphs and
presentation ready reports to easily communicate the results.
 More accurate reports are achieved here through estimation and uncovering of missing values in the data sets.
 SPSS is used for quantitative data analysis.
2. Stata
 Stata is also a widely used software that enables to analyze, manage, store and produce graphical visualization of data.
 Coding knowledge is not necessary to use it.
 Presence of both command line and graphical user interface makes its use more intuitive.
 It is generally used by researchers in the field of economics, social sciences and bio-medicine to examine the data patterns.
 Stata is used for quantitative data analysis.
Question :relation among economics statistics and accounting
Ans:
 1.Economics- economics is use for making rational dicisions . it is study of how scarce resources are allocated to satisfy
unlimited wants of human being. Accounting is system which provides data that is helpful for judgement & decisions in
economics
 2. Statistic- In accounting various financial ratios are based on accounting ,statistics is used for making cost or price
estimation in accounting . statistics is used for long term view in accounting.statistic is concerned with typical value,
behaviour, trend.statistical method is applied when need arises for broad classification
Question : Differences Between Type I and Type II Error
The points given below are substantial so far as the differences between type I and type II error is concerned:
1. Type I error is an error that takes place when the outcome is a rejection of null hypothesis which is, in fact, true. Type II error
occurs when the sample results in the acceptance of null hypothesis, which is actually false.
2. Type I error or otherwise known as false positives, in essence, the positive result is equivalent to the refusal of the null
hypothesis. In contrast, Type II error is also known as false negatives, i.e. negative result, leads to the acceptance of the null
hypothesis.
3. When the null hypothesis is true but mistakenly rejected, it is type I error. As against this, when the null hypothesis is false
but erroneously accepted, it is type II error.
4. Type I error tends to assert something that is not really present, i.e. it is a false hit. On the contrary, type II error fails in
identifying something, that is present, i.e. it is a miss.
5. The probability of committing type I error is the sample as the level of significance. Conversely, the likelihood of committing
type II error is same as the power of the test.
6. Greek letter ‘α’ indicates type I error. Unlike, type II error which is denoted by Greek letter ‘β’.
Question: Differences Between One-tailed and Two-tailed Test
The fundamental differences between one-tailed and two-tailed test, is explained below in points:
1. One-tailed test, as the name suggest is the statistical hypothesis test, in which the alternative hypothesis has a single end. On
the other hand, two-tailed test implies the hypothesis test; wherein the alternative hypothesis has dual ends.

20
2. In the one-tailed test, the alternative hypothesis is represented directionally. Conversely, the two-tailed test is a non-
directional hypothesis test.
3. In a one-tailed test, the region of rejection is either on the left or right of the sampling distribution. On the contrary, the region
of rejection is on both the sides of the sampling distribution.
4. A one-tailed test is used to ascertain if there is any relationship between variables in a single direction, i.e. left or right. As
against this, the two-tailed test is used to identify whether or not there is any relationship between variables in either
direction.
5. In a one-tailed test, the test parameter calculated is more or less than the critical value. Unlike, two-tailed test, the result
obtained is within or outside critical value.

ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS.docx

Recommended

Recommended

More Related Content

Similar to ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS.docx

Similar to ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS.docx (20)

Recently uploaded

Recently uploaded (20)

ADVANCED STATISTICS PREVIOUS YEAR QUESTIONS.docx