Business Statistics and Research Methodology

Chanderprabhu Jain College of Higher Studies & School of Law
Plot No. OCF, Sector A-8, Narela, New Delhi – 110040
(Affiliated to Guru Gobind Singh Indraprastha University and Approved by Govt of NCT of Delhi & Bar Council of India)
Semester: Second Semester
Subject Code: 106
Name of the Subject:
Business Statistics and Research
methodology

Measure Of Central Tendency
What Is Central Tendency?
A score that indicates where the center of the distribution tends
to be located.
The following are the five measures of average or central
endency that are in common use
(i) Arithmetic average or arithmetic mean or simple mean
(ii) Median
(iii) Mode
(iv) Geometric mean
(v) Harmonic mean

ARITHMETIC MEAN
• To find the arithmetic mean, add the values of
all terms and them divide sum by the number
of terms, the quotient is the arithmetic mean

Median
• If the scores in a distribution are listed in order from smallest to largest,
the median is defined as the midpoint of the list.
• The median divides the scores so that 50% of the scores in the distribution
have values that are equal to or less than the median.
• Computation of the median requires scores that can be placed in rank
order (smallest to largest) and are measured on an ordinal, interval, or
ratio scale.
• Usually, the median can be found by a simple counting procedure:
– 1. With an odd number of scores, list the values in order, and the median is the middle
score in the list.
– 2. With an even number of scores, list the values in order, and the median is half-way
between the middle two scores.

Mode
• The mode is defined as the most frequently
occurring category or score in the distribution.
• In a frequency distribution graph, the mode is
the category or score corresponding to the
peak or high point of the distribution.
• The mode can be determined for data
measured on any scale of measurement:
nominal, ordinal, interval, or ratio.

Geometric Mean
• Given a of a set of n numbers x1, . . . , xn, the
geometric mean is given by the following
formula:

Harmonic Mean
• a. If X1, X2………………Xn are given values then
Harmonic Mean is given by

Measuring Variability
• Variability can be measured with
– The range
– The standard deviation/variance
• In both cases, variability is determined by
measuring distance.

The Range
• The range is the total distance covered by the
distribution, from the highest score to the
lowest score (using the upper and lower real
limits of the range)

The Standard Deviation
• Standard deviation measures the standard (average) distance between a score
and the mean. The calculation of standard deviation can be summarized as a
four-step process:
1.Compute the deviation (distance from the mean) for each score.
2. Square each deviation.
3. Compute the mean of the squared deviations. For a population, this involves
summing the squared deviations (sum of squares, SS) and then dividing by N.
The resulting value is called the variance or mean square and measures the
average squared distance from the mean. For samples, variance is computed
by dividing the sum of the squared deviations (SS) by n - 1, rather than N. The
value, n - 1, is know as degrees of freedom (df) and is used so that the sample
variance will provide an unbiased estimate of the population variance.
4. Finally, take the square root of the variance to obtain the standard deviation.

Sampling: Basic Terms
1. Population
In sampling, a population signifies the units that we are interested in
studying. These units could be people, cases and pieces of data.
2. Sample
The items taken from the population for analysis (for deduction of
hypothesis or arriving at a conclusion) are samples.
3. Sample Size
The sample size is simply the number of units in your sample.
4. Sampling Error
The error that occurs because the samples are not the true
representatives of the population is known as Sampling Error.

SAMPLING
Sampling may be defined as the procedure in which a sample is selected from an
individual or a group of people of certain kind for research purpose. In sampling,
the population is divided into a number of parts called sampling units.
Advantages of sampling
Sampling ensures convenience, collection of intensive and exhaustive data,
suitability in limited resources and better rapport.

DISADVANTAGES OF SAMPLING
Chances of bias
Difficulties in selecting truly a representative sample
Need for subject specific knowledge
changeability of sampling units
impossibility of sampling.

STEPS IN SAMPLING PROCESS
Step 1: Define the population
Step 2 : Identify the sampling frame
Step 3: Specify the sampling unit
Step 4: Specify the sampling method
Step 5: Determine the sample size
Step 6: Specify the sampling plan
Step 7: Select the sample

TYPES OF SAMPLING

PROBABILITY SAMPLING
In probability sampling it is possible to both determine which
sampling units belong to which sample and the probability that each
sample will be selected.
The following sampling methods are examples of probability
sampling:
Simple Random Sampling (SRS)
Stratified Sampling
Cluster Sampling
Systematic Sampling
Multistage Sampling (in which some of the methods above are
combined in stages)

ADVANTAGES
Cluster sampling: convenience and ease of use.
Simple random sampling: creates samples that are highly representative
of the population.
Stratified random sampling: creates strata or layers that are highly
representative of strata or layers in the population.
Systematic sampling: creates samples that are highly representative of the
population, without the need for a random number generator.

Disadvantages
Cluster sampling: might not work well if unit members are
not homogeneous (i.e. if they are different from each other).
Simple random sampling: tedious and time consuming, especially when
creating larger samples.
Stratified random sampling: tedious and time consuming, especially
when creating larger samples.
Systematic sampling: not as random as simple random sampling,

Non-Probability Sampling
1. Convenience Sampling
Convenience sampling is probably the most common of all sampling techniques.
With convenience sampling, the samples are selected because they are accessible to
the researcher. Subjects are chosen simply because they are easy to recruit. This
technique is considered easiest, cheapest and least time consuming.
2.Consecutive Sampling
Consecutive sampling is very similar to convenience sampling except that it seeks to
include ALL accessible subjects as part of the sample. This non-probability sampling
technique can be considered as the best of all non-probability samples because it
includes all subjects that are available that makes the sample a better representation
of the entire population.

3. Quota Sampling
Quota sampling is a non-probability sampling technique wherein the researcher
ensures equal or proportionate representation of subjects depending on which trait is
considered as basis of the quota.
4. Judgmental Sampling
Judgmental sampling is more commonly known as purposive sampling. In this type
of sampling, subjects are chosen to be part of the sample with a specific purpose in
mind. With judgmental sampling, the researcher believes that some subjects are more
fit for the research compared to other individuals. This is the reason why they are
purposively chosen as subjects.

5. Snowball Sampling
Snowball sampling is usually done when there is a very small population
size. In this type of sampling, the researcher asks the initial subject to
identify another potential subject who also meets the criteria of the research.
The downside of using a snowball sample is that it is hardly representative
of the population.

ERRORS
Sampling error is one which occurs due to
unrepresentativeness of the sample selected for observation.
Non-sampling error is an error arise from human error, such as
error in problem identification, method or procedure used, etc.

CENTRAL LIMIT THEOREM
The Central Limit Theorem states that the sampling distribution of the sample means
approaches a normal distribution as the sample size gets larger — no matter what the
shape of the population distribution. This fact holds especially true for sample sizes
over 30. All this is saying is that as you take more samples, especially large ones,
your graph of the sample means will look more like a normal distribution..

Probability
• The probability of an event is the proportion
of times that the event occurs in a large
number of trials of the experiment.
• It is the “long-run relative frequency of the
event.”

Some important terms
• Experiment: Draw a card from a standard deck
of 52. • Sample space: The set of all possible
distinct outcomes, S (e.g., 52 cards).
• Elemenatary event or sample point: a member
of the sample space. (e.g., the ace of hearts).
• Event (or event class): any set of elementary
events. e.g., Suit (Hearts), Color (Red), or
Number (Ace).

• Joint Event is when you consider two (or more
events) at a time. e.g., A =heads on penny, B = heads
on quarter, and joint event is heads on both coins.
• Intersection: ( A ∩ B) = A and B occur at the same
time.
• Union: ( A ∪ B) = A or B occur
– Only A occurs.
– Only B occurs.
– A and B occur.

• Complement of an event is that the event did not occur. A¯ ≡
not A. e.g., if A =red card, then A¯ is a black card (not a red
card).
• Mutually exclusive events are events that cannot occur at the
same time. Events have no elementary events in common.
e.g., A = heart and B = club.
• Mutually exclusive and exhaustive events are a complete
partition of the sample space. e.g.,
– Suits (hearts, diamonds, clubs, spades)
– Numbers (A, 2, 3, 4, 5, 6, 7, 8, 9, J, Q, K)

Bayes Theorem
• P ( A ∩ B) = P (A, B) = P ( A|B ) P ( B )
• P ( A ∩ B) = P (A, B) = P ( B|A ) P ( A )
• Bayes Theorem:
P ( A|B) = (P ( B|A ) P ( A )) / P ( B )

Probability Distributions
• From Hayes: • “Any statement of a function
associating each of a set of mutually exclusive
and exhaustive events with its probability is a
probability distribution”
• “Let X represent a function that associates a
Real number with each and every elementary
event in some sample space S. Then X is called
a random variable on the sample space S.”

Random Variables
• If random variable can only equal a finite
number of values, it is a discrete random
variable. Probability distribution is known as a
“probability mass function”.
• If a random variable can equal an infinite (or
really really large) number of values, then it is
a continuous random variable. Probability
distribution is know as a “probability density
function”.

Characteristics of Distributions
• Discrete or continuous
• Shape
• Central tendency
• Dispersion (variability)

Binomial Distribution B(n,p)
• Consider the independent and identically
distributed random variables X1,…,Xn, which
are the results of n Bernoulli trials. The
number of successes among n trials, which is
the sum of the 0’s and 1’s resulting from the
individual trails
is described by a Binomial distribution and has
the probability

Poisson Distribution P(λt)
• The random variable
X=number of events in an interval of width t
is described by the density

The Normal Distribution N (μ,σ2)

HYPOTHESIS
A hypothesis is a specific, testable prediction. It
describes in concrete terms what you expect will
happen in a certain circumstance.
A hypothesis is used in an experiment to define the
relationship between two variables.

Hypothesis Testing
1. State the hypothesis of importance
2. Conclude the suitable test statistic
3. State the stage of statistical significance
4. State the decision regulation for rejecting / not rejecting the
null hypothesis
5. Collect the data and complete the needed calculations
6. Choose to reject / not reject the null hypothesis

Errors in Testing
It is common to make two types of errors while drawing
conclusions in research:
Type 1: When we recognize the research hypothesis and the
null hypothesis is supposed to be correct.
Type 2: When we refuse the research hypothesis even if the
null hypothesis is incorrect.

Hypothesis Testing Steps
1. Set up a null hypothesis and alternative hypothesis.
2. Decide about the test criterion to be used.
3. Calculate the test statistic using the given values from the sample
4. Find the critical value at the required level of significance and degrees
of freedom.
5. Decide whether to accept or reject the hypothesis. If the calculated test
statistic value is less than the critical value, we accept the hypothesis
otherwise we reject the hypothesis.

Different Types of Hypothesis
1) Simple Hypothesis
If a hypothesis is concerned with the population completely such as
functional form and the parameter, it is called simple hypothesis.
2.) Composite Hypothesis or Multiple Hypothesis
If the hypothesis concerning the population is not explicitly defined
based on the parameters, then it is composite hypothesis or multiple
hypothesis.

Different Types of Hypothesis
3) Parametric Hypothesis
A hypothesis, which specifies only the parameters of the probability density function, is
called parametric hypothesis.
4) Non Parametric Hypothesis
If a hypothesis specifies only the form of the density function in the population, it is called
a non- parametric hypothesis.
5) Null and Alternative Hypothesis
A null hypothesis can be defined as a statistical hypothesis, which is stated for acceptance.
It is the original hypothesis. Any other hypothesis other than null hypothesis is called
Alternative hypothesis. When null hypothesis is rejected we accept the alternative
hypothesis. Null hypothesis is denoted by H0 and alternative hypothesis is denoted by H1.

Level Of Significance
The level of significance is defined as the probability of rejecting a null
hypothesis by the test when it is really true, which is denoted as α. That
is, P (Type I error) = α.
Confidence level:
Confidence level refers to the possibility of a parameter that lies within a
specified range of values, which is denoted as c. Moreover, the
confidence level is connected with the level of significance. The
relationship between level of significance and the confidence level is
c=1−α.

Z-TEST
A z-test is a statistical test used to determine whether two
population means are different when the variances are
known and the sample size is large. The test statistic is
assumed to have a normal distribution, and nuisance
parameters such as standard deviation should be known for
an accurate z-test to be performed.

T-TEST
A t-test is an analysis of two populations means through the use of statistical
examination; a t-test with two samples is commonly used with small sample
sizes, testing the difference between the samples when the variances of
two normal distributions are not known.
A t-test looks at the t-statistic, the t-distribution and degrees of freedom to
determine the probability of difference between populations; the test statistic
in the test is known as the t-statistic. To conduct a test with three or more
variables, an analysis of variance (ANOVA) must be used.

CHI- SQUARE TEST
There are two types of chi-square tests. Both use the chi-square statistic
and distribution for different purposes:
A chi-square goodness of fit test determines if a sample data matches a
population. For more details on this type, see: Goodness of Fit Test.
A chi-square test for independence compares two variables in a
contingency table to see if they are related. In a more general sense, it tests
to see whether distributions of categorical variables differ from each another.
A very small chi square test statistic means that your observed data
fits your expected data extremely well. In other words, there is a
relationship.
A very large chi square test statistic means that the data does not fit
very well. In other words, there isn’t a relationship.

CHI- SQUARE TEST

ANOVA
Analysis of variance (ANOVA) is a collection of statistical models and their
associated estimation procedures (such as the "variation" among and between
groups) used to analyze the differences among group means in a sample. ANOVA
was developed by statistician and evolutionary biologist Ronald Fisher. In the
ANOVA setting, the observed variance in a particular variable is partitioned into
components attributable to different sources of variation. In its simplest form,
ANOVA provides a statistical test of whether the population means of several groups
are equal, and therefore generalizes the t-test to more than two grou ps. ANOVA is
useful for comparing (testing) three or more group means for statistical significance.
It is conceptually similar to multiple two-sample t-tests, but is more conservative
(results in less type I error) and is therefore suited to a wide range of practical
problems.
To conduct a test with three or more variables, an analysis of variance
(ANOVA) must be used.

THEORY OF
ESTIMATION
Estimation theory is a branch of statistics that deals with estimating the
values of parameters based on measured empirical data that has a random
component. The parameters describe an underlying physical setting in such
a way that their value affects the distribution of the measured data.
An estimator attempts to approximate the unknown parameters using the
measurements. When the data consist of multiple variables and one is
estimating the relationship between them, estimation is known as regression
analysis.
In estimation theory, two approaches are generally considered.

POINT ESTIMATION
In statistics, point estimation involves the use of sample data to calculate a
single value (known as a point estimate or statistic) which is to serve as a
"best guess" or "best estimate" of an unknown population parameter (for
example, the population mean).
More formally, it is the application of a point estimator to the data to obtain a
point estimate.

INTERVAL
ESTIMATION
In statistics, interval estimation is the use of sample data to calculate an
interval of possible (or probable) values of an unknown population
parameter, in contrast to point estimation, which is a single number. Neyman
(1937) identified interval estimation ("estimation by interval") as distinct
from point estimation ("estimation by unique estimate"). In doing so, he
recognised that then-recent work quoting results in the form of an estimate
plus-or-minus a standard deviation indicated that interval estimation was
actually the problem statisticians really had in mind.

CORRELATION
Correlation addresses the relationship between two different
factors (variables). The statistic is called a correlation
coefficient. A correlation coefficient can be calculated when
there are two (or more) sets of scores for the same individuals
or matched groups.

Correlation in general answers the following questions:
• What is the chance or likelihood that as one variable moves the
other will move too? (i.e. STRENTH OR MAGNITUDE of
the relationship)
• How will the variables move in relation to each other? As one
variable’s value increases what is the chance that the other
variable’s value will also increase i.e. positive correlation?
What is the chance that the other variable’s value will decrease
instead i.e. negative correlation? (i.e. the DIRECTION of the
relationship)

IMPORTANCE OF CORRELATION:
• Correlation is very important in the field of Psychology and
Education as a measure of relationship between test scores and
other measures of performance.
• With the help of correlation, it is possible to have a correct
idea of the working capacity of a person.
• With the help of it, it is also possible to have a knowledge of
the various qualities of an individual.
• After finding the correlation between the two qualities or
different qualities of an individual, it is also possible to
provide his vocational guidance.

KARL PEARSON’S COEFFICIENT OF
CORRELATION
• The most important algebraic method of measuring
correlation is Karl Pearson’s Coefficient of correlation or
Pearsonian’s coefficient of Correlation. It has widely used
application in Statistics. It is denoted by r.

Interpretation of Karl Pearson’s
Coefficient of correlation
Karl Pearson’s Coefficient of correlation denoted by r is the
degree of correlation between two variables. r takes values
between –1 and 1
• When r is –1, we say there is perfect negative correlation.
• When r is a value between –1 and 0, we say that there is a
negative correlation
• When r is 0, we say there is no correlation
• When r is a value between 0 and 1, we say there is a positive
correlationWhen r is 1, we say there is a perfect positive
correlation.

SPEARMAN'S CORRELATION
• There are some situations in Education and Psychology where
the objects or individuals may be ranked and arranged in order
of merit or proficiency on two variables and when these 2 sets
of ranks covary or have agreement between them, we measure
the degrees of relationship by rank correlation.
• Again, there are problems in which the relationship among the
measurements made is non-linear, and cannot be described by
the product-moment r.
• This coefficient of correlation is denoted by Greek letter ρ
(called Rho)

ASSUMPTIONS OF RHO (Ρ):
• N is small or the data are badly skewed.
• They are free, or independent, of some characteristics of the
population distribution.
• In many situations Ranking methods are used, where
quantitative measurements are not available.
• Though quantitative measurements are available, ranks are
substituted to reduce arithmetical labour.
• Such tests are described as non-parametric.

Regression analysis
• Regression analysis is used when you want to predict a
continuous dependent variable from a number of independent
variables.
• If the dependent variable is dichotomous, then logistic
regression should be used. (If the split between the two levels
of the dependent variable is close to 50-50, then both logistic
and linear regression will end up giving you similar results.)
• The independent variables used in regression can be either
continuous or dichotomous. Independent variables with more
than two levels can also be used in regression analyses, but
they first must be converted into variables that have only two
levels.

TWO REGRESSION LINES
• When there is a reasonable amount of scatter, we can draw two
different regression lines depending upon which variable we
consider to be the most accurate. The first is a line of
regression of y on x, which can be used to estimate y given
x. The other is a line of regression of x on y, used to estimate
x given y.
• If there is a perfect correlation between the data (in other
words, if all the points lie on a straight line), then the two
regression lines will be the same.

STANDARD ERROR OF THE ESTIMATE
The standard error of the estimate is a measure of the accuracy
of predictions. Recall that the regression line is the line that
minimizes the sum of squared deviations of prediction (also
called the sum of squares error).

TIME SERIES
Time series is a sequence of data points in chronological
sequence, most often gathered in regular intervals. Time series
analysis can be applied to any variable that changes over time
and generally speaking, usually data points that are closer
together are more similar than those further apart.

OBJECTIVES OF TIME SERIES ANALYSIS
There are many objectives related to time series
analysis, objectives of time series analysis may be classified
as
• Description
• Explanation
• Prediction
• Control

COMPONENTS OF A TIME SERIES
The factors that are responsible for bringing about changes in a
time series, also called the components of time series, are as
follows:
• Secular Trends (or General Trends)
• Seasonal Movements
• Cyclical Movements
• Irregular Fluctuations

MEASUREMENT OF SECULAR TREND
• Free hand graphic method
• Arbitrary average method
• Semi average method
• Moving average method
• Straight line method of least square
• Parabolic method of least square
• Geometric method of least square
• Exponential method of least square
• Growth curve method of least square

LEAST SQUARES METHOD
The least squares method is a form of mathematical regression
analysis that finds the line of best fit for a dataset, providing a
visual demonstration of the relationship between the data
points. Each point of data is representative of the relationship
between a known independent variable and an unknown
dependent variable.

Business Statistics and Research Methodology

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Business Statistics and Research Methodology

Similar to Business Statistics and Research Methodology (20)

More from cpjcollege

More from cpjcollege (20)

Recently uploaded

Recently uploaded (20)

Business Statistics and Research Methodology