2. SYLLABUS
• INTRODUCTION TO BUSINESS ANALYTICS
• MATHEMATICAL MODELLING
• UNDERSTANDING DATA TYPES AND EXPLORATORY ANALYTICS
• DATA PREDICTION TECHNIQUES, SAMPLING AND TESTING OF HYPOTHESIS
• SIMPLE LINEAR AND MULTIPLE LINEAR REGRESSION
• LOGISTIC REGRESSION
• FORECASTING TECHNIQUES
• CLUSTER ANALYSIS
• DECISION TREES
4. Reference textbooks
• PREDICTIVE ANALYTICS FOR BUSINESS STRATEGY- REASONING FROM
DATA TO ACTIONABLE KNOWLEDGE – JEFFREY T. PRINCE &
AMARANTH BOSE
• APPLIED PREDICTIVE ANALYTICS: PRINCIPLES AND TECHNIQUES FOR
THE PROFESSIONAL DATA ANALYSTS – DEAN ABBOTT
5. Introduction
• A key aspect of solving real business problems is dealing appropriately
with uncertainty.
• This involves recognizing explicitly that uncertainty exists and using
quantitative methods to model uncertainty.
• In many situations, the uncertain quantity is a numerical quantity. In
the language of probability, it is called a random variable.
• A probability distribution lists all of the possible values of the
random variable and their corresponding probabilities.
7. • Probability is the likelihood that an outcome occurs.
• An experiment is the process that results in an outcome.
• The outcome of an experiment is a result that we observe.
• The sample space is the collection of all possible outcomes of an
experiment.
• An event is a collection of one or more outcomes from a sample
space.
Basic Concepts of Probability
8. Probabilities may be defined from one of three perspectives:
Classical definition: probabilities can be deduced from theoretical
arguments
Relative frequency definition: probabilities are based on empirical
data
Subjective definition: probabilities are based on judgment and
experience
Definitions of Probability
9. Probability Essentials
• A probability is a number between 0 and 1 that measures the
likelihood that some event will occur.
• An event with probability 0 cannot occur, whereas an event with probability 1
is certain to occur.
• An event with probability greater than 0 and less than 1 involves uncertainty,
and the closer its probability is to 1, the more likely it is to occur.
• Probabilities are sometimes expressed as percentages or odds, but
these can be easily converted to probabilities on a 0-to-1 scale.
10. Addition Rule
• Events are mutually exclusive if at most one of them
can occur—that is, if one of them occurs, then none
of the others can occur.
• Events are exhaustive if they exhaust all
possibilities—one of the events must occur.
• The addition rule of probability involves the
probability that at least one of the events will occur.
• When the events are mutually exclusive, the probability
that at least one of the events will occur is the sum of their
individual probabilities:
11. • The union of two events contains all outcomes that belong to either
of the two events.
• If A and B are two events, the probability that some outcome in either A or B
(that is, the union of A and B) occurs is denoted as P(A or B).
• Two events are mutually exclusive if they have no outcomes in
common.
• Rule 3. If events A and B are mutually exclusive, then P(A or B) = P(A)
+ P(B).
Union of Events
12. • The notation (A and B) represents the intersection of events A and B – that
is, all outcomes belonging to both A and B .
• Rule 4. If two events A and B are not mutually exclusive, then P(A or B) =
P(A)+ P(B) - P(A and B).
Non-Mutually Exclusive Events
13. • The probability of the intersection of two events is called a joint
probability.
• The probability of an event, irrespective of the outcome of the other
joint event, is called a marginal probability.
Joint and Marginal Probability
14. Conditional Probability and the Multiplication
Rule
• A formal way to revise probabilities on the basis of new information is
to use conditional probabilities.
• Let A and B be any events with probabilities P(A) and P(B). If you are
told that B has occurred, then the probability of A might change.
• The new probability of A is called the conditional probability of A given B, or
P(A|B).
• It can be calculated with the following formula:
15. • Conditional probability is the probability of occurrence of one event
A, given that another event B is known to be true or has already
occurred.
Conditional Probability
16. • If two events are independent, then we can simplify the
multiplication law of probability in equation (5.4)
• by substituting P(A) for P(A | B):
Multiplication Law for Independent Events
17. • A random variable is a numerical description of the outcome of an
experiment.
• A discrete random variable is one for which the number of possible
outcomes can be counted.
• A continuous random variable has outcomes over one or more
continuous intervals of real numbers.
Random Variables
18. Probability Distribution of a Single Random
Variable
• A discrete random variable has only a finite number of possible
values.
• A continuous random variable has a continuum of possible values.
• Usually a discrete distribution results from a count, whereas a
continuous distribution results from a measurement.
• This distinction between counts and measurements is not always clear-cut.
• Mathematically, there is an important difference between discrete
and continuous probability distributions.
• Specifically, a proper treatment of continuous distributions requires calculus.
19. Examples of discrete random variables:
• outcomes of dice rolls
• whether a customer likes or dislikes a product
• number of hits on a Web site link today
Examples of continuous random variables:
• weekly change in DJIA
• daily temperature
• time between machine failures
Discrete and Continuous Random Variables
20. • A probability distribution is a characterization of the possible values
that a random variable may assume along with the probability of
assuming these values.
• We may develop a probability distribution using any one of the three
perspectives of probability: classical, relative frequency, and
subjective.
Probability Distributions
21. • The expected value of a random variable corresponds to the notion
of the mean, or average, for a sample.
• For a discrete random variable X, the expected value, denoted E[X], is
the weighted average of all possible outcomes, where the weights are
the probabilities:
Expected Value of a Discrete Random Variable
22. • The expected value is a “long-run average” and is appropriate for
decisions that occur on a repeated basis.
• For one-time decisions, however, you need to consider the downside
risk and the upside potential of the decision.
Expected Value and Decision Making
23. • The variance, Var[X ], of a discrete random variable X is a weighted
average of the squared deviations from the expected value:
Variance of a Discrete Random Variable
24. Summary Measures of a Probability Distribution
The mean, often denoted μ, is a weighted sum of
the possible values, weighted by their
probabilities:
• It is also called the expected value of X and denoted E(X).
To measure the variability in a distribution, we
calculate its variance or standard deviation.
• The variance, denoted by σ2 or Var(X), is a weighted sum of the
squared deviations of the possible values from the mean, where
the weights are again the probabilities.
25. Discrete random variables
• Bernoulli distribution
• Binomial distribution
• Poisson distribution
• Hypergeometric distribution
26. Continuous random variables
• Uniform
• Normal
• Exponential
• Lognormal - when the probability of zero is very low, but the most likely value is just greater than
zero
28. Sampling is the foundation of statistical analysis.
Sampling plan - a description of the approach that is used to
obtain samples from a population prior to any data collection
activity.
A sampling plan states:
- its objectives
- target population
- population frame (the list from which the sample is
selected)
- operational procedures for collecting data
- statistical tools for data analysis
Statistical Sampling
28
29. • A company wants to understand how golfers might respond to a
membership program that provides discounts at golf courses.
• Objective - estimate the proportion of golfers who would join the program
• Target population - golfers over 25 years old
• Population frame - golfers who purchased equipment at particular stores
• Operational procedures - e-mail link to survey or direct-mail questionnaire
• Statistical tools - PivotTables to summarize data by demographic groups and estimate
likelihood of joining the program
A Sampling Plan for a Market Research Study
29
30. Subjective Methods
Judgment sampling – expert judgment is used to select the sample
Convenience sampling – samples are selected based on the ease
with which the data can be collected
Probabilistic Sampling
Simple random sampling involves selecting items from a
population so that every subset of a given size has an equal chance
of being selected
Sampling Methods
30
31. • Systematic (periodic) sampling – a sampling plan that selects every nth item
from the population.
• Stratified sampling – applies to populations that are divided into natural
subsets (called strata) and allocates the appropriate proportion of samples
to each stratum.
• Cluster sampling - based on dividing a population into subgroups (clusters),
sampling a set of clusters, and (usually) conducting a complete census
within the clusters sampled
• Sampling from a continuous process
• Select a time at random; then select the next n items produced after that time.
• Select n times at random; then select the next item produced after each of these
times.
Additional Probabilistic Sampling Methods
31
https://www.youtube.com/watch?v=1XFU1d9XIWM
32. • Estimation involves assessing the value of an unknown population parameter
using sample data
• Estimators are the measures used to estimate population parameters
• E.g., sample mean, sample variance, sample proportion (𝑥)
• A point estimate is a single number derived from sample data that is used to
estimate the value of a population parameter.
• If the expected value of an estimator equals the population parameter it is
intended to estimate, the estimator is said to be unbiased.
Estimating Population Parameters
32
33. • Sampling (statistical) error occurs because samples
are only a subset of the total population
• Sampling error is inherent in any sampling process, and
although it can be minimized, it cannot be totally avoided.
• Nonsampling error occurs when the sample does not
represent the target population adequately .
• Nonsampling error usually results from a poor sample
design or inadequate data reliability.
Sampling Error
33
34. • The sampling distribution of the mean is the distribution of the means of all
possible samples of a fixed size n from some population.
• The standard deviation of the sampling distribution of the mean is called the
standard error of the mean:
• As n increases, the standard error decreases.
• Larger sample sizes have less sampling error.
• It is the means of all possible samples of a fixed size n from a group of population.
Sampling Distributions
34
35. • An interval estimate provides a range for a population
characteristic based on a sample.
• Intervals specify a range of plausible values for the characteristic of
interest and a way of assessing “how plausible” they are.
• In general, a 100(1 - a)% probability interval is any interval [A,
B] such that the probability of falling between A and B is 1 - a.
• Probability intervals are often centered on the mean or median.
• Example: in a normal distribution, the mean plus or minus 1 standard
deviation describes an approximate 68% probability interval around
the mean.
Interval Estimates
35
36. • A confidence interval is a range of values between which the
value of the population parameter is believed to be, along
with a probability that the interval correctly estimates the
true (unknown) population parameter.
• This probability is called the level of confidence, denoted by 1 - a,
where a is a number between 0 and 1.
• The level of confidence is usually expressed as a percent; common
values are 90%, 95%, or 99%.
• For a 95% confidence interval, if we chose 100 different
samples, leading to 100 different interval estimates, we
would expect that 95% of them would contain the true
population mean.
Confidence Intervals
37. • The t-distribution is a family of probability distributions with a shape
similar to the standard normal distribution. Different t-distributions are
distinguished by an additional parameter, degrees of freedom (df).
• As the number of degrees of freedom increases, the t-distribution converges to the
standard normal distribution
• The t-distribution has a larger variance than the standard normal distribution.
The t-Distribution
38. where tα/2 is the value of the t-distribution with
df = n − 1 for an upper tail area of α/2.
t values are found in Table 2 of Appendix A or with the Excel
function T.INV(1 – a/2, n – 1).
The Excel function
=CONFIDENCE.T(alpha, standard_deviation, size)
can be used to compute the margin of error
Confidence Interval for the Mean with Unknown Population
Standard Deviation
39. • A prediction interval is one that provides a range for
predicting the value of a new observation from the
same population.
• A confidence interval is associated with the sampling
distribution of a statistic, but a prediction interval is
associated with the distribution of the random variable
itself.
• A 100(1 – a)% prediction interval for a new
observation is
Prediction Intervals
40. Confidence Intervals and Sample Size
• We can determine the appropriate sample size needed to
estimate the population parameter within a specified level of
precision (± E).
• Sample size for the mean:
• Sample size for the proportion:
• Use the sample proportion from a preliminary sample as an estimate
of p or set p = 0.5 for a conservative estimate to guarantee the
required precision.
41. • Statistical inference focuses on drawing conclusions about
populations from samples.
• Statistical inference includes estimation of population parameters and
hypothesis testing, which involves drawing conclusions about the value of the
parameters of one or more populations.
Statistical Inference
42. Hypothesis testing involves drawing inferences about two
contrasting propositions (each called a hypothesis) relating to
the value of one or more population parameters.
H0: Null hypothesis: describes an existing theory
H1: Alternative hypothesis: the complement of H0
Using sample data, we either:
- reject H0 and conclude the sample data provides
sufficient evidence to support H1, or
- fail to reject H0 and conclude the sample data
does not support H1.
Hypothesis Testing
43. Steps in conducting a hypothesis test:
1. Identify the population parameter and formulate the hypotheses to
test.
2. Select a level of significance (the risk of drawing an incorrect
conclusion).
3. Determine the decision rule on which to base a conclusion.
4. Collect data and calculate a test statistic.
5. Apply the decision rule and draw a conclusion.
Hypothesis Testing Procedure
44. • Three types of one sample tests:
1. H0: parameter ≤ constant
H1: parameter > constant
2. H0: parameter ≥ constant
H1: parameter < constant
3. H0: parameter = constant
H1: parameter ≠ constant
• It is not correct to formulate a null hypothesis using >, <, or ≠.
One-Sample Hypothesis Tests
45. • Hypothesis testing always assumes that H0 is true and uses
sample data to determine whether H1 is more likely to be true.
• Statistically, we cannot “prove” that H0 is true; we can only fail to reject
it.
• Rejecting the null hypothesis provides strong evidence (in a
statistical sense) that the null hypothesis is not true and that
the alternative hypothesis is true.
• Therefore, what we wish to provide evidence for statistically
should be identified as the alternative hypothesis.
Determining the Proper Form of Hypotheses
46. • Hypothesis testing can result in one of four different
outcomes:
1. H0 is true and the test correctly fails to reject H0
2. H0 is false and the test correctly rejects H0
3. H0 is true and the test incorrectly rejects H0
(called Type I error)
4. H0 is false and the test incorrectly fails to reject H0
(called Type II error)
Understanding Potential Errors in Hypothesis Testing
47. • The probability of making a Type I error = α (level of significance) =
P(rejecting H0 | H0 is true)
• The value of 1 – a is called the confidence coefficient
= P(not rejecting H0 | H0 is true),
• The value of α can be controlled. Common values are 0.01, 0.05, or 0.10.
• The probability of making a Type II error = β = P(not rejecting H0 | H0
is false)
• The value of 1 - β is called the power of the test
= P(rejecting H0 | H0 is false).
• The value of β cannot be specified in advance and depends on the value of
the (unknown) population parameter.
How β Depends on the True Population Mean
The further away the true mean is from the hypothesized value, the smaller the
value of β.
• Generally, as a decreases, b increases.
Terminology
48. • We would like the power of the test to be high (equivalently, we
would like the probability of a Type II error to be low) to allow us
to make a valid conclusion.
• The power of the test is sensitive to the sample size; small
sample sizes generally result in a low value of 1 - b.
• The power of the test can be increased by taking larger samples,
which enable us to detect small differences between the sample
statistics and population parameters with more accuracy.
• If you choose a small level of significance, you should try to
compensate by having a large sample size.
Improving the Power of the Test
49. • The decision to reject or fail to reject a null hypothesis is based on computing a
test statistic from the sample data.
• The test statistic used depends on the type of hypothesis test.
• Test statistics for one-sample hypothesis tests for means:
Selecting the Test Statistic
50. • The conclusion to reject or fail to reject H0 is based on comparing the value of the
test statistic to a “critical value” from the sampling distribution of the test statistic
when the null hypothesis is true and the chosen level of significance, a.
• The sampling distribution of the test statistic is usually the normal distribution, t-distribution,
or some other well-known distribution.
• The critical value divides the sampling distribution into two parts, a rejection
region and a non-rejection region. If the test statistic falls into the rejection
region, we reject the null hypothesis; otherwise, we fail to reject it.
Drawing a Conclusion
51. Rejection Regions
H0: parameter = constant
H1: parameter ≠ constant
H0: parameter ≤ constant
H1: parameter > constant
H0: parameter ≥ constant
H1: parameter < constant
For a one-tailed test, if H1 is stated as <, the
rejection region is in the lower tail; if H1 is stated as
>, the rejection region is in the upper tail (just
think of the inequality as an arrow pointing to the
proper tail direction).
52. • A p-value (observed significance level) is the
probability of obtaining a test statistic value equal
to or more extreme than that obtained from the
sample data when the null hypothesis is true.
An alternative approach p-value rather than the
critical value:
Reject H0 if the p-value < α
p-Values
53. • For a lower one-tailed test, the p-value is the probability to the left of the test
statistic t in the t-distribution, and is found using the Excel function:
• =T.DIST(t, n-1, TRUE).
• For an upper one-tailed test, the p-value is the probability to the right of the test
statistic t, and is found using the Excel function:
• 1 - T.DIST(t, n-1, TRUE).
• For a two-tailed test, the p-value is found using the Excel function:
• T.DIST.2T(t, n-1), if t > 0
• T.DIST.2T(-t, n-1), if t < 0
Finding p-Values
55. Population variances are known:
z-Test: Two-Sample for Means
Population variances are unknown and assumed unequal:
t-Test: Two-Sample Assuming Unequal Variances
Population variances are unknown but assumed equal:
t-Test: Two-Sample Assuming Equal Variances
• These tools calculate the test statistic, the p-value for both a one-tail and two-tail
test, and the critical values for one-tail and two-tail tests.
Selecting the Proper Excel Procedure
56. • In many situations, data from two samples are naturally paired or matched.
• When paired samples are used, a paired t-test is more accurate than assuming that
the data come from independent populations.
• Hypotheses (mD is the mean difference between the paired samples):
• Excel Data Analysis tool: t-Test: Paired Two-Sample for Means
Two-Sample Test for Means with Paired
Samples
57. Test for Equality of Variances
• Test for equality of variances between two samples using a
new type of test, the F-test.
• To use this test, we must assume that both samples are drawn from
normal populations.
• Hypotheses:
• F-test statistic:
• Excel tool: F-test for Equality of Variances
58. • Although the hypothesis test is really a two-tailed test, we will simplify it as an
upper-tailed, one-tailed test to make it easy to use tables of the F-distribution
and interpret the results of the Excel tool.
• We do this by ensuring that when we compute F, we take the ratio of the larger sample
variance to the smaller sample variance.
• Find the critical value Fa/2,df1,df2 of the F-distribution, and then we reject the null
hypothesis if the F-test statistic exceeds the critical value.
• Note that we are using a/2 to find the critical value, not a. This is because we
are using only the upper tail information on which to base our conclusion.
Conducting the F-Test
59. • Used to compare the means of two or more population groups.
• ANOVA derives its name from the fact that we are analyzing
variances in the data.
• ANOVA measures variation between groups relative to
variation within groups.
• Each of the population groups is assumed to come from a
normally distributed population.
Analysis of Variance (ANOVA)
60. • The m groups or factor levels being studied represent populations
whose outcome measures
1. are randomly and independently obtained,
2. are normally distributed, and
3. have equal variances.
• If these assumptions are violated, then the level of significance and
the power of the test can be affected.
Assumptions of ANOVA
61. Chi-Square Test for Independence
Test for independence of two categorical
variables.
◦ H0: two categorical variables are independent
◦ H1: two categorical variables are dependent