SlideShare a Scribd company logo
1 of 61
PREDICTIVE ANALYTICS
K.MOHANASUNDARAM
MGT 482
CREDITS 3
Class 2
SYLLABUS
• INTRODUCTION TO BUSINESS ANALYTICS
• MATHEMATICAL MODELLING
• UNDERSTANDING DATA TYPES AND EXPLORATORY ANALYTICS
• DATA PREDICTION TECHNIQUES, SAMPLING AND TESTING OF HYPOTHESIS
• SIMPLE LINEAR AND MULTIPLE LINEAR REGRESSION
• LOGISTIC REGRESSION
• FORECASTING TECHNIQUES
• CLUSTER ANALYSIS
• DECISION TREES
Recommended textbook
• Business Analytics- Methods, models and decisions
JAMES R EVANS Pearson
Reference textbooks
• PREDICTIVE ANALYTICS FOR BUSINESS STRATEGY- REASONING FROM
DATA TO ACTIONABLE KNOWLEDGE – JEFFREY T. PRINCE &
AMARANTH BOSE
• APPLIED PREDICTIVE ANALYTICS: PRINCIPLES AND TECHNIQUES FOR
THE PROFESSIONAL DATA ANALYSTS – DEAN ABBOTT
Introduction
• A key aspect of solving real business problems is dealing appropriately
with uncertainty.
• This involves recognizing explicitly that uncertainty exists and using
quantitative methods to model uncertainty.
• In many situations, the uncertain quantity is a numerical quantity. In
the language of probability, it is called a random variable.
• A probability distribution lists all of the possible values of the
random variable and their corresponding probabilities.
Flow Chart for Modeling Uncertainty
• Probability is the likelihood that an outcome occurs.
• An experiment is the process that results in an outcome.
• The outcome of an experiment is a result that we observe.
• The sample space is the collection of all possible outcomes of an
experiment.
• An event is a collection of one or more outcomes from a sample
space.
Basic Concepts of Probability
Probabilities may be defined from one of three perspectives:
 Classical definition: probabilities can be deduced from theoretical
arguments
 Relative frequency definition: probabilities are based on empirical
data
 Subjective definition: probabilities are based on judgment and
experience
Definitions of Probability
Probability Essentials
• A probability is a number between 0 and 1 that measures the
likelihood that some event will occur.
• An event with probability 0 cannot occur, whereas an event with probability 1
is certain to occur.
• An event with probability greater than 0 and less than 1 involves uncertainty,
and the closer its probability is to 1, the more likely it is to occur.
• Probabilities are sometimes expressed as percentages or odds, but
these can be easily converted to probabilities on a 0-to-1 scale.
Addition Rule
• Events are mutually exclusive if at most one of them
can occur—that is, if one of them occurs, then none
of the others can occur.
• Events are exhaustive if they exhaust all
possibilities—one of the events must occur.
• The addition rule of probability involves the
probability that at least one of the events will occur.
• When the events are mutually exclusive, the probability
that at least one of the events will occur is the sum of their
individual probabilities:
• The union of two events contains all outcomes that belong to either
of the two events.
• If A and B are two events, the probability that some outcome in either A or B
(that is, the union of A and B) occurs is denoted as P(A or B).
• Two events are mutually exclusive if they have no outcomes in
common.
• Rule 3. If events A and B are mutually exclusive, then P(A or B) = P(A)
+ P(B).
Union of Events
• The notation (A and B) represents the intersection of events A and B – that
is, all outcomes belonging to both A and B .
• Rule 4. If two events A and B are not mutually exclusive, then P(A or B) =
P(A)+ P(B) - P(A and B).
Non-Mutually Exclusive Events
• The probability of the intersection of two events is called a joint
probability.
• The probability of an event, irrespective of the outcome of the other
joint event, is called a marginal probability.
Joint and Marginal Probability
Conditional Probability and the Multiplication
Rule
• A formal way to revise probabilities on the basis of new information is
to use conditional probabilities.
• Let A and B be any events with probabilities P(A) and P(B). If you are
told that B has occurred, then the probability of A might change.
• The new probability of A is called the conditional probability of A given B, or
P(A|B).
• It can be calculated with the following formula:
• Conditional probability is the probability of occurrence of one event
A, given that another event B is known to be true or has already
occurred.
Conditional Probability
• If two events are independent, then we can simplify the
multiplication law of probability in equation (5.4)
• by substituting P(A) for P(A | B):
Multiplication Law for Independent Events
• A random variable is a numerical description of the outcome of an
experiment.
• A discrete random variable is one for which the number of possible
outcomes can be counted.
• A continuous random variable has outcomes over one or more
continuous intervals of real numbers.
Random Variables
Probability Distribution of a Single Random
Variable
• A discrete random variable has only a finite number of possible
values.
• A continuous random variable has a continuum of possible values.
• Usually a discrete distribution results from a count, whereas a
continuous distribution results from a measurement.
• This distinction between counts and measurements is not always clear-cut.
• Mathematically, there is an important difference between discrete
and continuous probability distributions.
• Specifically, a proper treatment of continuous distributions requires calculus.
Examples of discrete random variables:
• outcomes of dice rolls
• whether a customer likes or dislikes a product
• number of hits on a Web site link today
Examples of continuous random variables:
• weekly change in DJIA
• daily temperature
• time between machine failures
Discrete and Continuous Random Variables
• A probability distribution is a characterization of the possible values
that a random variable may assume along with the probability of
assuming these values.
• We may develop a probability distribution using any one of the three
perspectives of probability: classical, relative frequency, and
subjective.
Probability Distributions
• The expected value of a random variable corresponds to the notion
of the mean, or average, for a sample.
• For a discrete random variable X, the expected value, denoted E[X], is
the weighted average of all possible outcomes, where the weights are
the probabilities:
Expected Value of a Discrete Random Variable
• The expected value is a “long-run average” and is appropriate for
decisions that occur on a repeated basis.
• For one-time decisions, however, you need to consider the downside
risk and the upside potential of the decision.
Expected Value and Decision Making
• The variance, Var[X ], of a discrete random variable X is a weighted
average of the squared deviations from the expected value:
Variance of a Discrete Random Variable
Summary Measures of a Probability Distribution
 The mean, often denoted μ, is a weighted sum of
the possible values, weighted by their
probabilities:
• It is also called the expected value of X and denoted E(X).
 To measure the variability in a distribution, we
calculate its variance or standard deviation.
• The variance, denoted by σ2 or Var(X), is a weighted sum of the
squared deviations of the possible values from the mean, where
the weights are again the probabilities.
Discrete random variables
• Bernoulli distribution
• Binomial distribution
• Poisson distribution
• Hypergeometric distribution
Continuous random variables
• Uniform
• Normal
• Exponential
• Lognormal - when the probability of zero is very low, but the most likely value is just greater than
zero
Sampling and estimation
 Sampling is the foundation of statistical analysis.
 Sampling plan - a description of the approach that is used to
obtain samples from a population prior to any data collection
activity.
 A sampling plan states:
- its objectives
- target population
- population frame (the list from which the sample is
selected)
- operational procedures for collecting data
- statistical tools for data analysis
Statistical Sampling
28
• A company wants to understand how golfers might respond to a
membership program that provides discounts at golf courses.
• Objective - estimate the proportion of golfers who would join the program
• Target population - golfers over 25 years old
• Population frame - golfers who purchased equipment at particular stores
• Operational procedures - e-mail link to survey or direct-mail questionnaire
• Statistical tools - PivotTables to summarize data by demographic groups and estimate
likelihood of joining the program
A Sampling Plan for a Market Research Study
29
 Subjective Methods
 Judgment sampling – expert judgment is used to select the sample
 Convenience sampling – samples are selected based on the ease
with which the data can be collected
 Probabilistic Sampling
 Simple random sampling involves selecting items from a
population so that every subset of a given size has an equal chance
of being selected
Sampling Methods
30
• Systematic (periodic) sampling – a sampling plan that selects every nth item
from the population.
• Stratified sampling – applies to populations that are divided into natural
subsets (called strata) and allocates the appropriate proportion of samples
to each stratum.
• Cluster sampling - based on dividing a population into subgroups (clusters),
sampling a set of clusters, and (usually) conducting a complete census
within the clusters sampled
• Sampling from a continuous process
• Select a time at random; then select the next n items produced after that time.
• Select n times at random; then select the next item produced after each of these
times.
Additional Probabilistic Sampling Methods
31
https://www.youtube.com/watch?v=1XFU1d9XIWM
• Estimation involves assessing the value of an unknown population parameter
using sample data
• Estimators are the measures used to estimate population parameters
• E.g., sample mean, sample variance, sample proportion (𝑥)
• A point estimate is a single number derived from sample data that is used to
estimate the value of a population parameter.
• If the expected value of an estimator equals the population parameter it is
intended to estimate, the estimator is said to be unbiased.
Estimating Population Parameters
32
• Sampling (statistical) error occurs because samples
are only a subset of the total population
• Sampling error is inherent in any sampling process, and
although it can be minimized, it cannot be totally avoided.
• Nonsampling error occurs when the sample does not
represent the target population adequately .
• Nonsampling error usually results from a poor sample
design or inadequate data reliability.
Sampling Error
33
• The sampling distribution of the mean is the distribution of the means of all
possible samples of a fixed size n from some population.
• The standard deviation of the sampling distribution of the mean is called the
standard error of the mean:
• As n increases, the standard error decreases.
• Larger sample sizes have less sampling error.
• It is the means of all possible samples of a fixed size n from a group of population.
Sampling Distributions
34
• An interval estimate provides a range for a population
characteristic based on a sample.
• Intervals specify a range of plausible values for the characteristic of
interest and a way of assessing “how plausible” they are.
• In general, a 100(1 - a)% probability interval is any interval [A,
B] such that the probability of falling between A and B is 1 - a.
• Probability intervals are often centered on the mean or median.
• Example: in a normal distribution, the mean plus or minus 1 standard
deviation describes an approximate 68% probability interval around
the mean.
Interval Estimates
35
• A confidence interval is a range of values between which the
value of the population parameter is believed to be, along
with a probability that the interval correctly estimates the
true (unknown) population parameter.
• This probability is called the level of confidence, denoted by 1 - a,
where a is a number between 0 and 1.
• The level of confidence is usually expressed as a percent; common
values are 90%, 95%, or 99%.
• For a 95% confidence interval, if we chose 100 different
samples, leading to 100 different interval estimates, we
would expect that 95% of them would contain the true
population mean.
Confidence Intervals
• The t-distribution is a family of probability distributions with a shape
similar to the standard normal distribution. Different t-distributions are
distinguished by an additional parameter, degrees of freedom (df).
• As the number of degrees of freedom increases, the t-distribution converges to the
standard normal distribution
• The t-distribution has a larger variance than the standard normal distribution.
The t-Distribution
where tα/2 is the value of the t-distribution with
df = n − 1 for an upper tail area of α/2.
 t values are found in Table 2 of Appendix A or with the Excel
function T.INV(1 – a/2, n – 1).
 The Excel function
=CONFIDENCE.T(alpha, standard_deviation, size)
can be used to compute the margin of error
Confidence Interval for the Mean with Unknown Population
Standard Deviation
• A prediction interval is one that provides a range for
predicting the value of a new observation from the
same population.
• A confidence interval is associated with the sampling
distribution of a statistic, but a prediction interval is
associated with the distribution of the random variable
itself.
• A 100(1 – a)% prediction interval for a new
observation is
Prediction Intervals
Confidence Intervals and Sample Size
• We can determine the appropriate sample size needed to
estimate the population parameter within a specified level of
precision (± E).
• Sample size for the mean:
• Sample size for the proportion:
• Use the sample proportion from a preliminary sample as an estimate
of p or set p = 0.5 for a conservative estimate to guarantee the
required precision.
• Statistical inference focuses on drawing conclusions about
populations from samples.
• Statistical inference includes estimation of population parameters and
hypothesis testing, which involves drawing conclusions about the value of the
parameters of one or more populations.
Statistical Inference
 Hypothesis testing involves drawing inferences about two
contrasting propositions (each called a hypothesis) relating to
the value of one or more population parameters.
 H0: Null hypothesis: describes an existing theory
 H1: Alternative hypothesis: the complement of H0
 Using sample data, we either:
- reject H0 and conclude the sample data provides
sufficient evidence to support H1, or
- fail to reject H0 and conclude the sample data
does not support H1.
Hypothesis Testing
Steps in conducting a hypothesis test:
1. Identify the population parameter and formulate the hypotheses to
test.
2. Select a level of significance (the risk of drawing an incorrect
conclusion).
3. Determine the decision rule on which to base a conclusion.
4. Collect data and calculate a test statistic.
5. Apply the decision rule and draw a conclusion.
Hypothesis Testing Procedure
• Three types of one sample tests:
1. H0: parameter ≤ constant
H1: parameter > constant
2. H0: parameter ≥ constant
H1: parameter < constant
3. H0: parameter = constant
H1: parameter ≠ constant
• It is not correct to formulate a null hypothesis using >, <, or ≠.
One-Sample Hypothesis Tests
• Hypothesis testing always assumes that H0 is true and uses
sample data to determine whether H1 is more likely to be true.
• Statistically, we cannot “prove” that H0 is true; we can only fail to reject
it.
• Rejecting the null hypothesis provides strong evidence (in a
statistical sense) that the null hypothesis is not true and that
the alternative hypothesis is true.
• Therefore, what we wish to provide evidence for statistically
should be identified as the alternative hypothesis.
Determining the Proper Form of Hypotheses
• Hypothesis testing can result in one of four different
outcomes:
1. H0 is true and the test correctly fails to reject H0
2. H0 is false and the test correctly rejects H0
3. H0 is true and the test incorrectly rejects H0
(called Type I error)
4. H0 is false and the test incorrectly fails to reject H0
(called Type II error)
Understanding Potential Errors in Hypothesis Testing
• The probability of making a Type I error = α (level of significance) =
P(rejecting H0 | H0 is true)
• The value of 1 – a is called the confidence coefficient
= P(not rejecting H0 | H0 is true),
• The value of α can be controlled. Common values are 0.01, 0.05, or 0.10.
• The probability of making a Type II error = β = P(not rejecting H0 | H0
is false)
• The value of 1 - β is called the power of the test
= P(rejecting H0 | H0 is false).
• The value of β cannot be specified in advance and depends on the value of
the (unknown) population parameter.
How β Depends on the True Population Mean
 The further away the true mean is from the hypothesized value, the smaller the
value of β.
• Generally, as a decreases, b increases.
Terminology
• We would like the power of the test to be high (equivalently, we
would like the probability of a Type II error to be low) to allow us
to make a valid conclusion.
• The power of the test is sensitive to the sample size; small
sample sizes generally result in a low value of 1 - b.
• The power of the test can be increased by taking larger samples,
which enable us to detect small differences between the sample
statistics and population parameters with more accuracy.
• If you choose a small level of significance, you should try to
compensate by having a large sample size.
Improving the Power of the Test
• The decision to reject or fail to reject a null hypothesis is based on computing a
test statistic from the sample data.
• The test statistic used depends on the type of hypothesis test.
• Test statistics for one-sample hypothesis tests for means:
Selecting the Test Statistic
• The conclusion to reject or fail to reject H0 is based on comparing the value of the
test statistic to a “critical value” from the sampling distribution of the test statistic
when the null hypothesis is true and the chosen level of significance, a.
• The sampling distribution of the test statistic is usually the normal distribution, t-distribution,
or some other well-known distribution.
• The critical value divides the sampling distribution into two parts, a rejection
region and a non-rejection region. If the test statistic falls into the rejection
region, we reject the null hypothesis; otherwise, we fail to reject it.
Drawing a Conclusion
Rejection Regions
H0: parameter = constant
H1: parameter ≠ constant
H0: parameter ≤ constant
H1: parameter > constant
H0: parameter ≥ constant
H1: parameter < constant
For a one-tailed test, if H1 is stated as <, the
rejection region is in the lower tail; if H1 is stated as
>, the rejection region is in the upper tail (just
think of the inequality as an arrow pointing to the
proper tail direction).
• A p-value (observed significance level) is the
probability of obtaining a test statistic value equal
to or more extreme than that obtained from the
sample data when the null hypothesis is true.
 An alternative approach p-value rather than the
critical value:
Reject H0 if the p-value < α
p-Values
• For a lower one-tailed test, the p-value is the probability to the left of the test
statistic t in the t-distribution, and is found using the Excel function:
• =T.DIST(t, n-1, TRUE).
• For an upper one-tailed test, the p-value is the probability to the right of the test
statistic t, and is found using the Excel function:
• 1 - T.DIST(t, n-1, TRUE).
• For a two-tailed test, the p-value is found using the Excel function:
• T.DIST.2T(t, n-1), if t > 0
• T.DIST.2T(-t, n-1), if t < 0
Finding p-Values
Excel Analysis Toolpak Procedures for Two-Sample Hypothesis
Tests
 Population variances are known:
 z-Test: Two-Sample for Means
 Population variances are unknown and assumed unequal:
 t-Test: Two-Sample Assuming Unequal Variances
 Population variances are unknown but assumed equal:
 t-Test: Two-Sample Assuming Equal Variances
• These tools calculate the test statistic, the p-value for both a one-tail and two-tail
test, and the critical values for one-tail and two-tail tests.
Selecting the Proper Excel Procedure
• In many situations, data from two samples are naturally paired or matched.
• When paired samples are used, a paired t-test is more accurate than assuming that
the data come from independent populations.
• Hypotheses (mD is the mean difference between the paired samples):
• Excel Data Analysis tool: t-Test: Paired Two-Sample for Means
Two-Sample Test for Means with Paired
Samples
Test for Equality of Variances
• Test for equality of variances between two samples using a
new type of test, the F-test.
• To use this test, we must assume that both samples are drawn from
normal populations.
• Hypotheses:
• F-test statistic:
• Excel tool: F-test for Equality of Variances
• Although the hypothesis test is really a two-tailed test, we will simplify it as an
upper-tailed, one-tailed test to make it easy to use tables of the F-distribution
and interpret the results of the Excel tool.
• We do this by ensuring that when we compute F, we take the ratio of the larger sample
variance to the smaller sample variance.
• Find the critical value Fa/2,df1,df2 of the F-distribution, and then we reject the null
hypothesis if the F-test statistic exceeds the critical value.
• Note that we are using a/2 to find the critical value, not a. This is because we
are using only the upper tail information on which to base our conclusion.
Conducting the F-Test
• Used to compare the means of two or more population groups.
• ANOVA derives its name from the fact that we are analyzing
variances in the data.
• ANOVA measures variation between groups relative to
variation within groups.
• Each of the population groups is assumed to come from a
normally distributed population.
Analysis of Variance (ANOVA)
• The m groups or factor levels being studied represent populations
whose outcome measures
1. are randomly and independently obtained,
2. are normally distributed, and
3. have equal variances.
• If these assumptions are violated, then the level of significance and
the power of the test can be affected.
Assumptions of ANOVA
Chi-Square Test for Independence
 Test for independence of two categorical
variables.
◦ H0: two categorical variables are independent
◦ H1: two categorical variables are dependent

More Related Content

Similar to PA_EPGDM_2_2023.pptx

7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdfezaldeen2013
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptxMrymNb
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1naranbatn
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
BIOSTATISTICS.pptx
BIOSTATISTICS.pptxBIOSTATISTICS.pptx
BIOSTATISTICS.pptxkajolbhavsar
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceKhulna University
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxzerihunnana
 
STSTISTICS AND PROBABILITY THEORY .pptx
STSTISTICS AND PROBABILITY THEORY  .pptxSTSTISTICS AND PROBABILITY THEORY  .pptx
STSTISTICS AND PROBABILITY THEORY .pptxVenuKumar65
 
Hypothesis test based approach for change detection
Hypothesis test based approach for change detectionHypothesis test based approach for change detection
Hypothesis test based approach for change detectionKoshy Geoji
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfAdamu Mohammad
 

Similar to PA_EPGDM_2_2023.pptx (20)

Chapter34
Chapter34Chapter34
Chapter34
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
Research method ch07 statistical methods 1
Research method ch07 statistical methods 1Research method ch07 statistical methods 1
Research method ch07 statistical methods 1
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
BIOSTATISTICS.pptx
BIOSTATISTICS.pptxBIOSTATISTICS.pptx
BIOSTATISTICS.pptx
 
4646150.ppt
4646150.ppt4646150.ppt
4646150.ppt
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptx
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
STSTISTICS AND PROBABILITY THEORY .pptx
STSTISTICS AND PROBABILITY THEORY  .pptxSTSTISTICS AND PROBABILITY THEORY  .pptx
STSTISTICS AND PROBABILITY THEORY .pptx
 
Statistics
StatisticsStatistics
Statistics
 
Qm 0809
Qm 0809 Qm 0809
Qm 0809
 
Hypothesis test based approach for change detection
Hypothesis test based approach for change detectionHypothesis test based approach for change detection
Hypothesis test based approach for change detection
 
day9.ppt
day9.pptday9.ppt
day9.ppt
 
statics in research
statics in researchstatics in research
statics in research
 
Presentation1
Presentation1Presentation1
Presentation1
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

PA_EPGDM_2_2023.pptx

  • 2. SYLLABUS • INTRODUCTION TO BUSINESS ANALYTICS • MATHEMATICAL MODELLING • UNDERSTANDING DATA TYPES AND EXPLORATORY ANALYTICS • DATA PREDICTION TECHNIQUES, SAMPLING AND TESTING OF HYPOTHESIS • SIMPLE LINEAR AND MULTIPLE LINEAR REGRESSION • LOGISTIC REGRESSION • FORECASTING TECHNIQUES • CLUSTER ANALYSIS • DECISION TREES
  • 3. Recommended textbook • Business Analytics- Methods, models and decisions JAMES R EVANS Pearson
  • 4. Reference textbooks • PREDICTIVE ANALYTICS FOR BUSINESS STRATEGY- REASONING FROM DATA TO ACTIONABLE KNOWLEDGE – JEFFREY T. PRINCE & AMARANTH BOSE • APPLIED PREDICTIVE ANALYTICS: PRINCIPLES AND TECHNIQUES FOR THE PROFESSIONAL DATA ANALYSTS – DEAN ABBOTT
  • 5. Introduction • A key aspect of solving real business problems is dealing appropriately with uncertainty. • This involves recognizing explicitly that uncertainty exists and using quantitative methods to model uncertainty. • In many situations, the uncertain quantity is a numerical quantity. In the language of probability, it is called a random variable. • A probability distribution lists all of the possible values of the random variable and their corresponding probabilities.
  • 6. Flow Chart for Modeling Uncertainty
  • 7. • Probability is the likelihood that an outcome occurs. • An experiment is the process that results in an outcome. • The outcome of an experiment is a result that we observe. • The sample space is the collection of all possible outcomes of an experiment. • An event is a collection of one or more outcomes from a sample space. Basic Concepts of Probability
  • 8. Probabilities may be defined from one of three perspectives:  Classical definition: probabilities can be deduced from theoretical arguments  Relative frequency definition: probabilities are based on empirical data  Subjective definition: probabilities are based on judgment and experience Definitions of Probability
  • 9. Probability Essentials • A probability is a number between 0 and 1 that measures the likelihood that some event will occur. • An event with probability 0 cannot occur, whereas an event with probability 1 is certain to occur. • An event with probability greater than 0 and less than 1 involves uncertainty, and the closer its probability is to 1, the more likely it is to occur. • Probabilities are sometimes expressed as percentages or odds, but these can be easily converted to probabilities on a 0-to-1 scale.
  • 10. Addition Rule • Events are mutually exclusive if at most one of them can occur—that is, if one of them occurs, then none of the others can occur. • Events are exhaustive if they exhaust all possibilities—one of the events must occur. • The addition rule of probability involves the probability that at least one of the events will occur. • When the events are mutually exclusive, the probability that at least one of the events will occur is the sum of their individual probabilities:
  • 11. • The union of two events contains all outcomes that belong to either of the two events. • If A and B are two events, the probability that some outcome in either A or B (that is, the union of A and B) occurs is denoted as P(A or B). • Two events are mutually exclusive if they have no outcomes in common. • Rule 3. If events A and B are mutually exclusive, then P(A or B) = P(A) + P(B). Union of Events
  • 12. • The notation (A and B) represents the intersection of events A and B – that is, all outcomes belonging to both A and B . • Rule 4. If two events A and B are not mutually exclusive, then P(A or B) = P(A)+ P(B) - P(A and B). Non-Mutually Exclusive Events
  • 13. • The probability of the intersection of two events is called a joint probability. • The probability of an event, irrespective of the outcome of the other joint event, is called a marginal probability. Joint and Marginal Probability
  • 14. Conditional Probability and the Multiplication Rule • A formal way to revise probabilities on the basis of new information is to use conditional probabilities. • Let A and B be any events with probabilities P(A) and P(B). If you are told that B has occurred, then the probability of A might change. • The new probability of A is called the conditional probability of A given B, or P(A|B). • It can be calculated with the following formula:
  • 15. • Conditional probability is the probability of occurrence of one event A, given that another event B is known to be true or has already occurred. Conditional Probability
  • 16. • If two events are independent, then we can simplify the multiplication law of probability in equation (5.4) • by substituting P(A) for P(A | B): Multiplication Law for Independent Events
  • 17. • A random variable is a numerical description of the outcome of an experiment. • A discrete random variable is one for which the number of possible outcomes can be counted. • A continuous random variable has outcomes over one or more continuous intervals of real numbers. Random Variables
  • 18. Probability Distribution of a Single Random Variable • A discrete random variable has only a finite number of possible values. • A continuous random variable has a continuum of possible values. • Usually a discrete distribution results from a count, whereas a continuous distribution results from a measurement. • This distinction between counts and measurements is not always clear-cut. • Mathematically, there is an important difference between discrete and continuous probability distributions. • Specifically, a proper treatment of continuous distributions requires calculus.
  • 19. Examples of discrete random variables: • outcomes of dice rolls • whether a customer likes or dislikes a product • number of hits on a Web site link today Examples of continuous random variables: • weekly change in DJIA • daily temperature • time between machine failures Discrete and Continuous Random Variables
  • 20. • A probability distribution is a characterization of the possible values that a random variable may assume along with the probability of assuming these values. • We may develop a probability distribution using any one of the three perspectives of probability: classical, relative frequency, and subjective. Probability Distributions
  • 21. • The expected value of a random variable corresponds to the notion of the mean, or average, for a sample. • For a discrete random variable X, the expected value, denoted E[X], is the weighted average of all possible outcomes, where the weights are the probabilities: Expected Value of a Discrete Random Variable
  • 22. • The expected value is a “long-run average” and is appropriate for decisions that occur on a repeated basis. • For one-time decisions, however, you need to consider the downside risk and the upside potential of the decision. Expected Value and Decision Making
  • 23. • The variance, Var[X ], of a discrete random variable X is a weighted average of the squared deviations from the expected value: Variance of a Discrete Random Variable
  • 24. Summary Measures of a Probability Distribution  The mean, often denoted μ, is a weighted sum of the possible values, weighted by their probabilities: • It is also called the expected value of X and denoted E(X).  To measure the variability in a distribution, we calculate its variance or standard deviation. • The variance, denoted by σ2 or Var(X), is a weighted sum of the squared deviations of the possible values from the mean, where the weights are again the probabilities.
  • 25. Discrete random variables • Bernoulli distribution • Binomial distribution • Poisson distribution • Hypergeometric distribution
  • 26. Continuous random variables • Uniform • Normal • Exponential • Lognormal - when the probability of zero is very low, but the most likely value is just greater than zero
  • 28.  Sampling is the foundation of statistical analysis.  Sampling plan - a description of the approach that is used to obtain samples from a population prior to any data collection activity.  A sampling plan states: - its objectives - target population - population frame (the list from which the sample is selected) - operational procedures for collecting data - statistical tools for data analysis Statistical Sampling 28
  • 29. • A company wants to understand how golfers might respond to a membership program that provides discounts at golf courses. • Objective - estimate the proportion of golfers who would join the program • Target population - golfers over 25 years old • Population frame - golfers who purchased equipment at particular stores • Operational procedures - e-mail link to survey or direct-mail questionnaire • Statistical tools - PivotTables to summarize data by demographic groups and estimate likelihood of joining the program A Sampling Plan for a Market Research Study 29
  • 30.  Subjective Methods  Judgment sampling – expert judgment is used to select the sample  Convenience sampling – samples are selected based on the ease with which the data can be collected  Probabilistic Sampling  Simple random sampling involves selecting items from a population so that every subset of a given size has an equal chance of being selected Sampling Methods 30
  • 31. • Systematic (periodic) sampling – a sampling plan that selects every nth item from the population. • Stratified sampling – applies to populations that are divided into natural subsets (called strata) and allocates the appropriate proportion of samples to each stratum. • Cluster sampling - based on dividing a population into subgroups (clusters), sampling a set of clusters, and (usually) conducting a complete census within the clusters sampled • Sampling from a continuous process • Select a time at random; then select the next n items produced after that time. • Select n times at random; then select the next item produced after each of these times. Additional Probabilistic Sampling Methods 31 https://www.youtube.com/watch?v=1XFU1d9XIWM
  • 32. • Estimation involves assessing the value of an unknown population parameter using sample data • Estimators are the measures used to estimate population parameters • E.g., sample mean, sample variance, sample proportion (𝑥) • A point estimate is a single number derived from sample data that is used to estimate the value of a population parameter. • If the expected value of an estimator equals the population parameter it is intended to estimate, the estimator is said to be unbiased. Estimating Population Parameters 32
  • 33. • Sampling (statistical) error occurs because samples are only a subset of the total population • Sampling error is inherent in any sampling process, and although it can be minimized, it cannot be totally avoided. • Nonsampling error occurs when the sample does not represent the target population adequately . • Nonsampling error usually results from a poor sample design or inadequate data reliability. Sampling Error 33
  • 34. • The sampling distribution of the mean is the distribution of the means of all possible samples of a fixed size n from some population. • The standard deviation of the sampling distribution of the mean is called the standard error of the mean: • As n increases, the standard error decreases. • Larger sample sizes have less sampling error. • It is the means of all possible samples of a fixed size n from a group of population. Sampling Distributions 34
  • 35. • An interval estimate provides a range for a population characteristic based on a sample. • Intervals specify a range of plausible values for the characteristic of interest and a way of assessing “how plausible” they are. • In general, a 100(1 - a)% probability interval is any interval [A, B] such that the probability of falling between A and B is 1 - a. • Probability intervals are often centered on the mean or median. • Example: in a normal distribution, the mean plus or minus 1 standard deviation describes an approximate 68% probability interval around the mean. Interval Estimates 35
  • 36. • A confidence interval is a range of values between which the value of the population parameter is believed to be, along with a probability that the interval correctly estimates the true (unknown) population parameter. • This probability is called the level of confidence, denoted by 1 - a, where a is a number between 0 and 1. • The level of confidence is usually expressed as a percent; common values are 90%, 95%, or 99%. • For a 95% confidence interval, if we chose 100 different samples, leading to 100 different interval estimates, we would expect that 95% of them would contain the true population mean. Confidence Intervals
  • 37. • The t-distribution is a family of probability distributions with a shape similar to the standard normal distribution. Different t-distributions are distinguished by an additional parameter, degrees of freedom (df). • As the number of degrees of freedom increases, the t-distribution converges to the standard normal distribution • The t-distribution has a larger variance than the standard normal distribution. The t-Distribution
  • 38. where tα/2 is the value of the t-distribution with df = n − 1 for an upper tail area of α/2.  t values are found in Table 2 of Appendix A or with the Excel function T.INV(1 – a/2, n – 1).  The Excel function =CONFIDENCE.T(alpha, standard_deviation, size) can be used to compute the margin of error Confidence Interval for the Mean with Unknown Population Standard Deviation
  • 39. • A prediction interval is one that provides a range for predicting the value of a new observation from the same population. • A confidence interval is associated with the sampling distribution of a statistic, but a prediction interval is associated with the distribution of the random variable itself. • A 100(1 – a)% prediction interval for a new observation is Prediction Intervals
  • 40. Confidence Intervals and Sample Size • We can determine the appropriate sample size needed to estimate the population parameter within a specified level of precision (± E). • Sample size for the mean: • Sample size for the proportion: • Use the sample proportion from a preliminary sample as an estimate of p or set p = 0.5 for a conservative estimate to guarantee the required precision.
  • 41. • Statistical inference focuses on drawing conclusions about populations from samples. • Statistical inference includes estimation of population parameters and hypothesis testing, which involves drawing conclusions about the value of the parameters of one or more populations. Statistical Inference
  • 42.  Hypothesis testing involves drawing inferences about two contrasting propositions (each called a hypothesis) relating to the value of one or more population parameters.  H0: Null hypothesis: describes an existing theory  H1: Alternative hypothesis: the complement of H0  Using sample data, we either: - reject H0 and conclude the sample data provides sufficient evidence to support H1, or - fail to reject H0 and conclude the sample data does not support H1. Hypothesis Testing
  • 43. Steps in conducting a hypothesis test: 1. Identify the population parameter and formulate the hypotheses to test. 2. Select a level of significance (the risk of drawing an incorrect conclusion). 3. Determine the decision rule on which to base a conclusion. 4. Collect data and calculate a test statistic. 5. Apply the decision rule and draw a conclusion. Hypothesis Testing Procedure
  • 44. • Three types of one sample tests: 1. H0: parameter ≤ constant H1: parameter > constant 2. H0: parameter ≥ constant H1: parameter < constant 3. H0: parameter = constant H1: parameter ≠ constant • It is not correct to formulate a null hypothesis using >, <, or ≠. One-Sample Hypothesis Tests
  • 45. • Hypothesis testing always assumes that H0 is true and uses sample data to determine whether H1 is more likely to be true. • Statistically, we cannot “prove” that H0 is true; we can only fail to reject it. • Rejecting the null hypothesis provides strong evidence (in a statistical sense) that the null hypothesis is not true and that the alternative hypothesis is true. • Therefore, what we wish to provide evidence for statistically should be identified as the alternative hypothesis. Determining the Proper Form of Hypotheses
  • 46. • Hypothesis testing can result in one of four different outcomes: 1. H0 is true and the test correctly fails to reject H0 2. H0 is false and the test correctly rejects H0 3. H0 is true and the test incorrectly rejects H0 (called Type I error) 4. H0 is false and the test incorrectly fails to reject H0 (called Type II error) Understanding Potential Errors in Hypothesis Testing
  • 47. • The probability of making a Type I error = α (level of significance) = P(rejecting H0 | H0 is true) • The value of 1 – a is called the confidence coefficient = P(not rejecting H0 | H0 is true), • The value of α can be controlled. Common values are 0.01, 0.05, or 0.10. • The probability of making a Type II error = β = P(not rejecting H0 | H0 is false) • The value of 1 - β is called the power of the test = P(rejecting H0 | H0 is false). • The value of β cannot be specified in advance and depends on the value of the (unknown) population parameter. How β Depends on the True Population Mean  The further away the true mean is from the hypothesized value, the smaller the value of β. • Generally, as a decreases, b increases. Terminology
  • 48. • We would like the power of the test to be high (equivalently, we would like the probability of a Type II error to be low) to allow us to make a valid conclusion. • The power of the test is sensitive to the sample size; small sample sizes generally result in a low value of 1 - b. • The power of the test can be increased by taking larger samples, which enable us to detect small differences between the sample statistics and population parameters with more accuracy. • If you choose a small level of significance, you should try to compensate by having a large sample size. Improving the Power of the Test
  • 49. • The decision to reject or fail to reject a null hypothesis is based on computing a test statistic from the sample data. • The test statistic used depends on the type of hypothesis test. • Test statistics for one-sample hypothesis tests for means: Selecting the Test Statistic
  • 50. • The conclusion to reject or fail to reject H0 is based on comparing the value of the test statistic to a “critical value” from the sampling distribution of the test statistic when the null hypothesis is true and the chosen level of significance, a. • The sampling distribution of the test statistic is usually the normal distribution, t-distribution, or some other well-known distribution. • The critical value divides the sampling distribution into two parts, a rejection region and a non-rejection region. If the test statistic falls into the rejection region, we reject the null hypothesis; otherwise, we fail to reject it. Drawing a Conclusion
  • 51. Rejection Regions H0: parameter = constant H1: parameter ≠ constant H0: parameter ≤ constant H1: parameter > constant H0: parameter ≥ constant H1: parameter < constant For a one-tailed test, if H1 is stated as <, the rejection region is in the lower tail; if H1 is stated as >, the rejection region is in the upper tail (just think of the inequality as an arrow pointing to the proper tail direction).
  • 52. • A p-value (observed significance level) is the probability of obtaining a test statistic value equal to or more extreme than that obtained from the sample data when the null hypothesis is true.  An alternative approach p-value rather than the critical value: Reject H0 if the p-value < α p-Values
  • 53. • For a lower one-tailed test, the p-value is the probability to the left of the test statistic t in the t-distribution, and is found using the Excel function: • =T.DIST(t, n-1, TRUE). • For an upper one-tailed test, the p-value is the probability to the right of the test statistic t, and is found using the Excel function: • 1 - T.DIST(t, n-1, TRUE). • For a two-tailed test, the p-value is found using the Excel function: • T.DIST.2T(t, n-1), if t > 0 • T.DIST.2T(-t, n-1), if t < 0 Finding p-Values
  • 54. Excel Analysis Toolpak Procedures for Two-Sample Hypothesis Tests
  • 55.  Population variances are known:  z-Test: Two-Sample for Means  Population variances are unknown and assumed unequal:  t-Test: Two-Sample Assuming Unequal Variances  Population variances are unknown but assumed equal:  t-Test: Two-Sample Assuming Equal Variances • These tools calculate the test statistic, the p-value for both a one-tail and two-tail test, and the critical values for one-tail and two-tail tests. Selecting the Proper Excel Procedure
  • 56. • In many situations, data from two samples are naturally paired or matched. • When paired samples are used, a paired t-test is more accurate than assuming that the data come from independent populations. • Hypotheses (mD is the mean difference between the paired samples): • Excel Data Analysis tool: t-Test: Paired Two-Sample for Means Two-Sample Test for Means with Paired Samples
  • 57. Test for Equality of Variances • Test for equality of variances between two samples using a new type of test, the F-test. • To use this test, we must assume that both samples are drawn from normal populations. • Hypotheses: • F-test statistic: • Excel tool: F-test for Equality of Variances
  • 58. • Although the hypothesis test is really a two-tailed test, we will simplify it as an upper-tailed, one-tailed test to make it easy to use tables of the F-distribution and interpret the results of the Excel tool. • We do this by ensuring that when we compute F, we take the ratio of the larger sample variance to the smaller sample variance. • Find the critical value Fa/2,df1,df2 of the F-distribution, and then we reject the null hypothesis if the F-test statistic exceeds the critical value. • Note that we are using a/2 to find the critical value, not a. This is because we are using only the upper tail information on which to base our conclusion. Conducting the F-Test
  • 59. • Used to compare the means of two or more population groups. • ANOVA derives its name from the fact that we are analyzing variances in the data. • ANOVA measures variation between groups relative to variation within groups. • Each of the population groups is assumed to come from a normally distributed population. Analysis of Variance (ANOVA)
  • 60. • The m groups or factor levels being studied represent populations whose outcome measures 1. are randomly and independently obtained, 2. are normally distributed, and 3. have equal variances. • If these assumptions are violated, then the level of significance and the power of the test can be affected. Assumptions of ANOVA
  • 61. Chi-Square Test for Independence  Test for independence of two categorical variables. ◦ H0: two categorical variables are independent ◦ H1: two categorical variables are dependent