SlideShare a Scribd company logo
1 of 101
Alpha University College
5/11/2022 1
Business Research Methods
Part VI (Sub-part II)
Data Analysis, Interpretation
and Reporting
5/11/2022 2
Chapter Six: Data Analysis, Interpretation and Reporting
Data Management and Support Software
Descriptive Analysis
Inferential Analysis
Hypothesis Testing
Interpretation, scientific writing and reporting
5/11/2022 3
Data Analysis: Introduction
 Once the data is ready for processing, the next step is to
choose appropriate analysis method and conduct the analysis.
 Data analysis depends on the nature of the variable, the type
of data and the purpose of the analysis. The following issues
will affect the data analysis part of your research endeavor.
 The type of data you have gathered, (i.e.
Nominal/Ordinal/Interval/Ratio)
 Are the data paired such as before and after treatment?
 Are they parametric or non-parametric?
 Ranks, scores, or categories are generally non-parametric data.
 Measurements that come from a population that is normally
distributed can usually be treated as parametric.
 What are you looking for? differences, correlation etc?
5/11/2022 4
Data Analysis: Introduction
 Simply put: Data analysis is the process of making
meaning from the data
 Broadly classified, data analysis involves:
 Quantitative analysis
 Qualitative analysis
 The quantitative analysis uses numeric
expressions/representations and manipulations of the
collected data.
 The analysis could take descriptive or inferential form.
 Based on number of variables involved, quantitative
analysis could be univariate, bivariate and/ or multivariate
analysis.
5/11/2022 5
Quantitative analysis
 Descriptive vs Inferential analysis:
 Descriptive analysis: refers to statistically describing,
aggregating, and presenting the constructs of interest or
associations between these constructs.
 Inferential analysis: refers to the statistical estimation of
parameter values and testing of hypotheses (theory testing).
 With respect to the number of variables:
 Univariate analysis: only one variable is analyzed
 Bivariate analysis: two variables are analyzed
 Multivariate analysis: more than two variables are
included in the analysis process
 It also varies with the four scales of measurement
5/11/2022 6
Scales of Measurement & Descriptive Statistics
Reliability Analysis/Test (SPSS)
 It helps measure consistency of an instrument.
 Internal consistency is the most commonly used measure
of reliability
 Factors that increase reliability
 Number of items
 High variation among individuals being tested
 Clear instructions
 Optimal testing situation
 Analyze  Scale  Reliability Analysis  select items 
Statistics  choose statistical tests  Continue  choose
from Alpha list  OK
5/11/2022 8
Univariate analysis (Descriptive analysis)
• The following categories of the descriptive analysis are usually
used.
• Frequency distributions
• Measures of central tendency
• Measures of dispersion
• Shape of distribution
1) Frequency distributions (tables, bar graph, pie chart, histogram)
a) Frequency table- a table of a summary of the values of a variable
and the number of times the variable assumes an given value. It
has:
• Descriptive tile
• Clear labels for columns and rows
• Appropriate categories
• Presentation of frequencies and corresponding percentages
5/11/2022 9
Univariate analysis…
b) Pie charts and Bar charts- when data is nominal or
ordinal, we use pie chart or bar chart. However, only one
variable in pie chart and possibly more than one in bar
charts.
c) Histogram –Histograms are used when it is an interval
level data measurement.
 We can also have line graphs to explore the variable(s).
5/11/2022 10
Univariate analysis…
5/11/2022 11
• Example: Frequency table (Leisure time preference)
Preference Frequency Percentage Cumulative
With friends 9 9.0 9.0
Sport activities 30 30.0 39.0
With family 40 40.0 79.0
Reading 21 21.0 100.0
Total 100 100.0
Example:
Bar Diagram: Lists the categories and presents the percent or
count of individuals who fall in each category.
5/11/2022 12
Example:
Pie Chart: Lists the categories and presents the percent or
count of individuals who fall in each category.
5/11/2022 13
Example:
Histogram: Overall pattern can be described by its shape,
center, and spread. The following age distribution is right
skewed. The center lies between 80 to 100. No outliers
5/11/2022 14
Frequency distributions in SPSS
 Frequency tables: are found under the ‘analyze’ menu bar
(Analyze ---- Descriptive statistics ---- Frequencies)
 Then, select variables and move them to ‘variable(s)’ dialog
box, choose from the options, display frequency tables, OK
 Charts and graphs: two options
 Analyze ---- Descriptive statistics --- Frequencies --- charts
 Graphs --- Legacy dialogs --- charts/graphs (options)
5/11/2022 15
16
Analyze Descriptive statistics Frequency
Frequency distributions in SPSS
17
Univatiate analysis…
2) Measures of central tendency
 Central tendency is an estimate of the center of a
distribution of values.
 There are three major estimates of central
tendency: mean, median, and mode.
5/11/2022 18
Measures of central tendency…
1. Mean
 For a data set, the mean is the sum of the values divided
by the number of values. The mean of a set of numbers
x1, x2... xn is typically denoted by , pronounced "x bar".
This mean is a type of arithmetic mean. The mean
describes the central location of the data; the arithmetic
mean is the "standard" average, often simply called the
"mean".
 The other name is average
 mainly for interval variables
 very widely used and intuitively appealing
5/11/2022 19
Measures of central tendency…
2. Median
 It is the middle value of the distribution when all items are
arranged in either ascending or descending order in terms
of value
 mid-point value; arrange data from lowest to highest to
identify mid value; if two mid values, take the average
 mean is sensitive to outliers but median is robust
5/11/2022 20
1
2
th
n
Med value

 
  
 
Measures of central tendency…
3. Mode
 It is the value that occurs most frequently in the data set
3) Measures of dispersion
• It measures the amount of scatter or variationin a dataset
• Or it refers to the way values are spread around the central
tendency, for example, how tightly or how widely are the
values clustered around the mean.
• similar measures of central tendency may come from very
different distributions
5/11/2022 21
Measures of dispersion...
Have the same mean
But different dispersions
Measures of dispersion…
 Common measures of dispersion include minimum,
maximum, range, variance and standard deviation.
 But, the most frequently used in analysis are range
and standard deviation
 Range = Maximum value – Minimum value
 Range is sensitive to outliers
5/11/2022 23
Measures of dispersion…
 Variance:
 The variance is used as a measure of how far a set of
numbers are spread out from each other. It is one of
several descriptors of a probability distribution,
describing how far the numbers lie from the mean
(expected value). In particular, the variance is one of
the moments of a distribution.
5/11/2022 24
2
1
( )
( )
n
i
i
x x
Var x
n




Measures of dispersion…
 Standard deviation:
 It is a widely used measurement of variability or diversity used
in statistics and probability theory. It shows how much variation
or “dispersion" there is from the average (mean, or expected
value). A low standard deviation indicates that the data points
tend to be very close to the mean, whereas high standard
deviation indicates that the data are spread out over a large range
of values. The standard deviation of X is given by:
5/11/2022 25
A useful property of
standard deviation is
that, unlike variance, it is
expressed in the same
units as the data.
2
1
( )
( )
n
i
i
x x
SE x
n




Measures of dispersion
 Coefficient of variation (CV):
 In probability theory and statistics, the coefficient of
variation (CV) is a normalized measure of dispersion of a
probability distribution. It is also known as unitized risk or
the variation coefficient. The coefficient of variation (CV)
is defined as the ratio of the standard deviation to the
mean :
5/11/2022 26
SD
CV
Mean
 
  
 
Measures of shape of distribution
4) Measures of shape of distribution
 skewness and kurtosis are the commonly used
measures of shape of distribution of a dataset.
 Skweness:
 It refers to symmetry or asymmetry of the
distribution.
 The skewness value can be positive or negative, or
even undefined.
5/11/2022 27
Measures of shape of distribution…
 Skewness:
 Qualitatively, a negative skew indicates that the tail on the
left side of the probability density function is longer than
the right side and the bulk of the values (possibly including
the median) lie to the right of the mean.
 A positive skew indicates that the tail on the right side is
longer than the left side and the bulk of the values lie to the
left of the mean. A zero value indicates that the values are
relatively evenly distributed on both sides of the mean,
typically but not necessarily implying a symmetric
distribution.
5/11/2022 28
Measures of shape of distribution…
 The skewness of a random variable X is the third
standardized moment and defined as
 The coefficient of Skewness is a measure for the degree
of symmetry in the variable distribution.
5/11/2022 29
3
1
3
( )
( 1)
n
i
i
x x
SK
n S





Measures of shape of distribution…
 Kurtosis:
 It refers to peakedness of the distribution.
 It is a measure of the "peakedness" of the probability
distribution of a real-valued random variable.
 Higher kurtosis means more of the variance is the result of
infrequent extreme deviation, as opposed to frequent
modestly sized deviations.
5/11/2022 30
4
1
4
( )
( 1)
n
i
i
x x
KU
n S





Measures of shape of distribution…
 The coefficient of Kurtosis is a measure for the degree of
peakedness/flatness in the variable distribution.
5/11/2022 31
32
Analyze Descriptive statistics Descriptives
Options (select your interest of analysis)
Central tendency, dispersion and shape in SPSS
The Normal Distribution Assumption
The Normal distribution – is a distribution that has
equal number of cases clustered around the mean. It is
the most useful distribution in statistics, and has the
following important properties:
1. Symmetry and bell-shaped
2. Mode, median, and mean coincide
3. As a corollary to (1), a fixed proportion of observations
lies between the mean and fixed units of standard
deviation.
5/11/2022 33
Normal distribution…
 Z-Score (Standard Normal Curve) – is a normal
curve with mean = 0 and standard deviation,
S = 1. It is used to compare scores in two or more
distributions that have different means and standard
deviations.
z = (x – x (Bar))/s, where z = number of standard
deviations, ….
 If the data is normally distributed, we employ
parametric tests
 If the data is categorical or if the assumption of
normality does not hold, we use non-parametric tests
5/11/2022 34
Using histogram to test the normality of the
data
5/11/2022 35
Checking for normality with a Q-Q plot
5/11/2022 36
Analyze, Descriptive Statistics, Explore…
5/11/2022 37
Bivariate analysis
How do we analyze relationships between the two?
Bivariate analysis is analysis of two variables to examine if
they are correlated or if there is differences between values
analyzing relationships between two variables.
Remember co-variation does not always imply causation
5/11/2022 38
Bivariate analysis
• Examples:
• Do men earn more income than women?
• Does educational level affect attitudes toward
participation in labour union?
• Is income level correlated with life expectancy?
• Is parental educational level correlated with student
performance?
 We need to conduct hypothesis testing to arrive at
conclusive results on issues like this.
5/11/2022 39
Hypotheses Testing
 The following are the steps in hypothesis testing:
1. state the null hypothesis
2. choose an appropriate statistical test,
3. specify the level of statistical significance. (usually
this is o.1, 0.05 or 0.01) --- known as the α–level.
4. Decide to accept or to reject the null hypothesis
based on the findings.
 We use different tests based on the nature of the dependent
and independent variables and nature of distribution of the
data.
 During hypothesis testing, there is a possibility of
committing decision errors. The are two types of errors.
5/11/2022 40
Hypothesis…
 "Type I error"
 A type one error is a false positive (true) result.
 If you use a parametric test on nonparametric data then
this could trick the test into seeing a significant effect
when there isn't one.
 Or , it is a situation where we reject the null hypothesis that
is true.
 The probability of committing Type I error is called
significance level (P-value).
 This error requires more attention and important to avoid
5/11/2022 41
Hypothesis…
 “Type II error”
 It occurs when we accept a null hypothesis that is false.
 However, this occurs if you use a nonparametric test on
parametric data then this could reduce the chance of
seeing a significant effect when there is one.
 A type two error is a missed opportunity, i.e. we have
failed to detect a significant effect that truly does exist
 This is least dangerous.
 Summary; Using a parametric test in the wrong context
may lead to a type one error, a false positive.
 Using a nonparametric test in the wrong context may lead
to a type two error, a missed opportunity.
5/11/2022 42
Hypothesis…
 Reading P-value
 It is the basis for deciding whether or not to reject the
null hypothesis.
 P-values do not simply provide you with a Yes or No
answer, they provide a sense of the strength of the
evidence against the null hypothesis.
 The lower the p-value, the stronger the evidence, usually
less than 0.05 or 0.01, the null hypothesis is rejected..
 It is the probability that a statistical result as extreme as
the one observed would occur if the null hypothesis
were true.
5/11/2022 43
5/11/2022 44
Hypothesis…
Parametric tests
 T-test (one sample, independent sample, paired)
 One-way ANOVA
 Repeated ANOVA (for paired data)
 Pearson correlation
There are many techniques of non-parametric tests
 Chi-square for independence
 Mann-Whitney Test
 Wilcoxon Signed Rank Test
 Kruskal-Wallis Test
 Friedman Test
 Spearman Rank Order Correlation
5/11/2022 45
Hypothesis…
Nominal Ordinal Interval/Ratio Dichotomous
Nominal Contingency table
Chi-square
Cramer’s V
Contingency
table
Chi-square
Cramer’s V
Z-test; T-test or
F-test
(If DV is
interval/ratio)
Contingency
table
Chi-square
Cramer’s V
Ordinal Contingency table
Chi-square
Cramer’s V
Spearman’s rho
(ƿ)
Spearman’s rho (ƿ) Spearman’s rho
(ƿ)
Interval/
Ratio
Z-test; T-test; or
F-test
(If DV)
Spearman’s rho
(ƿ)
Pearson’s r Spearman’s rho
(ƿ)
Dichoto
mous
Contingency table
Chi-square
Cramer’s V
Spearman’s rho
(ƿ)
Spearman’s rho (ƿ) Phi (ɸ)
Hypothesis…
Requirement Example of Situation Test to be Used
Compare to a target Is the average age of employees
more than 40 years?
Use a one sample
t-test
Compare two groups Do men earn more income than
women?
Use independent
samples t-test
Compare two groups with one
controlled intervention
Test scores before and after
training
Use Paired t-test
Compare more than two groups Compare amount of income
between four categories of
educational level
One way ANOVA
(F-test)
Association between two
categorical variables
Is there an association between
gender job grade?
Contingency table
Chi-square
Association between two
quantitative variables
Is there an association between
advertising & sales?
Pearson’s r
Hypothesis…
Contingency Table analysis (Cross-tabulation):
 We look for differences among categories (hence
nominal or ordinal level measurement) of the
independent variables. That is, does the IV influence the
DV?
 Contingency Table (Cross–tabulation) – a table of
percentage distribution with DV (in rows) and IV (in
columns).
 It is a bivariate frequency distribution, where number of
cases that fall into each possible pairing of the values or
categories of the variables .
5/11/2022 48
Chi-square Test
 Chi-square Test (Chi is pronounced "ky“ as is in
‘sky’)-
 employed to test relationships between two variables
when the data is measured at the nominal or ordinal
level.
 The Chi-square test for independence can be used in
situations where you have two categorical variables.
 It works with the "simplest" form data.
 Data such as gender or country, or data that has been
placed in categories, such as age group.
5/11/2022 49
Chi-square Test
 Chi-square can be calculated as follows
χc
2 = Σ[(observed – expected)2⁄expected]
 If the calculated chi-square is grater than the chi-
square obtained from the table, then we conclude
there is a relationship (that is, reject the Ho).
Remember, like in all hypothesis testing, the Chi-
square assumes that there is no relationship between
the DV and IV.
5/11/2022 50
Contingency Table and Chi-square in SPSS
 Analyze= Custom Tables = Custom Tables =
Ok= Row and Column= Test Statistics = Tests
of independence (Chi-square) = Ok
 Or
 Analyze= Descriptive statistics= Crostabs=
choose DV into Rows and IV into Columns=
Statistics= Chi-square= OK
5/11/2022 51
Comparing two groups: T-tests
 A t-test is a statistical hypothesis test. In such test, the test statistic
follows a Student’s T-distribution if the null hypothesis is true. The T-
statistic was introduced by W.S. Gossett under the pen name “Student”.
 The most frequently used procedures for testing to determine
whether or not the means of two independent groups could
conceivably have come from the same population.
 If you compute means for two samples, they will almost always
differ to some degree. The job of the t-test is to see whether they
differ by chance or whether the difference is real and reliable.
 It is given by:.
5/11/2022 52
/
x
t
s n



T-test in SPSS
 Parametric
 Analyze Compare means  One sample Test or
Independent samples test or paired samples test
• Non-parametric
• Analyze  Nonparametric Tests  Related samples or
Independent samples or One sample  Automatically
compare observed data to be hypothesized
5/11/2022 53
Comparing more than two groups: ANOVA
 ANOVA (similar to Difference of Means Test) is used
to examine variations among groups (and within
members of a group) with respect to some behavior
and see if the variations are statistically significant.
 Groups may be like: male/female; economically
developed/ economically developing; smokers/non-
smokers; dry-lands/wet-lands; religious/non-religious,
High, medium, low; etc.
 In AVOVA, the DV has an interval/scale measure,
while the IV has nominal or ordinal measure.
5/11/2022 54
ANOVA test
We use the F-test in ANOVA, given by
Fcalculated. =
Now, if Fcalc. > Ftable, then reject the Ho.
5/11/2022 55
ANOVA in SPSS
 Analyze, Compare Means, One-Way ANOVA...
(Parametric test)
 Analyze, Nonparametric, such as Kruskal-Wallis
one-way non-parametric ANOVA
 Choose Post Hoc..., Post Hoc Tests, Choose Tukey
5/11/2022 56
Scatterplots/diagrams
 Scatter plot/diagram:
 values of the two variables plotted on each axis
 strong relationships can be identified by scatter
diagrams
 Four relationships can be identified
 Positive linear
 Negative linear
 Non linear
 No relationship at all
5/11/2022 57
Scatter plot of a positive association
Income and livestock ownership
0
10
20
30
40
50
60
0 200 400 600 800 1000 1200
Income
Livestock
Scatter plot of a negative association
Income & illitracy rates (%)
0
20
40
60
80
100
0 200 400 600 800 1000 1200
Income
Rate
of
illiteracry
(%)
Scatter plot of no association
Income and household size
0
2
4
6
8
10
12
0 200 400 600 800 1000 1200
income
hh
size
Scatter and line graph
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Scatter plot in SPSS
 Graphs  Legacy Dialogs  Scatter/Dot
5/11/2022 62
Covariance and Correlations
 The interest is about the association/relationship
between two variables or whether the vary together.
 Example:
 Does income of individuals increase as age increases??
 Is the amount of sales associated with advertizing
expenditure?
 Is crime related with socio-economic background?
 Is student academic achievement associated with
parent’s educational level?
5/11/2022 63
Covariance
 Covariance:
 Covariance between X and Y refers to a measure of how much
two variables change together.
 Covariance indicates how two variables are related. A positive
covariance means the variables are positively related, while a
negative covariance means the variables are inversely related.
The formula for calculating covariance of sample data is
shown below.
5/11/2022 64
1
( )( )
( , )
n
i i
i
x x y y
Cov x y
n

 


Correlation Analysis
 Correlation:
 Is concerned with the relationship/association, direction
and strength of the relationship between variables.
 Correlation coefficients can be calculated to see the
direction and strength of the relationship
 Depends on the nature of variables (parametric vs non-
parametric or numeric vs non-numeric)
5/11/2022 65
1
( )( )
( , )
var( ) var( )
n
i i
i
i i
x x y y
r x y
x x y y

 

 

Correlation...
 The most commonly used is Pearson’s correlation coefficient
or Pearson’s r or simply correlation coefficient
 Captures linear relationship between variables; non-linear
relationship are not captured
 Lies between -1 & 1
 r=0: no significant relationship
 r=1: perfect positive relationship
 r=-1: perfect negative relationship
 Spearman’s rho/rank correlation coefficient (ρ)
 mainly for ordinal variables (parametric)
 Phi (Φ)correlation between two dichotomous variables
Correlations and Covariance in SPSS
 Correlation
 Analyze  Correlate  Bivariate  Correlation
coefficients (choose depending on
parametric/nonparametric)
 Covariance
 Analyze  Correlate  Options  Cross-product
deviations and covariances
5/11/2022 67
 Regression analysis is a set of statistical techniques using past
observations to find (or estimate) the equation that best
summarizes the relationships among key economic variables.
 The method requires that analysts:
 (1) collect data on the variables in question,
 (2) specify the form of the equation relating the variables,
 (3) estimate the equation coefficients, and
 (4) evaluate the accuracy of the equation
 Regression analysis is used to:
 Predict the value of a dependent variable based on the
value of at least one independent variable
 Explain the impact of changes in an independent variable
on the dependent variable
Regression Analysis
Regression…
 Regression Analysis is Used Primarily to Model
Causality and Provide Prediction
 Predict the values of a dependent (response) variable
based on values of at least one independent
(explanatory) variable
 Explain the effect of the independent variables on the
dependent variable
 The relationship between X and Y can be shown on a
scatter diagram
5/11/2022 69
Simple Linear Regression Model
 Only one independent variable, x
 Relationship between x and y is described by a
linear function
 Changes in y are assumed to be caused by
changes in x
 Regression analysis serves three major purposes:
1. Description
2. Control
3. Prediction
ε
x
β
β
y 1
0 


Linear component
Population Linear Regression
The population regression model:
Population
y
intercept
Population
Slope
Coefficient
Random
Error
term, or
residual
Dependent
Variable
Independe
nt Variable
Random Error
component
Regression…
 Explanatory and Response Variables are Numeric
 Relationship between the mean of the response
variable and the level of the explanatory variable
assumed to be approximately linear (straight line)
 Model:
• b1 > 0  Positive Association
• b1 < 0  Negative Association
• b1 = 0  No Association
)
,
0
(
~
1
0 


b
b N
x
Y 


Critical Assumptions
 Error term is normally distributed (Normality).
 Error term has zero expected value or mean.
 Error term has constant variance in each time period
and for all values of X (i.e. Homoscedasticity).
 Error term’s value in one time period is unrelated to its
value in any other period (Autocorrelation).
 The underlying relationship between the x variable
and the y variable is linear (Linearity)
5/11/2022 73
Ordinary Least Squares (OLS) Estimations
 b0  Mean response when x=0 (y-
intercept)
 b1  Change in mean response when x
increases by 1 unit (slope)
 b0, b1 are unknown parameters (like )
 b0+b1x  Mean response when
explanatory variable takes on the value
x
x
b
b
ŷ 1
0
i 

The sample regression line provides an
estimate of the population regression line
Estimated Regression Model
Estimate of
the
regression
intercept
Estimate of the
regression
slope
Estimated
(or
predicted) y
value
Independe
nt variable
The individual random error terms ei is a random variable
have a mean of zero
Interpretation of the Slope and the Intercept
b0 is the estimated average value of y
when the value of x is zero
b1 is the estimated change in the
average value of y as a result of a
one-unit change in x
Multiple Linear Regression
 In simple linear regression we studied the relationship
between one explanatory variable and one response
variable.
 Now, we look at situations where several explanatory
variables works together to explain the response variable.
Formal Statement of the Model
 General regression model
• b0, b1, , bk are parameters
• X1, X2, …,Xk are known constants
•  , the error terms are independent N(o, 2)

b
b
b
b 




 k
k x
x
x
Y 
2
2
1
1
0
Estimating the parameters of the model
 The values of the regression parameters bi are not known.
We estimate them from data.
 As in the simple linear regression case, we use the least-
squares method to fit a linear function to the data.
 The least-squares method chooses the b’s that make the
sum of squares of the residuals as small as possible.
k
k x
b
x
b
x
b
b
y 



 
2
2
1
1
0
ˆ
80
Testing for Overall Significance
 Shows if Y Depends Linearly on All of the X Variables
Together as a Group
 Use F Test Statistic
 Hypotheses:
 H0: b1 b2 … bk = 0 (No linear relationship)
 H1: At least one bi  0 ( At least one independent variable affects
Y )
 The Null Hypothesis is a Very Strong Statement
 The Null Hypothesis is Almost Always Rejected
81
Model Fitness Tests
Analysis of Variance and F Statistic
/( 1)
/( )
ExplainedVariation k
F
UnexplainedVariation n k



2
2
/( 1)
(1 ) /( )
R k
F
R n k


 
MSE
MSR
F 
82
Test for Overall Significance
ANOVA
df SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
k -1= 2 n - 1
p-value
k = 3, no of
parameters
The Coefficient of Determination – R2
 The coefficient of determination is the proportion of
the total variance that is explained by the regression.
 It is the ratio of the explained sum of squares to the total
sum of squares.
83
TSS
ESS
TSS
RSS
2
2
)
( Y
Y
e
i
i



R2 =
1-
1-
= =
0  R²  1
The higher R² is, the closer the estimated regression equation fits the
sample data.
•Since TSS, RSS and ESS are all non-negative (being squared deviations),
•and since ESS  TSS, R² must lie in the interval
•A value of R² close to one shows a “good“ overall fit, whereas a value
near zero shows a failure of the estimated regression equation to explain
the variation in Y.
The Coefficient of Determination – R2
84
Multiple regression model building
 Often we have many explanatory variables, and our goal is
to use these to explain the variation in the response
variable.
 A model using just a few of the variables often predicts
about as well as the model using all the explanatory
variables.
Linear Regression in SPSS
 Analyze  Regression  Linear  select several
options
5/11/2022 86
Dichotomous variables
Ordered Choice
Intensity measurement
Limited Dependent Variables
87
Logistic regression
 There are many important research topics for which the
dependent variable is "limited."
 For example: voting, morbidity or mortality, and
participation data is not continuous or distributed
normally.
 Binary logistic regression is a type of regression analysis
where the dependent variable is a dummy variable: coded 0
(did not vote) or 1(did vote)
 Binary models
 Discrete choice models, etc.
88
The Linear Probability Model
the linear probability model can be written as:
 Y =  + X + e ; where Y = (0, 1) or
 P(y = 1|x) = b0 + xb
 But:
 The error terms are heteroskedastic
 e is not normally distributed because Y takes on only two
values
 The predicted probabilities can be greater than 1 or less
than 0
 An alternative is to model the probability as a function, G(b0
+ xb), where 0<G(z)<1
89
90
The Logit Model
 A common choice for G(z) is the logistic function, which is the
cdf for a standard logistic random variable
 G(z) = exp(z)/[1 + exp(z)] = L(z)
 This case is referred to as a logit model, or a logistic regression
 The estimated probability is given as:
ln[p/(1-p)] =  + bX + e or
p = 1/[1 + exp(- - b X)]
The Logit Model
 Where:
 p is the probability that the event Y occurs, p(Y=1)
 p/(1-p) is the "odds ratio"
 ln[p/(1-p)] is the log odds ratio, or "logit"
The logistic distribution constrains the estimated
probabilities to lie between 0 and 1.
 if you let  + b X =0, then p = .50
 as  + b X gets really big, p approaches 1
 as  + b X gets really small, p approaches 0
91
92
The Probit Model
 Another choice for G(z) is the standard normal
cumulative distribution function (cdf)
 G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the standard normal,
so f(z) = (2p)-1/2exp(-z2/2)
 This case is referred to as a probit model
 Since discrete choice models are nonlinear models, they
cannot be estimated by OLS method
 we use maximum likelihood estimation
93
94
Probits and Logits
 Both the probit and logit are nonlinear and require
maximum likelihood estimation
 No real reason to prefer one over the other
 Both functions have similar shapes – they are increasing
in z, most quickly around 0
 Traditionally we saw more use of the logit, mainly
because the logistic function was easier to compute.
 Today, probit is easy to compute with standard packages,
so is also popular
Interpreting Coefficients
 In general we care about the effect of x on P(y = 1|x), that
is, we care about ∂p/ ∂x
 For the linear case, this is easily computed as the
coefficient on x
In the case of Logit since:
[p/(1-p)] = exp()+exp(b)exp(X)+exp(e)
The slope coefficient (b) is interpreted as the rate of
change in the "log odds" as X changes
exp(b) is the effect of the independent variable on the
"odds ratio"
95
96
The Likelihood Ratio Test
 Unlike the LPM, where we can compute F statistics to test
exclusion restrictions, we need a new type of test
 Maximum likelihood estimation (MLE), will always
produce a log-likelihood, L
 Just as in an F test, you estimate the restricted and
unrestricted model, then form
 LR = 2(Lur – Lr) ~ c2
q
97
Goodness of Fit
 Unlike the LPM, where we can compute an R2 to judge
goodness of fit, we need new measures of goodness of fit
 One possibility is a pseudo R2 based on the log likelihood and
defined as 1 – Lur/Lr
 Can also look at the percent correctly predicted.
Extensions
 Unordered multiple (j>2) choices: travel mode, treatment
choice, etc., should be analyzed with the multinomial
logit model
 Ordered multiple (j>2) choices: opinion/attitude surveys,
rankings,etc., should be analyzed with the ordered logit
model
 Tobit Model used when the dependent variable is being
censored.
 y* = xb + u, u|x ~ Normal(0,s2)
 we only observe y = max(0, y*)
98
Limited dependent variable models in SPSS
 Analyze  Regression  choose the model of your
interest from the list other than ‘Linear’
5/11/2022 99
Analyzing qualitative Data
• There is considerable amount of interview, focus group
discussion and/or text-based data and images that require
analysis.
• Creswell (2003) suggests that it is useful to look at the
codes that have emerged according to:
 Codes readers would expect to find;
 Codes that are uprising; and
 Codes that address a larger theoretical perspective in their research.
 Then, follow the next steps
 Identifying themes
 Coding data (reducing data to manageable size)
 Developing a description from the data
 Defining themes from the data
 Connecting and interrelating themes
Analyzing qualitative…
 Further activities
 Noting reflections in the margins
 Sorting and sifting through the materials to identify similar
phrases, relationships, patterns, themes, commonalities, &
differences
 Isolating patterns, processes, commonalities, & differences
and incorporating methods to further explore them into the
next wave of data collection
 Gradually developing a small set of generalizations about
what consistently appears in the data
 Confronting those generalizations with a formalized body of
knowledge in the form of constructs or theories
5/11/2022 101

More Related Content

Similar to BRM_Data Analysis, Interpretation and Reporting Part II.ppt

STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisiteRam Singh
 
A review of statistics
A review of statisticsA review of statistics
A review of statisticsedisonre
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statisticsteresa_soto
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statisticsteresa_soto
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Statistics ppt.ppt
Statistics ppt.pptStatistics ppt.ppt
Statistics ppt.pptHeni524956
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatisticNeurologyKota
 

Similar to BRM_Data Analysis, Interpretation and Reporting Part II.ppt (20)

STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Important terminologies
Important terminologiesImportant terminologies
Important terminologies
 
DescribingandPresentingData.ppt
DescribingandPresentingData.pptDescribingandPresentingData.ppt
DescribingandPresentingData.ppt
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
A review of statistics
A review of statisticsA review of statistics
A review of statistics
 
Edisons Statistics
Edisons StatisticsEdisons Statistics
Edisons Statistics
 
Edison S Statistics
Edison S StatisticsEdison S Statistics
Edison S Statistics
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
statistics class 11
statistics class 11statistics class 11
statistics class 11
 
Statistics ppt.ppt
Statistics ppt.pptStatistics ppt.ppt
Statistics ppt.ppt
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
3 statistic
3 statistic3 statistic
3 statistic
 
statistics
statisticsstatistics
statistics
 
Basic statisctis -Anandh Shankar
Basic statisctis -Anandh ShankarBasic statisctis -Anandh Shankar
Basic statisctis -Anandh Shankar
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 

More from AbdifatahAhmedHurre

BRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.pptBRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.pptAbdifatahAhmedHurre
 
BRM_ Instrument Preparation and Data Collection.ppt
BRM_ Instrument Preparation and Data Collection.pptBRM_ Instrument Preparation and Data Collection.ppt
BRM_ Instrument Preparation and Data Collection.pptAbdifatahAhmedHurre
 
BRM_Data Analysis, Interpretation and Reporting Part III.ppt
BRM_Data Analysis, Interpretation and Reporting Part III.pptBRM_Data Analysis, Interpretation and Reporting Part III.ppt
BRM_Data Analysis, Interpretation and Reporting Part III.pptAbdifatahAhmedHurre
 
BRM_Formulating Research Problem.ppt
BRM_Formulating Research Problem.pptBRM_Formulating Research Problem.ppt
BRM_Formulating Research Problem.pptAbdifatahAhmedHurre
 

More from AbdifatahAhmedHurre (7)

BRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.pptBRM_Data Analysis, Interpretation and Reporting Part I.ppt
BRM_Data Analysis, Interpretation and Reporting Part I.ppt
 
BRM_Research Design.ppt
BRM_Research Design.pptBRM_Research Design.ppt
BRM_Research Design.ppt
 
BRM_Sampling Procedures.ppt
BRM_Sampling Procedures.pptBRM_Sampling Procedures.ppt
BRM_Sampling Procedures.ppt
 
BRM_ Instrument Preparation and Data Collection.ppt
BRM_ Instrument Preparation and Data Collection.pptBRM_ Instrument Preparation and Data Collection.ppt
BRM_ Instrument Preparation and Data Collection.ppt
 
BRM_Introduction.ppt
BRM_Introduction.pptBRM_Introduction.ppt
BRM_Introduction.ppt
 
BRM_Data Analysis, Interpretation and Reporting Part III.ppt
BRM_Data Analysis, Interpretation and Reporting Part III.pptBRM_Data Analysis, Interpretation and Reporting Part III.ppt
BRM_Data Analysis, Interpretation and Reporting Part III.ppt
 
BRM_Formulating Research Problem.ppt
BRM_Formulating Research Problem.pptBRM_Formulating Research Problem.ppt
BRM_Formulating Research Problem.ppt
 

Recently uploaded

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...lizamodels9
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creationsnakalysalcedo61
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 

Recently uploaded (20)

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creations
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 

BRM_Data Analysis, Interpretation and Reporting Part II.ppt

  • 1. Alpha University College 5/11/2022 1 Business Research Methods
  • 2. Part VI (Sub-part II) Data Analysis, Interpretation and Reporting 5/11/2022 2
  • 3. Chapter Six: Data Analysis, Interpretation and Reporting Data Management and Support Software Descriptive Analysis Inferential Analysis Hypothesis Testing Interpretation, scientific writing and reporting 5/11/2022 3
  • 4. Data Analysis: Introduction  Once the data is ready for processing, the next step is to choose appropriate analysis method and conduct the analysis.  Data analysis depends on the nature of the variable, the type of data and the purpose of the analysis. The following issues will affect the data analysis part of your research endeavor.  The type of data you have gathered, (i.e. Nominal/Ordinal/Interval/Ratio)  Are the data paired such as before and after treatment?  Are they parametric or non-parametric?  Ranks, scores, or categories are generally non-parametric data.  Measurements that come from a population that is normally distributed can usually be treated as parametric.  What are you looking for? differences, correlation etc? 5/11/2022 4
  • 5. Data Analysis: Introduction  Simply put: Data analysis is the process of making meaning from the data  Broadly classified, data analysis involves:  Quantitative analysis  Qualitative analysis  The quantitative analysis uses numeric expressions/representations and manipulations of the collected data.  The analysis could take descriptive or inferential form.  Based on number of variables involved, quantitative analysis could be univariate, bivariate and/ or multivariate analysis. 5/11/2022 5
  • 6. Quantitative analysis  Descriptive vs Inferential analysis:  Descriptive analysis: refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs.  Inferential analysis: refers to the statistical estimation of parameter values and testing of hypotheses (theory testing).  With respect to the number of variables:  Univariate analysis: only one variable is analyzed  Bivariate analysis: two variables are analyzed  Multivariate analysis: more than two variables are included in the analysis process  It also varies with the four scales of measurement 5/11/2022 6
  • 7. Scales of Measurement & Descriptive Statistics
  • 8. Reliability Analysis/Test (SPSS)  It helps measure consistency of an instrument.  Internal consistency is the most commonly used measure of reliability  Factors that increase reliability  Number of items  High variation among individuals being tested  Clear instructions  Optimal testing situation  Analyze  Scale  Reliability Analysis  select items  Statistics  choose statistical tests  Continue  choose from Alpha list  OK 5/11/2022 8
  • 9. Univariate analysis (Descriptive analysis) • The following categories of the descriptive analysis are usually used. • Frequency distributions • Measures of central tendency • Measures of dispersion • Shape of distribution 1) Frequency distributions (tables, bar graph, pie chart, histogram) a) Frequency table- a table of a summary of the values of a variable and the number of times the variable assumes an given value. It has: • Descriptive tile • Clear labels for columns and rows • Appropriate categories • Presentation of frequencies and corresponding percentages 5/11/2022 9
  • 10. Univariate analysis… b) Pie charts and Bar charts- when data is nominal or ordinal, we use pie chart or bar chart. However, only one variable in pie chart and possibly more than one in bar charts. c) Histogram –Histograms are used when it is an interval level data measurement.  We can also have line graphs to explore the variable(s). 5/11/2022 10
  • 11. Univariate analysis… 5/11/2022 11 • Example: Frequency table (Leisure time preference) Preference Frequency Percentage Cumulative With friends 9 9.0 9.0 Sport activities 30 30.0 39.0 With family 40 40.0 79.0 Reading 21 21.0 100.0 Total 100 100.0
  • 12. Example: Bar Diagram: Lists the categories and presents the percent or count of individuals who fall in each category. 5/11/2022 12
  • 13. Example: Pie Chart: Lists the categories and presents the percent or count of individuals who fall in each category. 5/11/2022 13
  • 14. Example: Histogram: Overall pattern can be described by its shape, center, and spread. The following age distribution is right skewed. The center lies between 80 to 100. No outliers 5/11/2022 14
  • 15. Frequency distributions in SPSS  Frequency tables: are found under the ‘analyze’ menu bar (Analyze ---- Descriptive statistics ---- Frequencies)  Then, select variables and move them to ‘variable(s)’ dialog box, choose from the options, display frequency tables, OK  Charts and graphs: two options  Analyze ---- Descriptive statistics --- Frequencies --- charts  Graphs --- Legacy dialogs --- charts/graphs (options) 5/11/2022 15
  • 16. 16 Analyze Descriptive statistics Frequency Frequency distributions in SPSS
  • 17. 17
  • 18. Univatiate analysis… 2) Measures of central tendency  Central tendency is an estimate of the center of a distribution of values.  There are three major estimates of central tendency: mean, median, and mode. 5/11/2022 18
  • 19. Measures of central tendency… 1. Mean  For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1, x2... xn is typically denoted by , pronounced "x bar". This mean is a type of arithmetic mean. The mean describes the central location of the data; the arithmetic mean is the "standard" average, often simply called the "mean".  The other name is average  mainly for interval variables  very widely used and intuitively appealing 5/11/2022 19
  • 20. Measures of central tendency… 2. Median  It is the middle value of the distribution when all items are arranged in either ascending or descending order in terms of value  mid-point value; arrange data from lowest to highest to identify mid value; if two mid values, take the average  mean is sensitive to outliers but median is robust 5/11/2022 20 1 2 th n Med value        
  • 21. Measures of central tendency… 3. Mode  It is the value that occurs most frequently in the data set 3) Measures of dispersion • It measures the amount of scatter or variationin a dataset • Or it refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. • similar measures of central tendency may come from very different distributions 5/11/2022 21
  • 22. Measures of dispersion... Have the same mean But different dispersions
  • 23. Measures of dispersion…  Common measures of dispersion include minimum, maximum, range, variance and standard deviation.  But, the most frequently used in analysis are range and standard deviation  Range = Maximum value – Minimum value  Range is sensitive to outliers 5/11/2022 23
  • 24. Measures of dispersion…  Variance:  The variance is used as a measure of how far a set of numbers are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. 5/11/2022 24 2 1 ( ) ( ) n i i x x Var x n    
  • 25. Measures of dispersion…  Standard deviation:  It is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or “dispersion" there is from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data are spread out over a large range of values. The standard deviation of X is given by: 5/11/2022 25 A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. 2 1 ( ) ( ) n i i x x SE x n    
  • 26. Measures of dispersion  Coefficient of variation (CV):  In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean : 5/11/2022 26 SD CV Mean       
  • 27. Measures of shape of distribution 4) Measures of shape of distribution  skewness and kurtosis are the commonly used measures of shape of distribution of a dataset.  Skweness:  It refers to symmetry or asymmetry of the distribution.  The skewness value can be positive or negative, or even undefined. 5/11/2022 27
  • 28. Measures of shape of distribution…  Skewness:  Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (possibly including the median) lie to the right of the mean.  A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution. 5/11/2022 28
  • 29. Measures of shape of distribution…  The skewness of a random variable X is the third standardized moment and defined as  The coefficient of Skewness is a measure for the degree of symmetry in the variable distribution. 5/11/2022 29 3 1 3 ( ) ( 1) n i i x x SK n S     
  • 30. Measures of shape of distribution…  Kurtosis:  It refers to peakedness of the distribution.  It is a measure of the "peakedness" of the probability distribution of a real-valued random variable.  Higher kurtosis means more of the variance is the result of infrequent extreme deviation, as opposed to frequent modestly sized deviations. 5/11/2022 30 4 1 4 ( ) ( 1) n i i x x KU n S     
  • 31. Measures of shape of distribution…  The coefficient of Kurtosis is a measure for the degree of peakedness/flatness in the variable distribution. 5/11/2022 31
  • 32. 32 Analyze Descriptive statistics Descriptives Options (select your interest of analysis) Central tendency, dispersion and shape in SPSS
  • 33. The Normal Distribution Assumption The Normal distribution – is a distribution that has equal number of cases clustered around the mean. It is the most useful distribution in statistics, and has the following important properties: 1. Symmetry and bell-shaped 2. Mode, median, and mean coincide 3. As a corollary to (1), a fixed proportion of observations lies between the mean and fixed units of standard deviation. 5/11/2022 33
  • 34. Normal distribution…  Z-Score (Standard Normal Curve) – is a normal curve with mean = 0 and standard deviation, S = 1. It is used to compare scores in two or more distributions that have different means and standard deviations. z = (x – x (Bar))/s, where z = number of standard deviations, ….  If the data is normally distributed, we employ parametric tests  If the data is categorical or if the assumption of normality does not hold, we use non-parametric tests 5/11/2022 34
  • 35. Using histogram to test the normality of the data 5/11/2022 35
  • 36. Checking for normality with a Q-Q plot 5/11/2022 36
  • 37. Analyze, Descriptive Statistics, Explore… 5/11/2022 37
  • 38. Bivariate analysis How do we analyze relationships between the two? Bivariate analysis is analysis of two variables to examine if they are correlated or if there is differences between values analyzing relationships between two variables. Remember co-variation does not always imply causation 5/11/2022 38
  • 39. Bivariate analysis • Examples: • Do men earn more income than women? • Does educational level affect attitudes toward participation in labour union? • Is income level correlated with life expectancy? • Is parental educational level correlated with student performance?  We need to conduct hypothesis testing to arrive at conclusive results on issues like this. 5/11/2022 39
  • 40. Hypotheses Testing  The following are the steps in hypothesis testing: 1. state the null hypothesis 2. choose an appropriate statistical test, 3. specify the level of statistical significance. (usually this is o.1, 0.05 or 0.01) --- known as the α–level. 4. Decide to accept or to reject the null hypothesis based on the findings.  We use different tests based on the nature of the dependent and independent variables and nature of distribution of the data.  During hypothesis testing, there is a possibility of committing decision errors. The are two types of errors. 5/11/2022 40
  • 41. Hypothesis…  "Type I error"  A type one error is a false positive (true) result.  If you use a parametric test on nonparametric data then this could trick the test into seeing a significant effect when there isn't one.  Or , it is a situation where we reject the null hypothesis that is true.  The probability of committing Type I error is called significance level (P-value).  This error requires more attention and important to avoid 5/11/2022 41
  • 42. Hypothesis…  “Type II error”  It occurs when we accept a null hypothesis that is false.  However, this occurs if you use a nonparametric test on parametric data then this could reduce the chance of seeing a significant effect when there is one.  A type two error is a missed opportunity, i.e. we have failed to detect a significant effect that truly does exist  This is least dangerous.  Summary; Using a parametric test in the wrong context may lead to a type one error, a false positive.  Using a nonparametric test in the wrong context may lead to a type two error, a missed opportunity. 5/11/2022 42
  • 43. Hypothesis…  Reading P-value  It is the basis for deciding whether or not to reject the null hypothesis.  P-values do not simply provide you with a Yes or No answer, they provide a sense of the strength of the evidence against the null hypothesis.  The lower the p-value, the stronger the evidence, usually less than 0.05 or 0.01, the null hypothesis is rejected..  It is the probability that a statistical result as extreme as the one observed would occur if the null hypothesis were true. 5/11/2022 43
  • 45. Hypothesis… Parametric tests  T-test (one sample, independent sample, paired)  One-way ANOVA  Repeated ANOVA (for paired data)  Pearson correlation There are many techniques of non-parametric tests  Chi-square for independence  Mann-Whitney Test  Wilcoxon Signed Rank Test  Kruskal-Wallis Test  Friedman Test  Spearman Rank Order Correlation 5/11/2022 45
  • 46. Hypothesis… Nominal Ordinal Interval/Ratio Dichotomous Nominal Contingency table Chi-square Cramer’s V Contingency table Chi-square Cramer’s V Z-test; T-test or F-test (If DV is interval/ratio) Contingency table Chi-square Cramer’s V Ordinal Contingency table Chi-square Cramer’s V Spearman’s rho (ƿ) Spearman’s rho (ƿ) Spearman’s rho (ƿ) Interval/ Ratio Z-test; T-test; or F-test (If DV) Spearman’s rho (ƿ) Pearson’s r Spearman’s rho (ƿ) Dichoto mous Contingency table Chi-square Cramer’s V Spearman’s rho (ƿ) Spearman’s rho (ƿ) Phi (ɸ)
  • 47. Hypothesis… Requirement Example of Situation Test to be Used Compare to a target Is the average age of employees more than 40 years? Use a one sample t-test Compare two groups Do men earn more income than women? Use independent samples t-test Compare two groups with one controlled intervention Test scores before and after training Use Paired t-test Compare more than two groups Compare amount of income between four categories of educational level One way ANOVA (F-test) Association between two categorical variables Is there an association between gender job grade? Contingency table Chi-square Association between two quantitative variables Is there an association between advertising & sales? Pearson’s r
  • 48. Hypothesis… Contingency Table analysis (Cross-tabulation):  We look for differences among categories (hence nominal or ordinal level measurement) of the independent variables. That is, does the IV influence the DV?  Contingency Table (Cross–tabulation) – a table of percentage distribution with DV (in rows) and IV (in columns).  It is a bivariate frequency distribution, where number of cases that fall into each possible pairing of the values or categories of the variables . 5/11/2022 48
  • 49. Chi-square Test  Chi-square Test (Chi is pronounced "ky“ as is in ‘sky’)-  employed to test relationships between two variables when the data is measured at the nominal or ordinal level.  The Chi-square test for independence can be used in situations where you have two categorical variables.  It works with the "simplest" form data.  Data such as gender or country, or data that has been placed in categories, such as age group. 5/11/2022 49
  • 50. Chi-square Test  Chi-square can be calculated as follows χc 2 = Σ[(observed – expected)2⁄expected]  If the calculated chi-square is grater than the chi- square obtained from the table, then we conclude there is a relationship (that is, reject the Ho). Remember, like in all hypothesis testing, the Chi- square assumes that there is no relationship between the DV and IV. 5/11/2022 50
  • 51. Contingency Table and Chi-square in SPSS  Analyze= Custom Tables = Custom Tables = Ok= Row and Column= Test Statistics = Tests of independence (Chi-square) = Ok  Or  Analyze= Descriptive statistics= Crostabs= choose DV into Rows and IV into Columns= Statistics= Chi-square= OK 5/11/2022 51
  • 52. Comparing two groups: T-tests  A t-test is a statistical hypothesis test. In such test, the test statistic follows a Student’s T-distribution if the null hypothesis is true. The T- statistic was introduced by W.S. Gossett under the pen name “Student”.  The most frequently used procedures for testing to determine whether or not the means of two independent groups could conceivably have come from the same population.  If you compute means for two samples, they will almost always differ to some degree. The job of the t-test is to see whether they differ by chance or whether the difference is real and reliable.  It is given by:. 5/11/2022 52 / x t s n   
  • 53. T-test in SPSS  Parametric  Analyze Compare means  One sample Test or Independent samples test or paired samples test • Non-parametric • Analyze  Nonparametric Tests  Related samples or Independent samples or One sample  Automatically compare observed data to be hypothesized 5/11/2022 53
  • 54. Comparing more than two groups: ANOVA  ANOVA (similar to Difference of Means Test) is used to examine variations among groups (and within members of a group) with respect to some behavior and see if the variations are statistically significant.  Groups may be like: male/female; economically developed/ economically developing; smokers/non- smokers; dry-lands/wet-lands; religious/non-religious, High, medium, low; etc.  In AVOVA, the DV has an interval/scale measure, while the IV has nominal or ordinal measure. 5/11/2022 54
  • 55. ANOVA test We use the F-test in ANOVA, given by Fcalculated. = Now, if Fcalc. > Ftable, then reject the Ho. 5/11/2022 55
  • 56. ANOVA in SPSS  Analyze, Compare Means, One-Way ANOVA... (Parametric test)  Analyze, Nonparametric, such as Kruskal-Wallis one-way non-parametric ANOVA  Choose Post Hoc..., Post Hoc Tests, Choose Tukey 5/11/2022 56
  • 57. Scatterplots/diagrams  Scatter plot/diagram:  values of the two variables plotted on each axis  strong relationships can be identified by scatter diagrams  Four relationships can be identified  Positive linear  Negative linear  Non linear  No relationship at all 5/11/2022 57
  • 58. Scatter plot of a positive association Income and livestock ownership 0 10 20 30 40 50 60 0 200 400 600 800 1000 1200 Income Livestock
  • 59. Scatter plot of a negative association Income & illitracy rates (%) 0 20 40 60 80 100 0 200 400 600 800 1000 1200 Income Rate of illiteracry (%)
  • 60. Scatter plot of no association Income and household size 0 2 4 6 8 10 12 0 200 400 600 800 1000 1200 income hh size
  • 61. Scatter and line graph Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship
  • 62. Scatter plot in SPSS  Graphs  Legacy Dialogs  Scatter/Dot 5/11/2022 62
  • 63. Covariance and Correlations  The interest is about the association/relationship between two variables or whether the vary together.  Example:  Does income of individuals increase as age increases??  Is the amount of sales associated with advertizing expenditure?  Is crime related with socio-economic background?  Is student academic achievement associated with parent’s educational level? 5/11/2022 63
  • 64. Covariance  Covariance:  Covariance between X and Y refers to a measure of how much two variables change together.  Covariance indicates how two variables are related. A positive covariance means the variables are positively related, while a negative covariance means the variables are inversely related. The formula for calculating covariance of sample data is shown below. 5/11/2022 64 1 ( )( ) ( , ) n i i i x x y y Cov x y n     
  • 65. Correlation Analysis  Correlation:  Is concerned with the relationship/association, direction and strength of the relationship between variables.  Correlation coefficients can be calculated to see the direction and strength of the relationship  Depends on the nature of variables (parametric vs non- parametric or numeric vs non-numeric) 5/11/2022 65 1 ( )( ) ( , ) var( ) var( ) n i i i i i x x y y r x y x x y y       
  • 66. Correlation...  The most commonly used is Pearson’s correlation coefficient or Pearson’s r or simply correlation coefficient  Captures linear relationship between variables; non-linear relationship are not captured  Lies between -1 & 1  r=0: no significant relationship  r=1: perfect positive relationship  r=-1: perfect negative relationship  Spearman’s rho/rank correlation coefficient (ρ)  mainly for ordinal variables (parametric)  Phi (Φ)correlation between two dichotomous variables
  • 67. Correlations and Covariance in SPSS  Correlation  Analyze  Correlate  Bivariate  Correlation coefficients (choose depending on parametric/nonparametric)  Covariance  Analyze  Correlate  Options  Cross-product deviations and covariances 5/11/2022 67
  • 68.  Regression analysis is a set of statistical techniques using past observations to find (or estimate) the equation that best summarizes the relationships among key economic variables.  The method requires that analysts:  (1) collect data on the variables in question,  (2) specify the form of the equation relating the variables,  (3) estimate the equation coefficients, and  (4) evaluate the accuracy of the equation  Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Regression Analysis
  • 69. Regression…  Regression Analysis is Used Primarily to Model Causality and Provide Prediction  Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable  Explain the effect of the independent variables on the dependent variable  The relationship between X and Y can be shown on a scatter diagram 5/11/2022 69
  • 70. Simple Linear Regression Model  Only one independent variable, x  Relationship between x and y is described by a linear function  Changes in y are assumed to be caused by changes in x  Regression analysis serves three major purposes: 1. Description 2. Control 3. Prediction
  • 71. ε x β β y 1 0    Linear component Population Linear Regression The population regression model: Population y intercept Population Slope Coefficient Random Error term, or residual Dependent Variable Independe nt Variable Random Error component
  • 72. Regression…  Explanatory and Response Variables are Numeric  Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line)  Model: • b1 > 0  Positive Association • b1 < 0  Negative Association • b1 = 0  No Association ) , 0 ( ~ 1 0    b b N x Y   
  • 73. Critical Assumptions  Error term is normally distributed (Normality).  Error term has zero expected value or mean.  Error term has constant variance in each time period and for all values of X (i.e. Homoscedasticity).  Error term’s value in one time period is unrelated to its value in any other period (Autocorrelation).  The underlying relationship between the x variable and the y variable is linear (Linearity) 5/11/2022 73
  • 74. Ordinary Least Squares (OLS) Estimations  b0  Mean response when x=0 (y- intercept)  b1  Change in mean response when x increases by 1 unit (slope)  b0, b1 are unknown parameters (like )  b0+b1x  Mean response when explanatory variable takes on the value x
  • 75. x b b ŷ 1 0 i   The sample regression line provides an estimate of the population regression line Estimated Regression Model Estimate of the regression intercept Estimate of the regression slope Estimated (or predicted) y value Independe nt variable The individual random error terms ei is a random variable have a mean of zero
  • 76. Interpretation of the Slope and the Intercept b0 is the estimated average value of y when the value of x is zero b1 is the estimated change in the average value of y as a result of a one-unit change in x
  • 77. Multiple Linear Regression  In simple linear regression we studied the relationship between one explanatory variable and one response variable.  Now, we look at situations where several explanatory variables works together to explain the response variable.
  • 78. Formal Statement of the Model  General regression model • b0, b1, , bk are parameters • X1, X2, …,Xk are known constants •  , the error terms are independent N(o, 2)  b b b b       k k x x x Y  2 2 1 1 0
  • 79. Estimating the parameters of the model  The values of the regression parameters bi are not known. We estimate them from data.  As in the simple linear regression case, we use the least- squares method to fit a linear function to the data.  The least-squares method chooses the b’s that make the sum of squares of the residuals as small as possible. k k x b x b x b b y       2 2 1 1 0 ˆ
  • 80. 80 Testing for Overall Significance  Shows if Y Depends Linearly on All of the X Variables Together as a Group  Use F Test Statistic  Hypotheses:  H0: b1 b2 … bk = 0 (No linear relationship)  H1: At least one bi  0 ( At least one independent variable affects Y )  The Null Hypothesis is a Very Strong Statement  The Null Hypothesis is Almost Always Rejected
  • 81. 81 Model Fitness Tests Analysis of Variance and F Statistic /( 1) /( ) ExplainedVariation k F UnexplainedVariation n k    2 2 /( 1) (1 ) /( ) R k F R n k     MSE MSR F 
  • 82. 82 Test for Overall Significance ANOVA df SS MS F Significance F Regression 2 228014.6 114007.3 168.4712 1.65411E-09 Residual 12 8120.603 676.7169 Total 14 236135.2 k -1= 2 n - 1 p-value k = 3, no of parameters
  • 83. The Coefficient of Determination – R2  The coefficient of determination is the proportion of the total variance that is explained by the regression.  It is the ratio of the explained sum of squares to the total sum of squares. 83
  • 84. TSS ESS TSS RSS 2 2 ) ( Y Y e i i    R2 = 1- 1- = = 0  R²  1 The higher R² is, the closer the estimated regression equation fits the sample data. •Since TSS, RSS and ESS are all non-negative (being squared deviations), •and since ESS  TSS, R² must lie in the interval •A value of R² close to one shows a “good“ overall fit, whereas a value near zero shows a failure of the estimated regression equation to explain the variation in Y. The Coefficient of Determination – R2 84
  • 85. Multiple regression model building  Often we have many explanatory variables, and our goal is to use these to explain the variation in the response variable.  A model using just a few of the variables often predicts about as well as the model using all the explanatory variables.
  • 86. Linear Regression in SPSS  Analyze  Regression  Linear  select several options 5/11/2022 86
  • 87. Dichotomous variables Ordered Choice Intensity measurement Limited Dependent Variables 87
  • 88. Logistic regression  There are many important research topics for which the dependent variable is "limited."  For example: voting, morbidity or mortality, and participation data is not continuous or distributed normally.  Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not vote) or 1(did vote)  Binary models  Discrete choice models, etc. 88
  • 89. The Linear Probability Model the linear probability model can be written as:  Y =  + X + e ; where Y = (0, 1) or  P(y = 1|x) = b0 + xb  But:  The error terms are heteroskedastic  e is not normally distributed because Y takes on only two values  The predicted probabilities can be greater than 1 or less than 0  An alternative is to model the probability as a function, G(b0 + xb), where 0<G(z)<1 89
  • 90. 90 The Logit Model  A common choice for G(z) is the logistic function, which is the cdf for a standard logistic random variable  G(z) = exp(z)/[1 + exp(z)] = L(z)  This case is referred to as a logit model, or a logistic regression  The estimated probability is given as: ln[p/(1-p)] =  + bX + e or p = 1/[1 + exp(- - b X)]
  • 91. The Logit Model  Where:  p is the probability that the event Y occurs, p(Y=1)  p/(1-p) is the "odds ratio"  ln[p/(1-p)] is the log odds ratio, or "logit" The logistic distribution constrains the estimated probabilities to lie between 0 and 1.  if you let  + b X =0, then p = .50  as  + b X gets really big, p approaches 1  as  + b X gets really small, p approaches 0 91
  • 92. 92
  • 93. The Probit Model  Another choice for G(z) is the standard normal cumulative distribution function (cdf)  G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the standard normal, so f(z) = (2p)-1/2exp(-z2/2)  This case is referred to as a probit model  Since discrete choice models are nonlinear models, they cannot be estimated by OLS method  we use maximum likelihood estimation 93
  • 94. 94 Probits and Logits  Both the probit and logit are nonlinear and require maximum likelihood estimation  No real reason to prefer one over the other  Both functions have similar shapes – they are increasing in z, most quickly around 0  Traditionally we saw more use of the logit, mainly because the logistic function was easier to compute.  Today, probit is easy to compute with standard packages, so is also popular
  • 95. Interpreting Coefficients  In general we care about the effect of x on P(y = 1|x), that is, we care about ∂p/ ∂x  For the linear case, this is easily computed as the coefficient on x In the case of Logit since: [p/(1-p)] = exp()+exp(b)exp(X)+exp(e) The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes exp(b) is the effect of the independent variable on the "odds ratio" 95
  • 96. 96 The Likelihood Ratio Test  Unlike the LPM, where we can compute F statistics to test exclusion restrictions, we need a new type of test  Maximum likelihood estimation (MLE), will always produce a log-likelihood, L  Just as in an F test, you estimate the restricted and unrestricted model, then form  LR = 2(Lur – Lr) ~ c2 q
  • 97. 97 Goodness of Fit  Unlike the LPM, where we can compute an R2 to judge goodness of fit, we need new measures of goodness of fit  One possibility is a pseudo R2 based on the log likelihood and defined as 1 – Lur/Lr  Can also look at the percent correctly predicted.
  • 98. Extensions  Unordered multiple (j>2) choices: travel mode, treatment choice, etc., should be analyzed with the multinomial logit model  Ordered multiple (j>2) choices: opinion/attitude surveys, rankings,etc., should be analyzed with the ordered logit model  Tobit Model used when the dependent variable is being censored.  y* = xb + u, u|x ~ Normal(0,s2)  we only observe y = max(0, y*) 98
  • 99. Limited dependent variable models in SPSS  Analyze  Regression  choose the model of your interest from the list other than ‘Linear’ 5/11/2022 99
  • 100. Analyzing qualitative Data • There is considerable amount of interview, focus group discussion and/or text-based data and images that require analysis. • Creswell (2003) suggests that it is useful to look at the codes that have emerged according to:  Codes readers would expect to find;  Codes that are uprising; and  Codes that address a larger theoretical perspective in their research.  Then, follow the next steps  Identifying themes  Coding data (reducing data to manageable size)  Developing a description from the data  Defining themes from the data  Connecting and interrelating themes
  • 101. Analyzing qualitative…  Further activities  Noting reflections in the margins  Sorting and sifting through the materials to identify similar phrases, relationships, patterns, themes, commonalities, & differences  Isolating patterns, processes, commonalities, & differences and incorporating methods to further explore them into the next wave of data collection  Gradually developing a small set of generalizations about what consistently appears in the data  Confronting those generalizations with a formalized body of knowledge in the form of constructs or theories 5/11/2022 101