Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
GLM_2020_21.pptx
1. GENERALIZED LINEAR
MODELS
Ph.D Programme in Psychology, Linguistics and Cognitive
Neurosciences
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
httpswww.vox.comfuture-perfect21504366science-replication-crisis-peer-
review-statisticsfbclid=IwAR3lIJXfXBVwFWaE5aw4RXHKY
2. PLAN OFTHE LESSON
Part I
Icebreakers: review of the General Linear Models
Part II
The Generalized linear Model : extension to not normally distributed
data.
fractions (logistic regression),
counts (Poisson regression, log-linear models),
ordinal data (threshold models).
Overview of specific topics ( overdispersion, (quasi-) maximum
likelihood)
Part III
Overview of software for GLIMs . Spss and in R. Jamovi and Jasp
(both user-friendly based on some R. R is still the Linus’ blanket, for
wideness and updates on modelling, even if a bit rough and not fluffy.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
3. PART I
Icebreaker on the General Linear Model
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
4. GENERAL LINEAR MODELS AS MODELS
Our idea is that data are generated as specified in our model
plus a random error
DATA = MODEL + ERROR
Very general form of the model:
𝒀 = 𝒇(𝑿𝟏, 𝑿𝟐, 𝑿𝟑)+𝛆
Linear Models are models
𝒀 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏 +𝜷𝟐𝑿𝟐 + 𝜷𝟑 𝑿𝟑+𝛆
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Beware: notation may vary
from an author to another,
from one professor to
another, from one journal to
another.
Then :
• focus on the meaning of
the symbol;
• pay attention to the
requirements of the
journal
5. HOW DO WE MODEL DATA?
Objective
Model structure (e.g. variables, formula, equation)
Model assumptions
Parameter estimates and interpretation
Model fit (e.g. goodness-of-fit tests and statistics)
Model selection
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
6. PSYCHOLOGISTS’ STATISTICAL WORKHORSE:
THE GENERAL LINEAR MODEL
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
quant
itativ
e
• Linear regression (simple or multiple)
quali
tativ
e
• Anova
both
• Ancova
Predictors
Response : quantative
continuous
One or more
between-
subjects predictors
Quantitative
predictors -
regression
Categorical
predictors -
ANOVA
Quantitative
and categorical
predictors -
ANCOVA
At least one within-
subjects predictors
7. PSYCHOLOGISTS’ STATISTICAL WORKHORSE:
THE GENERAL LINEAR MODEL
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Response : quantative continuous
One or more dichotomous or continuous between-subjects predictors
One predictor
• Independent
samples t-test
• Simple
regression
>=two predictors
• Multiple regression
• Statistical control
(covariates)
• Mediation
>=two predictors plus interaction
• Interactions
• Moderated mediation
• Other type of linear model
(polynomial)
8. 8
THE GENERAL LINEAR MODEL
At least one within-subjects
predictors
One or more categorical
within-subjects
predictors
At least one
continuous within-subjects
predictor
Paired-samples
t-test
Within-subjects ANOVA
Linear Mixed-Effects
Models
(LMEM)
An additional
random term
9. GENERAL LINEAR MODEL: AN OUTLOOK ON
THE ASSUMPTIONS
Predictors 1 . on any scale: categorical or quantitative
2. measured without error (deterministic) –
random component expressed by the error
Responsevariable: (continuous) quantitative only
errors are iid and normally distributed. For all subjects i=1,2,..n. the errors i are:
i) identically, normally distributed with zero mean and equal variance (omoschedasticity)
ii) Incorrelated (independent)
Objective:
the response yi; i = 1, .., n is modelled by a linear (additive) function of predictors/explanatory variables xj ; j =
1, …, p plus an error term
The model is linear in the parameters
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
10. The general linear model make the assumptions below. When these
assumptions are met, OLS regression coefficients are MVUE (Minimum
Variance Unbiased Estimators) and BLUE (Best Linear Unbiased
Estimators).
1. Exact X: The IVs are assumed to be known exactly (i.e., without
measurement error), deterministic.
2. 2. Independence: Residuals are independently distributed (prob. of
obtaining a specific observation does not depend on other observations)
3. 3. Normality: All residual distributions are normally distributed
4. 4. Constant variance: All residual distributions have a constant variance
5. 5. Linearity: All residual distributions (i.e., for eachY') are assumed to have
means equal to zero
ESTIMATION WHEN ASSUMTIONS ARE MET
11. PARAMETER INTERPRETATION
Regression:
•b0 estimate of the intercept 𝛽0
•bi is estimate of the slope 𝛽0, i.e. the increase of the response due
to the unitary increase of the i.th preditor
Anova
•General mean
•Difference between the group mean and the general mean
Model selection: which explanatory variables to include?
Principle of parsimony (Occam’s razor): all relevant predictors
are included, no irrelevant one is.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
12. NOT ALL MISSING DATA ARETHE SAME
Missing by design
Values are missing by definition of the population of interest
Missing completely at random (MCAR)
Missing values are randomly distributed
Missing at random (MAR)
After accounting for one or more other variables, missing values are
randomly distributed
Non-ignorable (NI)
Missing values are functions of the variables themselves
13. BETTER METHODS OF HANDLING MISSING DATA
Full information maximum likelihood (FIML) methods
Can handle data that are MAR and NI
Implemented as part of particular statistical models
Missing data handled during analysis
Multiple imputation
Can also handle data that are MAR and NI
Simulation-based approach
Missing data are handled separately from analysis
14. RESTRICTIONS OF GENERAL LINEAR MODELS
Although a very useful framework, there are some situations
where general linear models are not appropriate
1. The range ofY is restricted
categorical variables, binary, ordered or unordered categories,
counts
2. Other violations of assumptions
Heteroschedasticity
Non-normality
Non linearity (in the Ivs and/or in the parameters)
Variance depending on the mean
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
15. Anscombe’s quartet
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
MESSY DATA
Anscombe, Francis J. (1973) Graphs in statistical analysis. American
Statistician, 27, 17–21
16. 16
A GLANCETO DECISION AND POWER (NEXT LESSON)
Reality:
NO EFFECT
Reality:
EFFECT EXISTS
Research concludes:
FAILTO REJECT (FTR)
NULL; NO EFFECT
CORRECT FTR TYPE 2 ERROR ()
Researcher concludes:
REJECT NULL;
EFFECT EXISTS
TYPE 1 ERROR () CORRECT REJECT (1-)
17. PART II
Generalised Linear Models (GLIMs)
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
18. EXTENSIONTO GENERALIZED LINEAR MODELS
(GLIM OR GLM)
GLIMs are a family of models that:
Represent an extension of linear regression to a broader family of outcome
variables - basic structure of linear regression equations.
Allow us to extend the linear modelling framework to variables that are not
Normally distributed.
Allow us to look at models that seem different in a unifying perspective.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Two major additions to the linear function framework
link function : when the response has a nonlinear relationships with predictors, a
transformation of the response is expressed as a linear regression
error structures beyond the normally, for instance binomial, poisson.
19. THREE COMPONENTS OF A GLIM
1. Systematic part: relation between the dependent variable Y and
the independent variables in the model.
2. Random part: error distribution of the outcome variable
3. Link function: transform of the response, so that the transfom is
expressed a the well known linear relation
g(⋅) link function (linear, logit, poisson..)
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
How to approach GLIM:
Understanding the common underlying linear structure
Esploring reason for different estimation techniques
20. GLIM FIXED MODELSWITH RESPONSES AND
PREDICTORS OF ANYTYPE
Predictors measured on any scale.
Response
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
continu
ous
• General linear model
dichoto
mous
• Logit (accuracy, yes or no)
categ
orical
• Logistic (ordinal or nominal categorical)
count
• Poisson regression (count variables,
frequencies)
21. 21
GLIM AS A SOLUTION FOR SOME
VIOLATIONS OF GENERAL LINEAR MODELS
ASSUMPTIONS
(Independence: Inaccurate standard errors, degrees of freedom and
significance tests. Use linear mixed effects models – see my
collegue’s lessons)
Normality: Inefficient (with large N). Use transformations,
g e n e rali ze d li n e ar mode ls
Constant variance: Inefficient and inaccurate standard errors. Use
transformations, generali zed li near models
Linearity: Biased parameter estimates. Use transformations,
g e n e rali ze d li n e ar mode ls
22. Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
DISTRIBUTION OF ERRORS IN PROBIT AND LOGIT
MODELS
23. Family Default Link Function
binomial (link = "logit")
gaussian (link = "identity")
Gamma (link = "inverse")
inverse.gaussian (link = "1/mu^2")
poisson (link = "log")
quasibinomial (link = "logit")
quasipoisson (link = "log")
glm(formula, family=familytype(link=linkfunction), data=)
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
24. LINK FUNCTION AND ERROR DISTRIBUTIONS
model Error distribution Link function
Regression Normal g=E(Y|X)
Binary logistic regression Binomial g=𝑙𝑛
𝑝
1−𝑝
Ordinal logistic regression Binomial g=𝑙𝑛
𝑝
1−𝑝
Multinomial logistic regression Multinomial g=𝑙𝑛
𝑝
1−𝑝
Poisson regression Poisson g=ln[E(Y|X)]
Beta regression Beta g=ln[E(Y|X)]
Gamma regression Gamma g=ln[E(Y|X)]
Negative binomial regression Negative binomial g=ln[E(Y|X)]
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
25. Family Default Link Function
1. binomial (link = "logit")
FIRST CASE: MODELLINGTHE PROBABILITY FOR A
DICHOTOMOUS VARIABLE
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
In a binomial variable (the random component is the error, which is binomial)
our interest is on the probability of ‘success’. In fact we have to outcomes,
success and insuccess (1 and 0).When we know the probability of success p,
then we derive the probability of success as 1-p.
Our reponse would be the probability, range 0-1.Then we cannot use the
General Linear Model. How do we solve the problem?
- We transform the response. Instead of the probability, we consider the logit
g=𝑙𝑛
𝑝
1−𝑝
.The symbol g stands for our ‘transformed’ response.This
transformed response now is continuous.
26. Family Default Link Function
2. gaussian (link = “identity")
SECOND CASE: MODELLINGTHE PROBABILITY FOR A
CONTINUOUS VARIABLE
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
In a continuous variable with normal (gaussian) random component, the
response has no restriction on the real numbers.
Our reponse is the variable as it stands
- Do we need to transform the response?
- No.
- How do we express the transform g, i.e. the link function?
- As the identity function.
- Are General Linear Models part of Generalized Linear Models?
- Yes, when the link function, also denoted as g, is identity and when the
error terms are normal.
27. ESTIMATION: MAXIMUM LIKELIHOOD (ML)
The likelihood function (LF) expresses the likelihood of observing the
data under the model
The LF is maximized by the best fitting parameter estimates
Any model estimated with ML methods will produce a deviance value
for the model, which can be used to assess fit of the model (for the
special case of linear regression model with normal errors, the
deviance is equal to the residual SS).
The deviance for a model can be used to calculate analogues of the
R2multiple for GLiMs
These notions are useful to understand the logic of models and their
‘assessment’, that’s it.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
28. MODEL STRUCTURE: BACKTOTHE LINEAR
STRUCTURE
The binary logistic model has the GLIMs structure:
𝑙𝑛
𝑝
1−𝑝
=𝛽0 + 𝛽 1x1i + 𝛽 2x2i + i
where:
p is the probability of 1 (or the proportion of);
Ln
𝑝
1−𝑝
is the logit, the link function
𝑝
1−𝑝
is the odd, i.e. the probability of presence over the
probability of absence of the response
0<p<1 vs Logit: (-, )
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
29. GOODNESS OF FIT
Wald test on logit regression coefficients:
· Large-Sample test (WaldTest) in truth a z-test:
· H0: 𝛽 = 0 HA: 𝛽 0
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
j
j
j
B
B
W
SE
The model with intercept and predictors is compared to an intercept
only model to test
χ2=2[LL(B)-LL(0)] where LL indicates the log likelihood
Analogues of the R2 –value in linear regression:
Hosmer & Lemeshow
Cox & Snell:
Nagelkerke:
−2𝐿𝐿𝐵
−2𝐿𝐿 0
-
2 2
1 exp [ ( ) (0)]
CS
R LL B LL
n
2
2 2 1
2
,where 1 exp[2( ) (0)]
CS
N MAX
MAX
R
R R n LL
R
30. IF WE PLOT A DICHOTOMOS
RESPONSE
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
31. BINARY LOGISTIC REGRESSION
The response variable is dichotomous.
Predictor variables may be categorical or
continuous.
If predictors are all continuous and nicely
distributed, may use discriminant
function analysis.
If predictors are all categorical, may use
logit analysis.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
32. PSEUDO-R MEASURES
Hosmer & Lemeshow is not computed in Spss
Cox & Snell: unluckily it does not reach 1
Nagelkerke has been adjused to reach 1, so it is used the
most
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
33. PARAMETERS INTERPRETATION
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Holding all other predictors constant:
• 𝛽= 0 P(Presence) is the same at each level of x
• 𝛽 > (<) 0 P(Presence) increases (decreases) as x increases
x
x
e
e
P
1
Interpretation in terms of probability X
P
P
ODDS
ˆ
1
ˆ
ln
ln
Response: vote in favour of cats as research subjects
Sample size: 315
Null (empty) model
187
128
684
.
)
379
.
(
Exp
In favour
against
P(in favour) = 128/315 = 40.6%
P(against) = 187/315 = 59.4%
Odds = 40.6/59.4 = .684
34. ADDING GENDER AS A DV
We add gender as a DV, male=1, female=0
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
429
.
0
:
448
.
1
:
847
.
0
217
.
1
847
.
0
217
.
1
847
.
0
e
female
e
male
e
e Gender
bGender
a
376
.
3
429
.
448
.
1
female
male
Odd
Odd
Odds
Odds ratio
Clearly these are not probabilities,
note that they can be >1!! (they are
odds, i.e the ratio given by y chance in
favour divided by the chance against,
for females only and for males only
respectively)
The odds ratio is the ratio between the two
odds.
A woman is .429 less likely to be in favour of
the research than against it.
A man is 1.448 times more likely to be in
favour to continue the research than against
it.
Men are 3.376 times more likelya to vote to
continue the research, i.e. . to be in favour
rather than against, with respect to women.
35. FROM ODDSTO PROBABILITIES
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
30
.
0
429
.
1
429
.
0
1
ˆ
Odds
Odds
Pwomen
59
.
0
448
.
2
448
.
1
1
ˆ
Odds
Odds
Pmen
For a woman, the probability
of voting in favour of cats in
experiments is 30%
For a man, the probability of
voting in favour of cats in
experiments is 59%, almost
double the probability for a
woman.
We can draw our conclusions in terms of probability NOW
36. POISSON REGRESSION
Count response variable
(frequencies) in a fixed period
of time, with a Poisson
distribution
Poisson distribution:
probability of 0, 1, 2, . . .
events; the mean of the
distribution is equal to the
variance
In the Poisson regression
model, predictor variables
may be categorical or
continuous. Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Rare events
When mean>10 similar to normal
38. MODEL STRUCTURE
The Poisson model has the structure:
𝑙𝑛 𝑦 =𝛽0 + 𝛽1x1i + 𝛽2x2i + i
where the link function is ln
Goodness of fit
Wald test on regression coefficients
R2
deviance=1-
𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒(𝑚𝑜𝑑𝑒𝑙)
𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒(𝑛𝑢𝑙𝑙)
overall fit
R2
deviance=1-
𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒(𝑚𝑜𝑑𝑒𝑙)
𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒(𝑚𝑜𝑑𝑒𝑙 𝑚𝑖𝑛𝑢𝑠 𝑜𝑛𝑒 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒)
gain in prediction
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
39. INTERPRETATION OF PARAMETERS
𝑙𝑛 𝑦 =𝛽0 + 𝛽1x1i + b2x2i
A unitary increase in x1 results in a b1 increase in ln(y)
For direct interpretation of the effect on the count variable, we
consider the regression as:
𝑦 = 𝑒𝛽0𝑒𝛽1x1i𝑒𝛽2x2i
A change in the value of a predictor results in a multiplicative change in
the predicted count.
Remember that in linear regression a change in the predictor result in
an additive change in the predicted value
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
40. INTERPRETATION OF PARAMETERS/2
If 𝛽 = 0, then exp(𝛽) = 1, Y and X are not related.
If 𝛽 > 0, then exp(𝛽) > 1, and the expectedY is exp(β) times
larger than when X = 0
If 𝛽 < 0, then exp(𝛽) < 1, and the expected count is exp(𝛽)
times smaller than when X = 0
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
41. ESTIMATION WITH ML
The deviance for a model can be used to calculate analogues to the
linear regression R2multiple
equidispersion: several GLiMs have error structures based on
distributions in which the variance is a function of the mean.
Actual data are usually overdisperse.
As in the comments for estimation of the logistic regression, these
comments sketch some general ideas.The subject is vast and at this
point we just need to see the logic and analogies and differences
between extensions of models.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
42. Two-Part Models or joint models:
single outcome variable has multiple facets that are modeled
simultaneously or when multiple outcome variables are conceptually
closely related.
Hurdle regression models
Hurdle regression models (Long, 1997; Mullahy, 1986) are often used to
model human decision-making processes It has been used in Italy in
migration studies.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Other GLIM models/1
43. Zero-inflated regression models
Individuals from two different populations: those who have no probability
of displaying the behavior of interest and therefore always respond with a
zero, and those who produce zeros with some probability.
Alcohol example: zeros will come from individuals who never drink for
religious, health, or other reasons and thereby produce structural zeros
that must always occur.
In practice: more 0 than expected in a Poisson (or Negatve Binomial)
distribution.
Consequences: estimated parameters and SE may be distorted
the excessive number of 0 can cause overdispersion
Solutions: Mixture models or Hurdle models
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Other GLIM models/2
44. OTHER MEASURES
Akaike Information Criteria (AIC) You can look at AIC as counterpart
of adjusted r square in multiple regression.The smaller the better
Null Deviance and Residual Deviance Null deviance is calculated from
the model with no features, i.e. intercept tonly. Residual deviance is
calculated from the model having all the features.
Receiver Operator Characteristic (ROC) curve
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
45. EXAMPLES IN
THE
LITERATURE
Parker, M. A., & Anthony, J. C. (2019). Underage drinking, alcohol
dependence, and young people starting to use prescription pain
relievers extra-medically: A zero-inflated Poisson regression
model. Experimental and clinical psychopharmacology, 27(1), 87.
DeLisi, M., Caudill, J.W.,Trulson, C. R., Marquart, J.W.,Vaughn, M.
G., & Beaver, K. M. (2010). Angry inmates are violent inmates: A
Poisson regression approach to youthful offenders. Journal of
Forensic Psychology Practice, 10(5), 419-439.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
46. EXAMPLES INTHE
LITERATURE/LOGISTIC
Adwere-Boamah, J., & Hufstedler, S.
(2015). Predicting SocialTrust with
Binary Logistic Regression. Research
in Higher Education Journal, 27
Adwere-Boamah, J. (2011). Multiple
Logistic Regression Analysis of
Cigarette Use among High School
Students. Journal of Case Studies in
Education, 1.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
47. PART II - 1
A bridge from Generalised Linear Models to Generalised Linear Mixed
Models
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
48. GENERALIZED LINEAR MIXED
MODELS - GLMM
• GLMMs as an extension of GLIM when the assumption of
incorrelated errors in violated.
• suitable for the analysis of normal and non-normal data with
a clustered (in groups) structure
• Added complexity: random effects (different from random
errors)
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
49. GLMM PARAMETERS
fixed regression effects and variance components parameters
common to all cluster
cluster-specific parameters, assumed to be randomly drawn from a
population distribution
Example: experimental psychology where the experimental design
contains within-subject variables
variance components of the population distribution to be estimated
together with the fixed effects
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
50. POWER AND RELIABILITY OF ESTIMATES
Often the limiting factor is the sample size at the highest unit of
analysis.
For example, having 500 patients from each of ten doctors would
give one a reasonable total number of observations, but not
enough to get stable estimates of doctor effects nor of the
doctor-to-doctor variation.
10 patients from each of 500 doctors (leading to the same total
number of observations) would be preferable.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
51. CLASSES OF GENERALIZED LINEAR MODELS
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
General Linear Models
(Linear regression, ANOVA,
ANCOVA)
Y= X β+𝜺
Responses Independent
Generalized Linear Models
(Logistic regression, Poisson
regression, etc.)
g(Y) = X β+𝜺
Responses Independent
Linear Mixed Models
Y = X β + Z b+𝜺
Responses Correlated
Correlation modeled in part by
“random effects”
g(Y|b) = X β + Z b+𝜺
Responses Correlated
Correlation modeled in part by
“random effects”
52. PART III
An overview on software
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
53. IBM SPSS
Spss allows estimation of severalGLIM
This menu is comprehensive and a bit
more complicated than the following one
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
54. IBM SPSS
Addressing GLIM
from regression
is more
straightforward
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
The model here is multinomial
regression, where:
Response: categorical (nominal) with
>2 categories
Predictors: any scale
55. SAS - STATA
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
These software share a longstanding tradition and
a widespread scientific community. Some advances
are ‘translated’ directly into their proc.
They are rather costly and not too user-friendly.
Unfortunately, so far there has been a trend
towards an inverse correlation between user-
friendly and scientifically advanced.
They are included in university our campus
sotware
56. OPEN SOURCE SOFTWARE
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Open source software has been engaging the scientific community for quite
some time.
Among the extremely user/psychologist friendly packages developed for the
psychologists community, we will take a glance at:
Jamovi (collaborators from our university)
Jasp
R is the open source environment. Several methodological/statistical advances
are published in the literature, the corresponding R package is published on
the related scientific journals and then made available to the community, with
on line documentation and with communities.
The strenght of R, Sas, Stata is that their methods are subject to a vast open
discussion in the scientific community and they provide a copious
documentation to the publich. See Idre from UCLA
57. JAMOVI
Some generalised linear models can be retrieved in the two menus
Anova and Regression.
GAMLj: module for GLM, LME and GZLMs in jamovi
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
59. R –THE INUDIBLE,THE INEVITABLE
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
Why is R so important in the scientific community?
Why is not-so-friendly software ( where we need to use programming
language) so relevant for applying statstical methods?
Estimation can have a domino effect
60. THE GLM FUNCTION IN R
Generalized linear models can be fitted in R using the glm
function,
The glm function is similar to the lm function for tting linear
models.
The arguments to a glm call are as follows
glm (formula, family = ………..)
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
61. R –THE INUDIBLE,THE INEVITABLE
The formula is specifieded to glm as, e.g.
y x1 + x2
where x1, x2 are the names of
numeric vectors (continuous variables)
factors (categorical variables)
All specied variables must be in the workspace or in the data
frame passed to the data argument.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
62. FAMILY ARGUMENT
The family argument takes (the name of) a family function which
specifies
the link function
the variance function and other e.g. linkinv
The family (exponential family) functions available in R are
Binomial (link = "logit")
Poisson (link = "log")
Gaussian (link = "identity")
inverse.gaussian (link = "1/mu2")
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
63. EXTRACTOR FUNCTIONS
There are several glm or lm methods available for accessing/displaying
components of the glm object, including:
residuals()
fitted()
predict()
coef()
deviance()
formula()
summary()
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
65. SOME USEFUL WEBSITES
An excellent introduction on R, with full information and
instructions:
https://stats.idre.ucla.edu/r/
Some detailed lessons on the General Linear Model and an
introduction on the Generalized Linear Model, with applications
in R (data not provided, some lessons are not freely available to
the public):
https://arc.psych.wisc.edu/courses/610-710/
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
66. Some references
Coxe, S.,West, S. G., & Aiken, L. S. (2009).The analysis of count data: A gentle
introduction to Poisson regression and its alternatives. Journal of personality
assessment, 91(2), 121-136
Coxe, S.,West, S. G., & Aiken, L. S. (2013). Generalized linear models. Oxford
handbook of quantitative methods, 26-51.
Dobson, A. J., & Barnett, A. (2008). An introduction to generalized linear models. CRC
press.
Fox, J. (2015). Applied regression analysis and generalized linear models. Sage
Publications.
Hedeker, D., Flay, B. R., & Petraitis, J. (1996). Estimating individual influences of
behavioral intentions: an application of random-effects modeling to the theory of
reasoned action. Journal of Consulting and Clinical Psychology, 64, 109-120.
Hirotsu, C. (2017). Advanced Analysis ofVariance (Vol. 384). John Wiley & Sons.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa
67. Some references
Osborne, J.W. (2014). Best practices in logistic regression. Sage
Publications.
http://core.ecu.edu/psyc/wuenschk/MV/Multreg/Logistic-
SPSS.PDF
Tuerlinckx, F., Rijmen, F.,Verbeke, G., & Boeck, P. (2006).
Statistical inference in generalized linear mixed models: A
review. BritishJournal of Mathematical and Statistical
Psychology, 59(2), 225-255.
Vandekerckhove, J., Matzke, D., &Wagenmakers, E. J. (2015).
Model comparison and the principle of parsimony. Oxford
handbook of computational and mathematical psychology, 300-
319.
Ph.D. School - University of Milano-Bicocca Prof. Franca Crippa