SlideShare a Scribd company logo
1 of 17
Download to read offline
9/13/2010
1
Categorical ModelsCategorical Models
Presented by: Jeff Skinner, M.S.
Biostatistics Specialist
Bioinformatics and Computational Biosciences Branch
National Institute of Allergy and Infectious Diseases 
Office of Cyber Infrastructure and Computational Biology
Introduction
Many biological experiments include categorical response
variables, which need to be analyzed with unfamiliar tests
• Simple contingency table methods
– Pearson vs. Fisher tests, odds ratios & relative risks, sensitivity & specificity
– M x N tables, McNemar’s test for paired data, MHC tests for confounding
• Logistic regression methods
– Odds ratios, estimating LD50, Wald and Likelihood Ratio Tests, …
• Generalized linear model (GLIM) methods
– Choosing distribution and link functions, overdispersion statistics, ...
9/13/2010
2
Contingency Tables
• Used to display relationships 
t i l i bl
Pregnant? Row
Totalsamong categorical variables
– Responses in the columns
– Predictors in the rows
• Statistical significance tested 
using Pearson chi‐square or 
Fisher’s exact tests
Yes No
Pregnancy 
Test?
Positive 27 3 30
Negative 4 26 30
31 29 60Column Totals →
Totals
↓
Fisher s exact tests
• Results interpreted using an 
odds ratio or relative risk
Pearson’s Chi‐Squared Test
• Pearson’s chi‐square test  
assumes that columns and
Pregnant?
assumes that columns and 
rows are independent
– Computation of expected values 
(Expij) assumes independence
• Chi‐square tests require large 
sample sizes with no empty 
Yes No
Pregnancy 
Test?
Positive Obs11 Obs12 R1.
Negative Obs21 Obs22 R2.
C.1 C.2 N..
cells & few small cell counts
• P‐values computed from the 
chi‐square distribution
9/13/2010
3
Fisher’s Exact Test
• Also tests the independence 
f l d
Pregnant?
of columns and rows
• Fisher’s test is valid for all 
sample sizes and cell counts
• Fisher’s test assumes column
Yes No
Pregnancy 
Test?
Positive a b a+b
Negative c d c+d
a+c b+d n
Fisher s test assumes column 
and row totals are fixed
– Fisher’s exact test may be 
inappropriate for some tables
• P‐values computed using the hypergeometric
distribution shown above
• P‐value represents the probability of finding 
this specific table vs. all possible tables of 
sample size of n = a + b + c + d
Odds Ratios and Relative Risk
• Pearson’s chi‐square and 
Fi h ’ i di
Pregnant?
Fisher’s exact tests indicate 
whether a relationship is 
statistically significant
– Did the results occur by chance?
• Odds ratios and relative risk 
indicate the magnitude of a
Yes No
Pregnancy 
Test?
Positive a b a+b
Negative c d c+d
a+c b+d n
indicate the magnitude of a 
relationship or its effect size
– Was there a large difference in 
the odds or risks among rows?
9/13/2010
4
Interpreting OR and RR
• The odds of pregnancy are OR = 58 5 times higher• The odds of pregnancy are OR = 58.5 times higher 
for women who tested positive than the odds of 
pregnancy for women who tested negative
• The risk of pregnancy is RR = 6.75 times higher for p g y g
women who tested positive than the odds of 
women who tested negative
Sensitivity and Specificity
• Sensitivity and specificity 
represent the performance
Pregnant?
represent the performance 
of diagnostic tests
• Sensitivity is the proportion 
of actual positives correctly 
identified by the diagnostic
Yes No
Pregnancy 
Test?
Positive TP FP
Negative FN TN
• Specificity is the proportion 
of actual negatives correctly 
identified by the diagnostic
9/13/2010
5
Table Formats
Pregnant?
Pregnancy
Test Pregnant? Count
Yes No
Pregnancy 
Test?
Positive 27 3 30
Negative 4 26 30
31 29 60
Test Pregnant? Count
Positive Yes 27
Positive No 3
Negative Yes 26
Negative No 4
Contingency Table format Summarized Table format
• You may need to reformat your data table for some software
– Contingency table format for analysis in GraphPad Prism
– Summarized table format for analysis in JMP
Review Contingency Table Results
Pregnant?
Yes No
Pregnancy 
Test?
Positive 27 3 30
Negative 4 26 30
31 29 60
Pearson Chi‐Square: X2 = 32.3026, p = 1.319e‐08q , p
Fisher’s Exact Test: p = 1.975e‐09
Odds of pregnancy are OR = 58.5 times higher after positive pregnancy test
Risk of pregnancy is RR = 6.75 times higher after positive pregnancy test
Pregnancy test has 87.1% sensitivity and 89.66% specificity
9/13/2010
6
More Complicated Models
• What if your contingency table is larger than 2 x 2?
– Pearson chi‐square and Fisher’s exact test for M x N tables
• What if your table contains paired data?
– McNemar’s Test for paired data
• What if your table has three variables?
– Mantel‐Haenzel‐Cochran (MHC) test
• What if you have a continuous predictor variable?y p
– Logistic regression models
• What about really complicated models?
– Generalized Linear Models (GLIM)
M x N Contingency Tables
Blood Types
P hi k h f l M N bl b
A B AB O
Ethnicity
Bambara 7 8 5 20 40
Peul 12 3 3 12 30
Tuareg 11 13 2 4 30
30 24 10 36 100
• Pearson chi‐square tests work the same for larger M x N tables, but 
researchers need to remember the assumptions about cell counts
• Fisher’s exact test is difficult to compute for M x N tables, but it 
can be computed using simulations in R or other software
9/13/2010
7
Ordinal vs. Nominal Variables
• Ordinal variables have outcomes that are ordered
D D 0 5 10 d 15– Drug Dosages: 0 mg, 5 mg, 10 mg and 15 mg
– Symptom Severity: Mild, Moderate and Severe
• Nominal variables have outcomes that are unordered
– Blood Types: A, B, AB and O
– Ethnicity: Bambara, Peul and Tuareg
• Most tests assume nominal variables by defaulty
– Ordinal variables require fewer odds ratio estimates
– Ordinal variables may allow for a simpler model
– E.g. compute odds ratios to compare Mild vs. Moderate and Moderate 
vs. Severe, but do not compare Mild vs. Severe
McNemar’s Test
• McNemar’s test should be used 
if t bl t t h d
Test 2
if table represents a matched 
pairs design experiment
– E.g.  Some matched pairs designs 
arise from repeated sampling of 
patients pre‐ and post‐treatment
– E g Case‐control experiments may
Pos Neg
Test 1
Positive a b a+b
Negative c d c+d
a+c b+d n
E.g.  Case control experiments may 
use McNemar’s test because case 
and control patients have been 
“matched” using key demographic 
variables like age, gender, race, ...
9/13/2010
8
Mantel‐Haenzel‐Cochran Test
Age < 40 Age > 40
• Mantel‐Haenzel‐Cochran test determines if the relationship 
All Ages
Heart Attack?
Yes No
Birth 
Control?
Yes 16 34
No 34 16
Heart Attack?
Yes No
8 32
2 8
Heart Attack?
Yes No
8 2
32 8
between two table variables remains the same if the table is 
“paneled” or split by a third table variable
• Often used to investigate Simpson’s Paradox
Logistic Regression
• Logistic regression fits the relationship 
b t ti di t dbetween a continuous predictor and a 
categorical response variable
– E.g. predict the gender of an unknown 
person based on their height
– E.g. predict whether an animal will live or 
die based on the dose of a drug
• The logistic regression plot represents 
a change in log odds ratio for each onea change in log odds ratio for each one 
unit increase in the predictor variable
– E.g. If an unknown person is 61 inches tall, 
their odds of being male are near zero
– E.g. if an unknown person is 68 inches tall, 
their odds of being male are about 50‐50
9/13/2010
9
“Long” Data Format
• Each row of data represents one p
patient, animal or subject
• Raw data format is useful when 
continuous covariates are unique 
to each subject or patientj p
– E.g. Exact weight of each patient
– E.g. Exact blood pressure, ...
“Wide” Data Format
• If each value of the continuous variable 
has been replicated, the data can be 
formatted as a summarized table
• Summarized tables require less space 
and can be used in multiple modelsp
– Logistic regression models
– Log‐linear models
– Probit analysis
9/13/2010
10
Results from Logistic Regression
• Whole model results
Likelihood Ratio Test (LRT)– Likelihood Ratio Test (LRT)
– Model fit diagnostics
• Parameter estimates
– Regression coefficients
– Wald tests
• Odds ratios
Th dd f i l 1 107– The odds of survival are 1.107 
times higher after every one 
unit increase in log(dose)
– Odds of survival are 12.794 
times higher after every one 
unit increase in dose
Why Use Both Wald and LRT?
• Likelihood Ratio tests compare the fit of two statistical models
– Most statistical models can be described with a likelihood function, e.g., g
– A likelihood ratio test (LRT) computes the log‐likelihood function under a full 
model (dose and intercept) and reduced model (intercept) to test model fit
• Wald tests evaluate the statistical significance of model parameters
– Wald test statistics are constructed very similar to Student’s T‐tests
– Results from Wald test should be consistent with LRT results
9/13/2010
11
Estimate LD50 from Logistic Regression
• You can use interpolated values   
i di ti t ti tor inverse prediction to estimate 
LD50 from a logistic regression
• Open the Inverse Prediction menu 
and enter Prob = 0.500 to estimate 
LD50 by finding X at Y = 0.500
– Enter Prob = 0.90 for LD90, ...,
• You may need to antilog your LD50 
estimate if your predictor is on the 
log scale (e.g. log10(dose))
Compute LD50 from Parameter Estimates
• Simple logistic regression is defined by the equation
• Therefore, by simple algebra, we find LD50 = ‐B0 / B1
9/13/2010
12
Reed‐Muench Method
• Graphical estimate of 
LD50 from survival data
• Plot total number of 
survivors and total 
number dead against 
dilution or concentration
• Intersection represents 
best estimate of LD50
Reed‐Muench Method
9/13/2010
13
Generalized Linear Models
• Logistic regression, extensions of Pearson chi‐square tests and other 
models can be defined as generalized linear models (GLIM)models can be defined as generalized linear models (GLIM)
• Each GLIM model is coerced into the form of a linear equation by 
choosing the correct statistical distribution and link function
• Excluding logistic regression, most multifactor categorical models 
must be specified using the GLIM procedures in your softwarep g p y
• GLIM procedures typically allow analysts to test for overdispersion, 
where real data has more variance than expected from the model
Distribution Choices
• Modeling categorical responses directlyg g p y
– Binomial and multinomial distributions
– Negative binomial distribution
• Modeling contingency table cell counts
– Poisson distribution models all cell counts as rare eventsPoisson distribution models all cell counts as rare events
– Normal distribution models cell counts as common events
9/13/2010
14
Link Functions
• Link functions are mathematical transformations 
used to coerce models into linear equations
– The identity link function g(y) = y for linear models
– The log link function g(y) = log(y) for log‐linear models
– The logit link function (below) for logistic regression models
– The probit link function (below) for probit analysis models
Historic Models as GLIM
• Logistic regressiong g
– Binomial distribution with logistic link function
• Probit analysis
– Binomial distribution with probit link function
• Log‐linear models
– Poisson distribution with log link function
• Negative Binomial regression
– Negative binomial distribution with log link function
9/13/2010
15
Overdispersion Parameters
• Traditional linear models, like linear regression, use independent 
parameters to estimate the variance of the response dataparameters to estimate the variance of the response data
– E.g. linear regression has independent mean μ = Xβ and variance σ2
• Many GLIM models, like logistic regression,  have fixed relationships 
between the variance and other model parameters
– E.g. logistic regression has mean μ = np and variance σ2 = np(1 – p)
– E.g. log‐linear models have μ = σ2 = λ = np for rare event with small p
• Overdispersion parameters are used to account for extra variability• Overdispersion parameters are used to account for extra variability 
in the responses, which cannot be explained by the model
– E.g. logistic regression modeled with variance σ2 = φnp(1 – p)
– Want to know if multiplier φ > 2 to determine significance or importance
Generalized Linear Mixed Models
• Generalized linear models can be advanced further by 
including random effect variables
– These models are called generalized linear mixed models (GLMM)
– Random effect variables are included to account for paired designs, 
repeated measures designs, split‐plot designs and other effects
– GLMM are typiaclly fit using generalized estimating equations (GEE), often 
using linearization techniques (e.g. SAS PROC GLIMMIX)
l d d b f• Sometimes complicated GLM and GLMM must be fit 
using nonlinear modeling procedures in your software
– Probit model with binomial errors or Poisson loss function models in JMP
– Probit‐Normal models and Poisson‐Normal models in SAS PROC NLMIXED
9/13/2010
16
Random vs. Fixed Effects
Subject effects are random Gender effects are fixed
• Subject effects are random because the subjects in a experiment 
are a sample from the population of all possible subjects
• Gender effects are fixed because there are only two genders
Split‐plot Design
12 mice: 6 infected, 6 uninfected
3 infected males, 3 infected females, …
• Split‐plot design experiments model experiments where 
whole plots and subplots represent different EUs
, ,
4 samples taken from each mouse
Each sample treated with one of 2 different drugs
Whole plot (mouse) EU’s: Infection, gender
Subplot (sample) EU’s: drug treatment
whole plots and subplots represent different EUs
– Whole plots are often locations, subjects, objects or factors that 
are difficult to change (e.g. temperature in an incubator)
– Subplot effects are typically the effects of highest interest
– Subplot effects are tested with higher power than whole plot
9/13/2010
17
References
• Agresti A.  2002.  Categorical Data Analyses.  Second Ed.  Wiley‐Interscience.
• Reed LJ and H Muench.  1938.  A Simple Method of Estimating Fifty Percent 
Endpoints.  The American Journal of Hygiene.  27(3):493‐497
• SAS Institute Inc.  2007.  SAS 9.1.3 Documentation.  Cary, NC.  SAS Institute Inc.
• SAS Institute Inc 2010 JMP Statistics and Graphics Guide Cary NC SAS• SAS Institute Inc.  2010.  JMP Statistics and Graphics Guide.  Cary, NC.  SAS 
Institute Inc.

More Related Content

What's hot

Lesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaLesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaSumit Prajapati
 
Non parametrict test
Non parametrict testNon parametrict test
Non parametrict testdobhalshiv
 
Parmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsParmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsVinod Pagidipalli
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statisticsAnchal Garg
 
3.1 non parametric test
3.1 non parametric test3.1 non parametric test
3.1 non parametric testShital Patil
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t testLONDIWE SHANGE
 
Parametric versus non parametric test
Parametric versus non parametric testParametric versus non parametric test
Parametric versus non parametric testJWANIKA VANSIYA
 
Chosing the appropriate_statistical_test
Chosing the appropriate_statistical_testChosing the appropriate_statistical_test
Chosing the appropriate_statistical_testBRAJESH KUMAR PARASHAR
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical testKhalid Mahmood
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical testsAkiode Noah
 
Parametric Statistical tests
Parametric Statistical testsParametric Statistical tests
Parametric Statistical testsSundar B N
 

What's hot (20)

Lesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaLesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 Ta
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 
Non parametrict test
Non parametrict testNon parametrict test
Non parametrict test
 
Parmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsParmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trails
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
Non parametric presentation
Non parametric presentationNon parametric presentation
Non parametric presentation
 
Statistical test
Statistical testStatistical test
Statistical test
 
3.1 non parametric test
3.1 non parametric test3.1 non parametric test
3.1 non parametric test
 
Biostatistics ii
Biostatistics iiBiostatistics ii
Biostatistics ii
 
Non parametric tests
Non parametric testsNon parametric tests
Non parametric tests
 
Non parametric test
Non parametric testNon parametric test
Non parametric test
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t test
 
Parametric versus non parametric test
Parametric versus non parametric testParametric versus non parametric test
Parametric versus non parametric test
 
Chosing the appropriate_statistical_test
Chosing the appropriate_statistical_testChosing the appropriate_statistical_test
Chosing the appropriate_statistical_test
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical test
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical tests
 
Parametric Statistical tests
Parametric Statistical testsParametric Statistical tests
Parametric Statistical tests
 
Chapter 6 Ranksumtest
Chapter 6 RanksumtestChapter 6 Ranksumtest
Chapter 6 Ranksumtest
 
Stat topics
Stat topicsStat topics
Stat topics
 

Similar to Categorical Models: An Introduction to Contingency Tables, Logistic Regression and Generalized Linear Models

Basics on statistical data analysis
Basics on statistical data analysisBasics on statistical data analysis
Basics on statistical data analysisDipesh Tamrakar
 
INFERENTIAL STATISTICS.pdf
INFERENTIAL STATISTICS.pdfINFERENTIAL STATISTICS.pdf
INFERENTIAL STATISTICS.pdfMandar Baviskar
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
Categorical data analysis.pptx
Categorical data analysis.pptxCategorical data analysis.pptx
Categorical data analysis.pptxBegashaw3
 
Choosing a test.pptx
Choosing a test.pptxChoosing a test.pptx
Choosing a test.pptxMuhammad Ayaz
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.pptmelkamugenet
 
non parametric test.pptx
non parametric test.pptxnon parametric test.pptx
non parametric test.pptxSoujanyaLk1
 
Diagnostic test Evaluation
Diagnostic test EvaluationDiagnostic test Evaluation
Diagnostic test Evaluationamitakashyap1
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdfGoogle
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiranKiran Ramakrishna
 

Similar to Categorical Models: An Introduction to Contingency Tables, Logistic Regression and Generalized Linear Models (20)

sta
stasta
sta
 
Basics on statistical data analysis
Basics on statistical data analysisBasics on statistical data analysis
Basics on statistical data analysis
 
INFERENTIAL STATISTICS.pdf
INFERENTIAL STATISTICS.pdfINFERENTIAL STATISTICS.pdf
INFERENTIAL STATISTICS.pdf
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Statistics
StatisticsStatistics
Statistics
 
Categorical data analysis.pptx
Categorical data analysis.pptxCategorical data analysis.pptx
Categorical data analysis.pptx
 
Choosing a test.pptx
Choosing a test.pptxChoosing a test.pptx
Choosing a test.pptx
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.ppt
 
biostatistics
biostatisticsbiostatistics
biostatistics
 
brm Assign 5.pdf
brm Assign 5.pdfbrm Assign 5.pdf
brm Assign 5.pdf
 
non parametric test.pptx
non parametric test.pptxnon parametric test.pptx
non parametric test.pptx
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
Diagnostic test Evaluation
Diagnostic test EvaluationDiagnostic test Evaluation
Diagnostic test Evaluation
 
EPI546_Lecture_9.ppt
EPI546_Lecture_9.pptEPI546_Lecture_9.ppt
EPI546_Lecture_9.ppt
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdf
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiran
 
Fishers test
Fishers testFishers test
Fishers test
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Stats - Intro to Quantitative
Stats -  Intro to Quantitative Stats -  Intro to Quantitative
Stats - Intro to Quantitative
 

More from Bioinformatics and Computational Biosciences Branch

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Appendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductorAppendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductor
 
Crash course in R and BioConductor
Crash course in R and BioConductorCrash course in R and BioConductor
Crash course in R and BioConductor
 
GraphPad Prism: Customizing your graphs
GraphPad Prism: Customizing your graphsGraphPad Prism: Customizing your graphs
GraphPad Prism: Customizing your graphs
 
Design of experiments
Design of experiments Design of experiments
Design of experiments
 
Introduction to GIMP
Introduction to GIMPIntroduction to GIMP
Introduction to GIMP
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Categorical Models: An Introduction to Contingency Tables, Logistic Regression and Generalized Linear Models

  • 1. 9/13/2010 1 Categorical ModelsCategorical Models Presented by: Jeff Skinner, M.S. Biostatistics Specialist Bioinformatics and Computational Biosciences Branch National Institute of Allergy and Infectious Diseases  Office of Cyber Infrastructure and Computational Biology Introduction Many biological experiments include categorical response variables, which need to be analyzed with unfamiliar tests • Simple contingency table methods – Pearson vs. Fisher tests, odds ratios & relative risks, sensitivity & specificity – M x N tables, McNemar’s test for paired data, MHC tests for confounding • Logistic regression methods – Odds ratios, estimating LD50, Wald and Likelihood Ratio Tests, … • Generalized linear model (GLIM) methods – Choosing distribution and link functions, overdispersion statistics, ...
  • 2. 9/13/2010 2 Contingency Tables • Used to display relationships  t i l i bl Pregnant? Row Totalsamong categorical variables – Responses in the columns – Predictors in the rows • Statistical significance tested  using Pearson chi‐square or  Fisher’s exact tests Yes No Pregnancy  Test? Positive 27 3 30 Negative 4 26 30 31 29 60Column Totals → Totals ↓ Fisher s exact tests • Results interpreted using an  odds ratio or relative risk Pearson’s Chi‐Squared Test • Pearson’s chi‐square test   assumes that columns and Pregnant? assumes that columns and  rows are independent – Computation of expected values  (Expij) assumes independence • Chi‐square tests require large  sample sizes with no empty  Yes No Pregnancy  Test? Positive Obs11 Obs12 R1. Negative Obs21 Obs22 R2. C.1 C.2 N.. cells & few small cell counts • P‐values computed from the  chi‐square distribution
  • 3. 9/13/2010 3 Fisher’s Exact Test • Also tests the independence  f l d Pregnant? of columns and rows • Fisher’s test is valid for all  sample sizes and cell counts • Fisher’s test assumes column Yes No Pregnancy  Test? Positive a b a+b Negative c d c+d a+c b+d n Fisher s test assumes column  and row totals are fixed – Fisher’s exact test may be  inappropriate for some tables • P‐values computed using the hypergeometric distribution shown above • P‐value represents the probability of finding  this specific table vs. all possible tables of  sample size of n = a + b + c + d Odds Ratios and Relative Risk • Pearson’s chi‐square and  Fi h ’ i di Pregnant? Fisher’s exact tests indicate  whether a relationship is  statistically significant – Did the results occur by chance? • Odds ratios and relative risk  indicate the magnitude of a Yes No Pregnancy  Test? Positive a b a+b Negative c d c+d a+c b+d n indicate the magnitude of a  relationship or its effect size – Was there a large difference in  the odds or risks among rows?
  • 4. 9/13/2010 4 Interpreting OR and RR • The odds of pregnancy are OR = 58 5 times higher• The odds of pregnancy are OR = 58.5 times higher  for women who tested positive than the odds of  pregnancy for women who tested negative • The risk of pregnancy is RR = 6.75 times higher for p g y g women who tested positive than the odds of  women who tested negative Sensitivity and Specificity • Sensitivity and specificity  represent the performance Pregnant? represent the performance  of diagnostic tests • Sensitivity is the proportion  of actual positives correctly  identified by the diagnostic Yes No Pregnancy  Test? Positive TP FP Negative FN TN • Specificity is the proportion  of actual negatives correctly  identified by the diagnostic
  • 5. 9/13/2010 5 Table Formats Pregnant? Pregnancy Test Pregnant? Count Yes No Pregnancy  Test? Positive 27 3 30 Negative 4 26 30 31 29 60 Test Pregnant? Count Positive Yes 27 Positive No 3 Negative Yes 26 Negative No 4 Contingency Table format Summarized Table format • You may need to reformat your data table for some software – Contingency table format for analysis in GraphPad Prism – Summarized table format for analysis in JMP Review Contingency Table Results Pregnant? Yes No Pregnancy  Test? Positive 27 3 30 Negative 4 26 30 31 29 60 Pearson Chi‐Square: X2 = 32.3026, p = 1.319e‐08q , p Fisher’s Exact Test: p = 1.975e‐09 Odds of pregnancy are OR = 58.5 times higher after positive pregnancy test Risk of pregnancy is RR = 6.75 times higher after positive pregnancy test Pregnancy test has 87.1% sensitivity and 89.66% specificity
  • 6. 9/13/2010 6 More Complicated Models • What if your contingency table is larger than 2 x 2? – Pearson chi‐square and Fisher’s exact test for M x N tables • What if your table contains paired data? – McNemar’s Test for paired data • What if your table has three variables? – Mantel‐Haenzel‐Cochran (MHC) test • What if you have a continuous predictor variable?y p – Logistic regression models • What about really complicated models? – Generalized Linear Models (GLIM) M x N Contingency Tables Blood Types P hi k h f l M N bl b A B AB O Ethnicity Bambara 7 8 5 20 40 Peul 12 3 3 12 30 Tuareg 11 13 2 4 30 30 24 10 36 100 • Pearson chi‐square tests work the same for larger M x N tables, but  researchers need to remember the assumptions about cell counts • Fisher’s exact test is difficult to compute for M x N tables, but it  can be computed using simulations in R or other software
  • 7. 9/13/2010 7 Ordinal vs. Nominal Variables • Ordinal variables have outcomes that are ordered D D 0 5 10 d 15– Drug Dosages: 0 mg, 5 mg, 10 mg and 15 mg – Symptom Severity: Mild, Moderate and Severe • Nominal variables have outcomes that are unordered – Blood Types: A, B, AB and O – Ethnicity: Bambara, Peul and Tuareg • Most tests assume nominal variables by defaulty – Ordinal variables require fewer odds ratio estimates – Ordinal variables may allow for a simpler model – E.g. compute odds ratios to compare Mild vs. Moderate and Moderate  vs. Severe, but do not compare Mild vs. Severe McNemar’s Test • McNemar’s test should be used  if t bl t t h d Test 2 if table represents a matched  pairs design experiment – E.g.  Some matched pairs designs  arise from repeated sampling of  patients pre‐ and post‐treatment – E g Case‐control experiments may Pos Neg Test 1 Positive a b a+b Negative c d c+d a+c b+d n E.g.  Case control experiments may  use McNemar’s test because case  and control patients have been  “matched” using key demographic  variables like age, gender, race, ...
  • 8. 9/13/2010 8 Mantel‐Haenzel‐Cochran Test Age < 40 Age > 40 • Mantel‐Haenzel‐Cochran test determines if the relationship  All Ages Heart Attack? Yes No Birth  Control? Yes 16 34 No 34 16 Heart Attack? Yes No 8 32 2 8 Heart Attack? Yes No 8 2 32 8 between two table variables remains the same if the table is  “paneled” or split by a third table variable • Often used to investigate Simpson’s Paradox Logistic Regression • Logistic regression fits the relationship  b t ti di t dbetween a continuous predictor and a  categorical response variable – E.g. predict the gender of an unknown  person based on their height – E.g. predict whether an animal will live or  die based on the dose of a drug • The logistic regression plot represents  a change in log odds ratio for each onea change in log odds ratio for each one  unit increase in the predictor variable – E.g. If an unknown person is 61 inches tall,  their odds of being male are near zero – E.g. if an unknown person is 68 inches tall,  their odds of being male are about 50‐50
  • 9. 9/13/2010 9 “Long” Data Format • Each row of data represents one p patient, animal or subject • Raw data format is useful when  continuous covariates are unique  to each subject or patientj p – E.g. Exact weight of each patient – E.g. Exact blood pressure, ... “Wide” Data Format • If each value of the continuous variable  has been replicated, the data can be  formatted as a summarized table • Summarized tables require less space  and can be used in multiple modelsp – Logistic regression models – Log‐linear models – Probit analysis
  • 10. 9/13/2010 10 Results from Logistic Regression • Whole model results Likelihood Ratio Test (LRT)– Likelihood Ratio Test (LRT) – Model fit diagnostics • Parameter estimates – Regression coefficients – Wald tests • Odds ratios Th dd f i l 1 107– The odds of survival are 1.107  times higher after every one  unit increase in log(dose) – Odds of survival are 12.794  times higher after every one  unit increase in dose Why Use Both Wald and LRT? • Likelihood Ratio tests compare the fit of two statistical models – Most statistical models can be described with a likelihood function, e.g., g – A likelihood ratio test (LRT) computes the log‐likelihood function under a full  model (dose and intercept) and reduced model (intercept) to test model fit • Wald tests evaluate the statistical significance of model parameters – Wald test statistics are constructed very similar to Student’s T‐tests – Results from Wald test should be consistent with LRT results
  • 11. 9/13/2010 11 Estimate LD50 from Logistic Regression • You can use interpolated values    i di ti t ti tor inverse prediction to estimate  LD50 from a logistic regression • Open the Inverse Prediction menu  and enter Prob = 0.500 to estimate  LD50 by finding X at Y = 0.500 – Enter Prob = 0.90 for LD90, ..., • You may need to antilog your LD50  estimate if your predictor is on the  log scale (e.g. log10(dose)) Compute LD50 from Parameter Estimates • Simple logistic regression is defined by the equation • Therefore, by simple algebra, we find LD50 = ‐B0 / B1
  • 12. 9/13/2010 12 Reed‐Muench Method • Graphical estimate of  LD50 from survival data • Plot total number of  survivors and total  number dead against  dilution or concentration • Intersection represents  best estimate of LD50 Reed‐Muench Method
  • 13. 9/13/2010 13 Generalized Linear Models • Logistic regression, extensions of Pearson chi‐square tests and other  models can be defined as generalized linear models (GLIM)models can be defined as generalized linear models (GLIM) • Each GLIM model is coerced into the form of a linear equation by  choosing the correct statistical distribution and link function • Excluding logistic regression, most multifactor categorical models  must be specified using the GLIM procedures in your softwarep g p y • GLIM procedures typically allow analysts to test for overdispersion,  where real data has more variance than expected from the model Distribution Choices • Modeling categorical responses directlyg g p y – Binomial and multinomial distributions – Negative binomial distribution • Modeling contingency table cell counts – Poisson distribution models all cell counts as rare eventsPoisson distribution models all cell counts as rare events – Normal distribution models cell counts as common events
  • 14. 9/13/2010 14 Link Functions • Link functions are mathematical transformations  used to coerce models into linear equations – The identity link function g(y) = y for linear models – The log link function g(y) = log(y) for log‐linear models – The logit link function (below) for logistic regression models – The probit link function (below) for probit analysis models Historic Models as GLIM • Logistic regressiong g – Binomial distribution with logistic link function • Probit analysis – Binomial distribution with probit link function • Log‐linear models – Poisson distribution with log link function • Negative Binomial regression – Negative binomial distribution with log link function
  • 15. 9/13/2010 15 Overdispersion Parameters • Traditional linear models, like linear regression, use independent  parameters to estimate the variance of the response dataparameters to estimate the variance of the response data – E.g. linear regression has independent mean μ = Xβ and variance σ2 • Many GLIM models, like logistic regression,  have fixed relationships  between the variance and other model parameters – E.g. logistic regression has mean μ = np and variance σ2 = np(1 – p) – E.g. log‐linear models have μ = σ2 = λ = np for rare event with small p • Overdispersion parameters are used to account for extra variability• Overdispersion parameters are used to account for extra variability  in the responses, which cannot be explained by the model – E.g. logistic regression modeled with variance σ2 = φnp(1 – p) – Want to know if multiplier φ > 2 to determine significance or importance Generalized Linear Mixed Models • Generalized linear models can be advanced further by  including random effect variables – These models are called generalized linear mixed models (GLMM) – Random effect variables are included to account for paired designs,  repeated measures designs, split‐plot designs and other effects – GLMM are typiaclly fit using generalized estimating equations (GEE), often  using linearization techniques (e.g. SAS PROC GLIMMIX) l d d b f• Sometimes complicated GLM and GLMM must be fit  using nonlinear modeling procedures in your software – Probit model with binomial errors or Poisson loss function models in JMP – Probit‐Normal models and Poisson‐Normal models in SAS PROC NLMIXED
  • 16. 9/13/2010 16 Random vs. Fixed Effects Subject effects are random Gender effects are fixed • Subject effects are random because the subjects in a experiment  are a sample from the population of all possible subjects • Gender effects are fixed because there are only two genders Split‐plot Design 12 mice: 6 infected, 6 uninfected 3 infected males, 3 infected females, … • Split‐plot design experiments model experiments where  whole plots and subplots represent different EUs , , 4 samples taken from each mouse Each sample treated with one of 2 different drugs Whole plot (mouse) EU’s: Infection, gender Subplot (sample) EU’s: drug treatment whole plots and subplots represent different EUs – Whole plots are often locations, subjects, objects or factors that  are difficult to change (e.g. temperature in an incubator) – Subplot effects are typically the effects of highest interest – Subplot effects are tested with higher power than whole plot
  • 17. 9/13/2010 17 References • Agresti A.  2002.  Categorical Data Analyses.  Second Ed.  Wiley‐Interscience. • Reed LJ and H Muench.  1938.  A Simple Method of Estimating Fifty Percent  Endpoints.  The American Journal of Hygiene.  27(3):493‐497 • SAS Institute Inc.  2007.  SAS 9.1.3 Documentation.  Cary, NC.  SAS Institute Inc. • SAS Institute Inc 2010 JMP Statistics and Graphics Guide Cary NC SAS• SAS Institute Inc.  2010.  JMP Statistics and Graphics Guide.  Cary, NC.  SAS  Institute Inc.