Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Logistic regression by saba khan 8626 views
- Logistic regression by Venkata Reddy Kon... 12519 views
- Logistic regression with SPSS examples by Gaurav Kamboj 2641 views
- unmatched case control studies by Mrinmoy Bharadwaz 1064 views
- Logistic regression by DrZahid Khan 4968 views
- Logistic Regression: Predicting The... by Michael Lieberman 6836 views

6,701 views

Published on

No Downloads

Total views

6,701

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

225

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Logistic Regression in Case- Control study using – A statistical tool Satish Gupta
- 2. What is R? The R statistical programming language is a free open source package. The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to cutting edge research.
- 3. Getting Started Go to www.r-project.org Downloads: CRAN (Comprehensive R Archive Network) Set your Mirror: location close to you. Select Windows 95 or later, MacOS or UNIX platforms
- 4. Getting Started
- 5. Basic operators and calculations Comparison operators equal: == not equal: != greater/less than: > < greater/less than or equal: >= <= Example: 1 == 1 # Returns TRUE
- 6. Basic operators and calculations Logical operators AND: & x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'. x > y & x > 5 # Returns TRUE where both comparisons return TRUE. OR: | x == y | x != y # Returns TRUE where at least one comparison is TRUE. NOT: ! !x > y # The '!' sign returns the negation (opposite) of a logical vector.
- 7. Basic operators and calculations Calculations Four basic arithmetic functions: addition, subtraction, multiplication and division 1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic calculations. Calculations on vectors x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for the vector x its sum, mean, standard deviation and square root. x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element in the vectors x and y.
- 8. R-Graphics R provides comprehensive graphics utilities for visualizing and exploring scientific data. It includes: Scatter plots Line plots Bar plots Pie charts Heatmaps Venn diagrams Density plots Box plots
- 9. Data handling in R Load data: mydata = read.csv(“/path/mydata.csv”) See data on screen: data(mydata) See top part of data: head(mydata) Specific number of rows and column of data: mydata[1:10,1:3] To get a type of data: class(mydata) Changing class of data: newdata = as.matrix(mydata) Summary of data: summary(mydata) Selecting (KEEPING) variables (columns) newdata = mydata[c(1,3:5)]
- 10. Data handling in R Selecting observations newdata= subset(mydata, age>=20 | age <10, select=c(ID, weight) newdata= subset(mydata, sex==“Male” & age >25, select=weight:income) Excluding (DROPPING) variables (columns) newdata = mydata[c(-3,-5)] mydata$v3 = NULL
- 11. R-Library There are many tools defined as “package” are present in R for different kind of analysis including data from genetics and genomics. Depending upon the availability of library, it can be downloaded from two sources Using CRAN (Comprehensive R Archive Network) as: install.packages(“package_name”) Using Bioconductor as: source("http://bioconductor.org/biocLite.R") biocLite(“package_name”)
- 12. R-Library To load a package, library() #Lists all libraries/packages that are available on a system. library(genetics) #Package for genetics data analysis library(help=genetics) #Lists all functions/objects of “genetics” package ?function #Opens documentation of a function
- 13. What is Logistic Regression? Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables. Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear.
- 14. A General Model: Logistic Regression JJ disease disease disease XX p p p βββ +++= − = 110) 1 log()logit( Where: pdisease is the probability that an individual has a particular disease. β0 is the intercept β1, β2 …βJ are the coefficients (effects) of genetic factors X1, X2 …XJ are the variables of genetic factors
- 15. Assumptions Logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables. Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions.
- 16. Questions ?? What is the relative importance of each predictor variable? How does each predictor variable affect the outcome? Does a predictor variable make the solution better or worse or have no effect? Are there interactions among predictors? Does adding interactions among predictors (continuous or categorical) improve the model? What is the strength of association between the outcome variable and a set of predictors? Often in model comparison you want non-significant differences so strength of association is reported for even non-significant effects.
- 17. Types of Logistic Regression Unconditional logistic regression Conditional logistic regression ** Rule of thumbs Use conditional logistic regression if matching has been done, and unconditional if there has been no matching. When in doubt, use conditional because it always gives unbiased results. The unconditional method is said to overestimate the odds ratio if it is not appropriate.
- 18. Data Format Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2 1 1 <60 CT TT AG AG 0 1 >60 – 70 CC CC GG GG 1 2 <60 TT CC AG AA 0 2 >70 – 80 CC CT GG GG 1 3 >80 CC CC AA AA 0 3 >60 – 70 CT TT GG GG 1 4 <60 CC CC AA AG 0 4 >70 – 80 TT TT GG GG 1 5 >80 CC CC AG AA 0 5 <60 CC CC GG GG 1 6 >70 – 80 CT TT AA AA 0 6 >80 CC CC GG AG 1 7 >60 – 70 TT CC AA AG
- 19. Data and Library loading Load and use data in R (Using Lung cancer data from PLoS One 2013, 8(3):e59051). lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE) Load the library and use data for analysis library(epicalc) use(lung)
- 20. Data Analysis Performing conditional logistic regression (Case vs. Control) clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.074 >70 – 80 0.11(0.03 – 0.33) <0.001 >80 0.10(0.03 – 0.34) <0.001
- 21. Data Analysis Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) GPX1: ref.=CC 0.032 CT 0.44(0.22 – 0.86) 0.017 TT 0.42(0.13 – 1.38) 0.151
- 22. Data Analysis Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung) crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042 >70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001 >80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001 GPX1:ref.=CC 0.006 CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004 TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313 Environmental Factor Genetic Factor
- 23. Data Analysis Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.054 >70 – 80 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) <0.001
- 24. Data Analysis Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=CC 0.034 CT 0.45 (0.24 – 0.85) 0.014 TT 0.44 (0.14 – 1.36) 0.156
- 25. Data Analysis Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074 >70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001 GPX1:ref.=CC 0.024 CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01 TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
- 26. Something More Changing the default reference GPX1 = relevel(GPX1, ref = "TT") pack() Saving the result result = clogistic.display(clogit_lung) write.csv(result$table, file=“path/result.csv“, sep = “t”) write.table(result$table, file=“path/result.xls“, sep = “t”)
- 27. Summary: regression models Regression models can be used to describe the average effect of predictors on outcomes in your data set. They can tell how likely that the effect is just be due to chance. They can look at each predictor “adjusting for” the others (estimating what would happen if all others were held constant.)
- 28. Thanks to, Prof. Virasakdi Chongsuvivatwong Epidemiology Unit, Faculty of Medicine, Prince of Songkla University, Thailand

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment