Your SlideShare is downloading.
×

- 1. Analysis of Time-series Data Case-crossover Study Jinseob Kim July 17, 2015 Jinseob Kim Analysis of Time-series Data July 17, 2015 1 / 30
- 2. Contents 1 Concepts Individual data Design 2 Conditional logistic regression Review Basic linear regression Logistic regression Conditional logistic regression 3 Practice Issues In R Jinseob Kim Analysis of Time-series Data July 17, 2015 2 / 30
- 3. Objective 1 Individual risk VS population risk 2 Case-crossover design의 개념 3 주의사항 4 적용: season package in R Jinseob Kim Analysis of Time-series Data July 17, 2015 3 / 30
- 4. Concepts Contents 1 Concepts Individual data Design 2 Conditional logistic regression Review Basic linear regression Logistic regression Conditional logistic regression 3 Practice Issues In R Jinseob Kim Analysis of Time-series Data July 17, 2015 4 / 30
- 5. Concepts Individual data Two approaches to see the relationship between weather and health outcome Population based study Y: # events (daily death counts or # hospital admissions) X: temperature Estimates pop’n risk (% change in daily death counts corresponding to the change in temperature) Individual based study Y : 1 if an event occurs, 0 otherwise X : temperature Estimates individual risk (% change in individual probability of event or odds ratio corresponding to the change in temperature) Jinseob Kim Analysis of Time-series Data July 17, 2015 5 / 30
- 6. Concepts Individual data Data structure change (Year,week,case) (2006,1,20) : 1 case (Year,week,event) (2006,1,1), (2006,1,1), · · · , (2006,1,1) : 20개 case (2005,53,0), · · · , (2005,53,0), (2006,2,0), · · · , (2006,2,0) : controls.. Jinseob Kim Analysis of Time-series Data July 17, 2015 6 / 30
- 7. Concepts Design Case + Crossover Case: 환자만 이용. Crossover: 환자의 다른 시점이 대조군. Jinseob Kim Analysis of Time-series Data July 17, 2015 7 / 30
- 8. Concepts Design If average (air pollution) of controls < average (air pollution) of case days.. We conclude that the event is associated with higher values of air pollution Jinseob Kim Analysis of Time-series Data July 17, 2015 8 / 30
- 9. Concepts Design Various control day Time trend로 인한 bias 보정 Jinseob Kim Analysis of Time-series Data July 17, 2015 9 / 30
- 10. Conditional logistic regression Contents 1 Concepts Individual data Design 2 Conditional logistic regression Review Basic linear regression Logistic regression Conditional logistic regression 3 Practice Issues In R Jinseob Kim Analysis of Time-series Data July 17, 2015 10 / 30
- 11. Conditional logistic regression Review Basic linear regression Remind β estimation in linear regression 1 Ordinary Least Square(OLS): semi-parametric 2 Maximum Likelihood Estimator(MLE): parametric Jinseob Kim Analysis of Time-series Data July 17, 2015 11 / 30
- 12. Conditional logistic regression Review Basic linear regression Least Square(최소제곱법) 제곱합을 최소로: y 정규성에 대한 가정 필요없다. Figure: OLS Fitting Jinseob Kim Analysis of Time-series Data July 17, 2015 12 / 30
- 13. Conditional logistic regression Review Basic linear regression Likelihood?? 가능도(likelihood) VS 확률(probability) Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 1 6 Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일 확률은 0... Jinseob Kim Analysis of Time-series Data July 17, 2015 13 / 30
- 14. Conditional logistic regression Review Basic linear regression Maximum likelihood estimator(MLE) 최대가능도추정량: 1, · · · , n이 서로 독립이라하자. 1 각각의 가능도 함수를 구한다. 2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까) 3 가능도를 최대로 하는 β를 구한다. Jinseob Kim Analysis of Time-series Data July 17, 2015 14 / 30
- 15. Conditional logistic regression Review Basic linear regression MLE: 최대가능도추정량 데이터가 일어날 가능성을 최대로: y또는 분포가정필요. Jinseob Kim Analysis of Time-series Data July 17, 2015 15 / 30
- 16. Conditional logistic regression Review Basic linear regression Logistic function: MLE Figure: Fitting Logistic Function Jinseob Kim Analysis of Time-series Data July 17, 2015 16 / 30
- 17. Conditional logistic regression Review Basic linear regression LRT? Ward? score? Likelihood Ratio Test VS Ward test VS score test 1 통계적 유의성 판단하는 방법들. 2 가능도비교 VS 베타값비교 VS 기울기비교/ Jinseob Kim Analysis of Time-series Data July 17, 2015 17 / 30
- 18. Conditional logistic regression Review Basic linear regression 비교 Figure: Comparison Jinseob Kim Analysis of Time-series Data July 17, 2015 18 / 30
- 19. Conditional logistic regression Logistic regression Model Log( pi 1 − pi ) = β0 + β1 · xi1 pi = P(Yi = 1) = exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) P(Yi = 0) = 1 1 + exp(β0 + β1 · xi1) P(Yi = yi ) = ( exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) )yi ( 1 1 + exp(β0 + β1 · xi1) )1−yi Jinseob Kim Analysis of Time-series Data July 17, 2015 19 / 30
- 20. Conditional logistic regression Logistic regression Likelihood Likelihood= n i=1 P(Yi = yi ) = n i=1 ( exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) )yi ( 1 1 + exp(β0 + β1 · xi1) )1−yi 개인별로 가능도(데이터의 상황이 나올 확률)이 나온다. 그것들을 다 곱하면 Likelihood 이것을 최소로 하는 β를 구하는 것. Case나 Control이나 따로따로 Likelihood를 구한다. Jinseob Kim Analysis of Time-series Data July 17, 2015 20 / 30
- 21. Conditional logistic regression Conditional logistic regression Conditional likelihood Matched case-control set Case와 그의 control들(1:1 or 1:N)이 한 쌍!! 쌍별로 likelihood가 나온다. 쌍별로 우리의 데이터를 볼 가능성을 계산. 모든 쌍에 대해 다 곱하면 전체 Likelihood Jinseob Kim Analysis of Time-series Data July 17, 2015 21 / 30
- 22. Conditional logistic regression Conditional logistic regression Deﬁnition ith strata(1 ≤ i ≤ N): 1 case(이름:갑), ni control이라 하자. Conditional likelihood of ith strata= Li = P(갑이 case고 나머지가 control|case 1명&control ni 명) Total likelihood= N i=1 Li Jinseob Kim Analysis of Time-series Data July 17, 2015 22 / 30
- 23. Practice Contents 1 Concepts Individual data Design 2 Conditional logistic regression Review Basic linear regression Logistic regression Conditional logistic regression 3 Practice Issues In R Jinseob Kim Analysis of Time-series Data July 17, 2015 23 / 30
- 24. Practice Issues Control 확실하냐? 앞 뒤 7일, 14일 등.. control이 확실?? Exposure → Disease가 짧아야.. Exposure 가 축적되지 않아야.. 급성질환, 폭로의 일시적 효과 (ex:폭염과 사망) Jinseob Kim Analysis of Time-series Data July 17, 2015 24 / 30
- 25. Practice In R season package > library(season) > data(CVDdaily) # cardiovascular disease data > CVDdaily=subset(CVDdaily,date<=as.Date('1987-12-31')) # subset for exampl > head(CVDdaily) date cvd dow tmpd o3mean o3tmean Mon Tue Wed Thu Fri Sat 3 1987-01-01 55 Thursday 54.50 -16.0073 -15.89619 0 0 0 1 0 0 5 1987-01-02 73 Friday 58.50 -11.6595 -11.19102 0 0 0 0 1 0 9 1987-01-03 64 Saturday 55.25 -10.3241 -10.51787 0 0 0 0 0 1 12 1987-01-04 57 Sunday 54.75 -18.6471 -18.27014 0 0 0 0 0 0 15 1987-01-05 56 Monday 54.50 -17.5291 -17.13201 1 0 0 0 0 0 18 1987-01-06 65 Tuesday 49.75 -22.7846 -22.74711 0 1 0 0 0 0 month winter spring summer autumn 3 1 1 0 0 0 5 1 1 0 0 0 9 1 1 0 0 0 12 1 1 0 0 0 15 1 1 0 0 0 18 1 1 0 0 0 Jinseob Kim Analysis of Time-series Data July 17, 2015 25 / 30
- 26. Practice In R casecross() > # Effect of ozone on CVD death > model1 = casecross(cvd ~ o3mean+tmpd+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily) > # match on day of the week > model2 = casecross(cvd ~ o3mean+tmpd,matchdow=TRUE, data=CVDdaily) > # match on temperature to within a degree > model3 = casecross(cvd ~ o3mean+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily, matchconf='tmpd', confrange=1) Jinseob Kim Analysis of Time-series Data July 17, 2015 26 / 30
- 27. Practice In R casecross(formula = cvd ~ o3mean + tmpd + Mon + Tue + Wed + Thu + Fri + Sat, data = CVDdaily, exclusion = 2, stratalength = 28, matchdow = FALSE, usefinalwindow = FALSE, matchconf = "", confrange = 0, stratamonth = FALSE) Time-stratified case-crossover with a stratum length of 28 days Total number of cases 17502 Number of case days with available control days 364 Average number of control days per case day 23.2 Parameter Estimates: coef exp(coef) se(coef) z Pr(>|z|) o3mean -0.002882613 0.9971215 0.001128975 -2.55330077 0.01067073 tmpd 0.001461400 1.0014625 0.001981047 0.73769030 0.46070267 Mon 0.042733425 1.0436596 0.028942815 1.47647783 0.13981566 Tue 0.057910712 1.0596204 0.028772745 2.01269332 0.04414690 Wed -0.010008025 0.9900419 0.029171937 -0.34307029 0.73154558 Thu -0.016790296 0.9833499 0.029455877 -0.57001513 0.56866744 Fri 0.027247952 1.0276226 0.029173235 0.93400517 0.35030123 Sat 0.001855841 1.0018576 0.028900116 0.06421568 0.94879849 Jinseob Kim Analysis of Time-series Data July 17, 2015 27 / 30
- 28. Practice In R casecross(formula = cvd ~ o3mean + tmpd, data = CVDdaily, matchdow = TRUE, exclusion = 2, stratalength = 28, usefinalwindow = FALSE, matchconf = "", confrange = 0, stratamonth = FALSE) Time-stratified case-crossover with a stratum length of 28 days Matched on day of the week Total number of cases 17502 Number of case days with available control days 364 Average number of control days per case day 3 Parameter Estimates: coef exp(coef) se(coef) z Pr(>|z|) o3mean -0.0030752572 0.9969295 0.001188540 -2.5874238 0.009669658 tmpd -0.0004095116 0.9995906 0.002131744 -0.1921017 0.847662557 Jinseob Kim Analysis of Time-series Data July 17, 2015 28 / 30
- 29. Practice In R casecross(formula = cvd ~ o3mean + Mon + Tue + Wed + Thu + Fri + Sat, data = CVDdaily, matchconf = "tmpd", confrange = 1, exclusion = 2, stratalength = 28, matchdow = FALSE, usefinalwindow = FA stratamonth = FALSE) Time-stratified case-crossover with a stratum length of 28 days Matched on tmpd plus/minus 1 Total number of cases 15180 Number of case days with available control days 318 Average number of control days per case day 4.9 Parameter Estimates: coef exp(coef) se(coef) z Pr(>|z|) o3mean -0.003238583 0.9967667 0.00131839 -2.4564691 1.403099e-02 Mon 0.182058170 1.1996840 0.03577818 5.0885255 3.608582e-07 Tue 0.144181049 1.1550932 0.03563272 4.0463108 5.203115e-05 Wed 0.099443480 1.1045560 0.03554924 2.7973451 5.152447e-03 Thu 0.088518237 1.0925542 0.03459482 2.5587140 1.050601e-02 Fri 0.108107305 1.1141673 0.03437323 3.1451022 1.660288e-03 Sat 0.023660066 1.0239422 0.03525152 0.6711786 5.021068e-01 Jinseob Kim Analysis of Time-series Data July 17, 2015 29 / 30
- 30. Practice In R END Email : secondmath85@gmail.com Jinseob Kim Analysis of Time-series Data July 17, 2015 30 / 30