1. Analysis of Time-series Data
Case-crossover Study
Jinseob Kim
July 17, 2015
Jinseob Kim Analysis of Time-series Data July 17, 2015 1 / 30
2. Contents
1 Concepts
Individual data
Design
2 Conditional logistic regression
Review Basic linear regression
Logistic regression
Conditional logistic regression
3 Practice
Issues
In R
Jinseob Kim Analysis of Time-series Data July 17, 2015 2 / 30
3. Objective
1 Individual risk VS population risk
2 Case-crossover design의 개념
3 주의사항
4 적용: season package in R
Jinseob Kim Analysis of Time-series Data July 17, 2015 3 / 30
4. Concepts
Contents
1 Concepts
Individual data
Design
2 Conditional logistic regression
Review Basic linear regression
Logistic regression
Conditional logistic regression
3 Practice
Issues
In R
Jinseob Kim Analysis of Time-series Data July 17, 2015 4 / 30
5. Concepts Individual data
Two approaches to see the relationship between weather
and health outcome
Population based study
Y: # events (daily death counts or # hospital admissions)
X: temperature
Estimates pop’n risk (% change in daily death counts corresponding
to the change in temperature)
Individual based study
Y : 1 if an event occurs, 0 otherwise
X : temperature
Estimates individual risk (% change in individual probability of event
or odds ratio corresponding to the change in temperature)
Jinseob Kim Analysis of Time-series Data July 17, 2015 5 / 30
6. Concepts Individual data
Data structure change
(Year,week,case)
(2006,1,20) : 1 case
(Year,week,event)
(2006,1,1), (2006,1,1), · · · , (2006,1,1) : 20개 case
(2005,53,0), · · · , (2005,53,0), (2006,2,0), · · · , (2006,2,0) : controls..
Jinseob Kim Analysis of Time-series Data July 17, 2015 6 / 30
7. Concepts Design
Case + Crossover
Case: 환자만 이용.
Crossover: 환자의 다른 시점이 대조군.
Jinseob Kim Analysis of Time-series Data July 17, 2015 7 / 30
8. Concepts Design
If average (air pollution) of controls < average (air pollution) of case
days..
We conclude that the event is associated with higher values of air
pollution
Jinseob Kim Analysis of Time-series Data July 17, 2015 8 / 30
10. Conditional logistic regression
Contents
1 Concepts
Individual data
Design
2 Conditional logistic regression
Review Basic linear regression
Logistic regression
Conditional logistic regression
3 Practice
Issues
In R
Jinseob Kim Analysis of Time-series Data July 17, 2015 10 / 30
11. Conditional logistic regression Review Basic linear regression
Remind
β estimation in linear regression
1 Ordinary Least Square(OLS): semi-parametric
2 Maximum Likelihood Estimator(MLE): parametric
Jinseob Kim Analysis of Time-series Data July 17, 2015 11 / 30
12. Conditional logistic regression Review Basic linear regression
Least Square(최소제곱법)
제곱합을 최소로: y 정규성에 대한 가정 필요없다.
Figure: OLS Fitting
Jinseob Kim Analysis of Time-series Data July 17, 2015 12 / 30
13. Conditional logistic regression Review Basic linear regression
Likelihood??
가능도(likelihood) VS 확률(probability)
Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 1
6
Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일
확률은 0...
Jinseob Kim Analysis of Time-series Data July 17, 2015 13 / 30
14. Conditional logistic regression Review Basic linear regression
Maximum likelihood estimator(MLE)
최대가능도추정량: 1, · · · , n이 서로 독립이라하자.
1 각각의 가능도 함수를 구한다.
2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까)
3 가능도를 최대로 하는 β를 구한다.
Jinseob Kim Analysis of Time-series Data July 17, 2015 14 / 30
15. Conditional logistic regression Review Basic linear regression
MLE: 최대가능도추정량
데이터가 일어날 가능성을 최대로: y또는 분포가정필요.
Jinseob Kim Analysis of Time-series Data July 17, 2015 15 / 30
16. Conditional logistic regression Review Basic linear regression
Logistic function: MLE
Figure: Fitting Logistic Function
Jinseob Kim Analysis of Time-series Data July 17, 2015 16 / 30
17. Conditional logistic regression Review Basic linear regression
LRT? Ward? score?
Likelihood Ratio Test VS Ward test VS score test
1 통계적 유의성 판단하는 방법들.
2 가능도비교 VS 베타값비교 VS 기울기비교/
Jinseob Kim Analysis of Time-series Data July 17, 2015 17 / 30
18. Conditional logistic regression Review Basic linear regression
비교
Figure: Comparison
Jinseob Kim Analysis of Time-series Data July 17, 2015 18 / 30
19. Conditional logistic regression Logistic regression
Model
Log(
pi
1 − pi
) = β0 + β1 · xi1
pi = P(Yi = 1) =
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
P(Yi = 0) =
1
1 + exp(β0 + β1 · xi1)
P(Yi = yi ) = (
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
)yi
(
1
1 + exp(β0 + β1 · xi1)
)1−yi
Jinseob Kim Analysis of Time-series Data July 17, 2015 19 / 30
20. Conditional logistic regression Logistic regression
Likelihood
Likelihood=
n
i=1
P(Yi = yi ) =
n
i=1
(
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
)yi
(
1
1 + exp(β0 + β1 · xi1)
)1−yi
개인별로 가능도(데이터의 상황이 나올 확률)이 나온다.
그것들을 다 곱하면 Likelihood
이것을 최소로 하는 β를 구하는 것.
Case나 Control이나 따로따로 Likelihood를 구한다.
Jinseob Kim Analysis of Time-series Data July 17, 2015 20 / 30
21. Conditional logistic regression Conditional logistic regression
Conditional likelihood
Matched case-control set
Case와 그의 control들(1:1 or 1:N)이 한 쌍!!
쌍별로 likelihood가 나온다.
쌍별로 우리의 데이터를 볼 가능성을 계산.
모든 쌍에 대해 다 곱하면 전체 Likelihood
Jinseob Kim Analysis of Time-series Data July 17, 2015 21 / 30
22. Conditional logistic regression Conditional logistic regression
Definition
ith strata(1 ≤ i ≤ N): 1 case(이름:갑), ni control이라 하자.
Conditional likelihood of ith strata=
Li = P(갑이 case고 나머지가 control|case 1명&control ni 명)
Total likelihood=
N
i=1
Li
Jinseob Kim Analysis of Time-series Data July 17, 2015 22 / 30
23. Practice
Contents
1 Concepts
Individual data
Design
2 Conditional logistic regression
Review Basic linear regression
Logistic regression
Conditional logistic regression
3 Practice
Issues
In R
Jinseob Kim Analysis of Time-series Data July 17, 2015 23 / 30
24. Practice Issues
Control 확실하냐?
앞 뒤 7일, 14일 등.. control이 확실??
Exposure → Disease가 짧아야..
Exposure 가 축적되지 않아야..
급성질환, 폭로의 일시적 효과 (ex:폭염과 사망)
Jinseob Kim Analysis of Time-series Data July 17, 2015 24 / 30
26. Practice In R
casecross()
> # Effect of ozone on CVD death
> model1 = casecross(cvd ~ o3mean+tmpd+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily)
> # match on day of the week
> model2 = casecross(cvd ~ o3mean+tmpd,matchdow=TRUE, data=CVDdaily)
> # match on temperature to within a degree
> model3 = casecross(cvd ~ o3mean+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily, matchconf='tmpd', confrange=1)
Jinseob Kim Analysis of Time-series Data July 17, 2015 26 / 30
27. Practice In R
casecross(formula = cvd ~ o3mean + tmpd + Mon + Tue + Wed + Thu +
Fri + Sat, data = CVDdaily, exclusion = 2, stratalength = 28,
matchdow = FALSE, usefinalwindow = FALSE, matchconf = "",
confrange = 0, stratamonth = FALSE)
Time-stratified case-crossover with a stratum length of 28 days
Total number of cases 17502
Number of case days with available control days 364
Average number of control days per case day 23.2
Parameter Estimates:
coef exp(coef) se(coef) z Pr(>|z|)
o3mean -0.002882613 0.9971215 0.001128975 -2.55330077 0.01067073
tmpd 0.001461400 1.0014625 0.001981047 0.73769030 0.46070267
Mon 0.042733425 1.0436596 0.028942815 1.47647783 0.13981566
Tue 0.057910712 1.0596204 0.028772745 2.01269332 0.04414690
Wed -0.010008025 0.9900419 0.029171937 -0.34307029 0.73154558
Thu -0.016790296 0.9833499 0.029455877 -0.57001513 0.56866744
Fri 0.027247952 1.0276226 0.029173235 0.93400517 0.35030123
Sat 0.001855841 1.0018576 0.028900116 0.06421568 0.94879849
Jinseob Kim Analysis of Time-series Data July 17, 2015 27 / 30
28. Practice In R
casecross(formula = cvd ~ o3mean + tmpd, data = CVDdaily, matchdow = TRUE,
exclusion = 2, stratalength = 28, usefinalwindow = FALSE,
matchconf = "", confrange = 0, stratamonth = FALSE)
Time-stratified case-crossover with a stratum length of 28 days
Matched on day of the week
Total number of cases 17502
Number of case days with available control days 364
Average number of control days per case day 3
Parameter Estimates:
coef exp(coef) se(coef) z Pr(>|z|)
o3mean -0.0030752572 0.9969295 0.001188540 -2.5874238 0.009669658
tmpd -0.0004095116 0.9995906 0.002131744 -0.1921017 0.847662557
Jinseob Kim Analysis of Time-series Data July 17, 2015 28 / 30
29. Practice In R
casecross(formula = cvd ~ o3mean + Mon + Tue + Wed + Thu + Fri +
Sat, data = CVDdaily, matchconf = "tmpd", confrange = 1,
exclusion = 2, stratalength = 28, matchdow = FALSE, usefinalwindow = FA
stratamonth = FALSE)
Time-stratified case-crossover with a stratum length of 28 days
Matched on tmpd plus/minus 1
Total number of cases 15180
Number of case days with available control days 318
Average number of control days per case day 4.9
Parameter Estimates:
coef exp(coef) se(coef) z Pr(>|z|)
o3mean -0.003238583 0.9967667 0.00131839 -2.4564691 1.403099e-02
Mon 0.182058170 1.1996840 0.03577818 5.0885255 3.608582e-07
Tue 0.144181049 1.1550932 0.03563272 4.0463108 5.203115e-05
Wed 0.099443480 1.1045560 0.03554924 2.7973451 5.152447e-03
Thu 0.088518237 1.0925542 0.03459482 2.5587140 1.050601e-02
Fri 0.108107305 1.1141673 0.03437323 3.1451022 1.660288e-03
Sat 0.023660066 1.0239422 0.03525152 0.6711786 5.021068e-01
Jinseob Kim Analysis of Time-series Data July 17, 2015 29 / 30
30. Practice In R
END
Email : secondmath85@gmail.com
Jinseob Kim Analysis of Time-series Data July 17, 2015 30 / 30