Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
ย
Generalized Additive Model
1. Analysis of Time-series Data
Generalized Additive Model
Jinseob Kim
July 17, 2015
Jinseob Kim Analysis of Time-series Data July 17, 2015 1 / 45
2. Contents
1 Non-linear Issues
Distribution of Y
Estimate of Beta
2 GAM Theory
Various Spline
Model selection
3 Descriptive Analysis of Time-series data
Time series plot
4 Analysis using GAM
Jinseob Kim Analysis of Time-series Data July 17, 2015 2 / 45
4. Non-linear Issues
Contents
1 Non-linear Issues
Distribution of Y
Estimate of Beta
2 GAM Theory
Various Spline
Model selection
3 Descriptive Analysis of Time-series data
Time series plot
4 Analysis using GAM
Jinseob Kim Analysis of Time-series Data July 17, 2015 4 / 45
5. Non-linear Issues Distribution of Y
Count data
์ผ/์ฃผ/์ ๋ณ ๋ฐ์/์ฌ๋ง ์
Population์ ๊ฒฝํฅ์ ๋ฐ๋ผ๋ณธ๋ค. ๋๋๋ ์์ !!
์ธ๊ตฌ์ง๋จ์์ ๋ฐ์ or ์ฌ๋งํ ํ๋ฅ ์ด ์ด๋์ ๋๋?
ํ๋ฅ
์ ๊ท๋ถํฌ
ํฌ์์ก๋ถํฌ
๊ธฐํ..quasipoisson, Gamma, Negbin, ZIP, ZINB...
๋งค์ฐ ์ค์ํ๋ค!!! p-value๊ฐ ๋ฐ๋๋ค!!!
Jinseob Kim Analysis of Time-series Data July 17, 2015 5 / 45
6. Non-linear Issues Distribution of Y
Compare Distribution
http://resources.esri.com/help/9.3/arcgisdesktop/com/gp_
toolref/process_simulations_sensitivity_analysis_and_error_
analysis_modeling/distributions_for_assigning_random_
values.htm
Jinseob Kim Analysis of Time-series Data July 17, 2015 6 / 45
7. Non-linear Issues Distribution of Y
๊ธฐ์ด์์ค
ํํ ์ง๋ณ์ด๋ฉด ์ ๊ท๋ถํฌ ๊ณ ๋ ค. ๋ถ์ ์ฌ์์ง๋ค.
๋๋ฌธ ์ง๋ณ์ด๋ฉด ํฌ์์ก.
ํ๊ท < ๋ถ์ฐ? โ quasipoisson
๋๋จธ์ง๋ ๋๋ฌผ๊ฒ ์ฐ์ธ๋ค.
Jinseob Kim Analysis of Time-series Data July 17, 2015 7 / 45
8. Non-linear Issues Distribution of Y
Poisson VS quasipoisson
Poisson
E(Yi ) = ยตi , Var(Yi ) = ยตi
quasipoisson
E(Yi ) = ยตi , Var(Yi ) = ฯ ร ยตi
Jinseob Kim Analysis of Time-series Data July 17, 2015 8 / 45
9. Non-linear Issues Estimate of Beta
Beta์ ์๋ฏธ
Distribution์ ๋ฐ๋ผ Beta์ ์๋ฏธ๊ฐ ๋ฐ๋๋ค.
์ ๊ท๋ถํฌ: ์ ํ๊ด๊ณ
์ดํญ๋ถํฌ: log(OR)- ๋ก์งํจ์์ ์ ํ๊ด๊ณ
ํฌ์์ก๋ถํฌ: log(RR)- ๋ก๊ทธํจ์์ ์ ํ๊ด๊ณ
์ด์จ๋ , ๋ค ์ ํ๊ด๊ณ๋ผ๊ณ ํ์.
Jinseob Kim Analysis of Time-series Data July 17, 2015 9 / 45
10. Non-linear Issues Estimate of Beta
Non-linear
์ ํ๊ด๊ณ๊ฐ ํด์์ ์ฝ์ง๋ง..
๊ณผ์ฐ ์ง์ค์ธ๊ฐ?
๊ธฐํ, ์ค์ผ๋ฌผ์ง.. ๋ฑ ์ ํ๊ด๊ณ๊ฐ ์๋์ง๋.
U shape, threshold etc..
Jinseob Kim Analysis of Time-series Data July 17, 2015 10 / 45
11. GAM Theory
Contents
1 Non-linear Issues
Distribution of Y
Estimate of Beta
2 GAM Theory
Various Spline
Model selection
3 Descriptive Analysis of Time-series data
Time series plot
4 Analysis using GAM
Jinseob Kim Analysis of Time-series Data July 17, 2015 11 / 45
12. GAM Theory Various Spline
Additive Model
Y = ฮฒ0 + ฮฒ1x1 + ฮฒ2x2 + ยท ยท ยท + (1)
Y = ฮฒ0 + f (x1) + ฮฒ2x2 ยท ยท ยท + (2)
f (x1, x2)๊ผด์ ํํ๋ ๊ฐ๋ฅ.. ์ด๋ฒ์๊ฐ์์ ์ ์ธ.
Jinseob Kim Analysis of Time-series Data July 17, 2015 12 / 45
13. GAM Theory Various Spline
Determine f
์ข ๋ฅ
Loess
(Natural)Cubic spline
Smoothing spline
๋ด์ฉ์ ๋ค์ํ์ง๋ง.. ์ค์ ๊ฒฐ๊ณผ๋ ๊ฑฐ์ ๋น์ท.
Jinseob Kim Analysis of Time-series Data July 17, 2015 13 / 45
14. GAM Theory Various Spline
Loess
Locally weighted scatterplot smoothing
Jinseob Kim Analysis of Time-series Data July 17, 2015 14 / 45
15. GAM Theory Various Spline
Example: Loess
Jinseob Kim Analysis of Time-series Data July 17, 2015 15 / 45
16. GAM Theory Various Spline
Cubic spline
Cubic = 3์ฐจ๋ฐฉ์ ์
๊ตฌ๊ฐ์ ๋ช๊ฐ๋ก ๋๋๊ณ : knot
๊ฐ ๊ตฌ๊ฐ์ 3์ฐจ๋ฐฉ์ ์์ ์ด์ฉํ์ฌ ๋ชจ๋ธ๋ง.
๊ตฌ๊ฐ ์ฌ์ด์ smoothing ๊ณ ๋ ค..
Jinseob Kim Analysis of Time-series Data July 17, 2015 16 / 45
17. GAM Theory Various Spline
Example: Cubic spline
Jinseob Kim Analysis of Time-series Data July 17, 2015 17 / 45
18. GAM Theory Various Spline
Example: Cubic Spline(2)
Jinseob Kim Analysis of Time-series Data July 17, 2015 18 / 45
19. GAM Theory Various Spline
Natural cubic spline: ns
Cubic + ์ฒ์๊ณผ ๋์ Linear
์ฒ์๋ณด๋ค ๋ ์ฒ์, ๋๋ณด๋ค ๋ ๋(๋ฐ์ดํฐ์ ์๋ ์ซ์)์ ๋ํ ๋ณด์์ ์ธ
์ถ์ .
3์ฐจ๋ณด๋ค 1์ฐจ๊ฐ ๋ณํ๋์ด ์ ์.
Jinseob Kim Analysis of Time-series Data July 17, 2015 19 / 45
20. GAM Theory Various Spline
Smoothing Splines Alias Penalised Splines
Loess, Cubic spline
Span, knot๋ฅผ ๋ฏธ๋ฆฌ ์ง์ : local ๊ตฌ๊ฐ์ ๋ฏธ๋ฆฌ ์ง์ .
Penalized spline
์์์.. ๋ฐ์ดํฐ๊ฐ ๋งํด์ฃผ๋ ๋๋ก..
mgcv R ํจํค์ง์ ๊ธฐ๋ณธ์ต์ .
Jinseob Kim Analysis of Time-series Data July 17, 2015 20 / 45
21. GAM Theory Various Spline
Penalized regression: Smoothing
Minimize ||Y โ Xฮฒ||2
+ ฮป f (x)2
dx
ฮป โ 0: ์ธํ๋ถํ.
ฮป๊ฐ ์ปค์ง์๋ก smoothing
Jinseob Kim Analysis of Time-series Data July 17, 2015 21 / 45
22. GAM Theory Various Spline
Example: Smoothing spline
Jinseob Kim Analysis of Time-series Data July 17, 2015 22 / 45
23. GAM Theory Model selection
Choose ฮป
1 CV (cross validation)
2 GCV (generalized)
3 UBRE (unbiased risk estimator)
4 Mallowโs Cp
์ด๋ค ๊ฒ์ด๋ .. ์ต์๋ก ํ๋ ฮป๋ฅผ choose!!
Jinseob Kim Analysis of Time-series Data July 17, 2015 23 / 45
24. GAM Theory Model selection
Cross validation
Minimize
1
n
n
i=1
(Yi โ หf โ[i]
(xi ))2
1๋ฒ์งธ ๋นผ๊ณ ์์ธกํ ๊ฑธ๋ก ์ค์ 1๋ฒ์งธ์ ์ฐจ์ด..
2๋ฒ์งธ ๋นผ๊ณ ์์ธกํ ๊ฑธ๋ก ์ค์ 2๋ฒ์งธ์ ์ฐจ์ด..
..
n๋ฒ์งธ ๋นผ๊ณ ์์ธกํ ๊ฑธ๋ก ์ค์ n๋ฒ์งธ์ ์ฐจ์ด..
GCV: CV์ computation burden์ ๊ฐ์ .
Jinseob Kim Analysis of Time-series Data July 17, 2015 24 / 45
25. GAM Theory Model selection
Example : 10 fold CV
Jinseob Kim Analysis of Time-series Data July 17, 2015 25 / 45
26. GAM Theory Model selection
Example : GCV
Jinseob Kim Analysis of Time-series Data July 17, 2015 26 / 45
27. GAM Theory Model selection
In practice
poisson: UBRE
quasipoisson: GCV
Jinseob Kim Analysis of Time-series Data July 17, 2015 27 / 45
28. GAM Theory Model selection
AIC
์ฐ๋ฆฌ๊ฐ ๊ตฌํ ๋ชจํ์ ๊ฐ๋ฅ๋๋ฅผ L์ด๋ผ ํ๋ฉด.
1 AIC = โ2 ร log(L) + 2 ร k
2 k: ์ค๋ช ๋ณ์์ ๊ฐฏ์(์ฑ๋ณ, ๋์ด, ์ฐ๋ด...)
3 ์์์๋ก ์ข์ ๋ชจํ!!!
๊ฐ๋ฅ๋๊ฐ ํฐ ๋ชจํ์ ๊ณ ๋ฅด๊ฒ ์ง๋ง.. ์ค๋ช ๋ณ์ ๋๋ฌด ๋ง์ผ๋ฉด ํ๋ํฐ!!!
Jinseob Kim Analysis of Time-series Data July 17, 2015 28 / 45
29. Descriptive Analysis of Time-series data
Contents
1 Non-linear Issues
Distribution of Y
Estimate of Beta
2 GAM Theory
Various Spline
Model selection
3 Descriptive Analysis of Time-series data
Time series plot
4 Analysis using GAM
Jinseob Kim Analysis of Time-series Data July 17, 2015 29 / 45
30. Descriptive Analysis of Time-series data Time series plot
Time series plot
012345
incidence
1020000010300000
population
0102030
temp
0200400
2002 2004 2006 2008 2010
pcp
Time
Seoul
Jinseob Kim Analysis of Time-series Data July 17, 2015 30 / 45
31. Descriptive Analysis of Time-series data Time series plot
Serial Correlation
Jinseob Kim Analysis of Time-series Data July 17, 2015 31 / 45
32. Descriptive Analysis of Time-series data Time series plot
0.0 0.1 0.2 0.3 0.4 0.5
0.00.20.40.60.81.0
Lag
ACF
Autocorrelation plot: Seoul
0.0 0.1 0.2 0.3 0.4 0.5
โ0.050.000.050.100.15
Lag
PartialACF
Partial Autocorrelation plot: Seoul
Jinseob Kim Analysis of Time-series Data July 17, 2015 32 / 45
33. Descriptive Analysis of Time-series data Time series plot
Decompose plot
012345
observed
0.20.40.60.8
trend
01234
seasonal
02468
2002 2004 2006 2008 2010
random
Time
Decomposition of multiplicative time series
Jinseob Kim Analysis of Time-series Data July 17, 2015 33 / 45
34. Analysis using GAM
Contents
1 Non-linear Issues
Distribution of Y
Estimate of Beta
2 GAM Theory
Various Spline
Model selection
3 Descriptive Analysis of Time-series data
Time series plot
4 Analysis using GAM
Jinseob Kim Analysis of Time-series Data July 17, 2015 34 / 45
35. Analysis using GAM
Seoul example: poisson (1)
Family: poisson
Link function: log
Formula:
incidence ~ offset(log(population)) + temp + pcp + s(week, k = 53) +
s(year, k = 9)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.702e+01 2.411e-01 -70.597 <2e-16 ***
temp -5.465e-03 1.776e-02 -0.308 0.758
pcp -3.751e-04 1.332e-03 -0.282 0.778
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(week) 3.038 3.997 13.33 0.00975 **
s(year) 7.568 7.942 31.79 9.93e-05 ***
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
R-sq.(adj) = 0.123 Deviance explained = 14.3%
UBRE = -0.029349 Scale est. = 1 n = 477
Jinseob Kim Analysis of Time-series Data July 17, 2015 35 / 45
36. Analysis using GAM
0 10 20 30 40 50
โ2.0โ1.00.00.51.0
week
s(week,3.04)
2002 2004 2006 2008 2010
โ2.0โ1.00.00.51.0
year
s(year,7.57)
Jinseob Kim Analysis of Time-series Data July 17, 2015 36 / 45
37. Analysis using GAM
Seoul example: poisson (2)
Family: poisson
Link function: log
Formula:
incidence ~ offset(log(population)) + s(temp) + s(pcp) + s(week,
k = 53) + s(year, k = 9)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.07888 0.07856 -217.4 <2e-16 ***
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(temp) 1.000 1.000 0.538 0.46313
s(pcp) 3.312 4.142 7.036 0.14440
s(week) 3.063 4.030 14.319 0.00654 **
s(year) 1.798 2.236 6.634 0.04593 *
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
R-sq.(adj) = 0.0834 Deviance explained = 11.5%
UBRE = -0.014142 Scale est. = 1 n = 477
Jinseob Kim Analysis of Time-series Data July 17, 2015 37 / 45
38. Analysis using GAM
0 10 20 30
โ2.0โ1.00.01.0
temp
s(temp,1)
0 100 200 300 400 500
โ2.0โ1.00.01.0
pcp
s(pcp,3.31)
0 10 20 30 40 50
โ2.0โ1.00.01.0
s(week,3.06)
2002 2004 2006 2008 2010
โ2.0โ1.00.01.0
s(year,1.8)
Jinseob Kim Analysis of Time-series Data July 17, 2015 38 / 45
39. Analysis using GAM
Seoul example: quasipoisson(1)
Family: quasipoisson
Link function: log
Formula:
incidence ~ offset(log(population)) + temp + pcp + s(week, k = 53) +
s(year, k = 9)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.012052 0.252254 -67.440 <2e-16 ***
temp -0.006425 0.018615 -0.345 0.730
pcp -0.000377 0.001378 -0.274 0.785
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(week) 3.126 4.110 3.072 0.015470 *
s(year) 7.595 7.949 3.746 0.000303 ***
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
R-sq.(adj) = 0.124 Deviance explained = 14.3%
GCV = 0.96803 Scale est. = 1.068 n = 477
Jinseob Kim Analysis of Time-series Data July 17, 2015 39 / 45
40. Analysis using GAM
0 10 20 30 40 50
โ2.0โ1.00.00.51.0
week
s(week,3.13)
2002 2004 2006 2008 2010
โ2.0โ1.00.00.51.0
year
s(year,7.59)
Jinseob Kim Analysis of Time-series Data July 17, 2015 40 / 45
41. Analysis using GAM
Seoul example: quasipoisson(2)
Family: quasipoisson
Link function: log
Formula:
incidence ~ offset(log(population)) + s(temp) + s(pcp) + s(week,
k = 53) + s(year, k = 9)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.08040 0.08055 -212 <2e-16 ***
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(temp) 1.000 1.000 0.543 0.46143
s(pcp) 3.356 4.193 1.616 0.16537
s(week) 3.109 4.088 3.412 0.00873 **
s(year) 1.872 2.329 2.748 0.05679 .
---
Signif. codes: 0 โ***โ 0.001 โ**โ 0.01 โ*โ 0.05 โ.โ 0.1 โ โ 1
R-sq.(adj) = 0.0838 Deviance explained = 11.6%
GCV = 0.98475 Scale est. = 1.0457 n = 477
Jinseob Kim Analysis of Time-series Data July 17, 2015 41 / 45
42. Analysis using GAM
0 10 20 30
โ2.0โ1.00.01.0
temp
s(temp,1)
0 100 200 300 400 500
โ2.0โ1.00.01.0
pcp
s(pcp,3.36)
0 10 20 30 40 50
โ2.0โ1.00.01.0
s(week,3.11)
2002 2004 2006 2008 2010
โ2.0โ1.00.01.0
s(year,1.87)
Jinseob Kim Analysis of Time-series Data July 17, 2015 42 / 45
43. Analysis using GAM
Compare AIC
> model_gam$aic
[1] 809.8845
> model_gam2$aic
[1] 817.1379
> model_gam3$aic
[1] NA
> model_gam4$aic
[1] NA
Jinseob Kim Analysis of Time-series Data July 17, 2015 43 / 45
44. Analysis using GAM
Good reference
Using R for Time Series Analysis
http://a-little-book-of-r-for-time-series.readthedocs.org/
en/latest/
Jinseob Kim Analysis of Time-series Data July 17, 2015 44 / 45
45. Analysis using GAM
END
Email : secondmath85@gmail.com
Jinseob Kim Analysis of Time-series Data July 17, 2015 45 / 45