Change Point Analysis (CPA)

  • 1,906 views
Uploaded on

CPA method can be used as a tool to detect subtle changes in time-series data and determine the moving direction (ie, up, down, or stable) in; for example, disease trends, such as influenza-like …

CPA method can be used as a tool to detect subtle changes in time-series data and determine the moving direction (ie, up, down, or stable) in; for example, disease trends, such as influenza-like illness, between change points.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,906
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BioSense 2.0 AnalyticsChange Point AnalysisJune 22, 2012By TahaKass-Hout and ZhihengXuBioSense is a national public health surveillance system for early detection and rapid assessment ofpotential bioterrorism-related illness. It integrates current health data shared by health departments froma variety of sources to provide insight on the health of communities and the country. Using statisticalaberration detection methods, public health officials are able toidentify and investigate theanomaliesboth temporal and spatial. In the first iteration of statistical tools for inclusion in BioSense 2.0redesign, we introduced the Early Aberration Reporting System (EARS) which has been usedextensively in BioSense for disease anomaly detection. As a complimentary tool to EARS, Change PointAnalysis (CPA) has been implemented in BioSenseto address the limitation of EARS in detecting subtlechanges and characterizing disease trends. In this paper, we will describe how to implement Taylor’scumulative sum (CUSUM) CPA method.CUSUM CPATaylor [1] developed a change point analysis method through the iterative application of cumulative sumcharts (CUSUM) and bootstrapping methods to detect changes in time-series and their inferences. Thisapproach is based on the mean-shift model and assumes that residuals are independent and identicallydistributed (iid) with a mean of zero. For time-series data Yi with i=1, …, N, the mean-shift model iswritten as ,where µ is the sample average as and is the residual term definedas for the ithobservation. The cumulative sums of residuals are calculated as
  • 2. for i=1, …, N where The change point at location mis detected through searching for themaximum absolute CUSUM of residuals where . The time-series data is split intotwo segments on each side of the change point, and the analysis is repeated for each segment. 1000Bootstrapping samples are generated to calculate the significance level and 95% confidence interval (CI)of change points. The following steps summarize how to implement Taylor’s CUSUM CPA to detectchange points: 1. Prepare the initial time series data. 2. Calculate the cumulative sum of residuals . 3. Find the location with the maximum absolute CUSUM of residuals which is defined as the change point. 4. Calculate the difference between maximum and minimum CUSUM of residuals as where and 5. Determine whether this change point is significant or not via bootstrapping: a. Generate a bootstrap sample of N, denoted as through reshuffling the original N values. b. Calculate the CUSUM of residuals from the bootstrap sample, denoted as c. Calculate the maximum, minimum and difference of CUSUM of residuals, denoted as where d. Determine whether the difference of CUSUM from the bootstrap sample is less than the original difference .
  • 3. e. Repeat step a-d 1000 times and record the number of the bootstrap samples which has denoted as X. f. The significance level is defined as X/1000. 6. If the significance level ≥95%, it indicates the detected change point is statistically significant and then we split the dataset into two subsets from this significant change point; if the significance level <95%, it indicates the detected change point is not statistically significant and then we stop the splitting. 7. Repeat step 2-6 in each one of two subsets until no more significant change point is detected.Data ExampleThe following data were created to illustrate the detection of change points using CUSUM CPA method.MMWRweek (i) Percent of visit (Yi) µ εi Si |Si| 1 0.001 0.036 -0.03483 -0.03483 0.034827 2 0.002 0.036 -0.03383 -0.06865 0.068654 3 0.003 0.036 -0.03283 -0.10148 0.101481 4 0.002 0.036 -0.03383 -0.13531 0.135308 5 0.008 0.036 -0.02783 -0.16313 0.163135 6 0.009 0.036 -0.02683 -0.18996 0.189962 7 0.012 0.036 -0.02383 -0.21379 0.213788 8 0.011 0.036 -0.02483 -0.23862 0.238615 9 0.009 0.036 -0.02683 -0.26544 0.265442 10 0.011 0.036 -0.02483 -0.29027 0.290269 11 0.021 0.036 -0.01483 -0.3051 0.305096 12 0.012 0.036 -0.02383 -0.32892 0.328923 13 0.01 0.036 -0.02583 -0.35475 0.35475 14 0.008 0.036 -0.02783 -0.38258 0.382577 15 0.01 0.036 -0.02583 -0.4084 0.408404 16 0.028 0.036 -0.00783 -0.41623 0.416231 17 0.023 0.036 -0.01283 -0.42906 0.429058 18 0.015 0.036 -0.02083 -0.44988 0.449885
  • 4. 19 0.014 0.036 -0.02183 -0.47171 0.47171220 0.052 0.036 0.016173 -0.45554 0.45553821 0.079 0.036 0.043173 -0.41237 0.41236522 0.064 0.036 0.028173 -0.38419 0.38419223 0.079 0.036 0.043173 -0.34102 0.34101924 0.085 0.036 0.049173 -0.29185 0.29184625 0.072 0.036 0.036173 -0.25567 0.25567326 0.099 0.036 0.063173 -0.1925 0.192527 0.036 0.036 0.000173 -0.19233 0.19232728 0.07 0.036 0.034173 -0.15815 0.15815429 0.077 0.036 0.041173 -0.11698 0.11698130 0.092 0.036 0.056173 -0.06081 0.06080831 0.111 0.036 0.075173 0.014365 0.01436532 0.083 0.036 0.047173 0.061538 0.06153833 0.095 0.036 0.059173 0.120712 0.12071234 0.072 0.036 0.036173 0.156885 0.15688535 0.092 0.036 0.056173 0.213058 0.21305836 0.019 0.036 -0.01683 0.196231 0.19623137 0.012 0.036 -0.02383 0.172404 0.17240438 0.023 0.036 -0.01283 0.159577 0.15957739 0.022 0.036 -0.01383 0.14575 0.1457540 0.024 0.036 -0.01183 0.133923 0.13392341 0.012 0.036 -0.02383 0.110096 0.11009642 0.03 0.036 -0.00583 0.104269 0.10426943 0.021 0.036 -0.01483 0.089442 0.08944244 0.026 0.036 -0.00983 0.079615 0.07961545 0.025 0.036 -0.01083 0.068788 0.06878846 0.02 0.036 -0.01583 0.052962 0.05296247 0.026 0.036 -0.00983 0.043135 0.04313548 0.02 0.036 -0.01583 0.027308 0.02730849 0.036 0.036 0.000173 0.027481 0.02748150 0.03 0.036 -0.00583 0.021654 0.02165451 0.03 0.036 -0.00583 0.015827 0.01582752 0.02 0.036 -0.01583 6.94E-17 6.94E-17
  • 5. Percent of visit (Yi) 0.12 0.1 0.08 0.06 0.04 0.02 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61The significance level and 95% CI of change points will be calculated from 1000 bootstrapping samples.After detecting the first significant change point which is highlighted in yellow in the above table, thetime-series data for MMWR week 1-52 will be split into two segments: MMWR week 1-19 and week20-52. The analysis will be repeated on each of two segments to determine their change points.Reference 1. Taylor, W. Change-Point Analysis: A Powerful New Tool For Detecting Changes. 2010; Available from: http://www.variation.com/anonftp/pub/changepoint.pdf. 2. Barker, N. A Practical Introduction to the Bootstrap Using the SAS System. 2010; Available from: http://www.lexjansen.com/phuse/2005/pk/pk02.pdf. 3. Efron, B.a.T., Robert, An introduction fo the Bootstrap1993, New York: Chapman & Hall. 4. Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose SL. Application of change point analysis to daily influenza-like-illness (ILI) emergency department visits. Journal of American Medical Informatics Association (2012), in press.