Change Point Analysis


Published on

Prospective anomaly detection methods such as the Modified EARS C2 are commonly adapted and used in public health syndromic surveillance systems. These methods however can produce an excessive false alert rate. We present a combined use of retrospective (e.g., Change Point Analysis (or CPA)) and prospective (e.g., C2) anomaly detection methods. This combined approach will help detect sudden aberrations in addition to subtle changes in local trends, help rule out alarm investigations, and assist with retrospective follow-ups. Examples on the utility of this combined approach in working collaboratively with the scientific community are applied to BioSense emergency departments' visits due to ILI. Methods, limitations, future work, and invitation to the scientific community to collaborate with us will be discussed at this talk.

Published in: Health & Medicine
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Change Point Analysis

  1. 1. Change Point Analysis Zhiheng (Roy) Xu, MS (PhD Candidate) Senior Research Scientist Taha A. Kass-Hout, MD, MS Deputy Director for Information Science (Acting) and BioSense Program Manager Division of Healthcare Information (DHI) Public Health Surveillance Program Office (PHSPO) Office of Surveillance, Epidemiology, and Laboratory Services (OSELS) Centers for Disease Control & Prevention (CDC) Any views or opinions expressed here do not necessarily represent the views of the CDC, HHS, or any other entity of the United States government. Furthermore, the use of any product names, trade names, images, or commercial sources is for identification purposes only, and does not imply endorsement or government sanction by the U.S. Department of Health and Human Services.
  2. 2. Change point <ul><li>HIV/AIDs Mortality Rate; </li></ul><ul><li>Breast Cancer Screening; </li></ul><ul><li>Quality Control; </li></ul><ul><ul><li>e.g., Cereal Packaging </li></ul></ul><ul><li>Social Network Change Detection (SNCD) </li></ul><ul><ul><li>e.g., An open source social network of the Al-Qaeda terrorist organization* </li></ul></ul>* McCulloh, I., Webb, M., Carley, K.M. (2007). Social Network Monitoring of Al-Qaeda. Network Science Report, Vol 1, pp 25–30.
  3. 3. Change point analysis (CPA) <ul><li>Purpose </li></ul><ul><ul><li>CPA aims at detecting any change in the mean of a process (e.g., time series) </li></ul></ul><ul><li>Use CPA to answer: </li></ul><ul><ul><li>Did a change occur? </li></ul></ul><ul><ul><li>Did more than one change occur? </li></ul></ul><ul><ul><li>When did the changes occur? </li></ul></ul><ul><ul><li>With what confidence did the changes occur? </li></ul></ul>
  4. 4. Time-series data <ul><li>A sequence of data points, measured typically at successive times spaced at uniform time intervals, e.g. stock price, mortgage rate, interest rate, etc. </li></ul>Source Google, Inc.
  5. 5. Control Chart <ul><li>Invented by Walter A. Shewhart in 1920s to improve the reliability of their telephony transmission systems in Bell Labs. </li></ul>Walter A. Shewhart, Ph.D. (1891-1967) Image source at
  6. 6. Control Chart Upper Control Limit (UCL)= µ + 3 σ Lower Control Limit (LCL) = µ - 3 σ where µ is the sample mean (central line) and σ is the sample standard deviation. 3 σ 3 σ
  7. 7. SIX SIGMA A six-sigma process is one in which 99.99966% of the products manufactured are free of defects.
  8. 8. CPA vs. Control Charts CPA Control Charts Data type Any Normal distributed data Type of changes Major and subtle changes Major changes only Mean Mean-shift Stable mean Computation Depends on the algorithms Simple and fast
  9. 9. CPA Benefits <ul><li>Detect changes in historic data; </li></ul><ul><li>Investigate what caused the changes; </li></ul><ul><li>Real-time trend analysis; </li></ul><ul><ul><li>When was the last change in % ED visits due to ILI; </li></ul></ul><ul><li>Forecasting; </li></ul><ul><ul><li>Since last change, is influenza activity going up, down or stable? </li></ul></ul>
  10. 10. CPA method 1 <ul><li>Cumulative Sum (CUSUM) </li></ul><ul><ul><li>Based on mean-shift model: </li></ul></ul><ul><ul><li>Maximizing the absolute cumulative sum of residuals; </li></ul></ul><ul><ul><li>Data assumption: identical and independent (iid); </li></ul></ul><ul><ul><li>Statistical inferences through bootstrapping. </li></ul></ul>
  11. 11. CUSUM* Step 1: sample mean Step 2: residuals Step 3: cusum of residuals 0 ε 1 ε 1 + ε 2 ε 1 + ε 2 + ε 3 … ε 1 + ε 2 +…+ ε n * Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
  12. 12. CUSUM* Level 1: Find a change point maximizing |S| Step 4: plot the cusum and find where is the maximum of absolute cusum. * Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
  13. 13. CUSUM* Level 2: Find a change point on each sub-series * Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010. Step 5: Break the time-series into two segments and repeat step 1-5.
  14. 14. CUSUM* Level n: Final result * Kass-Hout, et al, The Joint Statistical Meeting, Vancouver, CA. August, 2010.
  15. 15. CPA method 2 <ul><li>Structural Change Model (SCM) </li></ul><ul><ul><li>Based on mean-shift model; </li></ul></ul><ul><ul><li>Minimizing the sum of squared residuals; </li></ul></ul><ul><ul><li>Pros: </li></ul></ul><ul><ul><ul><li>Allow for autoregressive data; </li></ul></ul></ul><ul><ul><ul><li>Incorporate independent covariates; </li></ul></ul></ul><ul><ul><ul><li>Asymptotic distribution for change points; </li></ul></ul></ul><ul><ul><li>Cons: </li></ul></ul><ul><ul><ul><li>Assume a stationary process; </li></ul></ul></ul><ul><ul><ul><li>Mathematically complexity. </li></ul></ul></ul>
  16. 16. Structure Change Model <ul><li>A time-series data Xi, i = 1, …, N </li></ul><ul><li>Break series into two segments at any location k </li></ul><ul><li>Sum of squared residuals (SSR) is computed as </li></ul><ul><li>The change point is located at </li></ul>
  17. 17. Change points and statistical inference *: 95% CI= 95% confidence interval; lb=lower bound; ub=upper bound. Level CUSUM Structural Change Model Change Point 95% CI* P value Change Point 95% CI* P value lb* ub* lb* ub* 1 11/27/2008 2/4/2006 11/1/2009 0 12/13/2008 10/14/2008 12/17/2008 0 2 6/23/2009 1/20/2009 4/4/2010 0 6/22/2009 6/21/2009 7/31/2009 0 3 10/4/2009 7/28/2009 4/28/2010 0 9/18/2009 9/6/2009 9/21/2009 0 4 1/20/2010 11/3/2009 5/7/2010 0 1/5/2010 1/4/2010 1/10/2010 0 5 3/1/2010 2/3/2010 5/21/2010 0 2/16/2010 2/14/2010 2/18/2010 0 6 4/6/2010 3/12/2010 5/24/2010 0 4/5/2010 4/1/2010 4/14/2010 0
  18. 18. CUSUM vs. SCM “ I have long given up on CUSUM type procedures (and any of the variants). The tests are plagued with problems of non-monotonic power and to get a date and confidence interval for the break date is not trivial and most methods don't work well.” “ The main difference is that I do not use asymptotic results, but instead employ the computer intensive bootstrapping approach to determine confidence levels and intervals so as to make the procedure nonparametric. ” Wayne Taylor, Ph.D. Pierre Perron, Ph.D.
  19. 19. CPA Method 3 <ul><li>Bayesian CPA </li></ul><ul><ul><li>Weak Prior </li></ul></ul><ul><ul><li>Posterior distributions of the change points </li></ul></ul>Thomas Bayes (1702-1761) Image source at
  20. 20. Bayesian CPA Q: What is the probability of change occurred? Order Time Posterior probability 1 4/25/2009 1 2 6/14/2009 1 3 5/18/2009 0.99 4 5/22/2009 0.982 5 5/25/2009 0.982 6 5/14/2009 0.98 7 6/2/2009 0.936 8 1/25/2008 0.92 9 2/24/2008 0.868 10 5/15/2009 0.85 11 12/24/2008 0.846 12 5/3/2009 0.818 13 2/22/2009 0.806 14 11/25/2009 0.764 15 7/5/2009 0.748 16 6/21/2009 0.714 17 11/6/2009 0.64 18 6/7/2009 0.624 19 1/3/2010 0.61 20 11/30/2009 0.608 21 10/16/2009 0.538 22 12/24/2009 0.532
  21. 21. Autocorrelation Simulation <ul><li>Autocorrelation in Biosurveillance data </li></ul><ul><li>CUSUM Assumption </li></ul><ul><ul><li>Identical </li></ul></ul><ul><ul><li>Independent </li></ul></ul>
  22. 22. Simulation (cont’d) <ul><li>Purpose: </li></ul><ul><ul><li>Check CPA robustness and accuracy. </li></ul></ul><ul><ul><li>Based on first-order Autoregressive model </li></ul></ul><ul><ul><li>X1 = µ </li></ul></ul><ul><ul><li>Xi = ρ Xi-1 + ε i , ε i ~ N (0, σ 2) </li></ul></ul><ul><ul><li>where i = 2,…,100 and ρ is the autocorrelation coefficient with ρ = -1, -.8, -.5, -.2, 0, .2, .5, .8, 1 </li></ul></ul>
  23. 23. Simulation (cont’d) <ul><li>CP ρ : change point at ρ level; </li></ul><ul><ul><li>CP0 : change point at ρ =0 ( treated as iid sample); </li></ul></ul><ul><ul><li>For ρ ≠0 , if CP ρ = CP0 , it is a match; </li></ul></ul><ul><ul><li>otherwise, it is mismatch. </li></ul></ul><ul><li>Run 1000 simulations; </li></ul><ul><li>% of matches in 1000 simulations. </li></ul>
  24. 24. Simulation (cont’d) Conclusion: Taylor’s CUSUM method is robust in detecting change points in autocorrelated data with ≥80% matching probability at | ρ |≤0.2. CP ρ = CP 0 CP ρ = CP 0 ±3 CP ρ = CP 0 CP ρ = CP 0 ±3 ρ
  25. 25. Real Time Trend Analysis   Moderately Up   Slightly Up   Slightly Down  Forecast Moderately Down  
  26. 26. Forecasting <ul><li>Historic data since last change point; </li></ul><ul><li>Forecasting model: </li></ul><ul><ul><li>First-order Autoregressive (AR) model; </li></ul></ul><ul><ul><li>Xi = ρ Xi-1 + ε i , ε i ~ N (0, σ 2) </li></ul></ul><ul><ul><li>Forecast two weeks influenza activity; </li></ul></ul><ul><li>Change point analysis: </li></ul><ul><ul><li>Is there any changes since last change point? </li></ul></ul><ul><ul><li>Is influenza activity going up, down, or stable since last change point? </li></ul></ul>
  27. 27. Forecasting (cont’d) Since last detect change, no additional significant changes have been detected; Influenza activity is stable.
  28. 28. Conclusions <ul><li>CPA is a very useful tool in analyzing surveillance data; </li></ul><ul><li>CPA and control chart/aberration detection method in complimentary fashion; </li></ul><ul><li>Real-time trend analysis; </li></ul><ul><li>Integrate CPA in forecasting model. </li></ul>
  29. 29. Open-Access Scientific Collaboration 58 Collaborators, > 100 users from 46 cities
  30. 30. Future work <ul><li>CPA on counts of ED visits due to ILI; </li></ul><ul><li>CPA and forecasting; </li></ul><ul><li>Open source programs in R; </li></ul><ul><li>Manuscripts </li></ul>
  31. 31. References <ul><li>Kass-Hout, T., Park, S., Xu, R. McMurray, P. BioSense Program: Scientific Collaboration. The Joint Statistical Meeting, Vancouver, CA. August, 2010. </li></ul><ul><li>Bai, J. Estimation of a change point in multiple regression models. Review of Economics and Statistics, 79: 551-563, 1997. </li></ul><ul><li>Bai, J. and Perron, P. Computation and analysis of multiple structural change models. Journal of Applied Economics, 18: 1-22, 2003. </li></ul><ul><li>bcp: An R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software, 23 (3): 1-13, 2007. </li></ul><ul><li>Wayne A. Taylor, Change-Point Analysis: A Powerful New Tool for Detecting Changes. Retrieved from </li></ul>
  32. 32. Acknowledgement CDC Sam Groseclose, DVM, MPH Paul McMurray, MDS Soyoun Park, MS Others Rafal Raciborshi, Ph.D, Econometrician, STATA Corp, College Station, TX. Wayne Taylor, Ph.D, President of Taylor Enterprise, Inc. Pierre Perron, Ph.D, Professor of Economics, Boston University Yajun Mei, Ph.D, Asst. Professor of Statistics, Georgia Tech Elena Pesavento, Ph.D, Assoc. Professor of Economics, Emory University