Dynamically Evolving Systems: Cluster Analysis Using Time

731 views

Published on

David Corliss, PhD originally presented "Dynamically Evolving Systems: Cluster Analysis Using Time" at the SAS Global Forum in 2012.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
731
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dynamically Evolving Systems: Cluster Analysis Using Time

  1. 1. SAS Global Forum April 24, 2012Dynamically Evolving Systems: Cluster Analysis Using Time David J. Corliss, PhD Magnify Analytic Solutions WINNER WINNER BEST BEST CONTRIBUTED CONTRIBUTED STATS PAPER STATS PAPER 2012 SAS 2012 SAS GLOBAL FORUM GLOBAL FORUM
  2. 2. OUTLINEOverview and History A Basic Example Plotting the Results Standardization Non-Periodic Events Summary
  3. 3. A Typical Cluster Analysis in SAS
  4. 4. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  5. 5. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  6. 6. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  7. 7. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  8. 8. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  9. 9. An Example of PROC FASTCLUStitle2 Preliminary Analysis by FASTCLUS; proc fastclus data=iris summary maxc=10 maxiter=99 converge=0 mean=mean out=prelim cluster=preclus; var petal: sepal:; run;
  10. 10. An Example of PROC FASTCLUS
  11. 11. Cluster Analysis Applied to Time Series Data**** NOAA Precipitation Data ****;data work.noaa; infile "/home/sas/NESUG/noaa_mi_1950_2009_tab.txt" dsd dlm=09x lrecl=1500 truncover firstobs=2; input state_code :3.0 division :3.0 year_month :$6. pcp :6.2 ; length year 8.0 month 8.0; year = left(year_month,1,4); month = right(year_month,5,2);run;
  12. 12. Plotting the Resultsgoptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none;proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
  13. 13. Plotting the Resultsgoptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none;proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
  14. 14. Plotting the Resultsgoptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none;proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
  15. 15. Plotting the Resultsgoptions device=png;symbol1 font=marker value=u height=0.6 c=blue;symbol2 font=marker value=u height=0.6 c=red;symbol3 font=marker value=u height=0.6 c=yellow;symbol4 font=marker value=u height=0.6 c=green;legend1 frame cframe=ligr label=none cborder=black position=center value=(justify=center);axis1 label=(angle=90 rotate=0) minor=none;axis2 minor=none;proc gplot data=work.cluster3; plot year * month = cluster /frame cframe=ligr legend=legend1 vaxis=axis1 haxis=axis2;run;
  16. 16. Cluster Analysis Applied to Time Series Data
  17. 17. Rate of Change Variables**** interval percent change ****;proc sort data=work.doe; by year week;run;data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run;data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
  18. 18. Rate of Change Variables**** interval percent change ****;proc sort data=work.doe; by year week;run;data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run;data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
  19. 19. Rate of Change Variables**** interval percent change ****;proc sort data=work.doe; by year week;run;data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run;data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
  20. 20. Rate of Change Variables**** interval percent change ****;proc sort data=work.doe; by year week;run;data work.doe; set work.doe; by year week; retain pw_price_index; output; if first.week then pw_price_index = annualized_price_index;run;data work.doe; set work.doe; by year week; weekly_pct_change = (annualized_price_index - pw_price_index) * 100;run;
  21. 21. Standardization of Variablesproc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;
  22. 22. Use of PROC STANDARDproc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;
  23. 23. Use of PROC STANDARDproc standard data=work.doe mean=0 std=1 out=work.doe_stan; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;proc fastclus data=work.doe_stan maxc=6 maxiter=20 out=work.cluster1; var week annualized_price_index weekly_pct_change supply supply_pct_change;run;
  24. 24. Standardization of Volatile Data
  25. 25. Identification of Seasons Blue: Post-holiday lull Purple: Winter Season Red: Spring Run-up Amber: Summer Driving Season Yellow: Holiday Spikes Green: Fall Season
  26. 26. Non-Periodic Events of Variable Duration**** Time series of high velocity absorption events ****;data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run;(and similarly for the last date of each event, which are merged with the first date)data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
  27. 27. One-Time Events of Variable Duration**** Time series of high velocity absorption events ****;data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run;(and similarly for the last date of each event, which are merged with the first date)data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
  28. 28. Non-Periodic Events of Variable Duration**** Time series of high velocity absorption events ****;data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run;(and similarly for the last date of each event, which are merged with the first date)data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
  29. 29. Non-Periodic Events of Variable Duration**** Time series of high velocity absorption events ****;data work.first_date; set work.hva; by event_id jd_minus_24e5; if first.event_id; first_date = jd_minus_24e5; keep event_id first_date;run;(and similarly for the last date of each event, which are merged with the first date)data work.hva; merge work.hva work.first_last; by event_id; percent_duration = (day_of_event / duration) * 100;run;
  30. 30. Absolute versusRelative Measures
  31. 31. Time Series Clusteringof Non-Periodic Events
  32. 32. SummaryThe SAS procedures developed for clusteranalysis can be applied to events that changeover time to identify distinct, successive stagesin dynamically evolving systems.Both cyclical and non-repeating events may beanalyzed.Normalization of fields through the use ofPROC STANDARD may be necessary prior tocluster analysis to obtain the best results.
  33. 33. ReferencesFisher, R.A., 1936, Annals of Eugenics, 7, 2, 179Tryon, R. C., 1939, Cluster analysis. Ann Arbor: Edwards BrothersThe SAS Institute, Cary, N.C. www.support.sas.com
  34. 34. Questions David J. Corliss dcorliss@marketingassociates.com david.corliss@rockets.utoledo.eduhttp://astro1.panet.utoledo.edu/~dcorliss

×