Your SlideShare is downloading. ×
  • Like
Time series-mining-slides
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Time series-mining-slides

  • 2,550 views
Published

Time Series Analysis and Mining with R

Time Series Analysis and Mining with R

Published in Technology , Economy & Finance
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,550
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
111
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. R and TimeSeries DataTime SeriesDecomposi- Time Series Analysis and Mining with RtionTime SeriesForecastingTime Series Yanchang ZhaoClusteringTime Series RDataMining.comClassification http://www.rdatamining.com/R Functions &Packages forTime Series 18 July 2011Conclusions 1/42
  • 2. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 2/42
  • 3. RR and TimeSeries Data a free software environment for statistical computing andTime SeriesDecomposi- graphicstionTime Series runs on Windows, Linux and MacOSForecasting widely used in academia and research, as well as industrialTime SeriesClustering applicationsTime SeriesClassification over 3,000 packagesR Functions & CRAN Task View: Time Series AnalysisPackages forTime Series http://cran.r-project.org/web/views/TimeSeries.htmlConclusions 3/42
  • 4. Time Series Data in RR and TimeSeries DataTime SeriesDecomposi- class tstion represents data which has been sampled at equispacedTime SeriesForecasting points in timeTime SeriesClustering frequency=7: a weekly seriesTime Series frequency=12: a monthly seriesClassificationR Functions & frequency=4: a quarterly seriesPackages forTime SeriesConclusions 4/42
  • 5. Time Series Data in R > a <- ts(1:20, frequency=12, start=c(2011,3)) > print(a)R and TimeSeries Data Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovTime Series 2011 1 2 3 4 5 6 7 8 9Decomposi-tion 2012 11 12 13 14 15 16 17 18 19 20Time Series DecForecastingTime Series 2011 10Clustering 2012Time SeriesClassification > str(a)R Functions & Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8Packages forTime Series > attributes(a)Conclusions $tsp [1] 2011.167 2012.750 12.000 $class 5/42
  • 6. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 6/42
  • 7. What is Time Series DecompositionR and TimeSeries DataTime Series To decompose a time series into components:Decomposi-tion Trend component: long term trendTime SeriesForecasting Seasonal component: seasonal variationTime SeriesClustering Cyclical component: repeated but non-periodicTime Series fluctuationsClassificationR Functions & Irregular component: the residualsPackages forTime SeriesConclusions 7/42
  • 8. Data AirPassengers Data AirPassengers: Monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It hasR and Time 144(=12×12) values.Series DataTime Series > plot(AirPassengers)Decomposi-tionTime SeriesForecasting 600Time SeriesClustering 500Time SeriesClassification AirPassengers 400R Functions &Packages forTime Series 300Conclusions 200 100 1950 1952 1954 1956 1958 1960 8/42
  • 9. Decomposition > apts <- ts(AirPassengers, frequency = 12) > f <- decompose(apts)R and TimeSeries Data > # seasonal figuresTime Series > plot(f$figure,type="b")Decomposi-tionTime SeriesForecasting q q 60Time SeriesClustering 40 qTime Series 20Classification q f$figureR Functions & 0 q qPackages for qTime Series −20 q q qConclusions q −40 q 2 4 6 8 10 12 Index 9/42
  • 10. Decomposition > plot(f) Decomposition of additive time seriesR and TimeSeries Data 500 observedTime SeriesDecomposi- 300tion 100Time Series 450Forecasting 350 trendTime Series 250Clustering 150Time SeriesClassification 40 seasonalR Functions & 0Packages for 60 −40Time SeriesConclusions random 0 20 −40 2 4 6 8 10 12 Time 10/42
  • 11. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 11/42
  • 12. Time Series ForecastingR and TimeSeries DataTime SeriesDecomposi- To forecast future events based on known past datationTime Series E.g., to predict the opening price of a stock based on itsForecasting past performanceTime SeriesClustering Popular modelsTime Series Autoregressive moving average (ARMA)Classification Autoregressive integrated moving average (ARIMA)R Functions &Packages forTime SeriesConclusions 12/42
  • 13. ForecastingR and Time > # build an ARIMA modelSeries Data > fit <- arima(AirPassengers, order=c(1,0,0),Time SeriesDecomposi- + list(order=c(2,1,0), period=12))tion > fore <- predict(fit, n.ahead=24)Time SeriesForecasting > # error bounds at 95% confidence levelTime Series > U <- fore$pred + 2*fore$seClustering > L <- fore$pred - 2*fore$seTime SeriesClassification > ts.plot(AirPassengers, fore$pred, U, L,R Functions & + col=c(1,2,4,4), lty = c(1,1,2,2))Packages forTime Series > legend("topleft", col=c(1,2,4), lty=c(1,1,2),Conclusions + c("Actual", "Forecast", + "Error Bounds (95% Confidence)")) 13/42
  • 14. Forecasting Actual ForecastR and Time 700 Error Bounds (95% Confidence)Series Data 600Time SeriesDecomposi-tion 500Time SeriesForecasting 400Time SeriesClusteringTime Series 300ClassificationR Functions &Packages for 200Time SeriesConclusions 100 1950 1952 1954 1956 1958 1960 1962 Time 14/42
  • 15. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 15/42
  • 16. Time Series ClusteringR and TimeSeries Data To partition time series data into groups based onTime Series similarity or distance, so that time series in the sameDecomposi-tion cluster are similarTime Series Measure of distance/dissimilarityForecastingTime Series Euclidean distanceClustering Manhattan distanceTime Series Maximum normClassification Hamming distanceR Functions &Packages for The angle between two vectors (inner product)Time Series Dynamic Time Warping (DTW) distanceConclusions ... 16/42
  • 17. Dynamic Time Warping (DTW) DTW finds optimal alignment between two time series. > library(dtw)R and TimeSeries Data > idx <- seq(0, 2*pi, len=100)Time Series > a <- sin(idx) + runif(100)/10Decomposi-tion > b <- cos(idx)Time Series > align <- dtw(a, b, step=asymmetricP1, keep=T)Forecasting > dtwPlotTwoWay(align)Time SeriesClustering 1.0Time SeriesClassification 0.5R Functions & Query valuePackages forTime Series 0.0Conclusions −0.5 −1.0 0 20 40 60 80 100 Index 17/42
  • 18. Synthetic Control Chart Time Series The dataset contains 600 examples of control chartsR and TimeSeries Data synthetically generated by the process in Alcock andTime Series Manolopoulos (1999).Decomposi-tion Each control chart is a time series with 60 values.Time SeriesForecasting Six classes:Time Series 1-100 NormalClustering 101-200 CyclicTime SeriesClassification 201-300 Increasing trend 301-400 Decreasing trendR Functions &Packages for 401-500 Upward shiftTime Series 501-600 Downward shiftConclusions http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 18/42
  • 19. Synthetic Control Chart Time SeriesR and TimeSeries Data > # read data into RTime SeriesDecomposi- > # sep="": the separator is white space, i.e., onetion > # or more spaces, tabs, newlines or carriage returnsTime SeriesForecasting > sc <- read.table("synthetic_control.data",Time Series + header=F, sep="")Clustering > # show one sample from each classTime SeriesClassification > idx <- c(1,101,201,301,401,501)R Functions & > sample1 <- t(sc[idx,])Packages forTime Series > plot.ts(sample1, main="")Conclusions 19/42
  • 20. Six Classes 36 30 34 32 20 301 1 30R and Time 10Series Data 28 26Time Series 0Decomposi- 45 24 45tion 40Time Series 35Forecasting 101 401 35Time Series 25Clustering 30 15Time Series 35 25Classification 45R Functions & 30Packages for 40 25 201 501Time Series 35 20Conclusions 15 30 10 25 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time Time 20/42
  • 21. Hierarchical Clustering with Euclidean distanceR and TimeSeries Data > # sample n cases from every classTime Series > n <- 10Decomposi-tion > s <- sample(1:100, n)Time Series > idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)ForecastingTime Series > sample2 <- sc[idx,]Clustering > observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),Time SeriesClassification + rep(4,n), rep(5,n), rep(6,n))R Functions & > # hierarchical clustering with Euclidean distancePackages for > hc <- hclust(dist(sample2), method="ave")Time SeriesConclusions > plot(hc, labels=observedLabels, main="") 21/42
  • 22. Hierarchical Clustering with Euclidean distanceR and Time 140 120Series DataTime SeriesDecomposi- 100tionTime SeriesForecasting Height 80Time SeriesClusteringTime Series 60Classification 22R Functions &Packages for 5 6Time Series 6 40 2 2 2 2Conclusions 4 5 2 2 66 5 5 66 6 64 55 3 5 2 2 55 44 33 1 44 4 33 4 4 3 3 20 11 5 3 3 4 11 6 6 11 3 1 1 1 22/42
  • 23. Hierarchical Clustering with Euclidean distanceR and Time > # cut tree to get 8 clustersSeries Data > memb <- cutree(hc, k=8)Time SeriesDecomposi- > table(observedLabels, memb)tionTime Series membForecasting observedLabels 1 2 3 4 5 6 7 8Time SeriesClustering 1 10 0 0 0 0 0 0 0Time Series 2 0 3 1 1 3 2 0 0ClassificationR Functions & 3 0 0 0 0 0 0 10 0Packages forTime Series 4 0 0 0 0 0 0 0 10Conclusions 5 0 0 0 0 0 0 10 0 6 0 0 0 0 0 0 0 10 23/42
  • 24. Hierarchical Clustering with DTW Distance > myDist <- dist(sample2, method="DTW")R and Time > hc <- hclust(myDist, method="average")Series Data > plot(hc, labels=observedLabels, main="")Time SeriesDecomposi- > # cut tree to get 8 clusterstion > memb <- cutree(hc, k=8)Time SeriesForecasting > table(observedLabels, memb)Time SeriesClustering membTime Series observedLabels 1 2 3 4 5 6 7 8Classification 1 10 0 0 0 0 0 0 0R Functions &Packages for 2 0 4 3 2 1 0 0 0Time Series 3 0 0 0 0 0 6 4 0Conclusions 4 0 0 0 0 0 0 0 10 5 0 0 0 0 0 0 10 0 6 0 0 0 0 0 0 0 10 24/42
  • 25. Hierarchical Clustering with DTW DistanceR and Time 1000 800Series DataTime SeriesDecomposi-tion 600Time SeriesForecasting HeightTime SeriesClustering 400Time SeriesClassificationR Functions &Packages forTime Series 200Conclusions 2 2 2 22 2 22 2 2 4 6 66 33 35 35 46 33 6 55 3 4 6 6 35 1 1 4 6 6 6 55 5 33 4 4 4 4 1 5 5 1 1 1 1 4 4 1 1 1 0 25/42
  • 26. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 26/42
  • 27. Time Series Classification Time Series Classification To build a classification model based on labelled timeR and TimeSeries Data seriesTime SeriesDecomposi- and then use the model to predict the lable of unlabelledtion time seriesTime SeriesForecasting Feature ExtractionTime SeriesClustering Singular Value Decomposition (SVD)Time Series Discrete Fourier Transform (DFT)ClassificationR Functions & Discrete Wavelet Transform (DWT)Packages forTime Series Piecewise Aggregate Approximation (PAA)Conclusions Perpetually Important Points (PIP) Piecewise Linear Representation Symbolic Representation 27/42
  • 28. Decision Tree (ctree)R and TimeSeries Data ctree from package partyTime SeriesDecomposi- > classId <- c(rep("1",100), rep("2",100),tion + rep("3",100), rep("4",100),Time SeriesForecasting + rep("5",100), rep("6",100))Time Series > newSc <- data.frame(cbind(classId, sc))Clustering > library(party)Time SeriesClassification > ct <- ctree(classId ~ ., data=newSc,R Functions & + controls = ctree_control(minsplit=20,Packages forTime Series + minbucket=5, maxdepth=5))Conclusions 28/42
  • 29. Decision Tree > pClassId <- predict(ct) > table(classId, pClassId)R and TimeSeries Data pClassIdTime SeriesDecomposi- classId 1 2 3 4 5 6tion 1 100 0 0 0 0 0Time SeriesForecasting 2 1 97 2 0 0 0Time SeriesClustering 3 0 0 99 0 1 0Time Series 4 0 0 0 100 0 0Classification 5 4 0 8 0 88 0R Functions &Packages for 6 0 3 0 90 0 7Time SeriesConclusions > # accuracy > (sum(classId==pClassId)) / nrow(sc) [1] 0.8183333 29/42
  • 30. DWT (Discrete Wavelet Transform) Wavelet transform provides a multi-resolution representation using wavelets.R and TimeSeries Data Haar Wavelet Transform – the simplest DWTTime SeriesDecomposi- http://dmr.ath.cx/gfx/haar/tionTime SeriesForecastingTime SeriesClusteringTime SeriesClassificationR Functions &Packages forTime SeriesConclusions DFT (Discrete Fourier Transform): another popular feature extraction technique 30/42
  • 31. DWT (Discrete Wavelet Transform)R and Time > # extract DWT (with Haar filter) coefficientsSeries DataTime Series > library(wavelets)Decomposi-tion > wtData <- NULLTime Series > for (i in 1:nrow(sc)) {Forecasting + a <- t(sc[i,])Time SeriesClustering + wt <- dwt(a, filter="haar", boundary="periodic")Time Series + wtData <- rbind(wtData,Classification + unlist(c(wt@W, wt@V[[wt@level]])))R Functions &Packages for + }Time Series > wtData <- as.data.frame(wtData)Conclusions > wtSc <- data.frame(cbind(classId, wtData)) 31/42
  • 32. Decision Tree with DWT > ct <- ctree(classId ~ ., data=wtSc, controls = + ctree_control(minsplit=20,R and TimeSeries Data + minbucket=5, maxdepth=5))Time Series > pClassId <- predict(ct)Decomposi-tion > table(classId, pClassId)Time Series pClassIdForecastingTime Series classId 1 2 3 4 5 6Clustering 1 98 2 0 0 0 0Time SeriesClassification 2 1 99 0 0 0 0R Functions & 3 0 0 81 0 19 0Packages for 4 0 0 0 74 0 26Time SeriesConclusions 5 0 0 16 0 84 0 6 0 0 0 3 0 97 > (sum(classId==pClassId)) / nrow(wtSc) [1] 0.8883333 32/42
  • 33. > plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0)) 1 V57 ≤ 117 > 117 2 9 W43 V57 ≤ −4 > −4 ≤ 140 > 140 3 6 11 W5 W31 V57 ≤ 178 > 178 12 17 W22 W31 ≤ −8 > −8 ≤ −6 > −6 ≤ −6 > −6 ≤ −15 > −15 14 19 W31 W43 ≤ −13 > −13 ≤3 >3 Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97) 1 1 1 1 1 1 1 1 1 1 1 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 0 0 0 0 0 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
  • 34. k-NN Classification find the k nearest neighbours of a new instanceR and Time label it by majority votingSeries Data needs an efficient indexing structure for large datasetsTime SeriesDecomposi- > k <- 20tionTime Series > newTS <- sc[501,] + runif(100)*15Forecasting > distances <- dist(newTS, sc, method="DTW")Time SeriesClustering > s <- sort(as.vector(distances), index.return=TRUE)Time Series > # class IDs of k nearest neighboursClassification > table(classId[s$ix[1:k]])R Functions &Packages forTime Series 4 6Conclusions 3 17 Results of Majority Voting Label of newTS ← class 6 34/42
  • 35. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 35/42
  • 36. Functions - Construction, Plot & SmoothingR and Time ConstructionSeries DataTime Series ts() create time-series objects (stats)Decomposi-tion PlotTime SeriesForecasting plot.ts() plot time-series objects (stats)Time SeriesClustering Smoothing & FilteringTime SeriesClassification smoothts() time series smoothing (ast)R Functions &Packages forTime Series sfilter() remove seasonal fluctuation using movingConclusions average (ast) 36/42
  • 37. Functions - Decomposition & Forecasting Decomposition decomp() time series decomposition by square-root filterR and TimeSeries Data (timsac)Time Series decompose() classical seasonal decomposition by movingDecomposi-tion averages (stats)Time Series stl() seasonal decomposition of time series by loessForecasting (stats)Time SeriesClustering tsr() time series decomposition (ast)Time Series ardec() time series autoregressive decompositionClassification (ArDec)R Functions &Packages forTime Series ForecastingConclusions arima() fit an ARIMA model to a univariate time series (stats) predict.Arima() forecast from models fitted by arima (stats) 37/42
  • 38. Packages Packages timsac time series analysis and control programR and TimeSeries Data ast time series analysisTime SeriesDecomposi- ArDec time series autoregressive-based decompositiontion ares a toolbox for time series analyses using generalizedTime SeriesForecasting additive modelsTime SeriesClustering dse tools for multivariate, linear, time-invariant, timeTime Series series modelsClassification forecast displaying and analysing univariate time seriesR Functions &Packages for forecastsTime Series dtw Dynamic Time Warping – find optimal alignmentConclusions between two time series wavelets wavelet filters, wavelet transforms and multiresolution analyses 38/42
  • 39. Online Resources An R Time Series Tutorial http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htmR and Time Time Series Analysis with RSeries Data http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_Time SeriesDecomposi- intro.pdftion Using R (with applications in Time Series Analysis)Time SeriesForecasting http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdfTime Series CRAN Task View: Time Series AnalysisClustering http://cran.r-project.org/web/views/TimeSeries.htmlTime SeriesClassification R Functions for Time Series AnalysisR Functions & http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdfPackages forTime Series R Reference Card for Data Mining;Conclusions R and Data Mining: Examples and Case Studies http://www.rdatamining.com/ Time Series Analysis for Business Forecasting http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm 39/42
  • 40. Outline 1 R and Time Series DataR and TimeSeries DataTime Series 2 Time Series DecompositionDecomposi-tionTime Series 3 Time Series ForecastingForecastingTime SeriesClustering 4 Time Series ClusteringTime SeriesClassification 5 Time Series ClassificationR Functions &Packages forTime Series 6 R Functions & Packages for Time SeriesConclusions 7 Conclusions 40/42
  • 41. Conclusions Time series decomposition and forecasting: many R functions and packages availableR and TimeSeries Data Time series classification and clustering: no R functions orTime SeriesDecomposi- packages specially for this purpose; have to work it out bytion yourselfTime SeriesForecasting Time series classification: extract and build features, andTime Series then apply existing classification techniques, such as SVM,ClusteringTime Series k-NN, neural networks, regression and decision treesClassification Time series clustering: work out your ownR Functions &Packages for distance/similarity metrics, and then use existing clusteringTime Series techniques, such as k-means and hierarchical clusteringConclusions Techniques specially for classifying/clustering time series data: a lot of research publications, but no R implementations (as far as I know) 41/42
  • 42. The End Email: yanchangzhao@gmail.com RDataMining: http://www.rdatamining.comR and TimeSeries Data Twitter: http://twitter.com/rdataminingTime SeriesDecomposi- Group on Linkedin: http://group.rdatamining.comtion Group on Google: http://group2.rdatamining.comTime SeriesForecastingTime SeriesClusteringTime SeriesClassificationR Functions &Packages forTime SeriesConclusions 42/42