Successfully reported this slideshow.
Upcoming SlideShare
×

# Time Series at Scale for Weak Memory Systems

1,203 views

Published on

by Francois Belletti

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Time Series at Scale for Weak Memory Systems

1. 1. Time series at scale for weak memory systems Francois Belletti, Evan Sparks, Michael Franklin, Alexandre M. Bayen UC Berkeley
2. 2. Time series analysis Canonical time series analysis Figure: Time series analysis in the old days
3. 3. Time series analysis New systems to analyze Figure: Analyzing the El Nino pattern with Wavelets
4. 4. Time series analysis Life is a time series Figure: Today, time series analysis ranges from ﬁnancial markets to smart advertising
5. 5. Outline Embarrassingly Parallel Analysis of Time Series Weak memory time series analysis The overlapping block framework Applications Extensions
6. 6. Second order stationary models Weak memory Time Series analysis Observed process (Xt)t2Z 2 Rd Process is ergodic and more speciﬁcally: I E (Xt) = µX 2 Rd (constant) I gX (t, h) = Cov(Xt,Xt+h) is only a function of h I h ! gX (h) 2 Rdxd is the autocovariance function. Example: multidimensional white noise I 8t 2 Z, E (et) = 0 I 8t 2 Z, E eteT t = ⌃e I 8t,s 2 Z : t 6= s, E eteT s = 0.
7. 7. The importance of autocorrelation Weak memory Time Series analysis
8. 8. Di erent levels of memory in Time Series (1/3) Weak memory Time Series analysis Brownian motions, order 1 integrated processes
9. 9. Di erent levels of memory in Time Series (2/3) Weak memory Time Series analysis Trending process with increasing amplitude and seasonality
10. 10. Di erent levels of memory in Time Series (3/3) Weak memory Time Series analysis Controversial partially integrated time series
11. 11. How to erase memory? Weak memory Time Series analysis By di erentiating the time series: ( Xt)t2Z = (Xt Xt 1)t2Z
12. 12. Essential statistics for Second Order Stationary models Weak memory Time Series analysis Frequentist estimates: I Mean: cµX ⇣ (Xt)t2{1...N} ⌘ = 1 N ÂN k=1 Xk, I Autocovariance: gX (h) ⇣ (Xt)t2{1...N} ⌘ = 1 N h 1 ÂN h k=1 XkXT k+h. I Autocorrelation: crX h = dCor(Xt,Xt+h) = diag ⇣ gX (0) ⌘ 1 2 gX (h)diag ⇣ gX (0) ⌘ 1 2 . I Partial autocorrelation: 2 6 6 4 gX (0) ··· gX ( (p 1)) ... ... ... gX (p 1) ··· gX (0) 3 7 7 5 2 6 6 6 6 6 4 ⇣ U (p) 1 ⌘T ... ⇣ U (p) p ⌘T 3 7 7 7 7 7 5 = 2 6 6 4 gX (1) ... gX (p) 3 7 7 5
13. 13. Common computational structure Weak memory Time Series analysis Map-reduce computation of M-estimators
14. 14. The VARMA family of models Weak memory Time Series analysis Vector Autoregressive (VAR) models, linear predictor with iid noise Xt = A1Xt 1 +...+ApXt p +et. Vector Moving Average (VMA) models, autocorrelated noise Xt = et +B1et 1 +...+Bqet q. Vector Autoregressive Moving Average (VARMA) models, linear predictor with autocorrelated noise Xt = A1Xt 1 +...+ApXt p +et +B1et 1 +...+Bqet q. I To estimate the parameters, compute gX (h) ⇣ (Xt)t2{1...N} ⌘ forh = 1...p +q.
15. 15. The issue of naive partitioning The overlapping block framework Data is partitioned along the time axis How do we compute 1 N h 1 ÂN h k=1 XkXT k+h with partitioned data? How do we enable some look-ahead or look-back with partitioned data?
16. 16. From informational structure to memory layout The overlapping block framework Partitioned-data, avoid joins, avoid communication in general Only short-range dependency in M-estimation and Z-estimation Data replication enables embarrassingly parallel computations
17. 17. Computational accounting and the target system The overlapping block framework A kernel is computed on a target Only genuine data points (partition index == origin index) can be targets Guarantees there are no redundant computations
18. 18. A simple programming paradigm The overlapping block framework Second order essential statistics trait: I def kernelWidth: IntervalSize I def zero: ResultT I def kernel(slice: Array[(IndexT, ValueT)]): ResultT = ??? (Your kernel) I def reducer(r1: ResultT, r2: ResultT): ResultT = ??? (Your reducing operation)
19. 19. R-like API The overlapping block framework Create an overlapping block RDD: I SingleAxisBlockRDD((paddingMillis, paddingMillis), nPartitions, inSampleData) API calls for exploratory data analysis: I val mean = MeanEstimator(timeSeriesRDD) I val meanProﬁle = MeanProﬁleEstimator(timeSeriesRDD, hashFunction) I val (correlations, _) = CrossCorrelation(timeSeriesRDD, h) API calls for modeling: I val (estVARMatrices, _) = VARModel(timeSeriesRDD, p) I val residualVAR = VARPredictor(timeSeriesRDD, estVARMatrices, Some(mean))
20. 20. High dimensional data-intensive system ID Applications Prediction of Uber demand in New York Uber ride requests
21. 21. Statistical properties of Uber demand in New York Applications 40515 samples 314 dimensions Demand for Uber rides in April Uber demand New York April 2014
22. 22. Seasonality analysis of Uber rides Applications Compute and subtract weekly average proﬁle
23. 23. VAR coe cients (AR1, Uber rides) Applications We identify matrix A1, we can predict demand Xt (at time t) based on demand Xt 1 (at time t 1). I The best predictor for Xt given Xt 1 will be A1Xt 1.
24. 24. Univariate residuals (Uber rides) Applications Covariance of univariate residuals
25. 25. Multivariate residuals (Uber rides) Applications Covariance of multivariate residuals
26. 26. What any scale time series analysis enables (1/3) Further steps GDELT data set, interaction between news providers
27. 27. What any scale time series analysis enables (2/3) Further steps Climate studies, geophysical systems
28. 28. What any scale time series analysis enables (3/3) Further steps Large scale cyber-physical systems
29. 29. Concluding remarks and questions SparkGeoTS Packages such as Thunder and SparkTS were only optimized for univariate time series analysis I Partitioning was only done with respect to sensing dimensions We enable time axis partitioning I With overlapping blocks we can calibrate all models of the ARMA family Now this scheme will be extended to FARIMA models.