Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ashfaq Munshi, ML7 Fellow, Pepperdata

434 views

Published on

Classifying Multi-Variate Time Series at Scale:
Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.

In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set.

Bio: Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.

Published in: Technology
  • Be the first to comment

Ashfaq Munshi, ML7 Fellow, Pepperdata

  1. 1. Classifying Multivariate Time Series Scalably Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi November 10, 2017
  2. 2. • Background and Motivation • Univariate Time Series (UTS) • Multivariate Time Series (MTS) • Conclusion Overview © Pepperdata, Inc.2
  3. 3. Background
  4. 4. Pepperdata Telemetry Data Scale Example production deployment: © Pepperdata, Inc.5 570 Nodes 20 Tasks / Node 300 Metrics / Task 5-Sec Sampling 41 Million Points / Minute
  5. 5. 300 Trillion Performance Data Points Collected Our Big Data About Production Big Data © Pepperdata, Inc.6 22 Thousand Production Nodes 50 Million Jobs/Year
  6. 6. Example Time Series © Pepperdata, Inc.7
  7. 7. • Highly variable in length • 10 data points to 10K+ data points • Missing data • Extremely noisy Characteristics of our TS © Pepperdata, Inc.8
  8. 8. Problem © Pepperdata, Inc.9 Classify this collection of time series to give operators a better understanding of resource utilization on their clusters and to enable a scheduler to better optimize cluster resources
  9. 9. Univariate Time Series
  10. 10. • Two recent approaches from the literature • Transform the TS into an image then use a tiled CNN [Wang & Oats 2015] • Transform the TS into a bag of patterns [Schafer & Leser 2017] • Dataset is the UCR data set • 82 time series data sets • Number of series < 10K • Data points per series < 2K Approaches and Data Set © Pepperdata, Inc.11
  11. 11. • Map the time series into • Gramian Angular Summation Fields • Gramian Angular Difference Fields • Markov Transition Fields • Feed images into a tiled CNN for classification Time Series and Images © Pepperdata, Inc.12 [Wang & Oats, 2015]
  12. 12. • Normalize the time series into [-1,1] • Transform to Polar Coordinates Gramian Angular Fields © Pepperdata, Inc.13 [Wang & Oats, 2015]
  13. 13. Example GADF Image © Pepperdata, Inc.14 [Wang & Oats, 2015]
  14. 14. • Divide TS into windows • Fourier Transform TS in window • Apply low-pass filter • Quantize the Fourier coefficients • Map window to words • Extract features from sentences • Use Logistic Regression classifier Time Series and Bag of Patterns © Pepperdata, Inc.15 [Schafer & Leser 2017]
  15. 15. • Convert TS into image (GADF) • Use Google’s pre-trained CNN; trained on inception v3 • Embed into 2,048-dimensional vector space • Train MLP • 2 hidden layers (50 nodes each) • ReLU activation • Dropout for regularization (.1, .2) • Softmax final layer Our “Off the shelf” Approach (PD) © Pepperdata, Inc.16
  16. 16. Accuracies for a subset of UCR © Pepperdata, Inc.17 0% 20% 40% 60% 80% 100% BOSS (91.1) PD (89.8) GADF+GASF+MTF (86.4)
  17. 17. Accuracy on a subset of UCR © Pepperdata, Inc.18 68% 70% 72% 74% 76% 78% 80% 82% 84% 86% WEASEL 1-NN DTW CV 1-NN DTW BOSS Learning Shapelet (LS) TSBF ST EE (PROP) COTE (ensemble) PD
  18. 18. Training Time Comparison © Pepperdata, Inc.19  PD
  19. 19. Multivariate Time Series
  20. 20. • Two recent approaches from the literature • Use an ESN (“Echo State Network”) to map MTS into state clouds [Wang, Wang, Liu 2015] • Use Dynamic Time Warping with Mahalanobis distance metric [Mei, Liu, Wang, Gao 2016] • Dataset is from UCI, a small subset of UCR and others • Number of series ~ 10K • Data points per series ~ 200 Approaches and Data Set © Pepperdata, Inc.21
  21. 21. • Make TS for each variable the same length by zero padding • Convert each TS into a GADF image • Interpolate any missing data points in the image using linear interpolation on the image • Stack the images for the five variables • Use the same process as before for univariate time series Our “Off the Shelf” Approach (PD) © Pepperdata, Inc.22
  22. 22. 5-Fold Cross Validation Error © Pepperdata, Inc.23 0 5 10 15 20 25 30 Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5 MDDTW Best PD 5-fold
  23. 23. 10-Fold Cross Validation Error © Pepperdata, Inc.24 0 5 10 15 20 25 30 Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5 Echo Network Best PD 10-fold
  24. 24. • Four variables: • CPU, Virtual Memory, HDFS reads, Network Ops • Each time series collected over one week • 10 data points to 10K+ data points • Missing data • Extremely noisy • For periods longer than a week, data is much larger • Sampling rate is the same for all TS PD Data © Pepperdata, Inc.25
  25. 25. Accuracy per Label on PD Dataset G © Pepperdata, Inc.26 0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Accuracy Number of TS = 3092 Lengths per TS = 5 to 8500 Average Accuracy = 78.14%
  26. 26. Accuracy per Label on PD Dataset R © Pepperdata, Inc.27 Number of TS = 6715 Lengths per TS = 5 to 9400 Average Accuracy = 75.95 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
  27. 27. Summary © Pepperdata, Inc.28 Our “Off the Shelf” approach is as good as the best approaches for both UTS and MTS. And, the methodology is the same for both types of TS.
  28. 28. Thank You

×