Scalable AutoML for Time Series
Forecasting using Ray
Shengsheng Huang
Intel Corporation
Jason Dai
Intel Corporation
Agenda
Background
Analytics Zoo, Time Series, AutoML, Ray
Scalable AutoML for Time Series
Architecture, workflow, etc.
Use Case Sharing & Learnings
Use case study, what we learn from early
users, future work
Background
AI on Big Data
Accelerating Data Analytics + AI Solutions At Scale
Distributed, High-Performance
Deep Learning Framework
for Apache Spark*
https://github.com/intel-analytics/bigdl
Unified Analytics + AI Platform
for TensorFlow*, PyTorch*, Keras*, BigDL, Ray* and
Apache Spark*
https://github.com/intel-analytics/analytics-zoo
Analytics Zoo
https://github.com/intel-analytics/analytics-zoo
Unified Data Analytics and AI Platform
Time Series In a Nutshell
Time Series data
▪ A series of data that is observed sequentially in time.
▪ stock prices, sales volume, CPU/IO monitoring metrics,
KPIs in telecom networks ...
Time Series Analysis
▪ Time Series Forecasting
▪ Anomaly Detection
▪ Time Series Classification, clustering, etc.
Applications
▪ Demand forecasting
▪ network quality management
▪ predictive maintenance
▪ AIOps
Total volume of taxi passengers in NYC from 2014/07-2015/02 ( source :
https://github.com/intel-analytics/analytics-zoo/blob/master/apps/anomaly-
detection/anomaly-detection-nyc-taxi.ipynb)
Time Series Forecasting
Problem Definition
▪ Given all history t observations 𝑦_1,…, 𝑦_𝐭 , Predict
values of next 𝐡 steps, 𝑦_(𝐭+𝟏),…, 𝑦_(𝐭+𝒉)
▪ Usually only lookback 𝐤 steps, 𝑦_(𝐭−𝒌+𝟏),…, 𝑦_𝐭
Forecasting Methods
▪ Autoregression, Exponential Smoothing, ARIMA, …
▪ Machine Learning and Deep Learning methods
𝒚 𝟏 𝒚 𝟐 𝒚 𝒕 𝒚 𝒕$𝟏 𝒚 𝒕$𝒉𝒚 𝒕&𝒌$𝟏
lookback k steps forecast h steps forward
AutoML
Source: Yao, Q., Wang, et. al Taking the Human out of Learning Applications : A Survey on Automated Machine Learning.
Ray and RayOnSpark
Ray
▪ A distributed framework for emerging AI applications
Ray Tune
▪ A library on Ray for experiment execution and
hyperparameter tuning
RayOnSpark
▪ a feature in Analytics Zoo
▪ Directly run Ray programs on big data cluster
▪ Seamlessly integrate ray into spark data processing
pipeline
https://ray.readthedocs.io/en/latest/
https://analytics-zoo.github.io/master/#ProgrammingGuide/rayonspark/
Scalable AutoML for Time Series
Time Series Solution In Analytics Zoo
Rich algorithms
statistical, neural-networks, hybrid
state-of-art models, etc.
AutoML
for automatic feature generation,
model selection, hyper-parameter
tuning, etc.
Seamless scaling
with integrated analytics and AI
pipelines
Software Stack
AutoML Framework
▪ FeatureTransformer
▪ Model
▪ SearchEngine
▪ Pipeline
Time Series upon AutoML
▪ TimeSequencePredictor
▪ TimeSequencePipeline
https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-
zoo-b79a6fd08139
Training at Runtime
A glimpse of API
Training a Pipeline
▪ fit (w/ automl)
▪ recipe
▪ distributed mode
Using a Pipeline
▪ save/load
▪ evaluate/predict
▪ fit (incremental)
Project Zouwu
Use case
▪ reference time series use cases for Telco (such as
network traffic forecasting, etc.)
Models
▪ built-in models for time series analysis
(such as LSTM and MTNet)
“AutoTS”
▪ AutoML support for building E2E time series analysis
pipelines (including automatic feature generation,
model selection and hyperparameter tuning)
Project
Zouwu
Built-in Models
ML Workflow AutoML Workflow
Integrated Analytics & AI Pipelines
use-case
autots model
https://github.com/intel-analytics/analytics-
zoo/tree/master/pyzoo/zoo/zouwu
Use Case Sharing & Learnings
Network Traffic KPI Forecasting in Telco
Usage Scenario
▪ KPI/metrics forecasting is widely used in
Telco applications (e.g., energy saving,
network slicing, etc.)
▪ aggregated traffic KPI’s (i.e. total bytes,
average rate in Mbps/Gbps) in the past
week to forecast the KPI in the next two
hours.
2 ways to solve this
problem using Zouwu
▪ Use built-in “Forecaster” models for
training, and forecasting (notebook link)
▪ Use “AutoTS” (with built-in AutoML
support) to train an E2E Time Series
Analysis Pipeline, and forecast
(notebook link) Example result of network traffic average rate forecasting on the test period
Network Quality Prediction in SK Telecom
Spark-SQL
Data Loading
Data
Loader
Data Source APIs
File, HTTP, Kafka
forked.
DRAM Store
customized.
Flash
Store
tiering
Preprocess RDD of Tensor Model Code of TF
DL Training & Inferencing
Data Model
SIMD Acceleration
https://databricks.com/session_eu19/apache-spark-ai-use-case-in-
telco-network-quality-analysis-and-prediction-with-geospatial-
visualization
Forecast-based Analysis for AIOps at Neusoft
https://platform.neusoft.com/2020/01/17/xw-intel.html
https://platform.neusoft.com/2017/03/04/qt-baomaqiche.html
Takeaways from Early Users
Highlights
▪ Additional features allowed
▪ Less efforts in tuning
▪ Satisfactory accuracy
Data quality matters
▪ Missing values, outliers, etc.
Scale, scale, scale
▪ hundreds of thousands of cells X KPIs => millions of time series
▪ millions of servers/containers X metrics => hundreds of millions of time series
Future Work
Extremely high dimensional time series
Search algo, meta-learning, ensemble, …
More models, features, …
Automatic data preprocessing (e.g. missing data & outliers
Accelerate Your Data Analytics & AI Journey with Intel
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Scalable AutoML for Time Series Forecasting using Ray

Scalable AutoML for Time Series Forecasting using Ray

  • 2.
    Scalable AutoML forTime Series Forecasting using Ray Shengsheng Huang Intel Corporation Jason Dai Intel Corporation
  • 3.
    Agenda Background Analytics Zoo, TimeSeries, AutoML, Ray Scalable AutoML for Time Series Architecture, workflow, etc. Use Case Sharing & Learnings Use case study, what we learn from early users, future work
  • 4.
  • 5.
    AI on BigData Accelerating Data Analytics + AI Solutions At Scale Distributed, High-Performance Deep Learning Framework for Apache Spark* https://github.com/intel-analytics/bigdl Unified Analytics + AI Platform for TensorFlow*, PyTorch*, Keras*, BigDL, Ray* and Apache Spark* https://github.com/intel-analytics/analytics-zoo
  • 6.
  • 7.
    Time Series Ina Nutshell Time Series data ▪ A series of data that is observed sequentially in time. ▪ stock prices, sales volume, CPU/IO monitoring metrics, KPIs in telecom networks ... Time Series Analysis ▪ Time Series Forecasting ▪ Anomaly Detection ▪ Time Series Classification, clustering, etc. Applications ▪ Demand forecasting ▪ network quality management ▪ predictive maintenance ▪ AIOps Total volume of taxi passengers in NYC from 2014/07-2015/02 ( source : https://github.com/intel-analytics/analytics-zoo/blob/master/apps/anomaly- detection/anomaly-detection-nyc-taxi.ipynb)
  • 8.
    Time Series Forecasting ProblemDefinition ▪ Given all history t observations 𝑦_1,…, 𝑦_𝐭 , Predict values of next 𝐡 steps, 𝑦_(𝐭+𝟏),…, 𝑦_(𝐭+𝒉) ▪ Usually only lookback 𝐤 steps, 𝑦_(𝐭−𝒌+𝟏),…, 𝑦_𝐭 Forecasting Methods ▪ Autoregression, Exponential Smoothing, ARIMA, … ▪ Machine Learning and Deep Learning methods 𝒚 𝟏 𝒚 𝟐 𝒚 𝒕 𝒚 𝒕$𝟏 𝒚 𝒕$𝒉𝒚 𝒕&𝒌$𝟏 lookback k steps forecast h steps forward
  • 9.
    AutoML Source: Yao, Q.,Wang, et. al Taking the Human out of Learning Applications : A Survey on Automated Machine Learning.
  • 10.
    Ray and RayOnSpark Ray ▪A distributed framework for emerging AI applications Ray Tune ▪ A library on Ray for experiment execution and hyperparameter tuning RayOnSpark ▪ a feature in Analytics Zoo ▪ Directly run Ray programs on big data cluster ▪ Seamlessly integrate ray into spark data processing pipeline https://ray.readthedocs.io/en/latest/ https://analytics-zoo.github.io/master/#ProgrammingGuide/rayonspark/
  • 11.
  • 12.
    Time Series SolutionIn Analytics Zoo Rich algorithms statistical, neural-networks, hybrid state-of-art models, etc. AutoML for automatic feature generation, model selection, hyper-parameter tuning, etc. Seamless scaling with integrated analytics and AI pipelines
  • 13.
    Software Stack AutoML Framework ▪FeatureTransformer ▪ Model ▪ SearchEngine ▪ Pipeline Time Series upon AutoML ▪ TimeSequencePredictor ▪ TimeSequencePipeline https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics- zoo-b79a6fd08139
  • 14.
  • 15.
    A glimpse ofAPI Training a Pipeline ▪ fit (w/ automl) ▪ recipe ▪ distributed mode Using a Pipeline ▪ save/load ▪ evaluate/predict ▪ fit (incremental)
  • 16.
    Project Zouwu Use case ▪reference time series use cases for Telco (such as network traffic forecasting, etc.) Models ▪ built-in models for time series analysis (such as LSTM and MTNet) “AutoTS” ▪ AutoML support for building E2E time series analysis pipelines (including automatic feature generation, model selection and hyperparameter tuning) Project Zouwu Built-in Models ML Workflow AutoML Workflow Integrated Analytics & AI Pipelines use-case autots model https://github.com/intel-analytics/analytics- zoo/tree/master/pyzoo/zoo/zouwu
  • 17.
    Use Case Sharing& Learnings
  • 18.
    Network Traffic KPIForecasting in Telco Usage Scenario ▪ KPI/metrics forecasting is widely used in Telco applications (e.g., energy saving, network slicing, etc.) ▪ aggregated traffic KPI’s (i.e. total bytes, average rate in Mbps/Gbps) in the past week to forecast the KPI in the next two hours. 2 ways to solve this problem using Zouwu ▪ Use built-in “Forecaster” models for training, and forecasting (notebook link) ▪ Use “AutoTS” (with built-in AutoML support) to train an E2E Time Series Analysis Pipeline, and forecast (notebook link) Example result of network traffic average rate forecasting on the test period
  • 19.
    Network Quality Predictionin SK Telecom Spark-SQL Data Loading Data Loader Data Source APIs File, HTTP, Kafka forked. DRAM Store customized. Flash Store tiering Preprocess RDD of Tensor Model Code of TF DL Training & Inferencing Data Model SIMD Acceleration https://databricks.com/session_eu19/apache-spark-ai-use-case-in- telco-network-quality-analysis-and-prediction-with-geospatial- visualization
  • 20.
    Forecast-based Analysis forAIOps at Neusoft https://platform.neusoft.com/2020/01/17/xw-intel.html https://platform.neusoft.com/2017/03/04/qt-baomaqiche.html
  • 21.
    Takeaways from EarlyUsers Highlights ▪ Additional features allowed ▪ Less efforts in tuning ▪ Satisfactory accuracy Data quality matters ▪ Missing values, outliers, etc. Scale, scale, scale ▪ hundreds of thousands of cells X KPIs => millions of time series ▪ millions of servers/containers X metrics => hundreds of millions of time series
  • 22.
    Future Work Extremely highdimensional time series Search algo, meta-learning, ensemble, … More models, features, … Automatic data preprocessing (e.g. missing data & outliers
  • 23.
    Accelerate Your DataAnalytics & AI Journey with Intel
  • 24.
    Feedback Your feedback isimportant to us. Don’t forget to rate and review the sessions.