Time series modeling workd AMLD 2018 Lausanne

AMLD 2018
TimeSeries and Sequence Modeling
S u n i l M a l l y a | S r. A I S o l u t i o n s A r c h i t e c t , M L S o l u t i o n s L a b
A m a z o n W e b S e r v i c e s
@ s u n i l m a l l y a

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://github.com/sunilmallya/timeseries
TimeSeries Workshop

Agenda
AWS Machine Learning Stack
Introduction to Apache MXNet and Gluon
Introduction to Time Series Analysis
ML Environment for Data Scientists – Amazon SageMaker & DL AMI
Time Series with Deep Learning
Sequence to Sequence Modeling (seq2seq)
Self Paced Tutorials

CUSTOMERS RUNNING ML ON AWS

Media &
Entertainment
Pricing and
Product
Recommendation
Healthcare
& Life
Sciences
Financial
Services
/Trading
Customer
Experience
•  Content Commissioning
•  Content Creation
•  Promotion and Marketing
•  Copyright infringement
•  Content Auto tagging
•  Auto subtitling
•  Rights negotiation
•  Enhanced customer
service through voice
and chatbots
•  Call center
optimization
•  Personal financial
management
•  Ecommerce
•  Product recommendations
•  Credit assessments
•  Ad/Search relevance
•  Personalization
•  Patient health from
clinical data
•  Predicting hospital
stay length and re-
admittance
•  Drug discovery
•  Radiology image
recognition
•  Portfolio management/
robo-advising
•  Algorithmic trading
•  Sentiment/news analysis
•  Geospatial image analysis
•  Predictive grid computing
capacity management
Impact of AI/ML is enterprise-wide

Put machine learning in the hands of every
developer and data scientist
ML @ AWS: Our mission

ML in the Hands of Every Developer
Services
Platforms
Frameworks
Infrastructure

Frameworks &
Infrastructure
AWS Deep Learning AMI
GPU
(P3 Instances)
MobileCPU
IoT
(Greengrass)
Vision:
Rekognition Image
Rekognition Video
Speech:
Polly
Transcribe
Language:
Lex Translate
Comprehend
Apache
MXNet
PyTorch
Cognitive
Toolkit
Keras
Caffe2
& Caffe
TensorFlow Gluon
AWS ML Stack
Application
Services
Platform
Services
Amazon Machine
Learning
Mechanical
Turk
Spark &
EMR
Amazon
SageMaker
AWS
DeepLens

End-to-End
Machine Learning
Platform
Zero setup Flexible Model
Training
Pay by the second
Amazon SageMaker
The quickest and easiest way to get ML models from idea to production
$

AWS DeepLens: Deep Learning-Enabled
Video Camera for Developers (Limited
Preview)
Fully programmable video camera
Optimized for deep-learning on the
device with Apache MXNet, Caffe,
TensorFlow
Tutorials, sample code, examples and
pre-built models
Integrated with Amazon SageMaker for
custom models

Gluon (October 2017)
•  Simplifies development of deep learning models
•  Provides greater flexibility in building neural networks
•  Offers high performance
•  Developed in collaboration with Microsoft
•  Available now in MXNet, coming soon to CNTK
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs High Performance

Open Neural Network Exchange (ONNX)

•  Developers can choose the framework that best fits their needs
•  More customers can take advantage of MXNet’s performance and
scalability
•  MXNet users to run their model on various mobile and edge devices
(Qualcomm, Huawei, Intel, and ARM announced support for ONNX)

Amazon ML Lab
Lots of companies
doing Machine
Learning
Unable to unlock
business potential
Brainstorming Modeling Teaching
Lack ML
expertise
Leverage Amazon experts with decades of ML
experience with technologies like Amazon Echo,
Amazon Alexa, Prime Air and Amazon Go
Amazon ML Lab
provides the missing
ML expertise

Lets get a quick introduction to Deep Learning
– Neural Network architecture, Stochastic
Gradient Descent and other terms
Deep Learning Basics

• Classification : Class labels for
output
• Regression : Output takes
continuous values
Types of Machine Learning Problems

Linear and non linear models

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/linear_classification.html
Good Ol’ Models – Linear Classifiers

Why is non linearity important?

Deep Neural Network
hidden layersInput layer
output
Add Non Linearity to output of hidden layer
To transform output into continuous range

The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
Y1 != Y
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
--
Y
input
label
...
Y1

Loss Function
•  Objective function defines what success looks like when an
algorithm learns.
•  It is a measure of the difference between a neural net’s guess and
the ground truth; that is, the error.
•  Error resulting from the loss function is fed into backpropagation
in order to update the weights & biases
•  Common loss functions
•  Cross entropy
•  L1 (linear), L2 (quadratic)
•  Mean square error (MSE)

Gradient Descent in 1D
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/model_optimization.html

Training, Validation Set and Over ﬁtting
Best model

Learning Rates and SGD Visualization
Source: https://twitter.com/alecrad

Activation Functions
Adds non linearity
ReLU is most
commonly
used today

Lets get familiar with basics of the framework
Apache MXNet and Gluon

Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1! 2! 4! 8! 16! 32! 64! 128! 256!
No. of GPUs
•  Cloud formation with Deep Learning AMI
•  16x P2.16xlarge. Mounted on EFS
•  Inception and Resnet: batch size 32, Alex net: batch
size 512
•  ImageNet, 1.2M images,1K classes
•  152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet

Deep Learning Framework Comparison
Apache MXNet TensorFlow Cognitive Toolkit
Industry Owner
N/A – Apache
Community
Google Microsoft
Programmability
Imperative and
Declarative
Declarative only Declarative only
Language
Support
R, Python, Scala, Julia,
Cpp. Javascript, Go,
Matlab and more..
Python, Cpp.
Experimental Go and
Java
Python, Cpp,
Brainscript.
Code Length |
AlexNet (Python)
44 sloc 107 sloc using TF.Slim 214 sloc
Memory Footprint
(LSTM)
2.6GB 7.2GB N/A
*sloc – source lines of code

import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
•  Straightforward and flexible.
•  Take advantage of language
native features (loop,
condition, debugger)
•  E.g. Numpy, Matlab, Torch, …
•  Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing

•  More chances for optimization
•  Cross different languages
•  E.g. TensorFlow, Theano,
Caffe
•  Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing

IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidde
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm

Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm

Apache MXNet Cheat Sheet
Cheat sheet: bit.ly/2xTIwuj

Gluon Introduction
batch_size = 4
train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y),
batch_size=batch_size, shuffle=True)
net = gluon.nn.Dense(1, in_units=2)
net.collect_params()
Deferred Initialization
When we call initialize, gluon associates each parameter with an initializer. However, the actual initialization is
deferred until we make a first forward pass.
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

Lets solve churn prediction with Linear
Regression
Crash Course Tutorial

Time Series data
Time Series data has very distinct features
•  Temporal component. Basic assumptions of independent data points don’t
hold true
•  Trend: Deterministic or Stochastic
•  Seasonality: pattern that reflects periodicity or fluctuations

Stationary and non stationary distribution
Image src: https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
The mean of the series should not be a function of time variance of the series should not a be a function of time

Stationary Time Series data
•  Analyzing stationary series is simpler
•  Most models that exist assume a stationary distribution and are more mature
•  TS is stationary if
•  Mean, Variance remain constant
•  Practically may be hard to convert in to a strict stationary distribution
•  Apply transformations like log scale, root, square, aggregation, rolling
mean subtraction, exponential smoothing
•  Eliminate Trend/Seasonality by differencing, decomposition
•  Stationary distribution test
•  Plot rolling stats (moving avg, std)
•  Dickey-Fuller test

Trend, Seasonality and Residuals
Trend
Deterministic - In this case, the effects of the
perturbation present in the time series are
eliminated i.e. revert to the trend in long run.
Stochastic - When effects of perturbation are
never eliminated. i.e they permanently change
the level of the time series.
In this example, there is a trend component which grows year by year

There looks to be a seasonal component which has a cycle less than 12 months
Seasonality: It is a pattern that reflects regular
fluctuations.
Some sources of seasonality are:
•  Calendar (daily, holidays, weekends)
•  Climate
•  Social habits and practices

The variance in the data keeps on increasing with time
Residuals/ Irregular component
•  It’s the unpredictable component of time
series. When the trend and seasonality
components are removed, we get residual
•  Can’t be explained by any other component,
its also the random component in the series
•  These are usually short-term fluctuations that
are not systematic or irregular in pattern.

Decomposition: v i s u a l i z e t r e n d , s e a s o n a l i t y a n d r e s i d u a l s
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(timeseries)
residual = decomposition.resid #free of trend and seasonality

Diﬀerencing: r e m o v i n g t r e n d a n d s e a s o n a l i t y
# In Pandas
timeseries_diff = timeseries – timeseries.shift()
moving_avg = ttimeseries _moving_avg_diff.rolling(window=12, center=False).mean()
rolling_std = timeseries.rolling(window=12, center=False).std()

Recurrent
neural networks
input
output
state
DEEP LEARNING IN SEQUENCTIAL DATA
Variable length input and output sequences

Recurrent
neural networks
DEEP LEARNING IN NLP/SEQUENCE DATA

Time series modeling workd AMLD 2018 Lausanne

More Related Content

Similar to Time series modeling workd AMLD 2018 Lausanne

Recently uploaded

Time series modeling workd AMLD 2018 Lausanne