- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
Heart Disease Prediction using machine learning.pptx
Lecture 7: Recurrent Neural Networks
1. Recurrent Neural Networks
Sang Jun Lee
Ph.D. candidate, POSTECH
Email: lsj4u0208@postech.ac.kr
EECE695J 전자전기공학특론J(딥러닝기초및철강공정에의활용) – LECTURE 7 (2017. 11. 10)
2. 2
▣ Lecture 6: Convolutional Neural Network
1-page Review
Convolution layer Pooling layer
32x32x3 image
5x5x3 filter
Convolve (slide)
over all spatial
locations
Activation maps
Depth slice
Max pool with
2x2 filters and
stride 2
“Parameters are shared on
spatial domain”
3. 3
Introduction to recurrent neural network
Vanilla neural network
h
𝑥
𝑦
𝑥𝑥
𝑥 : concatenated data of 𝑥 , 𝑥 , 𝑥 , ⋯
h
𝑥
y
𝑊
𝑊𝑊𝑊
𝑊 𝑊 ; 𝑊 ; 𝑊 ; ⋯
A naive idea for handling sequential data
We usually want to predict a vector at a time step for a time domain data 𝑥
4. 4
Introduction to recurrent neural network
ℎ
𝑥
𝑦
𝑥𝑥
ℎℎ
𝑊𝑊𝑊
𝑊𝑊𝑊
Recurrent neural network (RNN)
Assume that
the relation between 𝑥 and 𝑥 is similar to the relation between 𝑥 and 𝑥
→ Parameter sharing for 𝑊
Identical feature extraction from inputs
→ Parameter sharing for 𝑊
5. 5
Introduction to recurrent neural network
ℎ
𝑥
𝑦
𝑥𝑥
ℎℎ
𝑊𝑊𝑊
𝑊𝑊
Recurrent neural network (RNN)
Multiple copies of a same network (same function and same paramters)
ℎ : a hidden state that consists of a vector
ℎ 𝑓 ℎ , 𝑥
ℎ tanh 𝑊 ⋅ ℎ 𝑊 ⋅ 𝑥
𝑦 𝑊 ⋅ ℎ
ℎ
Usually set to 0
Fully-
connected
layer
RNN cell
Input layer
Output layer
(RNN feature)
6. 6
Introduction to recurrent neural network
Various architectures of RNN
Flexibility for handling various types of data
Vanilla neural network
7. 7
Introduction to recurrent neural network
Various architectures of RNN
Flexibility for handling various types of data
e.g. machine translation
(sequence of words
→ sequence of words)
8. 8
Introduction to recurrent neural network
Limitations of vanilla RNN
Vanilla RNN works well for a small time step
However, the sensitivity of the input values decays over time in a standard RNN
“the clouds are in the sky”
“I grew up in France
…
I speak fluent French.”
9. 9
LSTM (long short-term memory)
A standard RNN contains a single layer in the repeating module
10. 10
LSTM (long short-term memory)
A special kind of RNN for learning long-term dependencies
Introduced by Hochreiter & Schmidhuber (1997)
11. 11
LSTM (long short-term memory)
The key idea of LSTMs : cell state
The cell state is kind of like a conveyor belt
12. 12
LSTM (long short-term memory)
Forget gate
LSTM have the ability to remove or add information to the cell state, carefully regulated by
structures call gates
The decision what information we’re going to throw away from the cell state is made by a
sigmoid layer called forget gate layer
13. 13
LSTM (long short-term memory)
Input gate layer
Decide what new information we’re going to store in the cell state
First, input gate layer decide which values we’ll update
Next, tanh layer creates a vector of new candidate values
Finally, combine two to create an update to the state
14. 14
LSTM (long short-term memory)
Update
Output
Forget previous information
Add new information
Output is based on the cell state
16. 16
Variants of RNN
Gated Recurrent Unit (GRU)
Combine the forget and input gates into a single update gate
Merge the cell state and hidden state
17. 17
Implementation of RNN
Manipulation of time series data
Split raw data into train, validation, and test dataset
def split_data(data, val_size=0.2, test_size=0.2):
ntest = int(round(len(data) * (1 ‐ test_size)))
nval = int(round(len(data.iloc[:ntest]) * (1 ‐ val_size)))
df_train, df_val, df_test = data.iloc[:nval], data.iloc[nval:ntest],
data.iloc[ntest:]
return df_train, df_val, df_test
train, val, test = split_data(raw_data, val_size=0.2, test_size=0.2)
Raw data
(100%)
Train
(80%)
Validation
(20%)
Test
(20%)
18. 18
Implementation of RNN
Manipulation of time series data
Generate sequence pair (x, y)
def rnn_data(data, time_steps, labels=False):
"""
creates new data frame based on previous observation
* example:
l = [1, 2, 3, 4, 5]
time_steps = 2
‐> labels == False [[1, 2], [2, 3], [3, 4]]
‐> labels == True [3, 4, 5]
"""
rnn_df = []
for i in range(len(data) ‐ time_steps):
if labels:
try:
rnn_df.append(data.iloc[i + time_steps].as_matrix())
except AttributeError:
rnn_df.append(data.iloc[i + time_steps])
else:
data_ = data.iloc[i: i + time_steps].as_matrix()
rnn_df.append(data_ if len(data_.shape) > 1 else [[i] for i in data_])
return np.array(rnn_df)
19. 19
Implementation of RNN
Manipulation of time series data
Generate sequence pair (x, y)
time_steps = 10
train_x = rnn_data(df_train, time_steps, labels=false)
train_y = rnn_data(df_train, time_steps, labels=true)
Training data [1:10000]
x #01
[1, 2, 3, …,10]
y #01
11
…
…
x #02
[2, 3, 4, …,11]
y #02
12
x #9990
[9990, 9991, 9992, …,9999]
y #9990
10000
train_x
train_y
20. 20
Implementation of RNN
Manipulation of time series data
Split each sample data
time_step = 10
x_split = tf.unpack(x_data, time_steps,1)
tf.unpack
1 2 3 10
𝑥 𝑥 𝑥 … 𝑥
…
x #01
[1, 2, 3, …,10]
Placeholder
21. 21
Implementation of RNN
Choose a RNN cell
Connect input and recurrent layer
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
output, state = tf.nn.rnn(rnn_cell, x_split)
Import tensorflow as tf
num_units = 100
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units)
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
rnn_cell = tf.nn.rnn_cell.GRUCell(num_units)
22. 22
Case study 1: MNIST classification
Hyper parameters for implementing a RNN
Learning rate, training iteration, batch size, etc.
Time step, the number of RNN neurons
Placeholder and variable tensor preparation
One-hot encoding 된 라벨
“Sequential processing of
non-sequence data”
23. 23
Case study 1: MNIST classification
RNN cell 구성
28x28 sample을 28개의 28-dimensional vector로 split
Vanilla RNN: rnn.rnn_cell.BasicRNNCell
Output layer 구성
RNN cell의 neuron 개수
각 category에 속할 추정 확률
24. 24
Case study 1: MNIST classification
Define loss and training operation
tf.Session()
Session을 열고 train_op run!
25. 25
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
예측 전략
• 일별 최대전력 수요 예측을 통한 하계 최대전력수요 예측
알고리즘 개요
• 특별시 및 광역시의 평균 온도를 전력수요 비율로 weighted sum하여 일별 우리나라의 대표 기온 데이터 구성
• 과거 전력/기온데이터를 활용한 RNN/CNN 복합모델 기반의 일별 최대전력수요/기온 예측
• 전력수요 데이터의 특징인 요일과 계절에 따른 주기성을 반영하기 위한 딥러닝 알고리즘 개발
26. 26
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
RNN 구조의 학습을 위한 학습 데이터 구성
• 과거 28일간의 전력/온도데이터를 이용하여 향후 28일간의 전력/온도 예측
Vanilla RNN model
Electricity (E)
Temperature (T)
Training data Test data
A training sample A label data
Time step Output dimension
Fully-connected layer
RNN cell
Input layer
𝑊
𝑊
𝑊
𝑊
𝐸
𝑇
𝑡𝑡 1
Output layer (→ RNN feature)
27. 27
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
Seasonal data
• 계절성을 학습에 반영하기 위한 데이터 구성
계절성을 반영하기 위한 CNN model
Electricity (E)
Temperature (T)
1st sample of the training data
Time step (𝑡𝑠) Output dimension
𝑡𝑡 𝑇
𝑡 2𝑇
𝑡 3𝑇
𝑘 x 𝑡𝑠
𝑋
𝑋
𝑋
𝑋
𝑋
𝑋
2𝑘 x 𝑡𝑠
Convolution layer
(2 x 𝑡𝑠 x 1 x 𝐶𝑁𝑁 𝑑𝑒𝑝𝑡ℎ)
𝑘 x 𝐶𝑁𝑁 𝑑𝑒𝑝𝑡ℎ
Fully-connected layer
CNN feature
28. 28
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
RNN과 CNN의 복합 모델
Training
• Total loss Loss Loss
• Backpropagation via Adam optimizer
CNN feature
(50)
RNN feature
(200)
Fully-connectedlayer
(100)
Outputlayer
Predicted electricity
Outputlayer
Predicted temperature
𝐿𝑜𝑠𝑠
𝐿𝑜𝑠𝑠
RNN cell
(200
Convolutionlayer
(2x28x1x200)
Convolutionlayer
(5x1x200x50)
(100)
Electricity &
Temperature
Seasonal data
29. 29
Case study 2 (2017년도 하계 최대전력수요 예측, 대한전기학회)
2017년 하계 최대 전력 수요 예측 결과: 86,477MW
(2017년도 하계 최대 전력 수요: 86,298MW, 오차: 0.21%)
Back testing
• 2016.5.31 이전 데이터로 학습하여 2016.6.1 이후 데이터에 대하여 테스트
• Averaged error rate : 2.37%/2.81% (28-day/56-day prediction)
30. Introduction to recurrent neural network
- Properties of RNN: parameter sharing
- Various architectures
- Limitation
LSTM (long short-term memory)
- Components of LSTM
- Forget gate, input gate, update, output
Implementation of RNN
Case studies
- MNIST classification
- 2017 하계 최대전력수요 예측
30
Summary