1. Min-Seo Kim
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: kms39273@naver.com
2. 1
Previous work
• A deep learning structure designed for analyzing sequential data.
• O(2) reflects both past and current information using h(1) and x(2).
• U: Input layer to hidden layer.
• W: Hidden layer at time t to hidden layer at time t+1.
• V: Hidden layer to output layer.
Recurrent Neural Networks (RNN)
3. 2
Previous work
• RNNs are models that are flexible in terms of the length of input and output values, allowing for the
construction of RNNs in various structures depending on the form of the input and output.
Recurrent Neural Networks (RNN)
5. 4
Previous work
• As the time steps in a vanilla RNN increase, there arises a problem of long-term dependencies, where
information from earlier time steps is not sufficiently transmitted to later stages.
• If important information for prediction is located at the beginning, it becomes impossible to predict effectively.
• Example 1: "I grew up in France and want to be a plumber who is the best in the world and I speak
fluent French."
Long Short-Term Memory (LSTM)
6. 5
Previous work
Long Short-Term Memory (LSTM)
• The core idea of LSTM is to store the information from previous steps in
a memory cell and pass it forward.
• It determines how much of the past information to forget based on the
current information, multiplies it accordingly, and then adds the current
information to this result to pass it on to the next time step.
17. 16
Methodology
Log Softmax
Applying Log Softmax in PyTorch
• Applies log to the softmax result
• Addresses vanishing gradients problem and improves numerical stability
18. 17
Methodology
NLLLoss (Negative Log Likelihood Loss)
Cross Entropy Loss
• Calculates the loss for the log softmax output
• Encourages the model to assign high probabilities to the correct labels
• nn.LogSoftmax + nn.NLLLoss = nn.CrossEntropyLoss
• Directly inputs logits, internally computes softmax and NLLLoss
Editor's Notes
순차적 데이터를 분석하기 위한 딥러닝 구조 O(2)는 h(1)과 x(2)를 이용하여 과거 정보와 현재 정보 모두를 반영 U : 입력층 → 은닉층 W: t 시점 은닉층 → t+1 시점 은닉층 V : 은닉층 → 출력층
RNN은 입력과 출력 값의 길이에 자유로운 모형이기 때문에 Input과 Output의 형태에 따라 다양한 구조로 RNN을 구성할 수 있음.
바닐라 RNN의 시점(time step)이 길어질 수록 앞의 정보가 뒤로 충분히 전달되지 못하는 장기의존성 문제(the problems of long-term dependencies)의 발생 예측을 위한 중요한 정보가 앞에 있을경우 예측 불가능
LSTM의 핵심 아이디어는 이전 단계의 정보를 memory cell에 저장하여 흘려보내는 것
현재 시점의 정보를 바탕으로 과거의 내용을 얼마나 잊을지 곱해주고, 그 결과에 현재의 정보를 더해서 다음 시점으로 정보를 전달