SlideShare a Scribd company logo
 Detecting Misleading
Headlines in Online News 

Hands-on Experiences on Attention-based RNN
Kunwoo Park
24th June 2019
IBS deep learning summer school
Who am I
• Kunwoo Park (박건우)
• Post doc, Data Analytics, QCRI (2018 - present)
• PhD, School of Computing, KAIST (2018) 

with outstanding dissertation award
• Research interest
• Computational social science using machine learning
• Text style transfer using RNN and RL
2
This talk will..
• Help audience understand the attention mechanism for text
• Introduce a recent research effort on detecting misleading
news headlines using deep neural networks
• Explain the building blocks of the state-of-the-art model and
show how they are implemented in TensorFlow (1.x)
• Give a hand-on experience in implementing text classifier
using attention mechanism
3
clickbait
4
Target problem
• Detect incongruity between news headline and body text: 

A news headline does not correctly represent the story
5
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
Goal: Detecting headline incongruity
from the textual relationship between body text and headline
6
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
7
Input data
• Transform words into vocabulary indices
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
8
Define input layer in TF
• Using tf.placeholders
• Parameters
• data type: tf.int32
• shape: [None, self.max_words]
• name: used for debugging
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
9
Feed data into placeholders
• At the last end of computation graph: usually at optimizer
headline:
[1, 30, 5, …, 9951, 2]
body text:
[ 875, 22, 39, …, 2481, 2,
9, 93, 9593, …, 431, 77,
1, 30, 5, …, 9951, 2, … ]
10
One-hot encoding
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,
“if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7}
Vocabulary
11
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
Drawbacks of one-hot
[[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 1, 0, … ],
[1, 0, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 1, 0, 0, … ],
[0, 0, 0, 1, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 1, 0, 0, 0, … ],
[0, 0, 1, 0, 0, 0, 0, 0, … ],
[0, 1, 0, 0, 0, 0, 0, 0, … ],
[0, 0, 0, 0, 0, 0, 0, 1, … ]]
{“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4,
“what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000}
Vocabulary
12
Word embedding
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
13
Sequence length
×
Vocab size
Sequence length
×
Embedding size
• A mapping of a discrete variable for each word to a fixed
dimensional vector of continuous numbers
Word embedding
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
14
?
Embedding matrix
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Training from scratch
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix
15
Vocab size
×
Embedding size
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Embedded input
Training from scratch
16
Load pre-trained matrix
[[0.01, 0.07],
[0.33, 0.68],
[0.23, 0.51],
[0.41, 0.38],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.01, 0.07],
[0.72, 0.13],
[0.14, 0.22]]
[[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0.18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
One-hot input Embedding matrix Embedded input
word2vec
glove
BERT
….
17
Vocab size
×
Embedding size
Sequence length
×
Vocab size
Sequence length
×
Embedding size
Load pre-trained matrix
18
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
19
Deep encoder
Deep neural network
20
[[0.23, 0.51],
[0.72, 0.13],
[0.01, 0.07],
[0,18, 0.77],
[0.04, 0.05],
[0.87, 0.92],
[0.41, 0.38],
[0.33, 0.68],
[0.14, 0.22]]
Embedded input
Sequence length
×
Embedding size
[[0.752, 0.757, 0.587],
[0.645, 0.397, 0.618],
[0.777, 0.099, 0.938],
[0.367, 0.139, 0.150],
[0.341, 0.069, 0.398],
[0.415, 0.655, 0.467],
[0.935, 0.659, 0.321],
[0.875, 0.699, 0.967],
[0.734, 0.966, 0.205]]
Hidden representation
Sequence length
×
Hidden size
Which neural net can we use?
• Feedforward neural network
• Convolutional network
• Recurrent neural network
21
Recurrent neural network
• Efficient in modeling inputs with sequential dependencies 

(e.g., text, time-series, …)
• To make an output for each step, RNNs incorporate the current
input with what we have learned so far
https://colah.github.io/posts/2015-08-Understanding-LSTMs/22
x2
x3
x4 xt
h2
⋯
h3
h4 ht
x1
h1
Long-term dependencies
• “the clouds are in the sky“
• “I grew up in France … I speak fluent French”
23
LSTM
Vanilla
recurrent unit
LSTM
24
Cell state
• Kind of memory units that keep past information
• LSTM has an ability to add or remove information to the state
by special structures called gates
25
Forget gate layer
• Decide what information we’re going to throw away from the
cell state
• 1: “completely keep this”. 0: “completely get rid of this”
26
Taking input
• What new information we’re going to store in the cell state
• Input gate layer: sigmoid decides which values we’ll update
• tanh layer: creates a vector of candidate values
27
Update cell state
• Combine the old cell state with the new candidate value
through andft it
28
Decide output
• Output is the filtered version of cell state Ct
29
GRU
• Update gate: combination of forget gate and input gate
• Merge cell state and hidden state
30
Bi-directional RNN
• Combining two RNNs together: 

One RNN reads inputs from left to right and 

another RNN reads inputs from right to left
• Able to understand context better
https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd6631
How to build RNN in TF
1. Decide which cell you use for RNN
2. Decide the number of layers in RNN
3. Decide whether RNN is uni- or bi- directional
32
LSTM or GRU
33
Stacked RNN
34
Uni-directional RNN
• tf.nn.dynamic_rnn()
• outputs: the sequence of hidden states 

[batch_size, max_sequences, output_size]
• state: the final state 

[batch_size, output_size]
35
Bi-directional RNN
• outputs, states = (output_fw, output_bw), (state_fw, state_bw)
36
Some body text is too long..
should contain all necessary information
from the past over thousand steps
ht
x2
x3
x4 xt
h2
⋯
h3
h4 ht
x1
h1
37
A news article is hierarchical
38
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
39
Hierarchical RNN
Word-level RNN
Paragraph-level RNN
ht
p = f(ht−1
p , xt
p; θf )
up = g(up−1, ht
p; θg)
x1
1 x2
1 x3
1
xt
1
h1
1
⋯
h2
1 h3
1
ht
1
⋯
x1
2 x2
2 x3
2
xt
2
h1
2
⋯
h2
2 h3
2
ht
2
x1
p x2
p x3
p xt
p
h1
p
⋯
h2
p h3
p ht
p
ht
1 ht
2 ht
p⋯
u1 u2
up
The maximum length of RNN
can be reduced significantly
Therefore, we can train models with a
fewer number of parameters effectively
40
Word-level RNN
41
Paragraph-level RNN
42
What’s more?
43
• Across body text, some paragraphs have a strong signal
Neural Machine Translation
• RNN-based encoder-decoder architecture, known as seq2seq
44Sutskever et al., 2014, Cho et al., 2014
Attention mechanism in NMT
45
Source
(German)
Target
(English)
https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Attention mechanism in NMT
46https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
Source
(German)
Target
(English)
Attention mechanism
47
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
Attention mechanism
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
RNN for headline (target) RNN for body text (source)
Alignment
Model
Weighted sum
uB
48
• In detecting incongruity, we can pay a different amount of
attention for each paragraph
RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
aH(s) = align(uH
, uB
s )
=
exp(score(uH
, uB
s )
∑s′
exp(score(uH, uB
s′)
• Calculate attention weights between each paragraph (source)
and headline (target)
49
uB
1 uB
2 uB
puH
RNN for headline (target) RNN for body text (source)
Alignment model
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Weighted sum
uB
Alignment
Model
Alignment
Model
Alignment
Model
Alignment
Model
50
uB
1 uB
2 uB
puH
• Score is a content-based function
(Luong et al. 2015)
RNN for headline (target) RNN for body text (source)
Context vector
ht
1 ht
2 ht
p⋯
uB
1 uB
2 uB
p
⋯
uH
Context vector
uB
Alignment
Model
• Represents the body text with different attention weights
across paragraphs
uB
=
∑
s′
aH(s)uB
s′ Weighted sum
uB
51
uB
1 uB
2 uB
p
Alignment
Model
Attention in TF
• Using dot-product similarity
• bodytext_outputs: sequence of the hidden states
• headline_states: the last hidden state
52
Overall model architecture
Deep Neural Net for
Encoding Headline
Deep Neural Net for
Encoding Body Text
Embedding
Layer
Output
Layer
Input
Layer
53
Measure similarity
• : last hidden state of RNN for encoding headline
• : context vector that encodes body text
• : learnable similarity matrix, : bias term
• :
p(label) = σ((uH
)⊤
MuB
+ b)
uH
uB
M
σ
b
54
Measure similarity
p(label) = σ((uH
)⊤
MuB
+ b)
55
Define loss function
• cross-entropy: standard loss function for classification

: ground truth (0/1) : model outputy p(y)
56
Optimizer
• Gradient clipping to prevent for exploding gradient
57
Overfitting
Model Complexity
Error
OverfittingUnderfitting
58
How to prevent overfitting?
• Add more data! (most effective if possible)
• Data augmentation: add noises to input to better generalized
• Regularization: L1/L2, Dropout, Early stopping
• Reduce architecture complexity
59
Evaluation results
60
Demo
61Credit: Taegyun Kim
Dataset/code/paper
• https://github.com/david-yoon/detecting-incongruity
62
Attention for text classification
• Giving different weights over word sequences (Zhou et al., ACL 2016)
63
H = [h1, h2, ⋯, hT]
M = tanh(H)
α = softmax(wt
M)
r = HαT
Attention for text classification
• Focusing on important sentence representation, each of which
pay a different amount of attention to words (Yang et al., NAACL 2016)
64
Attention for text classification
• Transfer learning on Transformer language model, trained by
multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019)
65
Hands-on experience
• Target problem: sentiment analysis on IMDB review dataset
66
Link: https://bit.ly/2xbelke
Thank you
Kunwoo Park
@ IBS deep learning summer school

More Related Content

What's hot

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
Sean Yu
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
Chun-Ming Chang
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
NAVER D2
 
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
현호 김
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
Ryo Iwaki
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
Mark Chang
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
Amazon Web Services
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
indico data
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
r-kor
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
Taegyun Jeon
 
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation engine
Prateek Sachdev
 
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph d
DanielGinot
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
Rajarshi Guha
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning
 
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design Thinking
Yen-lung Tsai
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
Lovelyn Rose
 

What's hot (20)

Baisc Deep Learning HandsOn
Baisc Deep Learning HandsOnBaisc Deep Learning HandsOn
Baisc Deep Learning HandsOn
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Personalized news recommendation engine
Personalized news recommendation enginePersonalized news recommendation engine
Personalized news recommendation engine
 
Tensorflow, deep learning and recurrent neural networks without a ph d
Tensorflow, deep learning and recurrent neural networks   without a ph dTensorflow, deep learning and recurrent neural networks   without a ph d
Tensorflow, deep learning and recurrent neural networks without a ph d
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design Thinking
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 

Similar to Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
Paco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
Paco Nathan
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
HONGJOO LEE
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
tdc-globalcode
 
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of change
Sergii Shmarkatiuk
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
S N
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
Andrew Koo
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
FEG
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
Julien SIMON
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
S N
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
InfluxData
 
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdf
Jyoti Yadav
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 

Similar to Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN (20)

Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 
Software version numbering - DSL of change
Software version numbering - DSL of changeSoftware version numbering - DSL of change
Software version numbering - DSL of change
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
LSA algorithm
LSA algorithmLSA algorithm
LSA algorithm
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
8. Deep Learning.pdf
8. Deep Learning.pdf8. Deep Learning.pdf
8. Deep Learning.pdf
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

More from Kunwoo Park

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
Kunwoo Park
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Kunwoo Park
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
Kunwoo Park
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
Kunwoo Park
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
Kunwoo Park
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
Kunwoo Park
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
Kunwoo Park
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
Kunwoo Park
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
Kunwoo Park
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
Kunwoo Park
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
Kunwoo Park
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2
Kunwoo Park
 

More from Kunwoo Park (12)

Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2
 

Recently uploaded

Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
wafawafa52
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
nikshimanasa
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Presentation on Food Delivery Systems
Presentation on Food Delivery SystemsPresentation on Food Delivery Systems
Presentation on Food Delivery Systems
Abdullah Al Noman
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
DharmaBanothu
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
mahaffeycheryld
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 

Recently uploaded (20)

Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Presentation on Food Delivery Systems
Presentation on Food Delivery SystemsPresentation on Food Delivery Systems
Presentation on Food Delivery Systems
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 

Detecting Misleading Headlines in Online News: Hands-on Experiences on Attention-based RNN

  • 1.  Detecting Misleading Headlines in Online News 
 Hands-on Experiences on Attention-based RNN Kunwoo Park 24th June 2019 IBS deep learning summer school
  • 2. Who am I • Kunwoo Park (박건우) • Post doc, Data Analytics, QCRI (2018 - present) • PhD, School of Computing, KAIST (2018) 
 with outstanding dissertation award • Research interest • Computational social science using machine learning • Text style transfer using RNN and RL 2
  • 3. This talk will.. • Help audience understand the attention mechanism for text • Introduce a recent research effort on detecting misleading news headlines using deep neural networks • Explain the building blocks of the state-of-the-art model and show how they are implemented in TensorFlow (1.x) • Give a hand-on experience in implementing text classifier using attention mechanism 3
  • 5. Target problem • Detect incongruity between news headline and body text: 
 A news headline does not correctly represent the story 5
  • 6. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer Goal: Detecting headline incongruity from the textual relationship between body text and headline 6
  • 7. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 7
  • 8. Input data • Transform words into vocabulary indices headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 8
  • 9. Define input layer in TF • Using tf.placeholders • Parameters • data type: tf.int32 • shape: [None, self.max_words] • name: used for debugging headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 9
  • 10. Feed data into placeholders • At the last end of computation graph: usually at optimizer headline: [1, 30, 5, …, 9951, 2] body text: [ 875, 22, 39, …, 2481, 2, 9, 93, 9593, …, 431, 77, 1, 30, 5, …, 9951, 2, … ] 10
  • 11. One-hot encoding {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3, “if”: 4,“what”: 5, “wouldn't”: 6, “yoga”: 7} Vocabulary 11 [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]]
  • 12. Drawbacks of one-hot [[0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 1, 0, … ], [1, 0, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 1, 0, 0, … ], [0, 0, 0, 1, 0, 0, 0, 0, … ], [0, 0, 0, 0, 1, 0, 0, 0, … ], [0, 0, 1, 0, 0, 0, 0, 0, … ], [0, 1, 0, 0, 0, 0, 0, 0, … ], [0, 0, 0, 0, 0, 0, 0, 1, … ]] {“believe”: 0, “do”: 1, “you”: 2, “happens”: 3,“if”: 4, “what”: 5, “wouldn't”: 6, “yoga”: 7, … “a”:1000000000} Vocabulary 12
  • 13. Word embedding • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 13 Sequence length × Vocab size Sequence length × Embedding size
  • 14. • A mapping of a discrete variable for each word to a fixed dimensional vector of continuous numbers Word embedding [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] 14 ? Embedding matrix Sequence length × Vocab size Sequence length × Embedding size
  • 15. Training from scratch [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix 15 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size Embedded input
  • 17. Load pre-trained matrix [[0.01, 0.07], [0.33, 0.68], [0.23, 0.51], [0.41, 0.38], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.01, 0.07], [0.72, 0.13], [0.14, 0.22]] [[0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1]] [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0.18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] One-hot input Embedding matrix Embedded input word2vec glove BERT …. 17 Vocab size × Embedding size Sequence length × Vocab size Sequence length × Embedding size
  • 19. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 19
  • 20. Deep encoder Deep neural network 20 [[0.23, 0.51], [0.72, 0.13], [0.01, 0.07], [0,18, 0.77], [0.04, 0.05], [0.87, 0.92], [0.41, 0.38], [0.33, 0.68], [0.14, 0.22]] Embedded input Sequence length × Embedding size [[0.752, 0.757, 0.587], [0.645, 0.397, 0.618], [0.777, 0.099, 0.938], [0.367, 0.139, 0.150], [0.341, 0.069, 0.398], [0.415, 0.655, 0.467], [0.935, 0.659, 0.321], [0.875, 0.699, 0.967], [0.734, 0.966, 0.205]] Hidden representation Sequence length × Hidden size
  • 21. Which neural net can we use? • Feedforward neural network • Convolutional network • Recurrent neural network 21
  • 22. Recurrent neural network • Efficient in modeling inputs with sequential dependencies 
 (e.g., text, time-series, …) • To make an output for each step, RNNs incorporate the current input with what we have learned so far https://colah.github.io/posts/2015-08-Understanding-LSTMs/22 x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1
  • 23. Long-term dependencies • “the clouds are in the sky“ • “I grew up in France … I speak fluent French” 23
  • 25. Cell state • Kind of memory units that keep past information • LSTM has an ability to add or remove information to the state by special structures called gates 25
  • 26. Forget gate layer • Decide what information we’re going to throw away from the cell state • 1: “completely keep this”. 0: “completely get rid of this” 26
  • 27. Taking input • What new information we’re going to store in the cell state • Input gate layer: sigmoid decides which values we’ll update • tanh layer: creates a vector of candidate values 27
  • 28. Update cell state • Combine the old cell state with the new candidate value through andft it 28
  • 29. Decide output • Output is the filtered version of cell state Ct 29
  • 30. GRU • Update gate: combination of forget gate and input gate • Merge cell state and hidden state 30
  • 31. Bi-directional RNN • Combining two RNNs together: 
 One RNN reads inputs from left to right and 
 another RNN reads inputs from right to left • Able to understand context better https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd6631
  • 32. How to build RNN in TF 1. Decide which cell you use for RNN 2. Decide the number of layers in RNN 3. Decide whether RNN is uni- or bi- directional 32
  • 35. Uni-directional RNN • tf.nn.dynamic_rnn() • outputs: the sequence of hidden states 
 [batch_size, max_sequences, output_size] • state: the final state 
 [batch_size, output_size] 35
  • 36. Bi-directional RNN • outputs, states = (output_fw, output_bw), (state_fw, state_bw) 36
  • 37. Some body text is too long.. should contain all necessary information from the past over thousand steps ht x2 x3 x4 xt h2 ⋯ h3 h4 ht x1 h1 37
  • 38. A news article is hierarchical 38
  • 39. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up 39
  • 40. Hierarchical RNN Word-level RNN Paragraph-level RNN ht p = f(ht−1 p , xt p; θf ) up = g(up−1, ht p; θg) x1 1 x2 1 x3 1 xt 1 h1 1 ⋯ h2 1 h3 1 ht 1 ⋯ x1 2 x2 2 x3 2 xt 2 h1 2 ⋯ h2 2 h3 2 ht 2 x1 p x2 p x3 p xt p h1 p ⋯ h2 p h3 p ht p ht 1 ht 2 ht p⋯ u1 u2 up The maximum length of RNN can be reduced significantly Therefore, we can train models with a fewer number of parameters effectively 40
  • 43. What’s more? 43 • Across body text, some paragraphs have a strong signal
  • 44. Neural Machine Translation • RNN-based encoder-decoder architecture, known as seq2seq 44Sutskever et al., 2014, Cho et al., 2014
  • 45. Attention mechanism in NMT 45 Source (German) Target (English) https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/
  • 46. Attention mechanism in NMT 46https://aws.amazon.com/ko/blogs/machine-learning/train-neural-machine-translation-models-with-sockeye/ Source (German) Target (English)
  • 47. Attention mechanism 47 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 48. Attention mechanism ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH RNN for headline (target) RNN for body text (source) Alignment Model Weighted sum uB 48 • In detecting incongruity, we can pay a different amount of attention for each paragraph
  • 49. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model aH(s) = align(uH , uB s ) = exp(score(uH , uB s ) ∑s′ exp(score(uH, uB s′) • Calculate attention weights between each paragraph (source) and headline (target) 49 uB 1 uB 2 uB puH
  • 50. RNN for headline (target) RNN for body text (source) Alignment model ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Weighted sum uB Alignment Model Alignment Model Alignment Model Alignment Model 50 uB 1 uB 2 uB puH • Score is a content-based function (Luong et al. 2015)
  • 51. RNN for headline (target) RNN for body text (source) Context vector ht 1 ht 2 ht p⋯ uB 1 uB 2 uB p ⋯ uH Context vector uB Alignment Model • Represents the body text with different attention weights across paragraphs uB = ∑ s′ aH(s)uB s′ Weighted sum uB 51 uB 1 uB 2 uB p Alignment Model
  • 52. Attention in TF • Using dot-product similarity • bodytext_outputs: sequence of the hidden states • headline_states: the last hidden state 52
  • 53. Overall model architecture Deep Neural Net for Encoding Headline Deep Neural Net for Encoding Body Text Embedding Layer Output Layer Input Layer 53
  • 54. Measure similarity • : last hidden state of RNN for encoding headline • : context vector that encodes body text • : learnable similarity matrix, : bias term • : p(label) = σ((uH )⊤ MuB + b) uH uB M σ b 54
  • 55. Measure similarity p(label) = σ((uH )⊤ MuB + b) 55
  • 56. Define loss function • cross-entropy: standard loss function for classification
 : ground truth (0/1) : model outputy p(y) 56
  • 57. Optimizer • Gradient clipping to prevent for exploding gradient 57
  • 59. How to prevent overfitting? • Add more data! (most effective if possible) • Data augmentation: add noises to input to better generalized • Regularization: L1/L2, Dropout, Early stopping • Reduce architecture complexity 59
  • 63. Attention for text classification • Giving different weights over word sequences (Zhou et al., ACL 2016) 63 H = [h1, h2, ⋯, hT] M = tanh(H) α = softmax(wt M) r = HαT
  • 64. Attention for text classification • Focusing on important sentence representation, each of which pay a different amount of attention to words (Yang et al., NAACL 2016) 64
  • 65. Attention for text classification • Transfer learning on Transformer language model, trained by multi-head attention (Vaswani et al., NIPS 2017, Devlin et al., NAACL 2019) 65
  • 66. Hands-on experience • Target problem: sentiment analysis on IMDB review dataset 66 Link: https://bit.ly/2xbelke
  • 67. Thank you Kunwoo Park @ IBS deep learning summer school