Vdcnn daniele meetup_27_june

VDCNN
STUDIO E CONFRONTO
Daniele D’Armiento
Fisico
Harman/Samsung

Very Deep Convolutional Network

Dataset
https://github.com/LC-John/Yahoo-Answers-Topic-Classification-Dataset

Dataset
 AG’s news
 AG is a collection of more than 1 million news articles, gathered from more than 2000 news
sources by ComeToMyHead, an academic news search engine. The dataset is provided by the
academic comunity for research purposes.
For more information, please refer to the link
http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html
 The AG's news topic classification dataset is constructed by Xiang Zhang from the dataset
above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo
Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification

Dataset
 The AG's news topic classification dataset is constructed by choosing 4 largest
classes from the original corpus:
 World
 Sports
 Business
 Sci/Tech
Each class contains 30,000 training samples and 1,900 testing samples.
The total number of training samples is 120,000 and testing 7,600.

Dataset
Alcuni esempi
 "4“ , "Calif. Aims to Limit Farm-Related Smog (AP)“ , "AP - Southern California's smog-fighting agency went
after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow ma
 "1“ , "Panama-Cuba 'pardon' row worsens“ , "Panama recalls its Havana ambassador after Cuba threatened
to cut ties if jailed anti-Castro activists are pardoned."nure.“
 "2“ , "Dickens Departs“ , "Tony Dickens resigns as head coach at Northwestern six months after leading
the Wildcats to the Maryland 4A boys basketball title.“
 "3“ , "Tata CS celebrates profits uplift“ , "Indian software giant Tata CS unveils sharply higher profits in its
first set of results since its stock market launch."

Dataset
 Bixby NLU
 L’Insieme di frasi usato per addestrare Bixby a capire l’intenzione
dell’utente
 Tante classi quante sono le «Capsule»,
l’analogo delle Skill di Alexa o le Actions di Google
Esempi:
- «ci sono eventi a parco sempione?»
- «aggiungi un appuntamento in ufficio»
- «qual è la tua squadra di calcio preferita»
- «cancella le email di oggi»
- «aggiungi questo brano alla lista dei favoriti»

Dataset
 Diverse versioni del dataset:
 Originale: 88 classi
 Accorpamento di classi di azioni tra le più simili: 61 cassi
 Successivo accorpamento e selezione di classi più rappresentative: 20 classi
Il dataset non è bilanciato, alcune capsule hanno più frasi di altre

Modello
 Il testo in input viene convertito, lettera per lettera, in vettori di
dimensione 69...
”abcdefghijklmnopqrstuvwxyz0123456789
-,;.!?:’"/| #$%ˆ&*˜‘+=<>()[]{}”
 ...e immediatamente ridotto con embedding a 16
 Input non più lungo di 1024 caratteri (padding)
~ più di 100 parole!

Modello
 64 filtri iniziali (kernels) di dimensione 3
 Blocchi convoluzionali con numero di filtri che raddoppiano
dopo il dimezzamento dovuto al pooling, per mantenere
costante la dimensione della feature map di ciascun layer
if depth == 9:
self.num_conv_block = [1, 1, 1, 1]
elif depth == 17:
elif depth == 29:
elif depth == 49:

Modello
 La maggior parte delle applicazioni di reti convoluzionali all’NLP usano architetture
abbastanza “shallow”, e usano filtri di varia dimensione
(3, 5, 7), al fine di estrarre features tipo N-grammi, con N sufficientemente grande da
abbracciare relazioni tra parole lontante
 Questo paper dimostra che è possibile modellare queste relazioni lontane, sfruttando
piccole convoluzioni (filtri a 3) attraverso un gran numero di livelli, quindi sfruttando la
profondità…
 Impilare 4 layer risulta in un intervallo di 9 elementi, e la rete impara da sola come
combinare al meglio le feature tipo 3-grammi in modo gerarchico.
 Questa architettura si può intendere come l’adattamento al testo della nota rete VGG
che ha dato eccellenti risultati nel riconoscimento delle immagini

Training
 Codice preso da https://github.com/vietnguyen91/Very-deep-cnn-tensorflow
che implementa fedelmente l’articolo
 Implementazione dell’articolo:
learning rate: 0.01
batch size: 128
dichiara di raggiungere convergenza del training dopo 10-15 epoche.
E’ stata usata un GPU K40, almeno 24 minuti per epoca
 Nostra implementazione
learning rate: 0.0005,
batch size: 256 (ove possibile)
Abbiamo usato una GPU Tesla V100, 2 minuti e mezzo per epoca (AG news)

Training: risultati AG news
9 layer 29 layer17 layer

Training: risultati Bixby NLU
88 classi 61 classi
Learning rate
0.001
29 layer

Learning rate
0.0005
29 layer
61 classi

Bixby
20 classi
29 layer 9 layer

Bixby
20 classi
49 layerBatch 16 Batch 128

Bixby
20 classi
49 layer
Batch 128
La VDCNN raggiunge
~93% accuracy
zoom

Vdcnn daniele meetup_27_june

Recommended

Recommended

More Related Content

Similar to Vdcnn daniele meetup_27_june

Similar to Vdcnn daniele meetup_27_june (20)

More from Deep Learning Italia

More from Deep Learning Italia (20)

Vdcnn daniele meetup_27_june