SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
(Deep) Neural Network& Text Mining
Piji Li
lipiji.pz@gmail.com
2.
•Deep Learning
-Story
•DL for NLP & Text Mining
-Words
-Sentences
-Documents
9/3/2014 2
lipiji.pz@gmail.comOutline
3.
•Deep Learning
-Story
•DL for NLP & Text Mining
-Words
-Sentences
-Documents
9/3/2014 3
lipiji.pz@gmail.comOutline
4.
9/3/2014 4
lipiji.pz@gmail.comBP
D.E. Rumelhart, G.E. Hinton, R.J. Williams
Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536
Solved learning problem
Biological system
…
Hard to train (non-convex, tricks)
Hard to do theoretical analysis
Small training sets...
1986
6.
9/3/2014 6
lipiji.pz@gmail.comCD-DNN-HMM
Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42.
Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward
2011
PhD candidate of Prof. Hinton, intern at MSR
GPU
7.
9/3/2014 7
lipiji.pz@gmail.comImageNet Classification
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
ImageNet Classification Error
2012
2013
2014
Deeper
Network in Network
Deep
DNN
First Blood
http://image-net.org/
•1000categories and 1.2million training images
Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
8.
9/3/2014 8
lipiji.pz@gmail.comNowadays
People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng
Industry: Google, Baidu, Facebook, IBM
Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING
Startup Companies using (Deep) Machine Learning:
9.
•Deep Learning
-Story
•DL for NLP & Text Mining
-Words
-Sentences
-Documents
9/3/2014 9
lipiji.pz@gmail.comOutline
10.
•Distributed Representation
-A Neural Probabilistic Language Model
-Word2Vec
•Application
-Semantic Hierarchies of Words
-Machine Translation
9/3/2014 10
lipiji.pz@gmail.comNN & Words
V(king) –V(queen) + V(woman) ≈ V(man)
11.
•Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155.
9/3/2014 11
lipiji.pz@gmail.comNeural Probabilistic Language Model
H, d
U, b
W, b
|V|
concatenate
-Maximizes the penalized log-likelihood:
-Sofmax
-where
-Training: BP
Better than n-gram model
1, Word vector;
2, Need not soothing, p(w|context) ~(0,1)
Time-Consuming
20.
•Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013).
9/3/2014 20
lipiji.pz@gmail.comMachine Translation
Translation Matrix
English (left) Spanish (right).
•Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
22.
9/3/2014 22
lipiji.pz@gmail.comSemantic Hierarchies
•Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014.
hypernym–hyponym
More Relations?
Clustering y -x, and then for each cluster:
23.
•Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper
9/3/2014 23
lipiji.pz@gmail.comRelation Classification
Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2”
Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN;
4, Concatenate features and feed into softmax classifier
SENNA
24.
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014.
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014).
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013).
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014).
9/3/2014 24
lipiji.pz@gmail.comNN & Sentences
PhilBlunsom
Oxford
http://www.cs.ox.ac.uk/people/phil.blunsom/
26.
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014.
9/3/2014 26
lipiji.pz@gmail.comCNN for Modelling Sentences
•Supervisor depends on tasks
•Dynamic k-Max Pooling
•Folding
-Sums every two rows
•Multiple Feature Maps
-Different features
-Word order, n-gram
-Sentence Feature
-Many tasks
•Word vector: learning
27.
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014
9/3/2014 27
lipiji.pz@gmail.comCNN for Sentence Classification
New: Multi-Channel
A model with two sets of word vectors. Each set of vectors is treated
as a ‘channel’ and each filter is applied to both channels.
word2vec
28.
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014).
9/3/2014 28
lipiji.pz@gmail.comMultilingual Models
Sentences
Documents
Word vector: learning
Objection:
CVM: SUM or
29.
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014).
9/3/2014 29
lipiji.pz@gmail.comWord2Vec Extension
Sentiment Analysisand Document Classification
30.
9/3/2014 30
lipiji.pz@gmail.comSENNA
Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011)
Word embeddings
ConvNet
31.
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014).
•Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014).
9/3/2014 31
lipiji.pz@gmail.comNN & Documents
Layer1
Layer4
Layer5
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014
How about Text?
-Word salience? –word/entity/topic extraction
-Sentence salience?