Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
(Deep) Neural Network& Text Mining 
Piji Li 
lipiji.pz@gmail.com
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 2 
lipiji.pz@gmail.comOutline
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 3 
lipiji.pz@gmail.comOutline
9/3/2014 4 
lipiji.pz@gmail.comBP 
D.E. Rumelhart, G.E. Hinton, R.J. Williams 
Learning representation by back-propagating...
9/3/2014 5 
lipiji.pz@gmail.comDBN 
Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets....
9/3/2014 6 
lipiji.pz@gmail.comCD-DNN-HMM 
Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneur...
9/3/2014 7 
lipiji.pz@gmail.comImageNet Classification 
0 
0.05 
0.1 
0.15 
0.2 
0.25 
0.3 
0.35 
0.4 
ImageNet Classifica...
9/3/2014 8 
lipiji.pz@gmail.comNowadays 
People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng 
Industry: Google, B...
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 9 
lipiji.pz@gmail.comOutline
•Distributed Representation 
-A Neural Probabilistic Language Model 
-Word2Vec 
•Application 
-Semantic Hierarchies of Wor...
•Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearnin...
•Mikolov,Tomas,KaiChen,GregCorrado,andJeffreyDean. "Efficientestimationofwordrepresentationsinvectorspace."ICLR(2013). 
-L...
•Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. 
•M...
9/3/2014 14 
lipiji.pz@gmail.comWord2vec -CBOW 
•Log-likelihood 
Blog: http://blog.csdn.net/itplus/article/details/3796997...
9/3/2014 15 
lipiji.pz@gmail.comWord2vec –Skip-gram
9/3/2014 16 
lipiji.pz@gmail.comCBOW vs Skip-gram 
Semantic Relation 
Syntactic Relation
9/3/2014 17 
lipiji.pz@gmail.comWord2Vec -Negative Sampling 
Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphras...
9/3/2014 18 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translatio...
9/3/2014 19 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translatio...
•Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivprep...
9/3/2014 21 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translatio...
9/3/2014 22 
lipiji.pz@gmail.comSemantic Hierarchies 
•Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Lear...
•Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING201...
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 20...
•Best Results in Computer Vision (LeNet5, ImageNet 2012~now) 
•Shared Weight: Convolutional Filter Feature Map 
•* Invari...
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 20...
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
9/3/2014 27 
lipiji.pz@gmail.comCNN...
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv...
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 29 
lipiji....
9/3/2014 30 
lipiji.pz@gmail.comSENNA 
Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa...
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
•Denil, Misha, Alban...
9/3/2014 32 
lipiji.pz@gmail.comWeb Search | Learning to Rank 
•Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLa...
9/3/2014 33 
lipiji.pz@gmail.comNN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAI...
9/3/2014 34 
lipiji.pz@gmail.comHybrid NN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmode...
9/3/2014 35 
lipiji.pz@gmail.comTuning 
•#layers,#hiddenunits 
•Learningrate 
•Momentum 
•Weightdecay 
•Mini-bachsize 
•… ...
9/3/2014 36 
lipiji.pz@gmail.com 
Thanks! 
Q&A
Upcoming SlideShare
Loading in …5
×

(Deep) Neural Networks在 NLP 和 Text Mining 总结

14,345 views

Published on

(Deep) Neural Networks在 NLP 和 Text Mining paper总结

Published in: Internet
  • Be the first to comment

(Deep) Neural Networks在 NLP 和 Text Mining 总结

  1. 1. (Deep) Neural Network& Text Mining Piji Li lipiji.pz@gmail.com
  2. 2. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 2 lipiji.pz@gmail.comOutline
  3. 3. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 3 lipiji.pz@gmail.comOutline
  4. 4. 9/3/2014 4 lipiji.pz@gmail.comBP D.E. Rumelhart, G.E. Hinton, R.J. Williams Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 Solved learning problem Biological system … Hard to train (non-convex, tricks) Hard to do theoretical analysis Small training sets... 1986
  5. 5. 9/3/2014 5 lipiji.pz@gmail.comDBN Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 2006 •Unsupervised layer-wised pre-training •Supervised fine-tuning •Feature learning & distributed representations •Large scale datasets •Tricks (learning rate, mini-batch size, momentum, etc.) •https://github.com/lipiji/PG_DEEP
  6. 6. 9/3/2014 6 lipiji.pz@gmail.comCD-DNN-HMM Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 2011 PhD candidate of Prof. Hinton, intern at MSR GPU
  7. 7. 9/3/2014 7 lipiji.pz@gmail.comImageNet Classification 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ImageNet Classification Error 2012 2013 2014 Deeper Network in Network Deep DNN First Blood http://image-net.org/ •1000categories and 1.2million training images Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
  8. 8. 9/3/2014 8 lipiji.pz@gmail.comNowadays People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng Industry: Google, Baidu, Facebook, IBM Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING Startup Companies using (Deep) Machine Learning:
  9. 9. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 9 lipiji.pz@gmail.comOutline
  10. 10. •Distributed Representation -A Neural Probabilistic Language Model -Word2Vec •Application -Semantic Hierarchies of Words -Machine Translation 9/3/2014 10 lipiji.pz@gmail.comNN & Words V(king) –V(queen) + V(woman) ≈ V(man)
  11. 11. •Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 9/3/2014 11 lipiji.pz@gmail.comNeural Probabilistic Language Model H, d U, b W, b |V| concatenate -Maximizes the penalized log-likelihood: -Sofmax -where -Training: BP Better than n-gram model 1, Word vector; 2, Need not soothing, p(w|context) ~(0,1) Time-Consuming
  12. 12. •Mikolov,Tomas,KaiChen,GregCorrado,andJeffreyDean. "Efficientestimationofwordrepresentationsinvectorspace."ICLR(2013). -Largeimprovementsinaccuracy,lowercomputationalcost. -Ittakeslessthanadaytotrainfrom1.6billionwordsdataset. 9/3/2014 12 lipiji.pz@gmail.comWord2Vec
  13. 13. •Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. •Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 9/3/2014 13 lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 5 3 2 2 1 휃2 휃1 푤3 푤2 푤1 •Hierarchical Strategies: O(|V|)O(log|V|) •Word2Vec: Huffman Tree -short codes to frequent words Time-Consuming
  14. 14. 9/3/2014 14 lipiji.pz@gmail.comWord2vec -CBOW •Log-likelihood Blog: http://blog.csdn.net/itplus/article/details/37969979 •Condition Probability •SGD Hierarchical Softmax
  15. 15. 9/3/2014 15 lipiji.pz@gmail.comWord2vec –Skip-gram
  16. 16. 9/3/2014 16 lipiji.pz@gmail.comCBOW vs Skip-gram Semantic Relation Syntactic Relation
  17. 17. 9/3/2014 17 lipiji.pz@gmail.comWord2Vec -Negative Sampling Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). CBOW&Skip-gram Blog: http://blog.csdn.net/itplus/article/details/37969979 Phrase Skip-Gram Results Negative Subsampling: Subsampling:
  18. 18. 9/3/2014 18 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  19. 19. 9/3/2014 19 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  20. 20. •Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 9/3/2014 20 lipiji.pz@gmail.comMachine Translation Translation Matrix English (left) Spanish (right). •Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
  21. 21. 9/3/2014 21 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  22. 22. 9/3/2014 22 lipiji.pz@gmail.comSemantic Hierarchies •Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. hypernym–hyponym More Relations? Clustering y -x, and then for each cluster:
  23. 23. •Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 9/3/2014 23 lipiji.pz@gmail.comRelation Classification Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 4, Concatenate features and feed into softmax classifier SENNA
  24. 24. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 24 lipiji.pz@gmail.comNN & Sentences PhilBlunsom Oxford http://www.cs.ox.ac.uk/people/phil.blunsom/
  25. 25. •Best Results in Computer Vision (LeNet5, ImageNet 2012~now) •Shared Weight: Convolutional Filter Feature Map •* Invariance: pooling (mean-, max-, stochastic) 9/3/2014 25 lipiji.pz@gmail.comConvolutional Neural Networks http://yann.lecun.com/exdb/lenet/ C1: (5*5+1)*6=156 parameters; 156*(28*28)=122,304 connections Convolutional Windows in Sentences
  26. 26. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 9/3/2014 26 lipiji.pz@gmail.comCNN for Modelling Sentences •Supervisor depends on tasks •Dynamic k-Max Pooling •Folding -Sums every two rows •Multiple Feature Maps -Different features -Word order, n-gram -Sentence Feature -Many tasks •Word vector: learning
  27. 27. •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 9/3/2014 27 lipiji.pz@gmail.comCNN for Sentence Classification New: Multi-Channel A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels. word2vec
  28. 28. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 9/3/2014 28 lipiji.pz@gmail.comMultilingual Models Sentences Documents Word vector: learning Objection: CVM: SUM or
  29. 29. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 29 lipiji.pz@gmail.comWord2Vec Extension Sentiment Analysisand Document Classification
  30. 30. 9/3/2014 30 lipiji.pz@gmail.comSENNA Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) Word embeddings ConvNet
  31. 31. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). •Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 9/3/2014 31 lipiji.pz@gmail.comNN & Documents Layer1 Layer4 Layer5 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 How about Text? -Word salience? –word/entity/topic extraction -Sentence salience?
  32. 32. 9/3/2014 32 lipiji.pz@gmail.comWeb Search | Learning to Rank •Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
  33. 33. 9/3/2014 33 lipiji.pz@gmail.comNN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. •Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
  34. 34. 9/3/2014 34 lipiji.pz@gmail.comHybrid NN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
  35. 35. 9/3/2014 35 lipiji.pz@gmail.comTuning •#layers,#hiddenunits •Learningrate •Momentum •Weightdecay •Mini-bachsize •… TO BE A GOOD TUNER
  36. 36. 9/3/2014 36 lipiji.pz@gmail.com Thanks! Q&A

×