(Deep) Neural Networks在 NLP 和 Text Mining 总结

(Deep) Neural Network& Text Mining 
Piji Li 
lipiji.pz@gmail.com
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 2 
lipiji.pz@gmail.comOutline
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 3 
lipiji.pz@gmail.comOutline
9/3/2014 4 
lipiji.pz@gmail.comBP 
D.E. Rumelhart, G.E. Hinton, R.J. Williams 
Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 
Solved learning problem 
Biological system 
… 
Hard to train (non-convex, tricks) 
Hard to do theoretical analysis 
Small training sets... 
1986
9/3/2014 5 
lipiji.pz@gmail.comDBN 
Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. 
Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 
2006 
•Unsupervised layer-wised pre-training 
•Supervised fine-tuning 
•Feature learning & distributed representations 
•Large scale datasets 
•Tricks (learning rate, mini-batch size, momentum, etc.) 
•https://github.com/lipiji/PG_DEEP
9/3/2014 6 
lipiji.pz@gmail.comCD-DNN-HMM 
Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. 
Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 
2011 
PhD candidate of Prof. Hinton, intern at MSR 
GPU
9/3/2014 7 
lipiji.pz@gmail.comImageNet Classification 
0 
0.05 
0.1 
0.15 
0.2 
0.25 
0.3 
0.35 
0.4 
ImageNet Classification Error 
2012 
2013 
2014 
Deeper 
Network in Network 
Deep 
DNN 
First Blood 
http://image-net.org/ 
•1000categories and 1.2million training images 
Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
9/3/2014 8 
lipiji.pz@gmail.comNowadays 
People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng 
Industry: Google, Baidu, Facebook, IBM 
Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING 
Startup Companies using (Deep) Machine Learning:
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 9 
lipiji.pz@gmail.comOutline
•Distributed Representation 
-A Neural Probabilistic Language Model 
-Word2Vec 
•Application 
-Semantic Hierarchies of Words 
-Machine Translation 
9/3/2014 10 
lipiji.pz@gmail.comNN & Words 
V(king) –V(queen) + V(woman) ≈ V(man)
•Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 
9/3/2014 11 
lipiji.pz@gmail.comNeural Probabilistic Language Model 
H, d 
U, b 
W, b 
|V| 
concatenate 
-Maximizes the penalized log-likelihood: 
-Sofmax 
-where 
-Training: BP 
Better than n-gram model 
1, Word vector; 
2, Need not soothing, p(w|context) ~(0,1) 
Time-Consuming
•Mikolov,Tomas,KaiChen,GregCorrado,andJeffreyDean. "Efficientestimationofwordrepresentationsinvectorspace."ICLR(2013). 
-Largeimprovementsinaccuracy,lowercomputationalcost. 
-Ittakeslessthanadaytotrainfrom1.6billionwordsdataset. 
9/3/2014 12 
lipiji.pz@gmail.comWord2Vec
•Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. 
•Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 
9/3/2014 13 
lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 
5 
3 
2 
2 
1 
휃2 
휃1 
푤3 
푤2 
푤1 
•Hierarchical Strategies: 
O(|V|)O(log|V|) 
•Word2Vec: 
Huffman Tree -short codes to frequent words 
Time-Consuming
9/3/2014 14 
lipiji.pz@gmail.comWord2vec -CBOW 
•Log-likelihood 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
•Condition Probability 
•SGD 
Hierarchical Softmax
9/3/2014 15 
lipiji.pz@gmail.comWord2vec –Skip-gram
9/3/2014 16 
lipiji.pz@gmail.comCBOW vs Skip-gram 
Semantic Relation 
Syntactic Relation
9/3/2014 17 
lipiji.pz@gmail.comWord2Vec -Negative Sampling 
Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). 
CBOW&Skip-gram 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
Phrase Skip-Gram Results 
Negative Subsampling: 
Subsampling:
9/3/2014 18 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 19 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
•Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 
9/3/2014 20 
lipiji.pz@gmail.comMachine Translation 
Translation Matrix 
English (left) Spanish (right). 
•Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
9/3/2014 21 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 22 
lipiji.pz@gmail.comSemantic Hierarchies 
•Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. 
hypernym–hyponym 
More Relations? 
Clustering y -x, and then for each cluster:
•Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 
9/3/2014 23 
lipiji.pz@gmail.comRelation Classification 
Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” 
Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 
4, Concatenate features and feed into softmax classifier 
SENNA
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). 
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 24 
lipiji.pz@gmail.comNN & Sentences 
PhilBlunsom 
Oxford 
http://www.cs.ox.ac.uk/people/phil.blunsom/
•Best Results in Computer Vision (LeNet5, ImageNet 2012~now) 
•Shared Weight: Convolutional Filter Feature Map 
•* Invariance: pooling (mean-, max-, stochastic) 
9/3/2014 25 
lipiji.pz@gmail.comConvolutional Neural Networks 
http://yann.lecun.com/exdb/lenet/ 
C1: (5*5+1)*6=156 parameters; 
156*(28*28)=122,304 connections 
Convolutional 
Windows in Sentences
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
9/3/2014 26 
lipiji.pz@gmail.comCNN for Modelling Sentences 
•Supervisor depends on tasks 
•Dynamic k-Max Pooling 
•Folding 
-Sums every two rows 
•Multiple Feature Maps 
-Different features 
-Word order, n-gram 
-Sentence Feature 
-Many tasks 
•Word vector: learning
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
9/3/2014 27 
lipiji.pz@gmail.comCNN for Sentence Classification 
New: Multi-Channel 
A model with two sets of word vectors. Each set of vectors is treated 
as a ‘channel’ and each filter is applied to both channels. 
word2vec
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
9/3/2014 28 
lipiji.pz@gmail.comMultilingual Models 
Sentences 
Documents 
Word vector: learning 
Objection: 
CVM: SUM or
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 29 
lipiji.pz@gmail.comWord2Vec Extension 
Sentiment Analysisand Document Classification
9/3/2014 30 
lipiji.pz@gmail.comSENNA 
Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) 
Word embeddings 
ConvNet
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
•Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 
9/3/2014 31 
lipiji.pz@gmail.comNN & Documents 
Layer1 
Layer4 
Layer5 
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 
How about Text? 
-Word salience? –word/entity/topic extraction 
-Sentence salience?
9/3/2014 32 
lipiji.pz@gmail.comWeb Search | Learning to Rank 
•Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
9/3/2014 33 
lipiji.pz@gmail.comNN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. 
•Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
9/3/2014 34 
lipiji.pz@gmail.comHybrid NN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
9/3/2014 35 
lipiji.pz@gmail.comTuning 
•#layers,#hiddenunits 
•Learningrate 
•Momentum 
•Weightdecay 
•Mini-bachsize 
•… 
TO BE A GOOD TUNER
9/3/2014 36 
lipiji.pz@gmail.com 
Thanks! 
Q&A
1 of 36

Recommended

Word2Vec: Learning of word representations in a vector space - Di Mitri & Her... by
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
4.4K views14 slides
Representation Learning of Vectors of Words and Phrases by
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
6.4K views21 slides
Word Embeddings, why the hype ? by
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
5.6K views49 slides
Neural Text Embeddings for Information Retrieval (WSDM 2017) by
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
17.3K views126 slides
Deep learning for nlp by
Deep learning for nlpDeep learning for nlp
Deep learning for nlpViet-Trung TRAN
2K views63 slides
Deep Learning for Natural Language Processing: Word Embeddings by
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
9.9K views99 slides

More Related Content

What's hot

What is word2vec? by
What is word2vec?What is word2vec?
What is word2vec?Traian Rebedea
22.3K views42 slides
Tomáš Mikolov - Distributed Representations for NLP by
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
3.5K views25 slides
Word Embedding to Document distances by
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
1.1K views26 slides
Using Text Embeddings for Information Retrieval by
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
10K views40 slides
Deep Learning for Information Retrieval by
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
10.5K views93 slides
(Kpi summer school 2015) word embeddings and neural language modeling by
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modelingSerhii Havrylov
1.3K views57 slides

What's hot(20)

Word Embedding to Document distances by Ganesh Borle
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
Ganesh Borle1.1K views
Using Text Embeddings for Information Retrieval by Bhaskar Mitra
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
Bhaskar Mitra10K views
Deep Learning for Information Retrieval by Roelof Pieters
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
Roelof Pieters10.5K views
(Kpi summer school 2015) word embeddings and neural language modeling by Serhii Havrylov
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling
Serhii Havrylov1.3K views
Visual-Semantic Embeddings: some thoughts on Language by Roelof Pieters
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters4.5K views
Learning to understand phrases by embedding the dictionary by Roelof Pieters
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
Roelof Pieters1.9K views
Vectorland: Brief Notes from Using Text Embeddings for Search by Bhaskar Mitra
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
Bhaskar Mitra6.3K views
Deep Learning for NLP: An Introduction to Neural Word Embeddings by Roelof Pieters
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Roelof Pieters20.1K views
Word Embeddings - Introduction by Christian Perone
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
Christian Perone19.2K views
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop by iwan_rg
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
iwan_rg2.5K views
Engineering Intelligent NLP Applications Using Deep Learning – Part 1 by Saurabh Kaushik
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Saurabh Kaushik1.7K views
Multi modal retrieval and generation with deep distributed models by Roelof Pieters
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters1.6K views
Deep learning for natural language embeddings by Roelof Pieters
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
Roelof Pieters1.9K views

Similar to (Deep) Neural Networks在 NLP 和 Text Mining 总结

Deep Neural Methods for Retrieval by
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
661 views47 slides
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2 by
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
522 views273 slides
Should we be afraid of Transformers? by
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?Dominik Seisser
299 views49 slides
Hacking Human Language (PyData London) by
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)hen_drik
2.2K views92 slides
Colloquium talk on modal sense classification using a convolutional neural ne... by
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
72 views55 slides
More ways of symbol grounding for knowledge graphs? by
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
3.9K views15 slides

Similar to (Deep) Neural Networks在 NLP 和 Text Mining 总结(20)

Deep Neural Methods for Retrieval by Bhaskar Mitra
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra661 views
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2 by Karthik Murugesan
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan522 views
Should we be afraid of Transformers? by Dominik Seisser
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
Dominik Seisser299 views
Hacking Human Language (PyData London) by hen_drik
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
hen_drik2.2K views
Colloquium talk on modal sense classification using a convolutional neural ne... by Ana Marasović
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović72 views
More ways of symbol grounding for knowledge graphs? by Paul Groth
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth3.9K views
2015-SemEval2015_poster by hpcosta
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_poster
hpcosta91 views
Towards Modelling Language Innovation Acceptance in Online Social Networks by Daniel Kershaw
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social Networks
Daniel Kershaw137 views
Lean Logic for Lean Times: Varieties of Natural Logic by Valeria de Paiva
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
Valeria de Paiva462 views
Big Data Palooza Talk: Aspects of Semantic Processing by Na'im Tyson
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
Na'im Tyson1.2K views
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto... by Seth Grimes
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes219 views
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu... by Seth Grimes
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes155 views
Dcnn for text by 捷恩 蔡
Dcnn for textDcnn for text
Dcnn for text
捷恩 蔡1.5K views
Convolutional neural networks for sentiment classification by Yunchao He
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
Yunchao He1.2K views
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data by Hang Dong
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Hang Dong545 views
Subword and spatiotemporal models for identifying actionable information in ... by Robert Munro
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
Robert Munro536 views
UCU NLP Summer Workshops 2017 - Part 2 by Yuriy Guts
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
Yuriy Guts366 views

Recently uploaded

Affiliate Marketing by
Affiliate MarketingAffiliate Marketing
Affiliate MarketingNavin Dhanuka
18 views30 slides
Marketing and Community Building in Web3 by
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3Federico Ast
15 views64 slides
hamro digital logics.pptx by
hamro digital logics.pptxhamro digital logics.pptx
hamro digital logics.pptxtupeshghimire
10 views36 slides
How to think like a threat actor for Kubernetes.pptx by
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptxLibbySchulze1
7 views33 slides
ATPMOUSE_융합2조.pptx by
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptxkts120898
35 views70 slides
The Dark Web : Hidden Services by
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden ServicesAnshu Singh
16 views24 slides

Recently uploaded(6)

Marketing and Community Building in Web3 by Federico Ast
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3
Federico Ast15 views
How to think like a threat actor for Kubernetes.pptx by LibbySchulze1
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptx
LibbySchulze17 views
ATPMOUSE_융합2조.pptx by kts120898
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptx
kts12089835 views
The Dark Web : Hidden Services by Anshu Singh
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden Services
Anshu Singh16 views

(Deep) Neural Networks在 NLP 和 Text Mining 总结

  • 1. (Deep) Neural Network& Text Mining Piji Li lipiji.pz@gmail.com
  • 2. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 2 lipiji.pz@gmail.comOutline
  • 3. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 3 lipiji.pz@gmail.comOutline
  • 4. 9/3/2014 4 lipiji.pz@gmail.comBP D.E. Rumelhart, G.E. Hinton, R.J. Williams Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 Solved learning problem Biological system … Hard to train (non-convex, tricks) Hard to do theoretical analysis Small training sets... 1986
  • 5. 9/3/2014 5 lipiji.pz@gmail.comDBN Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 2006 •Unsupervised layer-wised pre-training •Supervised fine-tuning •Feature learning & distributed representations •Large scale datasets •Tricks (learning rate, mini-batch size, momentum, etc.) •https://github.com/lipiji/PG_DEEP
  • 6. 9/3/2014 6 lipiji.pz@gmail.comCD-DNN-HMM Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 2011 PhD candidate of Prof. Hinton, intern at MSR GPU
  • 7. 9/3/2014 7 lipiji.pz@gmail.comImageNet Classification 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ImageNet Classification Error 2012 2013 2014 Deeper Network in Network Deep DNN First Blood http://image-net.org/ •1000categories and 1.2million training images Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
  • 8. 9/3/2014 8 lipiji.pz@gmail.comNowadays People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng Industry: Google, Baidu, Facebook, IBM Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING Startup Companies using (Deep) Machine Learning:
  • 9. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 9 lipiji.pz@gmail.comOutline
  • 10. •Distributed Representation -A Neural Probabilistic Language Model -Word2Vec •Application -Semantic Hierarchies of Words -Machine Translation 9/3/2014 10 lipiji.pz@gmail.comNN & Words V(king) –V(queen) + V(woman) ≈ V(man)
  • 11. •Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 9/3/2014 11 lipiji.pz@gmail.comNeural Probabilistic Language Model H, d U, b W, b |V| concatenate -Maximizes the penalized log-likelihood: -Sofmax -where -Training: BP Better than n-gram model 1, Word vector; 2, Need not soothing, p(w|context) ~(0,1) Time-Consuming
  • 13. •Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. •Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 9/3/2014 13 lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 5 3 2 2 1 휃2 휃1 푤3 푤2 푤1 •Hierarchical Strategies: O(|V|)O(log|V|) •Word2Vec: Huffman Tree -short codes to frequent words Time-Consuming
  • 14. 9/3/2014 14 lipiji.pz@gmail.comWord2vec -CBOW •Log-likelihood Blog: http://blog.csdn.net/itplus/article/details/37969979 •Condition Probability •SGD Hierarchical Softmax
  • 16. 9/3/2014 16 lipiji.pz@gmail.comCBOW vs Skip-gram Semantic Relation Syntactic Relation
  • 17. 9/3/2014 17 lipiji.pz@gmail.comWord2Vec -Negative Sampling Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). CBOW&Skip-gram Blog: http://blog.csdn.net/itplus/article/details/37969979 Phrase Skip-Gram Results Negative Subsampling: Subsampling:
  • 18. 9/3/2014 18 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 19. 9/3/2014 19 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 20. •Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 9/3/2014 20 lipiji.pz@gmail.comMachine Translation Translation Matrix English (left) Spanish (right). •Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
  • 21. 9/3/2014 21 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 22. 9/3/2014 22 lipiji.pz@gmail.comSemantic Hierarchies •Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. hypernym–hyponym More Relations? Clustering y -x, and then for each cluster:
  • 23. •Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 9/3/2014 23 lipiji.pz@gmail.comRelation Classification Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 4, Concatenate features and feed into softmax classifier SENNA
  • 24. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 24 lipiji.pz@gmail.comNN & Sentences PhilBlunsom Oxford http://www.cs.ox.ac.uk/people/phil.blunsom/
  • 25. •Best Results in Computer Vision (LeNet5, ImageNet 2012~now) •Shared Weight: Convolutional Filter Feature Map •* Invariance: pooling (mean-, max-, stochastic) 9/3/2014 25 lipiji.pz@gmail.comConvolutional Neural Networks http://yann.lecun.com/exdb/lenet/ C1: (5*5+1)*6=156 parameters; 156*(28*28)=122,304 connections Convolutional Windows in Sentences
  • 26. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 9/3/2014 26 lipiji.pz@gmail.comCNN for Modelling Sentences •Supervisor depends on tasks •Dynamic k-Max Pooling •Folding -Sums every two rows •Multiple Feature Maps -Different features -Word order, n-gram -Sentence Feature -Many tasks •Word vector: learning
  • 27. •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 9/3/2014 27 lipiji.pz@gmail.comCNN for Sentence Classification New: Multi-Channel A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels. word2vec
  • 28. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 9/3/2014 28 lipiji.pz@gmail.comMultilingual Models Sentences Documents Word vector: learning Objection: CVM: SUM or
  • 29. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 29 lipiji.pz@gmail.comWord2Vec Extension Sentiment Analysisand Document Classification
  • 30. 9/3/2014 30 lipiji.pz@gmail.comSENNA Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) Word embeddings ConvNet
  • 31. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). •Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 9/3/2014 31 lipiji.pz@gmail.comNN & Documents Layer1 Layer4 Layer5 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 How about Text? -Word salience? –word/entity/topic extraction -Sentence salience?
  • 32. 9/3/2014 32 lipiji.pz@gmail.comWeb Search | Learning to Rank •Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
  • 33. 9/3/2014 33 lipiji.pz@gmail.comNN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. •Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
  • 34. 9/3/2014 34 lipiji.pz@gmail.comHybrid NN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
  • 35. 9/3/2014 35 lipiji.pz@gmail.comTuning •#layers,#hiddenunits •Learningrate •Momentum •Weightdecay •Mini-bachsize •… TO BE A GOOD TUNER