SlideShare a Scribd company logo
Hendrik Heuer

Stockholm NLP Meetup
word2vec 

From theory to practice
About me
Hendrik Heuer
hendrikheuer@gmail.com
http://hen-drik.de
@hen_drik
–J. R. Firth 1957
“You shall know a word 

by the company it keeps”
–J. R. Firth 1957
“You shall know a word 

by the company it keeps”
Quoted after Socher
Quoted after Socher
Quoted after Socher
Vectors are directions in space
Vectors can encode relationships
man is to woman as king is to ?
word2vec
• by Mikolov, Sutskever, Chen, 

Corrado and Dean at Google
• NAACL 2013
• takes a text corpus as input 

and produces the 

word vectors as output
Sweden
Similar words
Harvard
Similar words
word2vec
• word meaning and relationships
between words are encoded spatially
• two main learning algorithms in
word2vec: 

continuous bag-of-words and 

continuous skip-gram
Goal
continuous
bag-of-words
• predicting the current
word based on the
context
• order of words in the
history does not
influence the
projection
• faster & more
appropriate for
larger corpora
Mikolov et al.
continuous
skip-gram
• maximize
classification of a
word based on
another word in the
same sentence
• better word vectors
for frequent words,
but slower to train
Mikolov et al.
Why it is awesome
• there is a fast open-source implementation
• can be used as features 

for natural language processing tasks and
machine learning algorithms
Machine Translation
Quoted after Mikolov
Sentiment Analysis
Quoted after SocherRecursive Neural Tensor Network
Image Descriptions
Quoted after Vinyals
Using word2vec
• Original: http://word2vec.googlecode.com/svn/trunk/
• C++11 version: https://github.com/jdeng/word2vec
• Python: http://radimrehurek.com/gensim/models/
word2vec.html
• Java: https://github.com/ansjsun/word2vec_java
• Parallel java: https://github.com/siegfang/word2vec
• CUDAversion: https://github.com/whatupbiatch/cuda-word2vec
Quoted after Wang
Using it in Python
Usage
Training a model
Quoted after Řehůřek
Training a model with iterator
Quoted after Řehůřek
Doing it in C
• Download the code: 

git clone https://github.com/h10r/word2vec-macosx-maverics.git
!
• Run 'make' to compile word2vec tool
!
• Run the demo scripts: 



./demo-word.sh 



and 



./demo-phrases.sh
Doing it in C
• Download the code: 

git clone https://github.com/h10r/word2vec-macosx-maverics.git
!
• Run 'make' to compile word2vec tool
!
• Run the demo scripts: 



./demo-word.sh 



and 



./demo-phrases.sh
Testing a model
Quoted after Řehůřek
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
Should be in Macports
py27-scikit-learn @0.15.2 (python, science)
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec 

From theory to practice
Hendrik Heuer

Stockholm NLP Meetup
!
Discussion:
Can anybody here think of ways
this might help her or him?
Further Reading
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Dean. Efficient Estimation of Word Representations in
Vector Space. In Proceedings of Workshop at ICLR, 2013.
• Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado,
and Jeffrey Dean. Distributed Representations of Words
and Phrases and their Compositionality. In Proceedings of
NIPS, 2013.
• Tomas Mikolov, Wen-tau Yih,and Geoffrey Zweig.
Linguistic Regularities in Continuous Space Word
Representations. In Proceedings of NAACL HLT, 2013.
Further Reading
• Richard Socher - Deep Learning for NLP (without Magic)

http://lxmls.it.pt/2014/?page_id=5
• Wang - Introduction to Word2vec and its application to find
predominant word senses

http://compling.hss.ntu.edu.sg/courses/hg7017/pdf/
word2vec%20and%20its%20application%20to%20wsd.pdf
• Exploiting Similarities among Languages for Machine
Translation, Tomas Mikolov, Quoc V. Le, Ilya Sutskever,
http://arxiv.org/abs/1309.4168
• Title Image by Hans Arp
Further Coding
• word2vec

https://code.google.com/p/word2vec/
• word2vec for MacOSX Maverics

https://github.com/h10r/word2vec-macosx-maverics
• Gensim Python Library

https://radimrehurek.com/gensim/index.html
• Gensim Tutorials

https://radimrehurek.com/gensim/tutorial.html
• Scikit-Learn TSNE

http://scikit-learn.org/stable/modules/generated/
sklearn.manifold.TSNE.html
Hendrik Heuer

Stockholm NLP Meetup
word2vec 

From theory to practice

More Related Content

What's hot

Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
Christian Perone
 
공간정보연구원 PostGIS 강의교재
공간정보연구원 PostGIS 강의교재공간정보연구원 PostGIS 강의교재
공간정보연구원 PostGIS 강의교재
JungHwan Yun
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
Leiden University
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
Roberto Pereira Silveira
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
Byeong-Hyeok Yu
 
The Duet model
The Duet modelThe Duet model
The Duet model
Bhaskar Mitra
 
공간정보거점대학 1.geo server_고급과정
공간정보거점대학 1.geo server_고급과정공간정보거점대학 1.geo server_고급과정
공간정보거점대학 1.geo server_고급과정
BJ Jang
 
LX 공간정보아카데미 PostGIS 강의자료
LX 공간정보아카데미 PostGIS 강의자료LX 공간정보아카데미 PostGIS 강의자료
LX 공간정보아카데미 PostGIS 강의자료
JungHwan Yun
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
Steven Miller
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
Leon Derczynski
 
지리정보체계(GIS) - [2] 좌표계 이해하기
지리정보체계(GIS) - [2] 좌표계 이해하기지리정보체계(GIS) - [2] 좌표계 이해하기
지리정보체계(GIS) - [2] 좌표계 이해하기
Byeong-Hyeok Yu
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
NAVER D2
 
Ontology Building and its Application using Hozo
Ontology Building and its Application using HozoOntology Building and its Application using Hozo
Ontology Building and its Application using Hozo
Kouji Kozaki
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
Mohd Iqbal Al-farabi
 
Ontology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغهOntology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغه
sadegh salehi
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
Hady Elsahar
 
오픈소스 GIS 교육 - PostGIS
오픈소스 GIS 교육 - PostGIS오픈소스 GIS 교육 - PostGIS
오픈소스 GIS 교육 - PostGIS
JungHwan Yun
 
Image super-resolution via iterative refinement.pptx
Image super-resolution via iterative refinement.pptxImage super-resolution via iterative refinement.pptx
Image super-resolution via iterative refinement.pptx
xiaoyiWang32
 

What's hot (20)

Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
공간정보연구원 PostGIS 강의교재
공간정보연구원 PostGIS 강의교재공간정보연구원 PostGIS 강의교재
공간정보연구원 PostGIS 강의교재
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Word embedding
Word embedding Word embedding
Word embedding
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
공간정보거점대학 1.geo server_고급과정
공간정보거점대학 1.geo server_고급과정공간정보거점대학 1.geo server_고급과정
공간정보거점대학 1.geo server_고급과정
 
LX 공간정보아카데미 PostGIS 강의자료
LX 공간정보아카데미 PostGIS 강의자료LX 공간정보아카데미 PostGIS 강의자료
LX 공간정보아카데미 PostGIS 강의자료
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
지리정보체계(GIS) - [2] 좌표계 이해하기
지리정보체계(GIS) - [2] 좌표계 이해하기지리정보체계(GIS) - [2] 좌표계 이해하기
지리정보체계(GIS) - [2] 좌표계 이해하기
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
 
Ontology Building and its Application using Hozo
Ontology Building and its Application using HozoOntology Building and its Application using Hozo
Ontology Building and its Application using Hozo
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
 
Ontology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغهOntology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغه
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
오픈소스 GIS 교육 - PostGIS
오픈소스 GIS 교육 - PostGIS오픈소스 GIS 교육 - PostGIS
오픈소스 GIS 교육 - PostGIS
 
Image super-resolution via iterative refinement.pptx
Image super-resolution via iterative refinement.pptxImage super-resolution via iterative refinement.pptx
Image super-resolution via iterative refinement.pptx
 

Viewers also liked

Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
Kai Sasaki
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Daniele Di Mitri
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
Felipe Moraes
 
Textrank algorithm
Textrank algorithmTextrank algorithm
Textrank algorithm
Andrew Koo
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
Jinpyo Lee
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
👋 Christopher Moody
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 
Word2vec 4 all
Word2vec 4 allWord2vec 4 all
Word2vec 4 all
Óscar García Peinado
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
Edgar Marca
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
Yuya Unno
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Roelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
hen_drik
 
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Yujiro Terazawa
 
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Camille Pradel
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
Sharvil Katariya
 
Text mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipediaText mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipedia
M. Atif Qureshi
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
Liam Bui
 

Viewers also liked (20)

Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Textrank algorithm
Textrank algorithmTextrank algorithm
Textrank algorithm
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Word2vec 4 all
Word2vec 4 allWord2vec 4 all
Word2vec 4 all
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Text summarization
Text summarizationText summarization
Text summarization
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
 
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Text mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipediaText mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipedia
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
 

Similar to word2vec - From theory to practice

Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Sease
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
Ha Loc Do
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
linshanleearchive
 
Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis
DataSift
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
source{d}
 
Devops: A History
Devops: A HistoryDevops: A History
Devops: A History
Nell Shamrell-Harrington
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
OpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Simon Hughes
 
StackEngine Problem Space Demo
StackEngine Problem Space DemoStackEngine Problem Space Demo
StackEngine Problem Space Demo
Boyd Hemphill
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
Roelof Pieters
 
Jwt == insecurity?
Jwt == insecurity?Jwt == insecurity?
Jwt == insecurity?
snyff
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
AIST
 
NLP with H2O
NLP with H2ONLP with H2O
NLP with H2O
Sri Ambati
 
Open Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talkOpen Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talk
Mushon Zer-Aviv
 

Similar to word2vec - From theory to practice (20)

Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
Devops: A History
Devops: A HistoryDevops: A History
Devops: A History
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
StackEngine Problem Space Demo
StackEngine Problem Space DemoStackEngine Problem Space Demo
StackEngine Problem Space Demo
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Jwt == insecurity?
Jwt == insecurity?Jwt == insecurity?
Jwt == insecurity?
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
 
NLP with H2O
NLP with H2ONLP with H2O
NLP with H2O
 
Open Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talkOpen Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talk
 

More from hen_drik

ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and moreifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
hen_drik
 
Game Design - Black and White
Game Design - Black and WhiteGame Design - Black and White
Game Design - Black and White
hen_drik
 
Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)
hen_drik
 
Actuated workbench
Actuated workbenchActuated workbench
Actuated workbenchhen_drik
 
Ruby on Rails - Kurzvortrag
Ruby on Rails - KurzvortragRuby on Rails - Kurzvortrag
Ruby on Rails - Kurzvortraghen_drik
 
Jugendmedienschutz
JugendmedienschutzJugendmedienschutz
Jugendmedienschutzhen_drik
 
Farbformprozess
FarbformprozessFarbformprozess
Farbformprozesshen_drik
 
Defining video games
Defining video gamesDefining video games
Defining video gameshen_drik
 

More from hen_drik (9)

ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and moreifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
 
Game Design - Black and White
Game Design - Black and WhiteGame Design - Black and White
Game Design - Black and White
 
Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)
 
Actuated workbench
Actuated workbenchActuated workbench
Actuated workbench
 
Sport
SportSport
Sport
 
Ruby on Rails - Kurzvortrag
Ruby on Rails - KurzvortragRuby on Rails - Kurzvortrag
Ruby on Rails - Kurzvortrag
 
Jugendmedienschutz
JugendmedienschutzJugendmedienschutz
Jugendmedienschutz
 
Farbformprozess
FarbformprozessFarbformprozess
Farbformprozess
 
Defining video games
Defining video gamesDefining video games
Defining video games
 

Recently uploaded

Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
Raheem Muhammad
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
OECD Directorate for Financial and Enterprise Affairs
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
gpww3sf4
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Ben Linders
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
RIDHIMAGARG21
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
Robin Haunschild
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
artemacademy2
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
Ben Linders
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
OECD Directorate for Financial and Enterprise Affairs
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
SkillCertProExams
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
kainatfatyma9
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 

Recently uploaded (20)

Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
 
Gamify it until you make it Improving Agile Development and Operations with ...
Gamify it until you make it  Improving Agile Development and Operations with ...Gamify it until you make it  Improving Agile Development and Operations with ...
Gamify it until you make it Improving Agile Development and Operations with ...
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
 

word2vec - From theory to practice

  • 1. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice
  • 3. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps”
  • 4. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps” Quoted after Socher
  • 7.
  • 8. Vectors are directions in space Vectors can encode relationships
  • 9. man is to woman as king is to ?
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. word2vec • by Mikolov, Sutskever, Chen, 
 Corrado and Dean at Google • NAACL 2013 • takes a text corpus as input 
 and produces the 
 word vectors as output
  • 17. word2vec • word meaning and relationships between words are encoded spatially • two main learning algorithms in word2vec: 
 continuous bag-of-words and 
 continuous skip-gram
  • 18. Goal
  • 19. continuous bag-of-words • predicting the current word based on the context • order of words in the history does not influence the projection • faster & more appropriate for larger corpora Mikolov et al.
  • 20. continuous skip-gram • maximize classification of a word based on another word in the same sentence • better word vectors for frequent words, but slower to train Mikolov et al.
  • 21. Why it is awesome • there is a fast open-source implementation • can be used as features 
 for natural language processing tasks and machine learning algorithms
  • 23. Sentiment Analysis Quoted after SocherRecursive Neural Tensor Network
  • 25. Using word2vec • Original: http://word2vec.googlecode.com/svn/trunk/ • C++11 version: https://github.com/jdeng/word2vec • Python: http://radimrehurek.com/gensim/models/ word2vec.html • Java: https://github.com/ansjsun/word2vec_java • Parallel java: https://github.com/siegfang/word2vec • CUDAversion: https://github.com/whatupbiatch/cuda-word2vec Quoted after Wang
  • 26. Using it in Python
  • 27.
  • 28. Usage
  • 29. Training a model Quoted after Řehůřek
  • 30. Training a model with iterator Quoted after Řehůřek
  • 31. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  • 32. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  • 33.
  • 34. Testing a model Quoted after Řehůřek
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 41. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 42.
  • 43.
  • 44. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 45.
  • 46. Should be in Macports py27-scikit-learn @0.15.2 (python, science)
  • 47.
  • 48. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. word2vec 
 From theory to practice Hendrik Heuer Stockholm NLP Meetup ! Discussion: Can anybody here think of ways this might help her or him?
  • 54. Further Reading • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. • Tomas Mikolov, Wen-tau Yih,and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
  • 55. Further Reading • Richard Socher - Deep Learning for NLP (without Magic)
 http://lxmls.it.pt/2014/?page_id=5 • Wang - Introduction to Word2vec and its application to find predominant word senses
 http://compling.hss.ntu.edu.sg/courses/hg7017/pdf/ word2vec%20and%20its%20application%20to%20wsd.pdf • Exploiting Similarities among Languages for Machine Translation, Tomas Mikolov, Quoc V. Le, Ilya Sutskever, http://arxiv.org/abs/1309.4168 • Title Image by Hans Arp
  • 56. Further Coding • word2vec
 https://code.google.com/p/word2vec/ • word2vec for MacOSX Maverics
 https://github.com/h10r/word2vec-macosx-maverics • Gensim Python Library
 https://radimrehurek.com/gensim/index.html • Gensim Tutorials
 https://radimrehurek.com/gensim/tutorial.html • Scikit-Learn TSNE
 http://scikit-learn.org/stable/modules/generated/ sklearn.manifold.TSNE.html
  • 57. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice