This document summarizes the lda2vec model, which combines aspects of word2vec and LDA. Word2vec learns word embeddings based on local context, while LDA learns document-level topic mixtures. Lda2vec models words based on both their local context and global document topic mixtures to leverage both approaches. It represents documents as mixtures over sparse topic vectors similar to LDA to maintain interpretability. This allows it to predict words based on local context and global document content.
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
(Data Day 2016)
Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes. Our systems need to identify a medical professional when she writes that she 'used to wear scrubs to work', and distill 'taking a trip' into a Fix for vacation clothing. Applied appropriately, word vectors are dramatically more meaningful and more flexible than current techniques and let computers peer into text in a fundamentally new way. I'll try to convince you that word vectors give us a simple and flexible platform for understanding text while speaking about word2vec, LDA, and introduce our hybrid algorithm lda2vec.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
(Data Day 2016)
Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes. Our systems need to identify a medical professional when she writes that she 'used to wear scrubs to work', and distill 'taking a trip' into a Fix for vacation clothing. Applied appropriately, word vectors are dramatically more meaningful and more flexible than current techniques and let computers peer into text in a fundamentally new way. I'll try to convince you that word vectors give us a simple and flexible platform for understanding text while speaking about word2vec, LDA, and introduce our hybrid algorithm lda2vec.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Databricks
Entity extraction, also known as named-entity recognition (NER), entity chunking and entity identification, is a subtask of information extraction with the goal of detecting and classifying phrases in a text into predefined categories. While finding entities in an automated way is useful on its own, it often serves as a preprocessing step for more complex tasks, such as relationship extraction. For example, biomedical entity extraction is a critical step for understanding the interactions between different entity types, such as the drug-disease relationship or the gene-protein relationship. Feature generation for such tasks is often complex and time consuming. However, neural networks can obviate the need for feature engineering and use original data as input.
We will demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained with word2vec learning algorithm on a Spark cluster using millions of Medline PubMed abstracts and then used as features to train an LSTM recurrent neural network for entity extraction, using Keras with TensorFlow or CNTK on a GPU-enabled Azure Data Science Virtual Machine (DSVM). Results show that training a domain-specific word embedding model boosts performance when compared to embeddings trained on generic data such as Google News. While we use biomedical data as an example, the pipeline is generic and can be applied to other domains.
Continuous representations of words and documents, which is recently referred to as Word Embeddings, have recently demonstrated large advancements in many of the Natural language processing tasks.
In this presentation we will provide an introduction to the most common methods of learning these representations. As well as previous methods in building these representations before the recent advances in deep learning, such as dimensionality reduction on the word co-occurrence matrix.
Moreover, we will present the continuous bag of word model (CBOW), one of the most successful models for word embeddings and one of the core models in word2vec, and in brief a glance of many other models of building representations for other tasks such as knowledge base embeddings.
Finally, we will motivate the potential of using such embeddings for many tasks that could be of importance for the group, such as semantic similarity, document clustering and retrieval.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Databricks
Entity extraction, also known as named-entity recognition (NER), entity chunking and entity identification, is a subtask of information extraction with the goal of detecting and classifying phrases in a text into predefined categories. While finding entities in an automated way is useful on its own, it often serves as a preprocessing step for more complex tasks, such as relationship extraction. For example, biomedical entity extraction is a critical step for understanding the interactions between different entity types, such as the drug-disease relationship or the gene-protein relationship. Feature generation for such tasks is often complex and time consuming. However, neural networks can obviate the need for feature engineering and use original data as input.
We will demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained with word2vec learning algorithm on a Spark cluster using millions of Medline PubMed abstracts and then used as features to train an LSTM recurrent neural network for entity extraction, using Keras with TensorFlow or CNTK on a GPU-enabled Azure Data Science Virtual Machine (DSVM). Results show that training a domain-specific word embedding model boosts performance when compared to embeddings trained on generic data such as Google News. While we use biomedical data as an example, the pipeline is generic and can be applied to other domains.
Continuous representations of words and documents, which is recently referred to as Word Embeddings, have recently demonstrated large advancements in many of the Natural language processing tasks.
In this presentation we will provide an introduction to the most common methods of learning these representations. As well as previous methods in building these representations before the recent advances in deep learning, such as dimensionality reduction on the word co-occurrence matrix.
Moreover, we will present the continuous bag of word model (CBOW), one of the most successful models for word embeddings and one of the core models in word2vec, and in brief a glance of many other models of building representations for other tasks such as knowledge base embeddings.
Finally, we will motivate the potential of using such embeddings for many tasks that could be of importance for the group, such as semantic similarity, document clustering and retrieval.
General background and conceptual explanation of word embeddings (word2vec in particular). Mostly aimed at linguists, but also understandable for non-linguists.
Leiden University, 23 March 2018
Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
This is a tutorial on topic modelling techniques - that informs the reader about the basic ingredients of all topic models, and allows them to develop a new model in the end.
Chainer is a deep learning framework which is flexible, intuitive, and powerful. This slide introduces some unique features of Chainer and its additional packages such as ChainerMN (distributed learning), ChainerCV (computer vision), ChainerRL (reinforcement learning)
Recipe2Vec: Or how does my robot know what’s tastyPyData
By Meghan Heintz
PyData New York City 2017
Your user knows they want a healthyish but tasty pasta for dinner but aren't quite sure exactly which recipe to choose. How can you help narrow their search and show them closely related recipes to give them enough options without making their search exhausting? This talk will show you BuzzFeed/Tasty tech's solution to creating a consistent method for finding similar Tasty recipes using word2vec.
Word embeddings have received a lot of attention since some Tomas Mikolov published word2vec in 2013 and showed that the embeddings that the neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding started being studied in more detail and applied to more serious NLP and IR tasks such as summarization, query expansion, etc… More recently, researchers and practitioners alike have come to appreciate the power of this type of approach and have started a cottage industry of modifying Mikolov’s original approach to many different areas.
In this talk we will cover the implementation and mathematical details underlying tools like word2vec and some of the applications word embeddings have found in various areas. Starting from an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec we will proceed to discussing the implementation details of the word2vec reference implementation in tensorflow. Finally, we will provide a birds eye view of the emerging field of “2vec" (dna2vec, node2vec, etc...) methods that use variations of the word2vec neural network architecture.
This (long) version of the Tutorial was presented at #O'Reilly AI 2017 in San Francisco. See https://bmtgoncalves.github.io/word2vec-and-friends/ for further details.
This is presentation about what skip-gram and CBOW is in seminar of Natural Language Processing Labs.
- how to make vector of words using skip-gram & CBOW.
Cloud Native Night, December 2020, talk by Jörg Viechtbauer (Senior Software Architect, QAware)
== Please download slides if blurred! ==
Abstract:
Neural networks like BERT have revolutionized the processing of natural language and achieve state-of-the-art performance in many NLP tasks. One of them is semantic search where documents are found by query intent and not only by exact match.
This talk takes us through the history of information retrieval and shows how keyword search has evolved into the term vector model. The desire for a better search led to the development of the first semantic models like SLI or PLSA. We will see how this culminates today in the use of sophisticated deep neural networks that perform nonlinear dimensional reductions and master long-range dependencies.
Semantic search has never been as good and easy to implement as it is today.
About Jörg:
Jörg is a search expert at QAware and uses neural networks for semantic search and text comprehension. He has spent almost 20 years developing search engines based on both proprietary and open source software for enterprise search, eDiscovery and local search - always hunting for the perfect ranking formula.
Ever wondered about the full form of Chat GPT?🤔 It stands for Chat Generative Pre-Trained Transformer. For those diving into the world of Transformers, I've been using this PPT during my lectures📚. Thought it might be handy for some of you too! Check it out and let me know what you think!🌟
Hacking Human Language (PyCon Sweden 2015)hen_drik
Video: https://www.youtube.com/watch?v=JXjB8yO-M7k
Abstract: This talk introduces computational social science as a new research discipline, gives a brief introduction to natural language processing and explains how word vector representations are computed and how to use them in Python. Word vector representations like word2vec encode semantic relationships like gender and "is the capital city of". This makes it easy to find similar words and compare them visually. To illustrate this, I am using the gensim and scikit-learn Python libraries to compare my own Google searches from 2011 and 2014.
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonMoscowDataFest
Presentation from Moscow Data Fest #1, September 12.
Moscow Data Fest is a free one-day event that brings together Data Scientists for sessions on both theory and practice.
Link: http://www.meetup.com/Moscow-Data-Fest/
Data Con LA 2022 - Transformers for NLPData Con LA
Ash Pahwa, Instructor at CalTech
Transformer architecture was proposed by Google Brain in 2017 to process sequential data. Transformers can be used in Natural Language Processing (NLP) and Computer Vision applications. Transformer architecture is based on the concept of ‘Self-Attention’. Transformers replaced the RNN/LSTM architecture. The major advantages of Transformer architecture are that they are fast and bi-directional. The input text is fed into this architecture in parallel which allows faster processing. The leading Language models BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), are built upon Transformer architecture. BERT was proposed by Google and GPT-1/2/3 was proposed by OpenAI. BERT Language Model is included in Google Search Engine. HuggingFace web portal provides many popular Transformers in different flavors. Transformer can be used for all Natural Language Processing (NLP) applications like sentiment analysis, translation, auto-completion, named entity recognition, automatic question- answering and many more. Transformers can also be used for generating artificial text, which is indistinguishable from text generated by humans. This talk will briefly cover the theory of Transformers. Next it will focus on how to fine tune the standard Transformer library (downloaded from Hugging Face portal) for a specific application.
Similar to word2vec, LDA, and introducing a new hybrid algorithm: lda2vec (20)
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
1. A word is worth a
thousand vectors
(word2vec, lda, and introducing lda2vec)
Christopher Moody
@ Stitch Fix
2. About
@chrisemoody
Caltech Physics
PhD. in astrostats supercomputing
sklearn t-SNE contributor
Data Labs at Stitch Fix
github.com/cemoody
Gaussian Processes t-SNE
chainer
deep learning
Tensor Decomposition
3. Credit
Large swathes of this talk are from
previous presentations by:
• Tomas Mikolov
• David Blei
• Christopher Olah
• Radim Rehurek
• Omer Levy & Yoav Goldberg
• Richard Socher
• Xin Rong
• Tim Hopper
8. w
ord2vec
“The fox jumped over the lazy dog”
Maximize the likelihood of seeing the words given the word over.
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
…instead of maximizing the likelihood of co-occurrence counts.
11. w
ord2vec
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
“The fox jumped over the lazy dog”
P(vOUT|vIN)
12. w
ord2vec
“The fox jumped over the lazy dog”
vIN
P(vOUT|vIN)
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
13. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
14. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
15. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
16. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
17. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
18. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
19. w
ord2vec
P(vOUT|vIN)
“The fox jumped over the lazy dog”
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
20. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
21. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
22. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
23. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
24. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
25. w
ord2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether it’s the input or the output.
Also a context window around every input word.
32. w
ord2vec
But we’d like to measure a probability.
softmax(vin . vout ∈ [-1,1])
objective
∈ [0,1]
33. w
ord2vec
But we’d like to measure a probability.
softmax(vin . vout ∈ [-1,1])
Probability of choosing 1 of N discrete items.
Mapping from vector space to a multinomial over words.
objective
34. w
ord2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [0,1])softmax ~
objective
35. w
ord2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [-1,1])
Σexp(vin . vk)
softmax =
objective
Normalization term over all words
k ∈ V
36. w
ord2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [-1,1])
Σexp(vin . vk)
softmax = = P(vout|vin)
objective
k ∈ V
37. w
ord2vec
Learn by gradient descent on the softmax prob.
For every example we see update vin
vin := vin + P(vout|vin)
objective
vout := vout + P(vout|vin)
79. The goal:
Use all of this context to learn
interpretable topics.
P(vOUT |vIN)word2vec
@chrisemoody
80. word2vec
LDA P(vOUT |vDOC)
The goal:
Use all of this context to learn
interpretable topics.
this document is
80% high fashion
this document is
60% style
@chrisemoody
81. word2vec
LDA
The goal:
Use all of this context to learn
interpretable topics.
this zip code is
80% hot climate
this zip code is
60% outdoors wear
@chrisemoody
82. word2vec
LDA
The goal:
Use all of this context to learn
interpretable topics.
this client is
80% sporty
this client is
60% casual wear
@chrisemoody
86. lda2vec
“PS! Thank you for such an awesome top”doc_id=1846
vIN vOUT
vDOC
can we predict a word both locally and globally ?
P(vOUT |vIN+ vDOC)
87. lda2vec
doc_id=1846
vIN vOUT
vDOC
*very similar to the Paragraph Vectors / doc2vec
can we predict a word both locally and globally ?
“PS! Thank you for such an awesome top”
P(vOUT |vIN+ vDOC)
102. lda2vec
Let’s make vDOC sparse
{a, b, c…} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +…
103. lda2vec
Let’s make vDOC sparse
{a, b, c…} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +…
104. word2vec
LDA
P(vOUT |vIN + vDOC)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
this document is
80% high fashion
this document is
60% style
105. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
106. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
this zip code is
80% hot climate
this zip code is
60% outdoors wear
@chrisemoody
107. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
this client is
80% sporty
this client is
60% casual wear
@chrisemoody
108. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS)
P(sold | vCLIENTS)
lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
Can also make the topics
supervised so that they predict
an outcome.
110. “PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we model topics to sentences?
lda2lstm
111. “PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we represent the internal LSTM
states as a dirichlet mixture?
112. Can we model topics to sentences?
lda2lstm
“PS! Thank you for such an awesome idea”doc_id=1846
@chrisemoody
Can we model topics to images?
lda2ae
TJ Torres
119. Crazy
Approaches
Paragraph Vectors
(Just extend the context window)
Content dependency
(Change the window grammatically)
Social word2vec (deepwalk)
(Sentence is a walk on the graph)
Spotify
(Sentence is a playlist of song_ids)
Stitch Fix
(Sentence is a shipment of five items)
120.
121. CBOW
“The fox jumped over the lazy dog”
Guess the word
given the context
~20x faster.
(this is the alternative.)
vOUT
vIN vINvIN vINvIN vIN
SkipGram
“The fox jumped over the lazy dog”
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context
given the word
Better at syntax.
(this is the one we went over)
127. What I didn’t mention
A lot of text (only if you have a specialized vocabulary)
Cleaning the text
Memory & performance
Traditional databases aren’t well-suited
False positives
129. All of the following ideas will change what
‘words’ and ‘context’ represent.
130. paragraph
vector
What about summarizing documents?
On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
131. On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector
Normal skipgram extends C words before, and C words after.
IN
OUT OUT
132. On the day he took office, President Obama reached out to America’s enemies,
offering in his first inaugural address to extend a hand if you are willing to unclench
your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector
A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
139. context
dependent
context
Levy
&
G
oldberg
2014
Also show that SGNS is simply factorizing:
w * c = PMI(w, c) - log k
This is completely amazing!
Intuition: positive associations (canada, snow)
stronger in humans than negative associations
(what is the opposite of Canada?)
140. deepwalk
Perozzi
etal2014
learn word vectors from
sentences
“The fox jumped over the lazy dog”
vOUT vOUT vOUT vOUT vOUTvOUT
‘words’ are graph vertices
‘sentences’ are random walks on the
graph
word2vec