Unsupervised sentence-embeddings by manifold approximation and projection

Unsupervised Sentence-embeddings by Manifold
Approximation and Projection
Deep Kayal
deep.kayal@pm.me

Setting the tone
Modern NLP systems are increasingly being powered by Transfer
learning

Setting the tone
But often, the downstream task is not known a-priori or adaptation
is not possible. E.g. in search

Setting the tone
In these cases we need universal sentence encoders

Pretrained model
Setting the tone

Pretrained model
Setting the tone
Who are you?
Where is this?
This is Amsterdam.
...

Pretrained model
Setting the tone
Who are you?
Where is this?
This is Amsterdam.
...
[0.2 0.3 -0.01 0.4...]
[0.8 0.1 -0.5 0.4...]
[0.5 0.9 0.9 0.3 ...]
...

Commonly used sentence encoders
Avg word2vec

Doc2vec

Sentence BERT (BERT fine-tuned on SNLI dataset)

Related Work
Word movers distance, Matt Kusner et al.

Related Work
Word movers embeddings, Lingfei Wu et al.

Observation: Word movers distance is one of many ways to
compute distance between sets of words
Contributions of this work

Contribution 1:
Test and compare other common set-distance metrics

Contribution 1:
Test and compare other common set-distance metrics
- WMD
- Hausdorff distance
- Energy distance

Observation: Using a set-distance metric, we can construct a
neighbourhood graph using sentences and these distances

Contribution 2:
Generate fixed-dimensional embeddings such they preserve the
above neighbourhood graph

Contribution 2:
Generate fixed-dimensional embeddings such they preserve the
above neighbourhood graph
- Universal manifold approximation and projection (UMAP)

Distance metrics
Hausdorff distance

Distance metrics
Energy distance

Steps to generate embeddings
Make approximate nearest neighbours graph

Generate initial low dimensional graph and minimize cross entropy
between the two representations

Points on low dimensional graphs are the desired embeddings

Evaluation
Sentence classification task on 6 datasets

Experimental Settings
First test:
- Use kNN with the set-distances to classify sentences directly

First test:
- Use kNN with the set-distances to classify sentences directly
- Versus, our method of generating embeddings using the
neighbourhood graph
- We use a linear SVM with the generated embeddings

Second test:
- Test 6 other popular approaches to produce sentence
embeddings
- Versus, our method of generating embeddings using the
neighbourhood graph

Results
Embeddings + classifier vs kNN

Results
Comparison of various embeddings

Takeaways
- We propose a novel sentence embedding mechanism

Takeaways
- Using set distances

Takeaways
- And neighbourhood graph approximation

Takeaways
- The embeddings are better at capturing information than the
distance metric alone

Takeaways
- The embeddings are better at capturing information than the
distance metric alone
- The embeddings perform favourably as compared to various
other efficient mechanisms

Unsupervised sentence-embeddings by manifold approximation and projection

Recommended

Recommended

More Related Content

Similar to Unsupervised sentence-embeddings by manifold approximation and projection

Similar to Unsupervised sentence-embeddings by manifold approximation and projection (20)

More from Deep Kayal

More from Deep Kayal (6)

Recently uploaded

Recently uploaded (20)

Unsupervised sentence-embeddings by manifold approximation and projection