Joerg Viechtbauer
joerg.viechtbauer@qaware.de
QAware
from grep to BERT
December, the 10th 2020
A cloud-native’s questions
How many requests were handled yesterday?
Where is this value set?
Why doesn't the microservice start?
It did start? How is that even possible?
We all have all used grep to get that important information!
QAware 2
THAT TYPICAL MOTIVATIONAL SLIDE
A cloud-native’s questions
How many requests were handled yesterday?
Where is this value set?
Why doesn't the microservice start?
It did start? How is that even possible?
We all have all used grep to get that important information!
And now I just can't think of this stupid song name…
QAware 3
THAT TYPICAL MOTIVATIONAL SLIDE
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
QAware 4
SO WHY NOT USE OUR TRUSTY GREP…?
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
QAware 5
…COMBINE IT WITH WGET –R
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
Problem solved!
QAware 6
= SEMANTIC INTERNET SEARCH IN ONE
LINE
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
Problem solved!
Thank you!
QAware 7
= SEMANTIC INTERNET SEARCH IN ONE
LINE
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
Problem solved!
Thank you!
Questions?
QAware 8
= SEMANTIC INTERNET SEARCH IN ONE
LINE
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
Problem solved!
Thank you!
Questions?
QAware 9
= SEMANTIC INTERNET SEARCH IN ONE
LINE
from grep to
BERT
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
QAware 11
LARGE SCALE SEMANTIC SEARCH
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
How to make that work fast?
QAware 12
LARGE SCALE SEMANTIC SEARCH
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
How to make that work fast? How to make that
work at all?
QAware 13
LARGE SCALE SEMANTIC SEARCH
wget -r -q -O- http://the-internet | grep "austrian rap song about
mozart"
How to make that work fast? How to make that
work at all?
QAware 14
TODAY‘S TOPIC…
QAware 15
DISCLAIMER – MINOR SIMPLIFICATIONS
AHEAD
Takeaway (https://commons.wikimedia.org/wiki/File:Kaeng_phet_mu.jpg), „Kaeng phet mu“,
https://creativecommons.org/licenses/by-sa/3.0/legalcode
Recipe for Red Thai
Curry
Cut stuff into small pieces
Throw everything together
Heat
Bon Appetit!
vector space
modela super fast introduction
Document
https://en.wikipedia.org/wiki/Cat
Text
A cat can either be a house cat, a farm cat or a feral cat.
Normalization = lowercase + stemming/lemmatization
(optional)
a cat can either be a house cat a farm cat or a feral cat
Document Vector
{"a":4, "be":1, "an":1, "cat":4, "either":1, "farm":1,
"feral":1,
"house":1, "or":1}
QAware 17
DOCUMENT VECTOR
Document vector
{"a":4, "be":1, "an":1, "cat":4, "either":1, "farm":1,
"feral":1,
"house":1, "or":1}
Document matrix
QAware 18
DOCUMENT VECTOR
DOC a an be cat curry dog
eithe
r
farm feral
hous
e
or
wikicat 1 1 1 4 1 1 1 1 1
wikido
g
1 1
curry 10
Cosinus Similarity
Transform document 𝑑! into document vector 𝑑!
Create query vector ⃗𝑞
Score for document 𝑑! and query 𝑞:
𝑠𝑖𝑚 𝑑!, 𝑞 =
𝑑!
𝑑! "
)
⃗𝑞
⃗𝑞 "
QAware
VECTOR SPACE MODEL SEARCH
Cosinus Similarity
Transform document 𝑑! into document vector 𝑑!
Create query vector ⃗𝑞
Score for document 𝑑! and query 𝑞:
𝑠𝑖𝑚 𝑑!, 𝑞 =
𝑑!
𝑑! "
)
⃗𝑞
⃗𝑞 "
QAware
VECTOR SPACE MODEL SEARCH
Better formula for vector values
𝑑! 𝑤 = log 1 + tf! 𝑤 + log
𝐷
df" 𝑤
(same formula for q)
Super simple
Very fast implementation possible
Pretty good search quality! (baseline for all retrieval systems)
QAware 21
TF*IDF Vectors
Find information not by matching keywords but by matching intent
Woah! Let’s start with synonyms!
Implementations
Manually crafted thesauri (= synonyms, associations, hierarchies)
Vector space dimensionality reduction
Find good approximation of the term-vectors in a low-dimensional space
(100000 -> 300)
Condensed representation captures the essence of the data (=> is that
meaning…?)
QAware 22
SEMANTIC SEARCH
Decompose the document-term-matrix A into smaller matrices
n = topic space (much smaller than t), typically n = 100-500
QAware 23
MATRIX FACTORIZATION
= × ×
t
d d
t n
n
Decompose the document-term-matrix A into smaller matrices
n = topic space (much smaller than t), typically n = 100-500
QAware 24
MATRIX FACTORIZATION
= × ×
t
d d
t n
n
U = topics V = document
embeddings
Reconstruction of values
QAware 25
MATRIX FACTORIZATION
= × ×
t
d d
t n
n
Uses singular value decomposition
All of you did this in the first semester (=> “Linear algebra"). (Do you remember?)
Unfortunately (tested on TREC7-SDR)
topic 0 = world news london boston eddie mair lisa mullins
radio public international cnn pri npr washington back next
ahead bbc coming …
Uhhh.
And it is slow.
And uhhh…
QAware 26
LSI - LATENT SEMANTIC INDEXING
Same idea: matrix decomposition
But different approach: expectation-maximization algorithm
(EM)
Will learn co-occurences of words in documents
All values are positive
Easy to implement (next slide)
Fast
QAware 27
PLSA - PROBABILISTIC LATENT SEMANTIC
ANALYSIS
for step in range(20): # <-- this is way way way too simple to work in
"reality(tm)"
#-- RESET ACCUMULATORS ------------------------------------------------------
padn = [[0] * aspects for j in range(docs )]
pawn = [[0] * aspects for j in range(words)]
pan = [0] * aspects
#-- PROCESS ALL DOCUMENTS ---------------------------------------------------
for docId in range(docs):
#-- ITERATE OVER ALL WORDS ------------------------------------------------
for wordId in range(words):
#-- E-STEP (THIS CAN BE DONE MUCH MORE EFFICIENT) ----------------------
pzdw = [pad[docId][a] * paw[wordId][a] / pa[a] for a in range(aspects)]
norm(pzdw)
scale(pzdw, arrDoc[docId][wordId])
#-- M-STEP --------------------------------------------------------------
add(padn[docId ], pzdw)
add(pawn[wordId], pzdw)
add(pan , pzdw)
#-- SAVE ACCUMULATORS -------------------------------------------------------
pad = padn
paw = pawn
pa = pan QAware 28
PLSA - PROBABILISTIC LATENT SEMANTIC
ANALYSIS
This actually works and
is not very far away from
a practical
implementation!
topic0
0.01928 kaczynski
0.01246 israel
0.00936 arafat
0.00910 israeli
0.00800 palestinian
0.00613 netanyahu
0.00607 minister
0.00584 peace
0.00536 judge
0.00506 suicide
0.00500 albright
0.00479 himself
0.00478 unabomber
0.00475 prime
0.00463 trial
0.00462 theodore
0.00450 said
0.00447 even
...
topic11
0.02723 hong
0.02691 kong
0.00811 human
0.00640 health
0.00614 flu
0.00611 those
0.00572 virus
0.00537 said
0.00524 any
0.00523 genetic
0.00487 government
0.00424 right
0.00420 them
0.00414 may
0.00414 don
0.00409 want
0.00405 china
0.00401 million
...
topic36
0.03553 space
0.02133 mir
0.01431 station
0.01076 mission
0.00927 crew
0.00926 russian
0.00924 nasa
0.00904 mars
0.00782 shuttle
0.00573 craft
0.00536 launch
0.00516 foale
0.00516 astronaut
0.00515 earth
0.00488 its
0.00479 tomorrow
0.00467 pathfinder
0.00466 off
...
QAware 29
PLSA ON TREC7-SDR (STANDARD CORPUS)
n=50
Project document into latent semantic space (= matrix multiplication)
Project query into latent semantic space (= matrix multiplication)
Calculate cosinus similarity
Result
Works well (10-15% better search quality*)
*mean average precision (measurement of search quality)
QAware 30
SEMANTIC SEARCH USING PLSA
linear to non-
linear
32
REDUCE THIS TO ONE DIMENSION
33
REDUCE THIS TO ONE DIMENSION
34
LINEAR PROJECTION TO ONE
DIMENSION
35
LINEAR PROJECTION TO ONE
DIMENSION
36
LINEAR PROJECTION TO ONE
DIMENSION
QAware 37
NON-LINEAR PROJECTION TO ONE
DIMENSION
QAware 38
NON-LINEAR PROJECTION TO ONE
DIMENSION
QAware 39
NON-LINEAR PROJECTION TO ONE
DIMENSION
QAware 40
NON-LINEAR PROJECTION TO ONE
DIMENSION
neural networks
a not so gentle introduction
QAware 42
NEURAL NETWORKS
Motivation
Brains are large networks of neurons connected by axons (and somewhat
sucessful)
Can approximate any input-output data function (universal approximation
theorem)
Potentially massively parallel execution (= fast, if you are Google, Microsoft,
Amazon)
Very successful with many highly complex problems
Paradigm shift
You do not try to find an algorithm that solves the problem
You only need to provide enough examples (training data)
QAware 43
NEURAL NETWORKS - COMPONENTS
1
Inputs/Outputs
Number
𝑖𝑛!: temperature, count, color value…
𝑜𝑢𝑡 : value (~ log-probability)
Neurons
Apply activiation function to sum of inputs
Bias (fixed input set to 1)
Activation function 𝑓 (non-linear, monotonic,
smooth, differentiable, f(0) = 0, f‘(0) = 1)
Connections
Between neurons
Each connection has a weight (𝑤!, 𝑏)
𝑓 𝑏 + ,
!
𝑤! ) 𝑖𝑛! = 𝑜𝑢𝑡
𝑖𝑛!
𝑖𝑛"
𝑖𝑛#
𝑤!
𝑤"
𝑤#
𝑜𝑢𝑡
𝑏
𝑓
<= They define the „algorithm“
„Magically“ trained from examples!
QAware 44
NEURAL NETWORKS – ACTIVATION
FUNCTIONS
QAware 45
EXAMPLE
𝑥
𝑦
1
1
𝑜𝑢𝑡𝑓
QAware 46
EXAMPLE
𝑥
𝑦
1
1
𝑜𝑢𝑡𝑓
x y weighted-sum f(w-sum) = out
0 0
1 0
0 1
1 1
QAware 47
EXAMPLE
𝑥
𝑦
1
1
𝑜𝑢𝑡𝑓
x y weighted-sum f(w-sum) = out
0 0 0 0
1 0 1 1
0 1 1 1
1 1 2 1
QAware 48
EXAMPLE
𝑥
𝑦
1
1
𝑜𝑢𝑡𝑓
x y weighted-sum f(w-sum) = out
0 0 0 0
1 0 1 1
0 1 1 1
1 1 2 1
=> That‘s the Boolean OR
QAware 49
EXAMPLE
𝑥
𝑦
1
1
𝑜𝑢𝑡𝑓
x y weighted-sum f(w-sum) = out
0 0 0 0
1 0 1 1
0 1 1 1
1 1 2 1
=> That‘s the Boolean OR
(at least pretend to be impressed)
QAware 50
USUALLY MUCH MORE COMPLEX
QAware 51
LAYER NAMES
Input
layer
Hidden
layer
Output layer
QAware 52
AND MANY HIDDEN LAYERS (DEEP
LEARNING)
Simple
features
Complex
features
QAware 53
NEURAL NETWORKS FOR IMAGE
RECOGNITION
probability for cat
probability for
dog
probability
for thai
curry
Told you this would not be
gentle!
QAware 54
AND NOW WHAT?
probability for cat
probability for dog
probability for
thai curry
QAware 55
SHOW EXAMPLE…
probability for cat
probability for dog
probability for
thai curry
QAware 56
…AND EXPECTATION
probability for cat =
1
probability for dog =
0
probability for
thai curry =
0
QAware 57
…AND UPDATE THE WEIGHTS
probability for cat =
1
probability for dog =
0
probability for
thai curry =
0
And that is the beauty of neural
networks…
Automagically learn
weights
QAware 58
UNDER THE HOOD
CIFAR-10 - Learning Multiple Layers of Features from Tiny Images, Alex
Krizhevsky, 2009
Initialization
Set all weights to random
values
Training
Show a training example
Adjust weights a bit into
the direction of the correct
answer
(=> gradient descent)
Repeat (until „happy“)
QAware 59
TRAIN A NEURAL NETWORK
CIFAR-10 - Learning Multiple Layers of Features from Tiny Images, Alex
Krizhevsky, 2009
Python (Keras)
model.fit(images,
expectedClasses,
epochs=50,
batch_size=32)
QAware 60
IT IS NOT QUITE THAT SIMPLE
In theory (but only there)
The one-hidden-all-dense-layer model approach can handle every problem
QAware 61
IT IS NOT QUITE THAT SIMPLE
In theory (but only there)
The one-hidden-all-dense-layer model approach can handle every problem
In practice
Training (such a model) can take ages (and probably will not be good)
Much better: configuration specifically tailored to the problem
Very difficult to find if you need to start from scratch (research)
Creating good training data can be hard
QAware 62
IT IS NOT QUITE THAT SIMPLE
In theory (but only there)
The one-hidden-all-dense-layer model approach can handle every problem
In practice
Training (such a model) can take ages (and probably will not be good)
Much better: configuration specifically tailored to the problem
Very difficult to find if you need to start from scratch (research)
Creating good training data can be hard
Good news
Many well-proven configurations
Many pre-trained and ready-to-use models
Adapt a pre-trained model to your problem (=> transfer learning)
QAware 63
IT IS NOT QUITE THAT SIMPLE
semantic search
with
neural networks
QAware 65
HOW TO HANDLE TEXT WITH A NN?
Text
The king and the queen live in the
castle.
One hot encoding
One input for each word in the
fixed vocabulary.
queen
the
live
and
castle
king
queen
the
live
and
castle
king
QAware 66
INFORMATION FUNNEL
Input format = output format
And a neural network
inbetween
Why? (And why that model?)
We’ll see!
queen
the
live
and
castle
king
queen
the
live
and
castle
king
QAware 67
INFORMATION FUNNEL
Text
The king and the queen live in the
castle.
Training
For all sentences in
Wikipedia…
Input: one word of the
sentence
Output: all words in the
sentence
queen
the
live
and
castle
king
queen
the
live
and
castle
king
QAware 68
INFORMATION FUNNEL
The king and the queen live in the
castle.
queen
the
live
and
castle
king
queen
the
live
and
castle
king
QAware 69
INFORMATION FUNNEL
queen
the
live
and
castle
king
queen
the
live
and
castle
king
The king and the queen live in the
castle.
QAware 70
INFORMATION FUNNEL
queen
the
live
and
castle
king
queen
the
live
and
castle
king
The king and the queen live in the
castle.
QAware 71
INFORMATION FUNNEL
We forced the neural network to
pass the information through a
funnel.
In order to reconstruct the input it
needs to learn relations between
words.
queen
the
live
and
castle
king
queen
the
live
and
castle
king
QAware 72
INFORMATION FUNNEL
Embeddings
The output of the neural
network after the funnel.
queen
the
live
and
castle
king
Word embeddings
Trained on a large number of input sentences
Not all use a neural network to generate the embedding
Freely available, ready for usage
(http://nlp.stanford.edu/data/glove.840B.300d.zip)
Search with word embeddings
Instead of the PLSA embeddings, we can use the GloVe embeddings
As vector use the average vector of every word from the document or the query
Cosinus similarity
QAware 73
GloVe/Word2Vec/fastText
QAware 74
zcat glove.840B.300d.txt.gz | grep cat
cat -0.15067 -0.024468 -0.23368 -0.23378 -0.18382 0.32711 -0.22084 -0.28777 0.12759 1.1656 -0.64163 -0.098455 -0.62397
0.010431 -0.25653 0.31799 0.037779 1.1904 -0.17714 -0.2595 -0.31461 0.038825 -0.15713 -0.13484 0.36936 -0.30562 -0.40619 -
0.38965 0.3686 0.013963 -0.6895 0.004066 -0.1367 0.32564 0.24688 -0.14011 0.53889 -0.80441 -0.1777 -0.12922 0.16303 0.14917 -
0.068429 -0.33922 0.18495 -0.082544 -0.46892 0.39581 -0.13742 -0.35132 0.22223 -0.144 -0.048287 0.3379 -0.31916 0.20526
0.098624 -0.23877 0.045338 0.43941 0.030385 -0.013821 -0.093273 -0.18178 0.19438 -0.3782 0.70144 0.16236 0.0059111 0.024898
-0.13613 -0.11425 -0.31598 -0.14209 0.028194 0.5419 -0.42413 -0.599 0.24976 -0.27003 0.14964 0.29287 -0.31281 0.16543 -0.21045
-0.4408 1.2174 0.51236 0.56209 0.14131 0.092514 0.71396 -0.021051 -0.33704 -0.20275 -0.36181 0.22055 -0.25665 0.28425 -
0.16968 0.058029 0.61182 0.31576 -0.079185 0.35538 -0.51236 0.4235 -0.30033 -0.22376 0.15223 -0.048292 0.23532 0.46507 -
0.67579 -0.32905 0.08446 -0.22123 -0.045333 0.34463 -0.1455 -0.18047 -0.17887 0.96879 -1.0028 -0.47343 0.28542 0.56382 -
0.33211 -0.38275 -0.2749 -0.22955 -0.24265 -0.37689 0.24822 0.36941 0.14651 -0.37864 0.31134 -0.28449 0.36948 -2.8174 -0.38319
-0.022373 0.56376 0.40131 -0.42131 -0.11311 -0.17317 0.1411 -0.13194 0.18494 0.097692 -0.097341 -0.23987 0.16631 -0.28556
0.0038654 0.53292 -0.32367 -0.38744 0.27011 -0.34181 -0.27702 -0.67279 -0.10771 -0.062189 -0.24783 -0.070884 -0.20898 0.062404
0.022372 0.13408 0.1305 -0.19546 -0.46849 0.77731 -0.043978 0.3827 -0.23376 1.0457 -0.14371 -0.3565 -0.080713 -0.31047 -
0.57822 -0.28067 -0.069678 0.068929 -0.16227 -0.63934 -0.62149 0.11222 -0.16969 -0.54637 0.49661 0.46565 0.088294 -0.48496
0.69263 -0.068977 -0.53709 0.20802 -0.42987 -0.11921 0.1174 -0.18443 0.43797 -0.1236 0.3607 -0.19608 -0.35366 0.18808 -0.5061
0.14455 -0.024368 -0.10772 -0.0115 0.58634 -0.054461 0.0076487 -0.056297 0.27193 0.23096 -0.29296 -0.24325 0.10317 -0.10014
0.7089 0.17402 -0.0037509 -0.46304 0.11806 -0.16457 -0.38609 0.14524 0.098122 -0.12352 -0.1047 0.39047 -0.3063 -0.65375 -
0.0044248 -0.033876 0.037114 -0.27472 0.0053147 0.30737 0.12528 -0.19527 -0.16461 0.087518 -0.051107 -0.16323 0.521 0.10822 -
0.060379 -0.71735 -0.064327 0.37043 -0.41054 -0.2728 -0.30217 0.015771 -0.43056 0.35647 0.17188 -0.54598 -0.21541 -0.044889 -
0.10597 -0.54391 0.53908 0.070938 0.097839 0.097908 0.17805 0.18995 0.49962 -0.18529 0.051234 0.019574 0.24805 0.3144 -
0.29304 0.54235 0.46672 0.26017 -0.44705 0.28287 -0.033345 -0.33181 -0.10902 -0.023324 0.2106 -0.29633 0.81506 0.038524
0.46004 0.17187 -0.29804
QAware 75
GloVe – most similar vectors
dist castle
22.639 | castles
25.825 | fortress
33.900 | manor
37.224 | palace
38.010 | medieval
38.579 | chateau
40.544 | citadel
40.544 | mansion
41.449 | tower
42.379 | abbey
42.604 | knights
42.699 | ruins
43.927 | knight
44.924 | fort
46.041 | hill
46.250 | haunted
46.495 | prince
46.921 | royal
48.447 | ruined
48.455 | towers
dist curry
26.513 | chilli
29.316 | curries
30.726 | soup
30.958 | chili
33.781 | gravy
33.958 | curried
34.262 | chicken
34.400 | sauce
35.007 | stew
35.989 | fried
36.195 | noodle
36.445 | fry
36.482 | spicy
37.188 | rice
37.604 | ginger
37.865 | cooked
37.996 | onion
38.693 | potato
39.988 | salad
41.579 | dish
dist cat
14.867 | cats
16.691 | kitten
20.210 | dog
20.405 | kitty
22.314 | pet
25.260 | feline
26.246 | ferret
27.326 | kittens
29.543 | dogs
29.871 | puppy
30.774 | rabbit
31.646 | pets
32.985 | animal
41.642 | bear
42.115 | animals
43.053 | one
44.076 | sure
44.170 | kind
44.207 | when
44.628 | put
Word embeddings are context-free
Embedding of a sentence from word embeddings
sentence embeding = average of term embeddings
Each term has always the same embedding
But the meaning of a word depends on the context
mouse (rodent, trap, computer, eye, garlic …)
cell (phone, prison, blood/skin, solar, = some people, hermitage…)
Sentence embeddings
Embedding of term depends on the context
QAware 76
BERT – WHY SENTENCE EMBEDDINGS?
Use word embeddings for every position in a sentence (word => sentence)
Take a gigantomanic neural network of a special type (Transformer)
Input: the sentence where one word has been blanked out
Output: the complete sentence
The king and the queen live in the castle.
The ____ and the queen live in the castle.
The king and the _____ live in the castle.
The king and the queen live in the ______.
Finally let a gazillion of tensorflow units burn on absurd amounts of data
QAware 77
BERT – THE ROUGH IDEA
!pip install -U sentence-transformers
!pip install scipy
import scipy
from sentence_transformers import SentenceTransformer
sentences = ["the sun shines",
"the sky is blue",
"we have good weather",
"bert is amazing",
"sentence embeddings rock",
"it is raining",
"uhh i need a rain coat",
"that's pretty bad weather"]
model = SentenceTransformer("roberta-large-nli-mean-tokens")
sentence_embeddings = model.encode(sentences)
distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings,
"cosine")
print(distances)
QAware 78
BERT – CODE
!pip install -U sentence-transformers
!pip install scipy
import scipy
from sentence_transformers import SentenceTransformer
sentences = ["the sun shines",
"the sky is blue",
"we have good weather",
"bert is amazing",
"sentence embeddings rock",
"it is raining",
"uhh i need a rain coat",
"that's pretty bad weather"]
model = SentenceTransformer("roberta-large-nli-mean-tokens") # <== plenty to choose
from
sentence_embeddings = model.encode(sentences)
distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings,
"cosine")
print(distances) QAware 79
BERT – CODE
!pip install -U sentence-transformers
!pip install scipy
import scipy
from sentence_transformers import SentenceTransformer
sentences = ["the sun shines",
"the sky is blue",
"we have good weather",
"bert is amazing",
"sentence embeddings rock",
"it is raining",
"uhh i need a rain coat",
"that's pretty bad weather"]
model = SentenceTransformer("roberta-large-nli-mean-tokens")
sentence_embeddings = model.encode(sentences)
distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings,
"cosine")
print(distances)
QAware 80
BERT – CODE
!pip install -U sentence-transformers
!pip install scipy
import scipy
from sentence_transformers import SentenceTransformer
sentences = ["the sun shines",
"the sky is blue",
"we have good weather",
"bert is amazing",
"sentence embeddings rock",
"it is raining",
"uhh i need a rain coat",
"that's pretty bad weather"]
model = SentenceTransformer("roberta-large-nli-mean-tokens")
sentence_embeddings = model.encode(sentences)
distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings,
"cosine")
print(distances)
QAware 81
BERT – CODE
| 0 1 2 3 4 5 6
7
---+-------------------------------
--
the sun shines 0 | . . . . . . .
.
the sky is blue 1 | 76 . . . . . .
.
we have good weather 2 | 81 75 . . . . .
.
| . . . . . . .
.
bert is amazing 3 | 52 43 61 . . . .
.
sentence embeddings rock 4 | 46 40 51 69 . . .
. QAware 82
BERT – EXAMPLES
QAware 83
Semantic search summary
Name
Latent semantic
indexing
Probabilistic
latent semantic
indexing
Word2vec,
GloVe,
FastText…
BERT +
Variations
Approach
Matrix
decomposition
via SVD
Matrix
decomposition
via EM-algorithm
Neural network Neural network
Interpretability ? very good good okay
Level word word word sentence
Ready-to-use? no and difficult nope, feasible yes, easy yes, very easy
Type linear linear non-linear non-linear
Quality meh good good yihaaa!
Joerg Viechtbauer
joerg.viechtbauer@qaware.de
Thank you
December, the 10th 2020

From grep to BERT

  • 1.
  • 2.
    A cloud-native’s questions Howmany requests were handled yesterday? Where is this value set? Why doesn't the microservice start? It did start? How is that even possible? We all have all used grep to get that important information! QAware 2 THAT TYPICAL MOTIVATIONAL SLIDE
  • 3.
    A cloud-native’s questions Howmany requests were handled yesterday? Where is this value set? Why doesn't the microservice start? It did start? How is that even possible? We all have all used grep to get that important information! And now I just can't think of this stupid song name… QAware 3 THAT TYPICAL MOTIVATIONAL SLIDE
  • 4.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" QAware 4 SO WHY NOT USE OUR TRUSTY GREP…?
  • 5.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" QAware 5 …COMBINE IT WITH WGET –R
  • 6.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" Problem solved! QAware 6 = SEMANTIC INTERNET SEARCH IN ONE LINE
  • 7.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" Problem solved! Thank you! QAware 7 = SEMANTIC INTERNET SEARCH IN ONE LINE
  • 8.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" Problem solved! Thank you! Questions? QAware 8 = SEMANTIC INTERNET SEARCH IN ONE LINE
  • 9.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" Problem solved! Thank you! Questions? QAware 9 = SEMANTIC INTERNET SEARCH IN ONE LINE
  • 10.
  • 11.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" QAware 11 LARGE SCALE SEMANTIC SEARCH
  • 12.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" How to make that work fast? QAware 12 LARGE SCALE SEMANTIC SEARCH
  • 13.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" How to make that work fast? How to make that work at all? QAware 13 LARGE SCALE SEMANTIC SEARCH
  • 14.
    wget -r -q-O- http://the-internet | grep "austrian rap song about mozart" How to make that work fast? How to make that work at all? QAware 14 TODAY‘S TOPIC…
  • 15.
    QAware 15 DISCLAIMER –MINOR SIMPLIFICATIONS AHEAD Takeaway (https://commons.wikimedia.org/wiki/File:Kaeng_phet_mu.jpg), „Kaeng phet mu“, https://creativecommons.org/licenses/by-sa/3.0/legalcode Recipe for Red Thai Curry Cut stuff into small pieces Throw everything together Heat Bon Appetit!
  • 16.
    vector space modela superfast introduction
  • 17.
    Document https://en.wikipedia.org/wiki/Cat Text A cat caneither be a house cat, a farm cat or a feral cat. Normalization = lowercase + stemming/lemmatization (optional) a cat can either be a house cat a farm cat or a feral cat Document Vector {"a":4, "be":1, "an":1, "cat":4, "either":1, "farm":1, "feral":1, "house":1, "or":1} QAware 17 DOCUMENT VECTOR
  • 18.
    Document vector {"a":4, "be":1,"an":1, "cat":4, "either":1, "farm":1, "feral":1, "house":1, "or":1} Document matrix QAware 18 DOCUMENT VECTOR DOC a an be cat curry dog eithe r farm feral hous e or wikicat 1 1 1 4 1 1 1 1 1 wikido g 1 1 curry 10
  • 19.
    Cosinus Similarity Transform document𝑑! into document vector 𝑑! Create query vector ⃗𝑞 Score for document 𝑑! and query 𝑞: 𝑠𝑖𝑚 𝑑!, 𝑞 = 𝑑! 𝑑! " ) ⃗𝑞 ⃗𝑞 " QAware VECTOR SPACE MODEL SEARCH
  • 20.
    Cosinus Similarity Transform document𝑑! into document vector 𝑑! Create query vector ⃗𝑞 Score for document 𝑑! and query 𝑞: 𝑠𝑖𝑚 𝑑!, 𝑞 = 𝑑! 𝑑! " ) ⃗𝑞 ⃗𝑞 " QAware VECTOR SPACE MODEL SEARCH
  • 21.
    Better formula forvector values 𝑑! 𝑤 = log 1 + tf! 𝑤 + log 𝐷 df" 𝑤 (same formula for q) Super simple Very fast implementation possible Pretty good search quality! (baseline for all retrieval systems) QAware 21 TF*IDF Vectors
  • 22.
    Find information notby matching keywords but by matching intent Woah! Let’s start with synonyms! Implementations Manually crafted thesauri (= synonyms, associations, hierarchies) Vector space dimensionality reduction Find good approximation of the term-vectors in a low-dimensional space (100000 -> 300) Condensed representation captures the essence of the data (=> is that meaning…?) QAware 22 SEMANTIC SEARCH
  • 23.
    Decompose the document-term-matrixA into smaller matrices n = topic space (much smaller than t), typically n = 100-500 QAware 23 MATRIX FACTORIZATION = × × t d d t n n
  • 24.
    Decompose the document-term-matrixA into smaller matrices n = topic space (much smaller than t), typically n = 100-500 QAware 24 MATRIX FACTORIZATION = × × t d d t n n U = topics V = document embeddings
  • 25.
    Reconstruction of values QAware25 MATRIX FACTORIZATION = × × t d d t n n
  • 26.
    Uses singular valuedecomposition All of you did this in the first semester (=> “Linear algebra"). (Do you remember?) Unfortunately (tested on TREC7-SDR) topic 0 = world news london boston eddie mair lisa mullins radio public international cnn pri npr washington back next ahead bbc coming … Uhhh. And it is slow. And uhhh… QAware 26 LSI - LATENT SEMANTIC INDEXING
  • 27.
    Same idea: matrixdecomposition But different approach: expectation-maximization algorithm (EM) Will learn co-occurences of words in documents All values are positive Easy to implement (next slide) Fast QAware 27 PLSA - PROBABILISTIC LATENT SEMANTIC ANALYSIS
  • 28.
    for step inrange(20): # <-- this is way way way too simple to work in "reality(tm)" #-- RESET ACCUMULATORS ------------------------------------------------------ padn = [[0] * aspects for j in range(docs )] pawn = [[0] * aspects for j in range(words)] pan = [0] * aspects #-- PROCESS ALL DOCUMENTS --------------------------------------------------- for docId in range(docs): #-- ITERATE OVER ALL WORDS ------------------------------------------------ for wordId in range(words): #-- E-STEP (THIS CAN BE DONE MUCH MORE EFFICIENT) ---------------------- pzdw = [pad[docId][a] * paw[wordId][a] / pa[a] for a in range(aspects)] norm(pzdw) scale(pzdw, arrDoc[docId][wordId]) #-- M-STEP -------------------------------------------------------------- add(padn[docId ], pzdw) add(pawn[wordId], pzdw) add(pan , pzdw) #-- SAVE ACCUMULATORS ------------------------------------------------------- pad = padn paw = pawn pa = pan QAware 28 PLSA - PROBABILISTIC LATENT SEMANTIC ANALYSIS This actually works and is not very far away from a practical implementation!
  • 29.
    topic0 0.01928 kaczynski 0.01246 israel 0.00936arafat 0.00910 israeli 0.00800 palestinian 0.00613 netanyahu 0.00607 minister 0.00584 peace 0.00536 judge 0.00506 suicide 0.00500 albright 0.00479 himself 0.00478 unabomber 0.00475 prime 0.00463 trial 0.00462 theodore 0.00450 said 0.00447 even ... topic11 0.02723 hong 0.02691 kong 0.00811 human 0.00640 health 0.00614 flu 0.00611 those 0.00572 virus 0.00537 said 0.00524 any 0.00523 genetic 0.00487 government 0.00424 right 0.00420 them 0.00414 may 0.00414 don 0.00409 want 0.00405 china 0.00401 million ... topic36 0.03553 space 0.02133 mir 0.01431 station 0.01076 mission 0.00927 crew 0.00926 russian 0.00924 nasa 0.00904 mars 0.00782 shuttle 0.00573 craft 0.00536 launch 0.00516 foale 0.00516 astronaut 0.00515 earth 0.00488 its 0.00479 tomorrow 0.00467 pathfinder 0.00466 off ... QAware 29 PLSA ON TREC7-SDR (STANDARD CORPUS) n=50
  • 30.
    Project document intolatent semantic space (= matrix multiplication) Project query into latent semantic space (= matrix multiplication) Calculate cosinus similarity Result Works well (10-15% better search quality*) *mean average precision (measurement of search quality) QAware 30 SEMANTIC SEARCH USING PLSA
  • 31.
  • 32.
    32 REDUCE THIS TOONE DIMENSION
  • 33.
    33 REDUCE THIS TOONE DIMENSION
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
    neural networks a notso gentle introduction
  • 42.
    QAware 42 NEURAL NETWORKS Motivation Brainsare large networks of neurons connected by axons (and somewhat sucessful) Can approximate any input-output data function (universal approximation theorem) Potentially massively parallel execution (= fast, if you are Google, Microsoft, Amazon) Very successful with many highly complex problems Paradigm shift You do not try to find an algorithm that solves the problem You only need to provide enough examples (training data)
  • 43.
    QAware 43 NEURAL NETWORKS- COMPONENTS 1 Inputs/Outputs Number 𝑖𝑛!: temperature, count, color value… 𝑜𝑢𝑡 : value (~ log-probability) Neurons Apply activiation function to sum of inputs Bias (fixed input set to 1) Activation function 𝑓 (non-linear, monotonic, smooth, differentiable, f(0) = 0, f‘(0) = 1) Connections Between neurons Each connection has a weight (𝑤!, 𝑏) 𝑓 𝑏 + , ! 𝑤! ) 𝑖𝑛! = 𝑜𝑢𝑡 𝑖𝑛! 𝑖𝑛" 𝑖𝑛# 𝑤! 𝑤" 𝑤# 𝑜𝑢𝑡 𝑏 𝑓 <= They define the „algorithm“ „Magically“ trained from examples!
  • 44.
    QAware 44 NEURAL NETWORKS– ACTIVATION FUNCTIONS
  • 45.
  • 46.
    QAware 46 EXAMPLE 𝑥 𝑦 1 1 𝑜𝑢𝑡𝑓 x yweighted-sum f(w-sum) = out 0 0 1 0 0 1 1 1
  • 47.
    QAware 47 EXAMPLE 𝑥 𝑦 1 1 𝑜𝑢𝑡𝑓 x yweighted-sum f(w-sum) = out 0 0 0 0 1 0 1 1 0 1 1 1 1 1 2 1
  • 48.
    QAware 48 EXAMPLE 𝑥 𝑦 1 1 𝑜𝑢𝑡𝑓 x yweighted-sum f(w-sum) = out 0 0 0 0 1 0 1 1 0 1 1 1 1 1 2 1 => That‘s the Boolean OR
  • 49.
    QAware 49 EXAMPLE 𝑥 𝑦 1 1 𝑜𝑢𝑡𝑓 x yweighted-sum f(w-sum) = out 0 0 0 0 1 0 1 1 0 1 1 1 1 1 2 1 => That‘s the Boolean OR (at least pretend to be impressed)
  • 50.
  • 51.
  • 52.
    QAware 52 AND MANYHIDDEN LAYERS (DEEP LEARNING) Simple features Complex features
  • 53.
    QAware 53 NEURAL NETWORKSFOR IMAGE RECOGNITION probability for cat probability for dog probability for thai curry Told you this would not be gentle!
  • 54.
    QAware 54 AND NOWWHAT? probability for cat probability for dog probability for thai curry
  • 55.
    QAware 55 SHOW EXAMPLE… probabilityfor cat probability for dog probability for thai curry
  • 56.
    QAware 56 …AND EXPECTATION probabilityfor cat = 1 probability for dog = 0 probability for thai curry = 0
  • 57.
    QAware 57 …AND UPDATETHE WEIGHTS probability for cat = 1 probability for dog = 0 probability for thai curry = 0 And that is the beauty of neural networks… Automagically learn weights
  • 58.
    QAware 58 UNDER THEHOOD CIFAR-10 - Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009 Initialization Set all weights to random values Training Show a training example Adjust weights a bit into the direction of the correct answer (=> gradient descent) Repeat (until „happy“)
  • 59.
    QAware 59 TRAIN ANEURAL NETWORK CIFAR-10 - Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009 Python (Keras) model.fit(images, expectedClasses, epochs=50, batch_size=32)
  • 60.
    QAware 60 IT ISNOT QUITE THAT SIMPLE
  • 61.
    In theory (butonly there) The one-hidden-all-dense-layer model approach can handle every problem QAware 61 IT IS NOT QUITE THAT SIMPLE
  • 62.
    In theory (butonly there) The one-hidden-all-dense-layer model approach can handle every problem In practice Training (such a model) can take ages (and probably will not be good) Much better: configuration specifically tailored to the problem Very difficult to find if you need to start from scratch (research) Creating good training data can be hard QAware 62 IT IS NOT QUITE THAT SIMPLE
  • 63.
    In theory (butonly there) The one-hidden-all-dense-layer model approach can handle every problem In practice Training (such a model) can take ages (and probably will not be good) Much better: configuration specifically tailored to the problem Very difficult to find if you need to start from scratch (research) Creating good training data can be hard Good news Many well-proven configurations Many pre-trained and ready-to-use models Adapt a pre-trained model to your problem (=> transfer learning) QAware 63 IT IS NOT QUITE THAT SIMPLE
  • 64.
  • 65.
    QAware 65 HOW TOHANDLE TEXT WITH A NN? Text The king and the queen live in the castle. One hot encoding One input for each word in the fixed vocabulary. queen the live and castle king queen the live and castle king
  • 66.
    QAware 66 INFORMATION FUNNEL Inputformat = output format And a neural network inbetween Why? (And why that model?) We’ll see! queen the live and castle king queen the live and castle king
  • 67.
    QAware 67 INFORMATION FUNNEL Text Theking and the queen live in the castle. Training For all sentences in Wikipedia… Input: one word of the sentence Output: all words in the sentence queen the live and castle king queen the live and castle king
  • 68.
    QAware 68 INFORMATION FUNNEL Theking and the queen live in the castle. queen the live and castle king queen the live and castle king
  • 69.
  • 70.
  • 71.
    QAware 71 INFORMATION FUNNEL Weforced the neural network to pass the information through a funnel. In order to reconstruct the input it needs to learn relations between words. queen the live and castle king queen the live and castle king
  • 72.
    QAware 72 INFORMATION FUNNEL Embeddings Theoutput of the neural network after the funnel. queen the live and castle king
  • 73.
    Word embeddings Trained ona large number of input sentences Not all use a neural network to generate the embedding Freely available, ready for usage (http://nlp.stanford.edu/data/glove.840B.300d.zip) Search with word embeddings Instead of the PLSA embeddings, we can use the GloVe embeddings As vector use the average vector of every word from the document or the query Cosinus similarity QAware 73 GloVe/Word2Vec/fastText
  • 74.
    QAware 74 zcat glove.840B.300d.txt.gz| grep cat cat -0.15067 -0.024468 -0.23368 -0.23378 -0.18382 0.32711 -0.22084 -0.28777 0.12759 1.1656 -0.64163 -0.098455 -0.62397 0.010431 -0.25653 0.31799 0.037779 1.1904 -0.17714 -0.2595 -0.31461 0.038825 -0.15713 -0.13484 0.36936 -0.30562 -0.40619 - 0.38965 0.3686 0.013963 -0.6895 0.004066 -0.1367 0.32564 0.24688 -0.14011 0.53889 -0.80441 -0.1777 -0.12922 0.16303 0.14917 - 0.068429 -0.33922 0.18495 -0.082544 -0.46892 0.39581 -0.13742 -0.35132 0.22223 -0.144 -0.048287 0.3379 -0.31916 0.20526 0.098624 -0.23877 0.045338 0.43941 0.030385 -0.013821 -0.093273 -0.18178 0.19438 -0.3782 0.70144 0.16236 0.0059111 0.024898 -0.13613 -0.11425 -0.31598 -0.14209 0.028194 0.5419 -0.42413 -0.599 0.24976 -0.27003 0.14964 0.29287 -0.31281 0.16543 -0.21045 -0.4408 1.2174 0.51236 0.56209 0.14131 0.092514 0.71396 -0.021051 -0.33704 -0.20275 -0.36181 0.22055 -0.25665 0.28425 - 0.16968 0.058029 0.61182 0.31576 -0.079185 0.35538 -0.51236 0.4235 -0.30033 -0.22376 0.15223 -0.048292 0.23532 0.46507 - 0.67579 -0.32905 0.08446 -0.22123 -0.045333 0.34463 -0.1455 -0.18047 -0.17887 0.96879 -1.0028 -0.47343 0.28542 0.56382 - 0.33211 -0.38275 -0.2749 -0.22955 -0.24265 -0.37689 0.24822 0.36941 0.14651 -0.37864 0.31134 -0.28449 0.36948 -2.8174 -0.38319 -0.022373 0.56376 0.40131 -0.42131 -0.11311 -0.17317 0.1411 -0.13194 0.18494 0.097692 -0.097341 -0.23987 0.16631 -0.28556 0.0038654 0.53292 -0.32367 -0.38744 0.27011 -0.34181 -0.27702 -0.67279 -0.10771 -0.062189 -0.24783 -0.070884 -0.20898 0.062404 0.022372 0.13408 0.1305 -0.19546 -0.46849 0.77731 -0.043978 0.3827 -0.23376 1.0457 -0.14371 -0.3565 -0.080713 -0.31047 - 0.57822 -0.28067 -0.069678 0.068929 -0.16227 -0.63934 -0.62149 0.11222 -0.16969 -0.54637 0.49661 0.46565 0.088294 -0.48496 0.69263 -0.068977 -0.53709 0.20802 -0.42987 -0.11921 0.1174 -0.18443 0.43797 -0.1236 0.3607 -0.19608 -0.35366 0.18808 -0.5061 0.14455 -0.024368 -0.10772 -0.0115 0.58634 -0.054461 0.0076487 -0.056297 0.27193 0.23096 -0.29296 -0.24325 0.10317 -0.10014 0.7089 0.17402 -0.0037509 -0.46304 0.11806 -0.16457 -0.38609 0.14524 0.098122 -0.12352 -0.1047 0.39047 -0.3063 -0.65375 - 0.0044248 -0.033876 0.037114 -0.27472 0.0053147 0.30737 0.12528 -0.19527 -0.16461 0.087518 -0.051107 -0.16323 0.521 0.10822 - 0.060379 -0.71735 -0.064327 0.37043 -0.41054 -0.2728 -0.30217 0.015771 -0.43056 0.35647 0.17188 -0.54598 -0.21541 -0.044889 - 0.10597 -0.54391 0.53908 0.070938 0.097839 0.097908 0.17805 0.18995 0.49962 -0.18529 0.051234 0.019574 0.24805 0.3144 - 0.29304 0.54235 0.46672 0.26017 -0.44705 0.28287 -0.033345 -0.33181 -0.10902 -0.023324 0.2106 -0.29633 0.81506 0.038524 0.46004 0.17187 -0.29804
  • 75.
    QAware 75 GloVe –most similar vectors dist castle 22.639 | castles 25.825 | fortress 33.900 | manor 37.224 | palace 38.010 | medieval 38.579 | chateau 40.544 | citadel 40.544 | mansion 41.449 | tower 42.379 | abbey 42.604 | knights 42.699 | ruins 43.927 | knight 44.924 | fort 46.041 | hill 46.250 | haunted 46.495 | prince 46.921 | royal 48.447 | ruined 48.455 | towers dist curry 26.513 | chilli 29.316 | curries 30.726 | soup 30.958 | chili 33.781 | gravy 33.958 | curried 34.262 | chicken 34.400 | sauce 35.007 | stew 35.989 | fried 36.195 | noodle 36.445 | fry 36.482 | spicy 37.188 | rice 37.604 | ginger 37.865 | cooked 37.996 | onion 38.693 | potato 39.988 | salad 41.579 | dish dist cat 14.867 | cats 16.691 | kitten 20.210 | dog 20.405 | kitty 22.314 | pet 25.260 | feline 26.246 | ferret 27.326 | kittens 29.543 | dogs 29.871 | puppy 30.774 | rabbit 31.646 | pets 32.985 | animal 41.642 | bear 42.115 | animals 43.053 | one 44.076 | sure 44.170 | kind 44.207 | when 44.628 | put
  • 76.
    Word embeddings arecontext-free Embedding of a sentence from word embeddings sentence embeding = average of term embeddings Each term has always the same embedding But the meaning of a word depends on the context mouse (rodent, trap, computer, eye, garlic …) cell (phone, prison, blood/skin, solar, = some people, hermitage…) Sentence embeddings Embedding of term depends on the context QAware 76 BERT – WHY SENTENCE EMBEDDINGS?
  • 77.
    Use word embeddingsfor every position in a sentence (word => sentence) Take a gigantomanic neural network of a special type (Transformer) Input: the sentence where one word has been blanked out Output: the complete sentence The king and the queen live in the castle. The ____ and the queen live in the castle. The king and the _____ live in the castle. The king and the queen live in the ______. Finally let a gazillion of tensorflow units burn on absurd amounts of data QAware 77 BERT – THE ROUGH IDEA
  • 78.
    !pip install -Usentence-transformers !pip install scipy import scipy from sentence_transformers import SentenceTransformer sentences = ["the sun shines", "the sky is blue", "we have good weather", "bert is amazing", "sentence embeddings rock", "it is raining", "uhh i need a rain coat", "that's pretty bad weather"] model = SentenceTransformer("roberta-large-nli-mean-tokens") sentence_embeddings = model.encode(sentences) distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings, "cosine") print(distances) QAware 78 BERT – CODE
  • 79.
    !pip install -Usentence-transformers !pip install scipy import scipy from sentence_transformers import SentenceTransformer sentences = ["the sun shines", "the sky is blue", "we have good weather", "bert is amazing", "sentence embeddings rock", "it is raining", "uhh i need a rain coat", "that's pretty bad weather"] model = SentenceTransformer("roberta-large-nli-mean-tokens") # <== plenty to choose from sentence_embeddings = model.encode(sentences) distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings, "cosine") print(distances) QAware 79 BERT – CODE
  • 80.
    !pip install -Usentence-transformers !pip install scipy import scipy from sentence_transformers import SentenceTransformer sentences = ["the sun shines", "the sky is blue", "we have good weather", "bert is amazing", "sentence embeddings rock", "it is raining", "uhh i need a rain coat", "that's pretty bad weather"] model = SentenceTransformer("roberta-large-nli-mean-tokens") sentence_embeddings = model.encode(sentences) distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings, "cosine") print(distances) QAware 80 BERT – CODE
  • 81.
    !pip install -Usentence-transformers !pip install scipy import scipy from sentence_transformers import SentenceTransformer sentences = ["the sun shines", "the sky is blue", "we have good weather", "bert is amazing", "sentence embeddings rock", "it is raining", "uhh i need a rain coat", "that's pretty bad weather"] model = SentenceTransformer("roberta-large-nli-mean-tokens") sentence_embeddings = model.encode(sentences) distances = scipy.spatial.distance.cdist(sentence_embeddings, sentence_embeddings, "cosine") print(distances) QAware 81 BERT – CODE
  • 82.
    | 0 12 3 4 5 6 7 ---+------------------------------- -- the sun shines 0 | . . . . . . . . the sky is blue 1 | 76 . . . . . . . we have good weather 2 | 81 75 . . . . . . | . . . . . . . . bert is amazing 3 | 52 43 61 . . . . . sentence embeddings rock 4 | 46 40 51 69 . . . . QAware 82 BERT – EXAMPLES
  • 83.
    QAware 83 Semantic searchsummary Name Latent semantic indexing Probabilistic latent semantic indexing Word2vec, GloVe, FastText… BERT + Variations Approach Matrix decomposition via SVD Matrix decomposition via EM-algorithm Neural network Neural network Interpretability ? very good good okay Level word word word sentence Ready-to-use? no and difficult nope, feasible yes, easy yes, very easy Type linear linear non-linear non-linear Quality meh good good yihaaa!
  • 84.