Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

“What Happens in
Vagueness Stays in
Vagueness’”
Dawn Anderson

If I came into
your hardware
store…
Image Attribution: Acabashi [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)]

And asked for “fork handles”

It kind of sounds like “four candles”

What about if I said “Got
any Os”?

It reads like O’s but it sounds like
other things as well as … 0 (zero)’s …
for the gate

Homophones - Examples ‘four candles’ and ‘fork handles’

What about, if, in the very next sentence, I asked “Got any P’s?”

You’d
presume I
meant P’s”

Because of the
context of the
previous question

The Two Ronnies –
‘British Comedians’
Name
Droppers
The Confusing
Library
Four Candles
Crossed Lines Mastermind

Almost every
other word in the
English language
has multiple
meanings

“The meaning of a word is
its use in a language”
(Ludwig Wittgenstein
, 1953)
Image attribution: Moritz Nähr [Public domain]

The most important thing to remember
is I have a Pomeranian called Bert

Disclaimer: I am NOT a data
scientist

But I will be talking about some concepts
covering:
Data Science
01
Information
Retrieval
02
Algorithms
03
Linguistics
04
Information
Architecture
05
Library
Science
Category
theory

Since… These are all areas
connected to how search
engines (try to) find the
right information, for the
right informational need at
the right time for the right
user

‘information retrieval’ in web search
To extract informational resources to meet a
search engine user’s information need at time
of query.

Let us first take a very
simplistic look at how we
know search engines work

It’s just like gathering & organizing
books in a library system or using an
old card index system

But instead we are taking
words (or phrases) and
recording where they live

EXAMPLE
Inverted Index:
Text to Doc ID
Mapping

And then picking one (or some) of
these ‘word homes’ (documents) to
meet a query

The hard part is knowing how to
choose the right documents, in
the right order, at the right time

Since ‘relevance’ to one user is not ‘relevance to another

For some queries there
can be only one answer

And even - Zero-query
queries - The user is the
query

Where the user
might be looking for
a restaurant whilst
travelling at 60 mph
on a highway?

So…
Just what is the right information need for the
right user at the right time?

Relevance Matching to Query Requires:
Understanding meaning of words in content & query (What?)
Understanding meaning of word's context in content & query (What?)
Understanding of user’s context (Who / Where / When / Why?)
Understanding of collaboration (Past queries / popularity /
reinforcement / learning to rank)

Matching ‘content’ with
‘intent’ requires increasing
precision

A lot of content is kind of unfocused

Each document (page) is largely just a ‘stream of words’

Every day there are huge volumes of new indexable data

Since every Single Tweet is a
new web Page

Many websites
(and
webpages) are
not logically
organized at all
Unstructured data is voluminous
Filled with irrelevance
Lacks focus
Riddled with nuance
Lots of meaningless text and further
ambiguating jabber

Most text-filled web pages
could be considered
unstructured, noisy data
Blog == Blah Blah

Structured versus unstructured data
• Structured data – high
degree of organization
• Readily searchable by
simple search engine
algorithms or known search
operators (e.g. SQL)
• Logically organized
• Often stored in a relational
database

When we compare them with highly organized relational database systems

A form of structured (& semi-structured) data – Entities, Knowledge
Graphs, Knowledge Bases & Knowledge Repositories

“Entities help to bridge the
gap between structured
and unstructured data”
(Krisztian Balog, ECIR2019
Keynote)

Author of Entity-Oriented Search
– Free on Open Access

Using structured data is an obvious way to
disambiguate in both content & query
understanding

Two things (entities) are similar if
they have a not so distant common
ancestor

Knowledge
Graphs using
triples
(subject,
predicate,
object)

IsA Concepts in entities
& their relationships
can be mapped &
categorised

A well organized
website can
resemble a
knowledge graph

Since website
is NOT ALL
unstructured
data even
before
structured
data markup
It can have a hierarchy
It can have weighted sections
It can have metadata
It (often) has a tree like structure

As long as there is
understanding of
notions of
categorical
‘inheritance’

Semi-
structured
data
• Hierarchical nature of a
website
• Tree structure
• Well sectioned and
including clear containers
and meta headings
• An ontology map between
semi and structured

Internal linking can be
as much about
ontology mapping as
crawl optimisation

And many pages lack the things that emphasise important topics and structure

Ontology Driven Natural Language Processing
Image credit: IBM
https://www.ibm.com/developerworks/community/blogs/nlp/entry/ontology_driven_nlp

But even named
entities can be
polysemic

Did you mean?
•Amadeus Mozart
(composer)
•Mozart Street
•Mozart Cafe

And verbally…Who
(what) are you talking
about?
”Lyndsey Doyle” or
”Linseed Oil”?

And not
everyone or
thing is mapped
to the
knowledge
graph

On their own single words
have no semantic meaning

Even if we understand the
entity (thing) itself we need
to understand word’s context

Semantic context matters
•He kicked the bucket
•I have yet to cross that off my
bucket list
•The bucket was filled with
water

Unfortunately… when things lack topical focus and relevance

How can search
engines fill in the
gaps between
named entities?

When they can’t even tell the difference between Pomeranians and pancakes

They need
‘Text
cohesion’
Cohesion is the grammatical and
lexical linking within a text
or sentence that holds a text
together and gives it meaning.
Without surrounding words the
word bucket could mean
anything in a sentence

If I said to you…
“I’ve got a new
jaguar”

“It’s in the garage”
(sidenote: this is not my garage)

You probably
wouldn’t expect
to see this

Because garage and car go together

The ‘jaguar’ (cat) is the
odd one out

Garage and car and jaguar ‘co-occur’ in common language
together - ‘garage’ added context to ’jaguar’ the ‘car’

But if we understood a topic is about felines we
might be more confident of a jaguar ‘cat’

“You shall know a word by
the company it keeps”
(John Rupert Firth, 1957)

Natural Language
Disambiguation

Probabilistic ‘Guesstimation’

Teaching machines to
understand what words
live nearby each other in
context

Then we can disambiguate
through co-occurrence

Using ‘Distributional
Similarity’
(Relatedness)

Nearest Neighbours (Similarity) Evaluations
KNN – K-Nearest-Neighbour

2 words are similar if they
co-occur with similar words

2 words are similar if they occur in a given
grammatical relation with the same words
Harvest Peel Eat Slice

First Level Relatedness – Words
that appear together in the same
sentence

Second Level Relatedness – words
that co-occur with the same
‘other’ words

Coast and
Shore
Example
Coast and shore have a similar
meaning
They co-occur in first and second
level relatedness documents in a
collection
They would receive a high score in
similarity

Language models are trained on
very large text corpora or
collections (loads of words) to
learn distributional similarity

Vector representations of words (Word Vectors)

Models learn the
weights of the
similarity and
relatedness distances

An important
part of this is
‘Part of
Speech’ (POS)
tagging

Continuous Bag of Words (CBoW)
(Method) or Skip-gram (Opposite of
CBoW)
Continuous Bag of Words -
Taking a continuous bag of
words with no context utilize a
context window of n size n-gram)
to ascertain words which are
similar or related using Euclidean
distances to create vector
models and word embeddings

A Moving Word ‘Context Window’

And build vector
space models for
word
embeddings
king - man +
woman = queen

Tensorflow (tool)
& e.g. Word2Vec
or Glove2Vec
(language models)

Concept2Vec
Ontological
concepts

Google’s Topic Layer is a new
Layer in the Knowledge Graph

Example Microsoft Concept Distribution Layer

Past language models
(e.g. Word2Vec &
Glove2Vec) built
context-free word
embeddings

Did you mean “bank”?
Or did you mean “bank”?

Most language modellers are uni-directional
Source Text
Writing a list of random sentences is harder than I Initially thought it would be
They can traverse over the word’s context window from only left to right or
right to left. Only in one direction, but not both at the same time

They can only look at words in the context
window before and not the words in the rest of
the sentence. Nor sentence to follow next

Often the next
sentence REALLY
matters

I remember the last words my Grandpa
said before he kicked the bucket…
…How far do you reckon I could kick this
bucket?

BERT
(Bidirectional
Encoder
Representation
from
Transformers)

BERT is different. BERT uses bi-directional
language modelling. The FIRST to do this
Source Text
Bert can see both the left and the right hand side of the target word

BERT has been open sourced
by Google AI

Google’s move to
open source BERT
may change natural
language processing
forever

Bert uses ‘Transformers’ &
’Masked Language Modelling’

Masked Language
Modelling Stops
The Target Word
From Seeing Itself

BERT can see the WHOLE
sentence on either side of a
word (contextual language
modelling) and all of the
words almost at once

BERT has been pre-trained on a
lot of words … on the whole of
the English Wikipedia (2,500
million words)

BERT can identify which sentence
likely comes next from two choices

The ML & NLP Community are very excited about
BERT

Vanilla BERT provides a pre-trained
starting point layer for Neural
Networks in machine learning &
natural language diverse tasks

Everybody wants to ‘Build-a-
BERT. Now there are loads of
algorithms with BERT

Whilst BERT has been pre-
trained on Wikipedia it is fine-
tuned on ‘questions and answer
datasets’

Whilst BERT has been
pre-trained on
Wikipedia it is fine-
tuned on ‘questions
and answer datasets’

Researchers compete over Natural Language Understanding with e.g.
SQuAD (Stanford Question & Answering Dataset)

BERT now even beats the human
reasoning benchmark on SQuAD

Not to be outdone – Microsoft also extends on BERT with MT-DNN

In GLUE – It’s
Humans, MT-
DNN, then
BERT

It’s not just words in content
that need to be
disambiguated though

How can search
engines
understand
intents?

Query Classifications -
There are some we know
of already

We need to understand how
queries have been classified
by search engines

Google’s
Quality Raters
Guide
simplifies &
extends these
Know query == Informational
Website query == Navigational
Do query == Transactional
Visit in person == Local intent

There are also
several types of
queries too
(Krisztian Balog,
ECIR, 2019)
Keyword queries (Normal keyword queries)
Keyword++ queries (Faceted / filtered
queries)
Zero-Query queries (User is the query)
Natural language queries
Structured queries (e.g. SQL)

‘Dresses’ is clearly classified
as a transactional query

Whilst keyword research
tools are useful... The
SERPs tell us some
secrets on ‘initial intent’
detection

But If I searched for “fork handles’

Would I mean
“Handles for
forks”?

The organic results are NOT for fork handles

The Two Ronnies – ‘Four Candles’

No high organic ranking candles or forks

Apart from… Ebay selling an actual fork handle in position 8

Almost completely
‘informational & video
results’ (not
transactional)

The overwhelming ‘intent’
was detected

Even in voice search & assistant

Temporal Dynamic Intent (Burstiness) is a huge factor for intent

At certain times far more intents will be transactional

“dresses”, “shoes”,
“bags”
“buy dresses”, “buy
shoes”, “buy bags”,
“dress sales”, “shoe
sales”
Really means

And sometimes only
reasons a particular
audience would
understand spike
temporal queries

Sometimes it is other events which trigger unexpected queries

[Four candles] interest over time

[Fork Handles] interest over time

Often intents can be
modelled according to
predicted intent shifts

Google Trends will only show interest, not intent

The exact same queries
have different intent at
different times &
different locations

Let’s Take The Query [Easter]

What did you really mean when you searched for ‘Easter’?
When did
you search
for ‘Easter’?
A few weeks
before Easter
A few days
before Easter
During Easter
What you
mostly meant
When is
Easter?
Things to do
at Easter
What is the
meaning of
Easter?
Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J., Bocharov, A. and
Horvitz, E., 2013. Behavioral dynamics on the web: Learning, modeling, and
prediction. ACM Transactions on Information Systems (TOIS), 31(3), p.16.

“Easter” Query Intent Shift

Predicting the future
with Web Dynamics
• The journey to predict the future: Kira
Radinsky at TEDxHiriya

This is ‘Query
Intent Shift’

“When users’ information
needs change over time, the
ranking of results should also
change to accommodate
these needs.” (Radinsky,
2013)

Your ranking flux might well be shifting query intents at scale

The passage of time adds new meaning
sometimes too

Another Great ‘Ronnies’ Sketch BTW

The rise and fall
of the
Blackberry?

In query understanding sometimes users don’t know what they want

Sometimes the
searcher query is
a ‘cold start’
query

Broad queries might call for
result diversification due to
lack of intent detection

Search
engines may
return a
blend of
results to
match these
Freshness
Serendipity
Novelty
Diversity

The searcher has to click
around to provide
feedback on their intent
or reformulate the query
by entering something
else (‘query refinement’)

To then deliver sequential
queries with greater intent
understanding

Query Refinement says… “Your move next”

Sometimes there
are not enough
precise results
either

And result precision is not possible

And this can increase recall due to query expansion or relaxation

Precision versus recall in search results

The intent tied to the
page type matters too

Different features matter to users
more dependent on the domain
News (freshness)
Jobs (salary, job title, location)
Restaurants (location, cuisine)
Shopping (price)

In theory… a consolidated page should rank higher… but…

Mixing ‘intent’
on target pages
can be like oil
and water

So watch out for
random informational
blurb on ecommerce
pages

Watch out for both topical &
intent drift

And watch out you don’t lose a featured snippet by changing intent

One oar on topic – the other on intent

To keep the boat going straight

But wait… Understanding word’s context
more is NOT understanding ‘The Whole
Context’

Where the
user is truly
‘the query’

Since humans are unique individuals

Truly PERSONAL AI is not
possible without a
PERSONAL KNOWLEDGE
GRAPH (Krisztian Balog,
ECIR 2019)

Assistant + Home + Discover
+ Search App + Desktop

A Recent Microsoft Personal Knowledge Graph Patent

Semantic
Query
Understanding
Example
Source & Image Attribution
NTent

That is a whole different
‘kettle of fish’

And that is for another
time…

In the
meantime…
remember…

Sources,
References,
further reading

• Balog, K - Entity-Oriented Search | SpringerLink. 2019. Entity-Oriented Search |
SpringerLink. [ONLINE] Available at: https://link.springer.com/book/10.1007/978-
3-319-93935-3. [Accessed 06 May 2019].
• Boyd-Graber, J., Hu, Y. and Mimno, D., 2017. Applications of topic
models. Foundations and Trends® in Information Retrieval, 11(2-3), pp.143-296.
• ECIR 2019. 2019. Proceedings. [ONLINE] Available
at: http://ecir2019.org/proceedings/. [Accessed 06 May 2019].
• Gabrilovich, E. and Markovitch, S., 2007, January. Computing semantic relatedness
using wikipedia-based explicit semantic analysis. In IJcAI (Vol. 7, pp. 1606-1611).
• Hakkani-Tur, D., Tur, G., Li, X. and Li, Q., Microsoft Technology Licensing LLC,
2017. Personal knowledge graph population from declarative user utterances. U.S.
Patent Application 14/809,243.
• Lim, Y.J., Linn, J., Liang, Y., Steinebach, C., Lu, W.L., Kim, D.H., Kunz, J., Koepnick, L.
and Yang, M., Google LLC, 2018. Predicting intent of a search for a particular
context. U.S. Patent Application 15/598,580.
• Lotfi, A., Bouchachia, H., Gegov, A., Langensiepen, C. and McGinnity, M., 2018.
Advances in Computational Intelligence Systems. Intelligence.

• Lohar, P., Ganguly, D., Afli, H., Way, A. and Jones, G.J., 2016. FaDA: Fast
document aligner using word embedding. The Prague Bulletin of
Mathematical Linguistics, 106(1), pp.169-179.
• McDonald, R., Brokos, G.I. and Androutsopoulos, I., 2018. Deep relevance
ranking using enhanced document-query interactions. arXiv preprint
arXiv:1809.01682.
• NTENT. 2019. Query Understanding - NTENT. [ONLINE] Available
at: https://ntent.com/technology/query-understanding/. [Accessed 09 May
2019].
• Plank, Barbara | Keynote - Natural Language Processing: -
https://www.youtube.com/watch?v=Wl6c0OpF6Ho
• Radinsky, Kira - Tedx Talk -
https://www.youtube.com/watch?v=gAifa_CVGCY
• Radinsky, K., 2012, December. Learning to predict the future using Web
knowledge and dynamics. In ACM SIGIR Forum(Vol. 46, No. 2, pp. 114-115).
ACM.

• https://www.youtube.com/watch?v=Ozpek_FrOPs
• Sherkat, E. and Milios, E.E., 2017, June. Vector embedding of
wikipedia concepts and entities. In International conference on
applications of natural language to information systems (pp. 418-
428). Springer, Cham.
• Syed, U., Slivkins, A. and Mishra, N., 2009. Adapting to the shifting
intent of search queries. In Advances in Neural Information Processing
Systems (pp. 1829-1837).

• https://github.com/Hironsan/awesome-embedding-models
• https://nlp.stanford.edu/IR-book/html/htmledition/document-
representations-and-measures-of-relatedness-in-vector-spaces-
1.html
• https://www.youtube.com/watch?time_continue=790&v=wI5O-
lYLBCw
• https://en.wikipedia.org/wiki/Euclidean_distance
• https://ai.googleblog.com/2017/08/transformer-novel-neural-
network.html
• https://www.microsoft.com/en-us/research/blog/towards-universal-
language-embeddings/

• https://nlp.stanford.edu/projects/glove/
• https://ai.googleblog.com/2017/08/transformer-novel-neural-
network.html
• https://blog.acolyer.org/2018/02/22/dynamic-word-embeddings-for-
evolving-semantic-discovery/
• https://pdfs.semanticscholar.org/811d/fb83f77ccb803e3202887af10
30b9e77e772.pdf
• https://webmasters.googleblog.com/2019/04/search-console-
reporting-for-your-sites.html
• https://www.searchenginejournal.com/google-dont-blindly-stuff-text-
into-ecommerce-category-pages/299003/

• https://towardsdatascience.com/bert-explained-state-of-the-art-language-
model-for-nlp-f8b21a9b6270
• https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
• https://towardsdatascience.com/bert-explained-state-of-the-art-language-
model-for-nlp-f8b21a9b6270
• https://arxiv.org/pdf/1810.04805.pdf
• https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
• https://nlp.stanford.edu/seminar/details/jdevlin.pdf
• https://www.analyticsindiamag.com/googles-move-to-open-source-bert-
may-change-nlp-forever/
• https://towardsdatascience.com/word2vec-skip-gram-model-part-1-
intuition-78614e4d6e0b

• http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-
gram-model/
• Semantic similarity and relatedness as scaffolding for natural
language processing ->
https://www.youtube.com/watch?v=YTBVfQ8iBSo
• gensim: models.word2vec – Word2vec embeddings. 2019. gensim:
models.word2vec – Word2vec embeddings. [ONLINE] Available
at: https://radimrehurek.com/gensim/models/word2vec.html.
[Accessed 09 May 2019].

Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

More Related Content

What's hot

Similar to Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

More from Dawn Anderson MSc DigM

Recently uploaded

Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

Editor's Notes