Vespa tour - an introduction to building search and recommendation applications

Me
Ma# Overstreet
OpenSource Connec2ons
Stuﬀ I do:
* Solr/Elas1cSearch/Searchy-stuﬀ
* DataStax Cassandra
* So;ware Development

What is it?
"Big data. Real -me.
The open big data serving engine:
Store, search, rank and organize big data
at user serving 8me."1
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html

What does it do?
Use Vespa to build:
• Search applica,ons
• Personalized recommenda,on
• Naviga,on pages computed on demand
• Real,me data displays - tag clouds, maps, graphs

Applica'on Packages
A Vespa applica+on package is the set of
configura+on files and Java plugins that
together define the behavior of a Vespa
system

Services.xml
Primary config file for an applica1on
package.
• <search> sets up the search endpoint
for Vespa queries. The default port is
8080.
• <nodes> defines the nodes required per
service. (See the reference for more on
container cluster setup.)
• <content> defines how documents are
stored and searched

Search Defini,on
Field Defini*ons:
• index: Create a search index for this
field
• a4ribute: Store this field in memory as
an a4ribute — for sor;ng, searching and
grouping
• summary: Let this field be part of the
document summary in the result set

Stopwords, Synonyms and Query Rewri4ng
[stopword] -> ; # (Replace them by nothing)
[stopword] :- and, or, the, be;
lotr -> lord of the rings;
[brand] -> company:[brand];
[brand] :- sony, dell, ibm, hp;
[category] +> $category:[category];
[category] :- laptop, digital camera, camera;
[destination] (in, by, at, on) [place] +> $name:[destination]

Default Linguis.cs
• Tokeniza*on on whitespace
• Kstemmer for stemming
• Changing linguis*cs means wri*ng code
• Only English for stemming, wai*ng for community support to
extend
See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more
informa5on.

Custom Linguis,cs
Start here: h)ps://github.com/vespa-
engine/vespa/tree/master/linguis9cs/src/
main/java/com/yahoo/language/simple

First, Querying and a Li1le YQL
http://localhost:8080/search/
?yql=select * from sources * where userQuery()
&query=trees

Other YQL Examples
Numerics
select * from sources * where 500 >= price;
Grouping and aggregates
select * from sources * where sddocname contains 'purchase' |
all(group(customer) each(output(sum(price))));

Na#veRank
"Out of the box" ranking for Vespa combines1
:
• Field/A)ribute Match
• Proximity
Good for text ranking, but should be combined with other features
for even be9er relevancy.
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html

Ranking Expressions
Built with query features:
nativeRank + query(deservesFreshness) * freshness(timestamp)

More Features
Feature Descrip,on
term(n).significance normalized number (between 0.0 and 1.0 describing the significance of the
term
term(n).connectedness normalized strength with which this term is connected to the previous term
queryTermCount number of terms in this query
fieldLength(name) number of terms in this field
fieldMatch(name) normalized measure of degree to which query and field matched
fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field
fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched
distanceToPath(name).distance euclidian distance from a path through 2d space
Full list: h*p://docs.vespa.ai/documenta6on/reference/rank-
features.html

Two Phase Ranking
search myapp {
…
rank-profile default inherits default {
first-phase {
expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
}
second-phase {
expression {
0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) +
0.3 * attributeMatch(keywords)
}
rerank-count: 200
}
}
}

Side Note: Literal boos0ng
Vespa stems by default, but allows access to the literal value.
field title type string {
indexing: index
rank: literal
}
You can write this ranking expression:
0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)

Tensors
Mul$-dimensional arrays of values.
{
"user_id": 270,
"user_item_cf": {
"user_item_cf:0": -1.750116e-05,
"user_item_cf:1": 9.730623e-05,
"user_item_cf:2": 8.515047e-05,
"user_item_cf:3": 6.9297894e-05,
"user_item_cf:4": 7.343942e-05,
"user_item_cf:5": -0.00017635927,
"user_item_cf:6": 5.7642872e-05,
"user_item_cf:7": -6.6685796e-05,
"user_item_cf:8": 8.5506894e-05,
"user_item_cf:9": -1.7209566e-05
}
}

Searching With Tensors
rank-profile tensor {
first-phase {
expression: sum(query(user_item_cf) * attribute(user_item_cf))
}
}

Ranking with TensorFlow models
search tf {
document tf {
field document_tensor type tensor(d0[1],d1[784]) {
indexing: attribute | summary
attribute: tensor(d0[1],d1[784])
}
}
rank-profile default inherits default {
macro input_tensor() {
expression: attribute(document_tensor)
}
first-phase {
expression: sum(tensorflow("my_model/saved", "serving_default", "output"))
}
}
}

With Docker
git clone https://github.com/vespa-engine/sample-apps.git
export VESPA_SAMPLE_APPS=`pwd`/sample-apps
docker run --detach --name vespa --hostname vespa-container
--privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps
--publish 8080:8080 vespaengine/vespa
h"p://docs.vespa.ai/documenta3on/vespa-quick-start.html

Vespa tour - an introduction to building search and recommendation applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vespa tour - an introduction to building search and recommendation applications

Similar to Vespa tour - an introduction to building search and recommendation applications (20)

Recently uploaded

Recently uploaded (20)

Vespa tour - an introduction to building search and recommendation applications