Vespa, a tour
Me
Ma# Overstreet
OpenSource Connec2ons
Stuff I do:
* Solr/Elas1cSearch/Searchy-stuff
* DataStax Cassandra
* So;ware Development
What is it?
"Big data. Real -me.
The open big data serving engine:
Store, search, rank and organize big data
at user serving 8me."1
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
What does it do?
Use Vespa to build:
• Search applica,ons
• Personalized recommenda,on
• Naviga,on pages computed on demand
• Real,me data displays - tag clouds, maps, graphs
Configuring
Applica'on Packages
A Vespa applica+on package is the set of
configura+on files and Java plugins that
together define the behavior of a Vespa
system
Services.xml
Primary config file for an applica1on
package.
• <search> sets up the search endpoint
for Vespa queries. The default port is
8080.
• <nodes> defines the nodes required per
service. (See the reference for more on
container cluster setup.)
• <content> defines how documents are
stored and searched
Search Defini,on
Field Defini*ons:
• index: Create a search index for this
field
• a4ribute: Store this field in memory as
an a4ribute — for sor;ng, searching and
grouping
• summary: Let this field be part of the
document summary in the result set
Stopwords, Synonyms and Query Rewri4ng
[stopword] -> ; # (Replace them by nothing)
[stopword] :- and, or, the, be;
lotr -> lord of the rings;
[brand] -> company:[brand];
[brand] :- sony, dell, ibm, hp;
[category] +> $category:[category];
[category] :- laptop, digital camera, camera;
[destination] (in, by, at, on) [place] +> $name:[destination]
Linguis'cs
Default Linguis.cs
• Tokeniza*on on whitespace
• Kstemmer for stemming
• Changing linguis*cs means wri*ng code
• Only English for stemming, wai*ng for community support to
extend
See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more
informa5on.
Custom Linguis,cs
Start here: h)ps://github.com/vespa-
engine/vespa/tree/master/linguis9cs/src/
main/java/com/yahoo/language/simple
Ranking
First, Querying and a Li1le YQL
http://localhost:8080/search/
?yql=select * from sources * where userQuery()
&query=trees
Other YQL Examples
Numerics
select * from sources * where 500 >= price;
Grouping and aggregates
select * from sources * where sddocname contains 'purchase' |
all(group(customer) each(output(sum(price))));
Na#veRank
"Out of the box" ranking for Vespa combines1
:
• Field/A)ribute Match
• Proximity
Good for text ranking, but should be combined with other features
for even be9er relevancy.
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
Ranking Expressions
Built with query features:
nativeRank + query(deservesFreshness) * freshness(timestamp)
More Features
Feature Descrip,on
term(n).significance normalized number (between 0.0 and 1.0 describing the significance of the
term
term(n).connectedness normalized strength with which this term is connected to the previous term
queryTermCount number of terms in this query
fieldLength(name) number of terms in this field
fieldMatch(name) normalized measure of degree to which query and field matched
fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field
fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched
distanceToPath(name).distance euclidian distance from a path through 2d space
Full list: h*p://docs.vespa.ai/documenta6on/reference/rank-
features.html
Two Phase Ranking
search myapp {
…
rank-profile default inherits default {
first-phase {
expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
}
second-phase {
expression {
0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) +
0.3 * attributeMatch(keywords)
}
rerank-count: 200
}
}
}
Side Note: Literal boos0ng
Vespa stems by default, but allows access to the literal value.
field title type string {
indexing: index
rank: literal
}
You can write this ranking expression:
0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)
Tensors
Mul$-dimensional arrays of values.
{
"user_id": 270,
"user_item_cf": {
"user_item_cf:0": -1.750116e-05,
"user_item_cf:1": 9.730623e-05,
"user_item_cf:2": 8.515047e-05,
"user_item_cf:3": 6.9297894e-05,
"user_item_cf:4": 7.343942e-05,
"user_item_cf:5": -0.00017635927,
"user_item_cf:6": 5.7642872e-05,
"user_item_cf:7": -6.6685796e-05,
"user_item_cf:8": 8.5506894e-05,
"user_item_cf:9": -1.7209566e-05
}
}
Searching With Tensors
rank-profile tensor {
first-phase {
expression: sum(query(user_item_cf) * attribute(user_item_cf))
}
}
Ranking with TensorFlow models
search tf {
document tf {
field document_tensor type tensor(d0[1],d1[784]) {
indexing: attribute | summary
attribute: tensor(d0[1],d1[784])
}
}
rank-profile default inherits default {
macro input_tensor() {
expression: attribute(document_tensor)
}
first-phase {
expression: sum(tensorflow("my_model/saved", "serving_default", "output"))
}
}
}
Try it
With Docker
git clone https://github.com/vespa-engine/sample-apps.git
export VESPA_SAMPLE_APPS=`pwd`/sample-apps
docker run --detach --name vespa --hostname vespa-container 
--privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps 
--publish 8080:8080 vespaengine/vespa
h"p://docs.vespa.ai/documenta3on/vespa-quick-start.html

Vespa, A Tour

  • 1.
  • 2.
    Me Ma# Overstreet OpenSource Connec2ons StuffI do: * Solr/Elas1cSearch/Searchy-stuff * DataStax Cassandra * So;ware Development
  • 3.
    What is it? "Bigdata. Real -me. The open big data serving engine: Store, search, rank and organize big data at user serving 8me."1 1 h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
  • 4.
    What does itdo? Use Vespa to build: • Search applica,ons • Personalized recommenda,on • Naviga,on pages computed on demand • Real,me data displays - tag clouds, maps, graphs
  • 5.
  • 6.
    Applica'on Packages A Vespaapplica+on package is the set of configura+on files and Java plugins that together define the behavior of a Vespa system
  • 7.
    Services.xml Primary config filefor an applica1on package. • <search> sets up the search endpoint for Vespa queries. The default port is 8080. • <nodes> defines the nodes required per service. (See the reference for more on container cluster setup.) • <content> defines how documents are stored and searched
  • 8.
    Search Defini,on Field Defini*ons: •index: Create a search index for this field • a4ribute: Store this field in memory as an a4ribute — for sor;ng, searching and grouping • summary: Let this field be part of the document summary in the result set
  • 9.
    Stopwords, Synonyms andQuery Rewri4ng [stopword] -> ; # (Replace them by nothing) [stopword] :- and, or, the, be; lotr -> lord of the rings; [brand] -> company:[brand]; [brand] :- sony, dell, ibm, hp; [category] +> $category:[category]; [category] :- laptop, digital camera, camera; [destination] (in, by, at, on) [place] +> $name:[destination]
  • 10.
  • 11.
    Default Linguis.cs • Tokeniza*onon whitespace • Kstemmer for stemming • Changing linguis*cs means wri*ng code • Only English for stemming, wai*ng for community support to extend See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more informa5on.
  • 12.
    Custom Linguis,cs Start here:h)ps://github.com/vespa- engine/vespa/tree/master/linguis9cs/src/ main/java/com/yahoo/language/simple
  • 13.
  • 14.
    First, Querying anda Li1le YQL http://localhost:8080/search/ ?yql=select * from sources * where userQuery() &query=trees
  • 15.
    Other YQL Examples Numerics select* from sources * where 500 >= price; Grouping and aggregates select * from sources * where sddocname contains 'purchase' | all(group(customer) each(output(sum(price))));
  • 16.
    Na#veRank "Out of thebox" ranking for Vespa combines1 : • Field/A)ribute Match • Proximity Good for text ranking, but should be combined with other features for even be9er relevancy. 1 h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
  • 17.
    Ranking Expressions Built withquery features: nativeRank + query(deservesFreshness) * freshness(timestamp)
  • 18.
    More Features Feature Descrip,on term(n).significancenormalized number (between 0.0 and 1.0 describing the significance of the term term(n).connectedness normalized strength with which this term is connected to the previous term queryTermCount number of terms in this query fieldLength(name) number of terms in this field fieldMatch(name) normalized measure of degree to which query and field matched fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched distanceToPath(name).distance euclidian distance from a path through 2d space Full list: h*p://docs.vespa.ai/documenta6on/reference/rank- features.html
  • 19.
    Two Phase Ranking searchmyapp { … rank-profile default inherits default { first-phase { expression: nativeRank + query(deservesFreshness) * freshness(timestamp) } second-phase { expression { 0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) + 0.3 * attributeMatch(keywords) } rerank-count: 200 } } }
  • 20.
    Side Note: Literalboos0ng Vespa stems by default, but allows access to the literal value. field title type string { indexing: index rank: literal } You can write this ranking expression: 0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)
  • 21.
    Tensors Mul$-dimensional arrays ofvalues. { "user_id": 270, "user_item_cf": { "user_item_cf:0": -1.750116e-05, "user_item_cf:1": 9.730623e-05, "user_item_cf:2": 8.515047e-05, "user_item_cf:3": 6.9297894e-05, "user_item_cf:4": 7.343942e-05, "user_item_cf:5": -0.00017635927, "user_item_cf:6": 5.7642872e-05, "user_item_cf:7": -6.6685796e-05, "user_item_cf:8": 8.5506894e-05, "user_item_cf:9": -1.7209566e-05 } }
  • 22.
    Searching With Tensors rank-profiletensor { first-phase { expression: sum(query(user_item_cf) * attribute(user_item_cf)) } }
  • 23.
    Ranking with TensorFlowmodels search tf { document tf { field document_tensor type tensor(d0[1],d1[784]) { indexing: attribute | summary attribute: tensor(d0[1],d1[784]) } } rank-profile default inherits default { macro input_tensor() { expression: attribute(document_tensor) } first-phase { expression: sum(tensorflow("my_model/saved", "serving_default", "output")) } } }
  • 24.
  • 25.
    With Docker git clonehttps://github.com/vespa-engine/sample-apps.git export VESPA_SAMPLE_APPS=`pwd`/sample-apps docker run --detach --name vespa --hostname vespa-container --privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 vespaengine/vespa h"p://docs.vespa.ai/documenta3on/vespa-quick-start.html