3. What is it?
"Big data. Real -me.
The open big data serving engine:
Store, search, rank and organize big data
at user serving 8me."1
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
4. What does it do?
Use Vespa to build:
• Search applica,ons
• Personalized recommenda,on
• Naviga,on pages computed on demand
• Real,me data displays - tag clouds, maps, graphs
6. Applica'on Packages
A Vespa applica+on package is the set of
configura+on files and Java plugins that
together define the behavior of a Vespa
system
7. Services.xml
Primary config file for an applica1on
package.
• <search> sets up the search endpoint
for Vespa queries. The default port is
8080.
• <nodes> defines the nodes required per
service. (See the reference for more on
container cluster setup.)
• <content> defines how documents are
stored and searched
8. Search Defini,on
Field Defini*ons:
• index: Create a search index for this
field
• a4ribute: Store this field in memory as
an a4ribute — for sor;ng, searching and
grouping
• summary: Let this field be part of the
document summary in the result set
9. Stopwords, Synonyms and Query Rewri4ng
[stopword] -> ; # (Replace them by nothing)
[stopword] :- and, or, the, be;
lotr -> lord of the rings;
[brand] -> company:[brand];
[brand] :- sony, dell, ibm, hp;
[category] +> $category:[category];
[category] :- laptop, digital camera, camera;
[destination] (in, by, at, on) [place] +> $name:[destination]
11. Default Linguis.cs
• Tokeniza*on on whitespace
• Kstemmer for stemming
• Changing linguis*cs means wri*ng code
• Only English for stemming, wai*ng for community support to
extend
See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more
informa5on.
14. First, Querying and a Li1le YQL
http://localhost:8080/search/
?yql=select * from sources * where userQuery()
&query=trees
15. Other YQL Examples
Numerics
select * from sources * where 500 >= price;
Grouping and aggregates
select * from sources * where sddocname contains 'purchase' |
all(group(customer) each(output(sum(price))));
16. Na#veRank
"Out of the box" ranking for Vespa combines1
:
• Field/A)ribute Match
• Proximity
Good for text ranking, but should be combined with other features
for even be9er relevancy.
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
18. More Features
Feature Descrip,on
term(n).significance normalized number (between 0.0 and 1.0 describing the significance of the
term
term(n).connectedness normalized strength with which this term is connected to the previous term
queryTermCount number of terms in this query
fieldLength(name) number of terms in this field
fieldMatch(name) normalized measure of degree to which query and field matched
fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field
fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched
distanceToPath(name).distance euclidian distance from a path through 2d space
Full list: h*p://docs.vespa.ai/documenta6on/reference/rank-
features.html
20. Side Note: Literal boos0ng
Vespa stems by default, but allows access to the literal value.
field title type string {
indexing: index
rank: literal
}
You can write this ranking expression:
0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)