Vespa, A Tour

Vespa, a tour
Me
Ma# Overstreet
OpenSource Connec2ons
Stuff I do:
* Solr/Elas1cSearch/Searchy-stuff
* DataStax Cassandra
* So;ware Development
What is it?
"Big data. Real -me.
The open big data serving engine:
Store, search, rank and organize big data
at user serving 8me."1
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
What does it do?
Use Vespa to build:
• Search applica,ons
• Personalized recommenda,on
• Naviga,on pages computed on demand
• Real,me data displays - tag clouds, maps, graphs
Configuring
Applica'on Packages
A Vespa applica+on package is the set of
configura+on files and Java plugins that
together define the behavior of a Vespa
system
Services.xml
Primary config file for an applica1on
package.
• <search> sets up the search endpoint
for Vespa queries. The default port is
8080.
• <nodes> defines the nodes required per
service. (See the reference for more on
container cluster setup.)
• <content> defines how documents are
stored and searched
Search Defini,on
Field Defini*ons:
• index: Create a search index for this
field
• a4ribute: Store this field in memory as
an a4ribute — for sor;ng, searching and
grouping
• summary: Let this field be part of the
document summary in the result set
Stopwords, Synonyms and Query Rewri4ng
[stopword] -> ; # (Replace them by nothing)
[stopword] :- and, or, the, be;
lotr -> lord of the rings;
[brand] -> company:[brand];
[brand] :- sony, dell, ibm, hp;
[category] +> $category:[category];
[category] :- laptop, digital camera, camera;
[destination] (in, by, at, on) [place] +> $name:[destination]
Linguis'cs
Default Linguis.cs
• Tokeniza*on on whitespace
• Kstemmer for stemming
• Changing linguis*cs means wri*ng code
• Only English for stemming, wai*ng for community support to
extend
See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more
informa5on.
Custom Linguis,cs
Start here: h)ps://github.com/vespa-
engine/vespa/tree/master/linguis9cs/src/
main/java/com/yahoo/language/simple
Ranking
First, Querying and a Li1le YQL
http://localhost:8080/search/
?yql=select * from sources * where userQuery()
&query=trees
Other YQL Examples
Numerics
select * from sources * where 500 >= price;
Grouping and aggregates
select * from sources * where sddocname contains 'purchase' |
all(group(customer) each(output(sum(price))));
Na#veRank
"Out of the box" ranking for Vespa combines1
:
• Field/A)ribute Match
• Proximity
Good for text ranking, but should be combined with other features
for even be9er relevancy.
1
h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
Ranking Expressions
Built with query features:
nativeRank + query(deservesFreshness) * freshness(timestamp)
More Features
Feature Descrip,on
term(n).significance normalized number (between 0.0 and 1.0 describing the significance of the
term
term(n).connectedness normalized strength with which this term is connected to the previous term
queryTermCount number of terms in this query
fieldLength(name) number of terms in this field
fieldMatch(name) normalized measure of degree to which query and field matched
fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field
fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched
distanceToPath(name).distance euclidian distance from a path through 2d space
Full list: h*p://docs.vespa.ai/documenta6on/reference/rank-
features.html
Two Phase Ranking
search myapp {
…
rank-profile default inherits default {
first-phase {
expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
}
second-phase {
expression {
0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) +
0.3 * attributeMatch(keywords)
}
rerank-count: 200
}
}
}
Side Note: Literal boos0ng
Vespa stems by default, but allows access to the literal value.
field title type string {
indexing: index
rank: literal
}
You can write this ranking expression:
0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)
Tensors
Mul$-dimensional arrays of values.
{
"user_id": 270,
"user_item_cf": {
"user_item_cf:0": -1.750116e-05,
"user_item_cf:1": 9.730623e-05,
"user_item_cf:2": 8.515047e-05,
"user_item_cf:3": 6.9297894e-05,
"user_item_cf:4": 7.343942e-05,
"user_item_cf:5": -0.00017635927,
"user_item_cf:6": 5.7642872e-05,
"user_item_cf:7": -6.6685796e-05,
"user_item_cf:8": 8.5506894e-05,
"user_item_cf:9": -1.7209566e-05
}
}
Searching With Tensors
rank-profile tensor {
first-phase {
expression: sum(query(user_item_cf) * attribute(user_item_cf))
}
}
Ranking with TensorFlow models
search tf {
document tf {
field document_tensor type tensor(d0[1],d1[784]) {
indexing: attribute | summary
attribute: tensor(d0[1],d1[784])
}
}
rank-profile default inherits default {
macro input_tensor() {
expression: attribute(document_tensor)
}
first-phase {
expression: sum(tensorflow("my_model/saved", "serving_default", "output"))
}
}
}
Try it
With Docker
git clone https://github.com/vespa-engine/sample-apps.git
export VESPA_SAMPLE_APPS=`pwd`/sample-apps
docker run --detach --name vespa --hostname vespa-container 
--privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps 
--publish 8080:8080 vespaengine/vespa
h"p://docs.vespa.ai/documenta3on/vespa-quick-start.html
1 of 25

Recommended

Haystacks slides by
Haystacks slidesHaystacks slides
Haystacks slidesTed Sullivan
544 views24 slides
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies by
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
3.1K views29 slides
Interleaving, Evaluation to Self-learning Search @904Labs by
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
812 views27 slides
Enhancing relevancy through personalization & semantic search by
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
3.2K views62 slides
Extending Solr: Building a Cloud-like Knowledge Discovery Platform by
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
1.7K views32 slides
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T... by
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
1.6K views37 slides

More Related Content

What's hot

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine by
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
8K views37 slides
Reflected intelligence evolving self-learning data systems by
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
6.5K views90 slides
Extracting keywords from texts - Sanda Martincic Ipsic by
Extracting keywords from texts - Sanda Martincic IpsicExtracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic IpsicInstitute of Contemporary Sciences
753 views26 slides
The Intent Algorithms of Search & Recommendation Engines by
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
2.4K views108 slides
Self-learned Relevancy with Apache Solr by
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
2.5K views103 slides
Haystack- Learning to rank in an hourly job market by
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market Xun Wang
665 views49 slides

What's hot(20)

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine by Trey Grainger
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger8K views
Reflected intelligence evolving self-learning data systems by Trey Grainger
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
Trey Grainger6.5K views
The Intent Algorithms of Search & Recommendation Engines by Trey Grainger
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger2.4K views
Self-learned Relevancy with Apache Solr by Trey Grainger
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
Trey Grainger2.5K views
Haystack- Learning to rank in an hourly job market by Xun Wang
Haystack- Learning to rank in an hourly job market Haystack- Learning to rank in an hourly job market
Haystack- Learning to rank in an hourly job market
Xun Wang665 views
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb... by Trey Grainger
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger9.9K views
Intent Algorithms: The Data Science of Smart Information Retrieval Systems by Trey Grainger
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger5.3K views
Building Search & Recommendation Engines by Trey Grainger
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger6K views
The Apache Solr Smart Data Ecosystem by Trey Grainger
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger6.1K views
Reflected Intelligence: Lucene/Solr as a self-learning data system by Trey Grainger
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger6.1K views
Building a Real-time Solr-powered Recommendation Engine by lucenerevolution
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
lucenerevolution8.9K views
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc... by Lucidworks
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Lucidworks2.5K views
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite... by Lucidworks
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks9K views
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente... by Lucidworks
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Lucidworks1.1K views
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese... by Lucidworks
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks1.1K views
Thought Vectors and Knowledge Graphs in AI-powered Search by Trey Grainger
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger1.6K views
Semantic & Multilingual Strategies in Lucene/Solr by Trey Grainger
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
Trey Grainger15.1K views
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul... by Lucidworks
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Lucidworks3.8K views
The Semantic Knowledge Graph by Trey Grainger
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger12.3K views

Similar to Vespa, A Tour

Productionizing your Streaming Jobs by
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
1.6K views46 slides
Tez Data Processing over Yarn by
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
2.4K views29 slides
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day by
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps dayPlain Concepts
637 views50 slides
Couchbas for dummies by
Couchbas for dummiesCouchbas for dummies
Couchbas for dummiesQureshi Tehmina
630 views58 slides
Annotating search results from web databases-IEEE Transaction Paper 2013 by
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Yadhu Kiran
896 views31 slides
Finding the right stuff, an intro to Elasticsearch (at Rug::B) by
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Michael Reinsch
1.1K views45 slides

Similar to Vespa, A Tour(20)

Productionizing your Streaming Jobs by Databricks
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
Databricks1.6K views
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day by Plain Concepts
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Plain Concepts637 views
Annotating search results from web databases-IEEE Transaction Paper 2013 by Yadhu Kiran
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
Yadhu Kiran896 views
Finding the right stuff, an intro to Elasticsearch (at Rug::B) by Michael Reinsch
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Michael Reinsch1.1K views
StackMate - CloudFormation for CloudStack by Chiradeep Vittal
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
Chiradeep Vittal3.6K views
Get started with R lang by senthil0809
Get started with R langGet started with R lang
Get started with R lang
senthil08091.5K views
Data Quality, Correctness and Dynamic Transformations using Spark and Scala by Subhasish Guha
Data Quality, Correctness and Dynamic Transformations using Spark and ScalaData Quality, Correctness and Dynamic Transformations using Spark and Scala
Data Quality, Correctness and Dynamic Transformations using Spark and Scala
Subhasish Guha351 views
About elasticsearch by Minsoo Jun
About elasticsearchAbout elasticsearch
About elasticsearch
Minsoo Jun202 views
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ... by Julian Hyde
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde2.3K views
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S... by Helena Edelson
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson86.2K views
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan... by Jürgen Ambrosi
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi538 views
AWS Webcast - Build high-scale applications with Amazon DynamoDB by Amazon Web Services
AWS Webcast - Build high-scale applications with Amazon DynamoDBAWS Webcast - Build high-scale applications with Amazon DynamoDB
AWS Webcast - Build high-scale applications with Amazon DynamoDB
Amazon Web Services6.1K views
Embrace NoSQL and Eventual Consistency with Ripple by Sean Cribbs
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
Sean Cribbs2.2K views
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics by Miklos Christine
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine1.2K views

Recently uploaded

Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...ShapeBlue
180 views18 slides
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
132 views13 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
166 views28 slides
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
198 views20 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
139 views31 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
119 views17 slides

Recently uploaded(20)

Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue180 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue132 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue166 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10139 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue119 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li85 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue297 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue194 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue263 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty64 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue222 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue123 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue203 views

Vespa, A Tour

  • 2. Me Ma# Overstreet OpenSource Connec2ons Stuff I do: * Solr/Elas1cSearch/Searchy-stuff * DataStax Cassandra * So;ware Development
  • 3. What is it? "Big data. Real -me. The open big data serving engine: Store, search, rank and organize big data at user serving 8me."1 1 h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
  • 4. What does it do? Use Vespa to build: • Search applica,ons • Personalized recommenda,on • Naviga,on pages computed on demand • Real,me data displays - tag clouds, maps, graphs
  • 6. Applica'on Packages A Vespa applica+on package is the set of configura+on files and Java plugins that together define the behavior of a Vespa system
  • 7. Services.xml Primary config file for an applica1on package. • <search> sets up the search endpoint for Vespa queries. The default port is 8080. • <nodes> defines the nodes required per service. (See the reference for more on container cluster setup.) • <content> defines how documents are stored and searched
  • 8. Search Defini,on Field Defini*ons: • index: Create a search index for this field • a4ribute: Store this field in memory as an a4ribute — for sor;ng, searching and grouping • summary: Let this field be part of the document summary in the result set
  • 9. Stopwords, Synonyms and Query Rewri4ng [stopword] -> ; # (Replace them by nothing) [stopword] :- and, or, the, be; lotr -> lord of the rings; [brand] -> company:[brand]; [brand] :- sony, dell, ibm, hp; [category] +> $category:[category]; [category] :- laptop, digital camera, camera; [destination] (in, by, at, on) [place] +> $name:[destination]
  • 11. Default Linguis.cs • Tokeniza*on on whitespace • Kstemmer for stemming • Changing linguis*cs means wri*ng code • Only English for stemming, wai*ng for community support to extend See h%p://docs.vespa.ai/documenta5on/linguis5cs.html for more informa5on.
  • 12. Custom Linguis,cs Start here: h)ps://github.com/vespa- engine/vespa/tree/master/linguis9cs/src/ main/java/com/yahoo/language/simple
  • 14. First, Querying and a Li1le YQL http://localhost:8080/search/ ?yql=select * from sources * where userQuery() &query=trees
  • 15. Other YQL Examples Numerics select * from sources * where 500 >= price; Grouping and aggregates select * from sources * where sddocname contains 'purchase' | all(group(customer) each(output(sum(price))));
  • 16. Na#veRank "Out of the box" ranking for Vespa combines1 : • Field/A)ribute Match • Proximity Good for text ranking, but should be combined with other features for even be9er relevancy. 1 h$p://docs.vespa.ai/documenta5on/reference/na5verank.html
  • 17. Ranking Expressions Built with query features: nativeRank + query(deservesFreshness) * freshness(timestamp)
  • 18. More Features Feature Descrip,on term(n).significance normalized number (between 0.0 and 1.0 describing the significance of the term term(n).connectedness normalized strength with which this term is connected to the previous term queryTermCount number of terms in this query fieldLength(name) number of terms in this field fieldMatch(name) normalized measure of degree to which query and field matched fieldMatch(name).queryCompleteness normalized raCo of query tokens matched in the field fieldMatch(name).fieldCompleteness normalized raCo of query tokens which was matched distanceToPath(name).distance euclidian distance from a path through 2d space Full list: h*p://docs.vespa.ai/documenta6on/reference/rank- features.html
  • 19. Two Phase Ranking search myapp { … rank-profile default inherits default { first-phase { expression: nativeRank + query(deservesFreshness) * freshness(timestamp) } second-phase { expression { 0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) + 0.3 * attributeMatch(keywords) } rerank-count: 200 } } }
  • 20. Side Note: Literal boos0ng Vespa stems by default, but allows access to the literal value. field title type string { indexing: index rank: literal } You can write this ranking expression: 0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)
  • 21. Tensors Mul$-dimensional arrays of values. { "user_id": 270, "user_item_cf": { "user_item_cf:0": -1.750116e-05, "user_item_cf:1": 9.730623e-05, "user_item_cf:2": 8.515047e-05, "user_item_cf:3": 6.9297894e-05, "user_item_cf:4": 7.343942e-05, "user_item_cf:5": -0.00017635927, "user_item_cf:6": 5.7642872e-05, "user_item_cf:7": -6.6685796e-05, "user_item_cf:8": 8.5506894e-05, "user_item_cf:9": -1.7209566e-05 } }
  • 22. Searching With Tensors rank-profile tensor { first-phase { expression: sum(query(user_item_cf) * attribute(user_item_cf)) } }
  • 23. Ranking with TensorFlow models search tf { document tf { field document_tensor type tensor(d0[1],d1[784]) { indexing: attribute | summary attribute: tensor(d0[1],d1[784]) } } rank-profile default inherits default { macro input_tensor() { expression: attribute(document_tensor) } first-phase { expression: sum(tensorflow("my_model/saved", "serving_default", "output")) } } }
  • 25. With Docker git clone https://github.com/vespa-engine/sample-apps.git export VESPA_SAMPLE_APPS=`pwd`/sample-apps docker run --detach --name vespa --hostname vespa-container --privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 vespaengine/vespa h"p://docs.vespa.ai/documenta3on/vespa-quick-start.html