SlideShare a Scribd company logo
Learning To Rank For Solr
Michael Nilsson – Software Engineer
Diego Ceccarelli – Software Engineer
Joshua Pantony – Software Engineer
Bloomberg LP
OUTLINE
●  Search at Bloomberg
●  Why do we need machine learning for search?
●  Learning to Rank
●  Solr Learning to Rank Plugin
8 millions searches PER DAY
1 million PER DAY
400	
  million	
  stories	
  in	
  the	
  index	
  
SOLR IN BLOOMBERG
●  Search engine of choice at Bloomberg
─  Large community / Well distributed committers
─  Open source Apache Project
─  Used within many commercial products
─  Large feature set and rapid growth
●  Committed to open-source
─  Ability to contribute to core engine
─  Ability to fix bugs ourselves
─  Contributions in almost every Solr release since 4.5.0
PROBLEM SETUP
score: 30
score: 1.0
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+
10∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
score: 52.2
score: 30.8
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=100∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+
10∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟏𝟓𝟎∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+
𝟑.𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+
𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒
+𝟑.𝟏𝟏𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
+𝟒𝟐.𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠 +
5 ∗  timeElapsedFrom  LastUpdate  
●  It’s hard to manually tweak the ranking
─  You must be an expert in the domain
─  … or a magician
PROBLEM SETUP
𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒
+𝟑.𝟏𝟏𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
+𝟒𝟐.𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠 +
5 ∗  timeElapsedFrom  LastUpdate  
query = solr query = lucene query = austin query = bloomberg query = …
PROBLEM SETUP
It’s easier with Machine Learning
●  2,000+ parameters (non-linear, factorially larger than linear form)
●  8,000+ queries that are regularly tuned
●  Early on we spent many days hand tuning…
SEARCH PIPELINE (ONLINE)
Index
Top-k
retrieval
User
Query
People
Commodities
News
Other Sources
ReRanking
Model
Top-k
reranked
Top-x
retrieval
x >> k
TRAINING PIPELINE (OFFLINE)
Index
Feature
Extraction
Learning
Algorithm
Ranking
Model
Training
Query-Document
Pairs
People
Commodities
News
Other Sources
Metrics
TRAINING PIPELINE (OFFLINE)
Index
Feature
Extraction
Learning
Algorithm
Ranking
Model
Training
Query-Document
Pairs
People
Commodities
News
Other Sources
Metrics
TRAINING DATA: IMPLICIT VS EXPLICIT
What is explicit data?
●  A set of judges will assess the
search results manually given a
query
─  Experts
─  Crowd
What is implicit data?
●  Infer user preferences based on
user behavior
─  Aggregated results clicks
─  Query reformulation
─  Dwell time
Pros:
─  Data is very clean
Cons:
─  Can be very expensive!
Pros:
─  A lot of data!
Cons:
─  Extremely noisy
─  Privacy concerns
TRAINING PIPELINE (OFFLINE)
Index
Feature
Extraction
Learning
Algorithm
Ranking
Model
Training
Query-Document
Pairs
People
Commodities
News
Other Sources
Metrics
FEATURES
●  A feature is an individual measurable property
●  Given a query, and a collection we can produce many features for each
document in the collection
─  If the query matches the title
─  Length of the document
─  Number of views
─  How old is it?
─  Can be visualized on a mobile device?
FEATURES
Extract “features”
Was the result a
cofounder?
0
Features are signals that give an indication of a result’s importance
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the document
have an exec. position?
1
Query : APPL US
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the query match
the document title?
0
Does the document
have an exec. position?
1
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the query match
the document title?
0
Does the document
have an exec. position?
1
Popularity (%) 0.9
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the query match
the document title?
1
Does the document
have an exec. position?
0
Popularity (%) 0.6
TRAINING PIPELINE (OFFLINE)
Index
Feature
Extraction
Learning
Algorithm
Ranking
Model
Training
Query-Document
Pairs
People
Commodities
News
Other Sources
Metrics
METRICS
How do we know if our model is doing better?
●  Offline metrics
─  Precision/Recall/F1 score
─  nDCG (Normalized Discount Cumulative Gain)
─  Other metrics (e.g., ERR, MAP, …)
●  Online Metrics
─  Click through rates à higher
─  Time to first click à lower
─  Interleaving1
1O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large scale validation and analysis of interleaved search evaluation. ACM
Transactions on Information Science, 30(1), 2012.
TRAINING PIPELINE (OFFLINE)
Index
Feature
Extraction
Learning
Algorithm
Ranking
Model
Training
Query-Document
Pairs
People
Commodities
News
Other Sources
Metrics
LEARNING TO RANK
●  Learn how to combine the features for optimizing one or more metrics
●  Many learning algorithms
─  RankSVM1
─  LambdaMART2
─  …
1T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and
Data Mining (KDD), ACM, 2002.
2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR-
TR-2010-82, 2010.
SEARCH PIPELINE: STANDARD
Index
Top-k
retrieval
User
Query
SolrPeople
Commodities
News
Other Sources
SEARCH PIPELINE: STANDARD
Index
Top-k
retrieval
User
Query
Solr
Training
Data
Learning
Algorithm
Ranking
Model Offline
People
Commodities
News
Other Sources
SEARCH PIPELINE: STANDARD
Index
Top-k
retrieval
User
Query
Solr
Ranking
ModelOnline
Top-x
reranked
People
Commodities
News
Other Sources
SEARCH PIPELINE: SOLR INTEGRATION
Index
Top-k
retrieval
User
Query
Solr
Ranking
ModelOnline
Top-x
reranked
People
Commodities
News
Other Sources
SOLR RELEVANCY
●  Pros
─  Simple and quick scoring computation
─  Phrase matching
─  Function query boosting on time, distance, popularity, etc
─  Customized fields for stemming, synonyms, etc
●  Cons
─  Lots of manual time for creating a well tuned query
─  Weights are brittle, and may not be compatible in the future with more documents
or fields added
LTR PLUGIN: GOALS
●  Don’t tune the relevancy manually!
─  Uses machine learning to power automatic relevancy tuning
●  Significant relevancy improvements
●  Allow comparable scores across collections
─  Collections of different sizes
●  Maintaining low latency
─  Re-use the vast Solr search functionality that is already built-in
─  Less data transport
●  Makes it simple to use domain knowledge to rapidly create features
─  Features are no longer coded but rather scripted
STANDARD SOLR SEARCH REQUEST
Index
Top-k
retrieval
User
Query
People
Commodities
News
Other Sources
Index
STANDARD SOLR SEARCH REQUEST
Index
[10 Million]
Top-10
retrieval
User
Query
Matches
[10k]
Score
[10k]
Solr Query
People
Commodities
News
Other Sources
LTR SOLR SEARCH REQUEST
Index
[10 Million]
Top-1000
retrieval
User
Query
Matches
[10k]
Score
[10k]
Ranking
Model
Top-10
reranked
Solr Query
LTR Query
People
Commodities
News
Other Sources
<!-- Query parser used to rerank top docs with a provided model -->	
  
<queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />	
  
LTR PLUGIN: RERANKING
●  LTRQuery extends Solr’s RankQuery
─  Wraps main query to fetch initial results
─  Returns custom TopDocsCollector for reranked ordered results
●  Solr rerank request parameter
rq={!ltr model=myModel1 reRankDocs=100 efi.user_query=‘james’ efi.my_var=123}
─  !ltr – name used in the solrconfig.xml for the LTRQParserPlugin
─  model – name of deployed model to use for reranking
─  reRankDocs – total number of documents to rerank
─  efi.* – custom parameters used to pass external feature information for your
features to use
•  Query intent
•  Personalization
SEARCH PIPELINE (ONLINE)
Index
[10 Million]
Top-1000
retrieval
User
Query
Matches
[10k]
Score
[10k]
Ranking
Model
Top-10
reranked
Feature
Extraction
People
Commodities
News
Other Sources
{	
  
	
  	
  	
  	
  "name":	
  	
  "Tim	
  Cook",	
  
	
  	
  	
  	
  "primary_position":	
  	
  "ceo",	
  
	
  	
  	
  	
  "category	
  ":	
  	
  "person",	
  
	
  	
  	
  	
  …	
  
}	
  
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the query match
the document title?
0
Does the document
have an exec. position?
1
Popularity (%) 0.9
LTR PLUGIN: FEATURES BEFORE
[	
  
	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  	
  "isPersonAndExecutive",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "type":	
  "org.apache.solr.ltr.feature.impl.SolrFeature",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "params":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "fq":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "{!terms	
  f=category}person",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "{!terms	
  f=primary_position}ceo,	
  cto,	
  cfo,	
  president"	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  …	
  
]	
  
LTR PLUGIN: FEATURES AFTER
LTR PLUGIN: FUNCTION QUERIES
[	
  
	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  	
  "documentRecency",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "type":	
  "org.apache.solr.ltr.feature.impl.SolrFeature",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "params":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "q":	
  "{!func}recip(	
  ms(NOW,publish_date),	
  3.16e-­‐11,	
  1,	
  1)"	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  …	
  
]	
  
	
  
1	
  for	
  docs	
  dated	
  now,	
  1/2	
  for	
  docs	
  dated	
  1	
  year	
  ago,	
  1/3	
  for	
  docs	
  dated	
  2	
  years	
  ago,	
  etc..	
  	
  
See	
  http://wiki.apache.org/solr/FunctionQuery#Date_Boosting	
  
LTR PLUGIN: FEATURE STORE
●  FeatureStore is a Solr Managed Resource
─  REST API endpoint for performing CRUD operations on Solr objects
─  Stored in maintained in Zookeeper
●  Deploy
─  curl -XPUT 'http://yoursolrserver/solr/collection/config/fstore'
--data-binary @./features.json -H 'Content-type:application/json'
●  View
─  http://yoursolrserver/solr/collection/config/fstore
LTR PLUGIN: FEATURES
●  Simplifies feature engineering through configuration file
●  Utilizes rich search functionality built-in to Solr
─  Phrase matching
─  Synonyms, Stemming, etc
●  Inherit the Feature class for specialized features
SEARCH PIPELINE (ONLINE)
Index
[10 Million]
Top-1000
retrieval
User
Query
Matches
[10k]
Score
[10k]
Ranking
Model
Top-10
reranked
Feature
Extraction
People
Commodities
News
Other Sources
TRAINING PIPELINE (OFFLINE)
Index
[10 Million]
Top-1000
retrieval
Training
Queries
Matches
[10k]
Score
[10k]
Feature
Extraction
Learning
Algorithm
Ranking
Model
People
Commodities
News
Other Sources
{	
  
	
  	
  	
  	
  "name":	
  	
  "Tim	
  Cook",	
  
	
  	
  	
  	
  "primary_position":	
  	
  "ceo",	
  
	
  	
  	
  	
  "category	
  ":	
  	
  "person",	
  
	
  	
  	
  	
  …	
  
}	
  
FEATURES
Extract “features”
Features are signals that give an indication of a result’s importance
Was the result a
cofounder?
0
Does the query match
the document title?
0
Does the document
have an exec. position?
1
Popularity (%) 0.9
<!-- Document transformer adding feature vectors with each retrieved document -->	
  
<transformer name="fv" class= "org.apache.solr.ltr.ranking.LTRFeatureTransformer" />	
  
LTR PLUGIN: FEATURE EXTRACTION
●  Feature extraction uses Solr’s TransformerFactory
─  Returns a custom field with each document
●  fl = *, [fv]
{	
  
	
  	
  	
  	
  "name":	
  	
  "Tim	
  Cook",	
  
	
  	
  	
  	
  "primary_position":	
  	
  "ceo",	
  
	
  	
  	
  	
  "category	
  ":	
  	
  "person",	
  
	
  	
  	
  	
  …	
  
	
  	
  	
  	
  "[fv]":	
  	
  "isCofounder:0.0,	
  isPersonAndExecutive:1.0,	
  matchTitle:0.0,	
  popularity:0.9"	
  
}	
  
LTR PLUGIN: MODEL{	
  
	
  	
  	
  	
  "type":	
  "org.apache.solr.ltr.ranking.LambdaMARTModel",	
  
	
  	
  	
  	
  "name":	
  "mymodel1",	
  
	
  	
  	
  	
  "features":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  "name":	
  "matchedTitle"},	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  "name":	
  "isPersonAndExecutive"}	
  
	
  	
  	
  	
  ],	
  
	
  	
  	
  	
  "params":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "trees":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "weight":	
  1,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "tree":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "feature":	
  "matchedTitle",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "threshold":	
  0.5,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "left":	
  {	
  "value":	
  -­‐100	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "right":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "feature":	
  "isPersonAndExecutive",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "threshold":	
  0.5,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "left":	
  {	
  "value":	
  50	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "right":	
  {	
  "value":	
  75	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  }	
  
}	
  
LTR PLUGIN: MODEL
●  ModelStore is also a Solr Managed Resource
●  Deploy
─  curl -XPUT 'http://yoursolrserver/solr/collection/config/mstore'
--data-binary @./model.json -H 'Content-type:application/json'
●  View
─  http://yoursolrserver/solr/collection/config/mstore
●  Inherit from the model class for new scoring algorithms
─  score()
─  explain()
LTR PLUGIN: EVALUATION
●  Offline Metrics
─  nDCG increased approximately 10% after reranking
●  Online Metrics
─  Clicks @ 1 up by approximately 10%
BEFORE AND AFTER
Query: “unemployment”
Solr Ranking Machine Learned Reranking
LTR PLUGIN: EVALUATION
●  Offline Metrics
─  nDCG increased approximately 10% after reranking
●  Online Metrics
─  Clicks @ 1 up by approximately 10%
●  Performance
─  About 30% faster than previous external ranking system
10 million documents in collection
100k queries
1k features
1k documents/query reranked
LTR PLUGIN: BENEFITS
●  Simpler feature engineering, without compiling
●  Access to rich internal Solr search functionality for feature building
●  Search result relevancy improvements vs regular Solr relevance
●  Automatic relevancy tuning
●  Compatible scores across collections
●  Performance benefits vs external ranking system
FUTURE WORK
●  Continue work to open source the plugin
●  Support pipelining multiple reranking models
●  Allow a simple ranking model to be used in the first pass
QUESTIONS?

More Related Content

What's hot

Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Lucidworks (Archived)
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
Lucidworks
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
confluent
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
pmanvi
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
OpenSource Connections
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using Solr
Pradeep Pujari
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
Sease
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
Neil Baker
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 

What's hot (20)

Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using Solr
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 

Similar to Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP

Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
Trey Grainger
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
Andrei Lopatenko
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Lucidworks
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
TYPO3 CertiFUNcation
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
Trey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
DataWorks Summit
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
Ramez Al-Fayez
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
Alessandro Benedetti
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
Joachim Draeger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Lucidworks
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
SplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner WorkshopSplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner Workshopjenny_splunk
 

Similar to Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP (20)

Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
kdd2015
kdd2015kdd2015
kdd2015
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
SplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner WorkshopSplunkLive Oslo/Stockholm Beginner Workshop
SplunkLive Oslo/Stockholm Beginner Workshop
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP

  • 1. Learning To Rank For Solr Michael Nilsson – Software Engineer Diego Ceccarelli – Software Engineer Joshua Pantony – Software Engineer Bloomberg LP
  • 2. OUTLINE ●  Search at Bloomberg ●  Why do we need machine learning for search? ●  Learning to Rank ●  Solr Learning to Rank Plugin
  • 3. 8 millions searches PER DAY 1 million PER DAY 400  million  stories  in  the  index  
  • 4. SOLR IN BLOOMBERG ●  Search engine of choice at Bloomberg ─  Large community / Well distributed committers ─  Open source Apache Project ─  Used within many commercial products ─  Large feature set and rapid growth ●  Committed to open-source ─  Ability to contribute to core engine ─  Ability to fix bugs ourselves ─  Contributions in almost every Solr release since 4.5.0
  • 6. PROBLEM SETUP 𝑆𝑐𝑜𝑟𝑒=100∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+ 10∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 score: 52.2 score: 30.8
  • 7. PROBLEM SETUP 𝑆𝑐𝑜𝑟𝑒=100∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+ 10∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛
  • 8. PROBLEM SETUP 𝑆𝑐𝑜𝑟𝑒=𝟏𝟓𝟎∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒+ 𝟑.𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛+ 𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠
  • 9. PROBLEM SETUP 𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒 +𝟑.𝟏𝟏𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 +𝟒𝟐.𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗  timeElapsedFrom  LastUpdate  
  • 10. ●  It’s hard to manually tweak the ranking ─  You must be an expert in the domain ─  … or a magician PROBLEM SETUP 𝑆𝑐𝑜𝑟𝑒=𝟗𝟗.𝟗∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝑇𝑖𝑡𝑙𝑒 +𝟑.𝟏𝟏𝟏𝟒∗ 𝑠𝑐𝑜𝑟𝑒𝑂𝑛𝐷𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 +𝟒𝟐.𝟒𝟐∗ 𝑐𝑙𝑖𝑐𝑘𝑠 + 5 ∗  timeElapsedFrom  LastUpdate   query = solr query = lucene query = austin query = bloomberg query = …
  • 11. PROBLEM SETUP It’s easier with Machine Learning ●  2,000+ parameters (non-linear, factorially larger than linear form) ●  8,000+ queries that are regularly tuned ●  Early on we spent many days hand tuning…
  • 12. SEARCH PIPELINE (ONLINE) Index Top-k retrieval User Query People Commodities News Other Sources ReRanking Model Top-k reranked Top-x retrieval x >> k
  • 15. TRAINING DATA: IMPLICIT VS EXPLICIT What is explicit data? ●  A set of judges will assess the search results manually given a query ─  Experts ─  Crowd What is implicit data? ●  Infer user preferences based on user behavior ─  Aggregated results clicks ─  Query reformulation ─  Dwell time Pros: ─  Data is very clean Cons: ─  Can be very expensive! Pros: ─  A lot of data! Cons: ─  Extremely noisy ─  Privacy concerns
  • 17. FEATURES ●  A feature is an individual measurable property ●  Given a query, and a collection we can produce many features for each document in the collection ─  If the query matches the title ─  Length of the document ─  Number of views ─  How old is it? ─  Can be visualized on a mobile device?
  • 18. FEATURES Extract “features” Was the result a cofounder? 0 Features are signals that give an indication of a result’s importance
  • 19. FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the document have an exec. position? 1 Query : APPL US
  • 20. FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the query match the document title? 0 Does the document have an exec. position? 1
  • 21. FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the query match the document title? 0 Does the document have an exec. position? 1 Popularity (%) 0.9
  • 22. FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the query match the document title? 1 Does the document have an exec. position? 0 Popularity (%) 0.6
  • 24. METRICS How do we know if our model is doing better? ●  Offline metrics ─  Precision/Recall/F1 score ─  nDCG (Normalized Discount Cumulative Gain) ─  Other metrics (e.g., ERR, MAP, …) ●  Online Metrics ─  Click through rates à higher ─  Time to first click à lower ─  Interleaving1 1O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Science, 30(1), 2012.
  • 26. LEARNING TO RANK ●  Learn how to combine the features for optimizing one or more metrics ●  Many learning algorithms ─  RankSVM1 ─  LambdaMART2 ─  … 1T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. 2C.J.C. Burges, "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR- TR-2010-82, 2010.
  • 30. SEARCH PIPELINE: SOLR INTEGRATION Index Top-k retrieval User Query Solr Ranking ModelOnline Top-x reranked People Commodities News Other Sources
  • 31. SOLR RELEVANCY ●  Pros ─  Simple and quick scoring computation ─  Phrase matching ─  Function query boosting on time, distance, popularity, etc ─  Customized fields for stemming, synonyms, etc ●  Cons ─  Lots of manual time for creating a well tuned query ─  Weights are brittle, and may not be compatible in the future with more documents or fields added
  • 32. LTR PLUGIN: GOALS ●  Don’t tune the relevancy manually! ─  Uses machine learning to power automatic relevancy tuning ●  Significant relevancy improvements ●  Allow comparable scores across collections ─  Collections of different sizes ●  Maintaining low latency ─  Re-use the vast Solr search functionality that is already built-in ─  Less data transport ●  Makes it simple to use domain knowledge to rapidly create features ─  Features are no longer coded but rather scripted
  • 33. STANDARD SOLR SEARCH REQUEST Index Top-k retrieval User Query People Commodities News Other Sources
  • 34. Index STANDARD SOLR SEARCH REQUEST Index [10 Million] Top-10 retrieval User Query Matches [10k] Score [10k] Solr Query People Commodities News Other Sources
  • 35. LTR SOLR SEARCH REQUEST Index [10 Million] Top-1000 retrieval User Query Matches [10k] Score [10k] Ranking Model Top-10 reranked Solr Query LTR Query People Commodities News Other Sources
  • 36. <!-- Query parser used to rerank top docs with a provided model -->   <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />   LTR PLUGIN: RERANKING ●  LTRQuery extends Solr’s RankQuery ─  Wraps main query to fetch initial results ─  Returns custom TopDocsCollector for reranked ordered results ●  Solr rerank request parameter rq={!ltr model=myModel1 reRankDocs=100 efi.user_query=‘james’ efi.my_var=123} ─  !ltr – name used in the solrconfig.xml for the LTRQParserPlugin ─  model – name of deployed model to use for reranking ─  reRankDocs – total number of documents to rerank ─  efi.* – custom parameters used to pass external feature information for your features to use •  Query intent •  Personalization
  • 37. SEARCH PIPELINE (ONLINE) Index [10 Million] Top-1000 retrieval User Query Matches [10k] Score [10k] Ranking Model Top-10 reranked Feature Extraction People Commodities News Other Sources
  • 38. {          "name":    "Tim  Cook",          "primary_position":    "ceo",          "category  ":    "person",          …   }   FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the query match the document title? 0 Does the document have an exec. position? 1 Popularity (%) 0.9
  • 40. [          {                  "name":    "isPersonAndExecutive",                  "type":  "org.apache.solr.ltr.feature.impl.SolrFeature",                  "params":  {                          "fq":  [                                  "{!terms  f=category}person",                                  "{!terms  f=primary_position}ceo,  cto,  cfo,  president"                          ]                  }          },          …   ]   LTR PLUGIN: FEATURES AFTER
  • 41. LTR PLUGIN: FUNCTION QUERIES [          {                  "name":    "documentRecency",                  "type":  "org.apache.solr.ltr.feature.impl.SolrFeature",                  "params":  {                          "q":  "{!func}recip(  ms(NOW,publish_date),  3.16e-­‐11,  1,  1)"                  }          },          …   ]     1  for  docs  dated  now,  1/2  for  docs  dated  1  year  ago,  1/3  for  docs  dated  2  years  ago,  etc..     See  http://wiki.apache.org/solr/FunctionQuery#Date_Boosting  
  • 42. LTR PLUGIN: FEATURE STORE ●  FeatureStore is a Solr Managed Resource ─  REST API endpoint for performing CRUD operations on Solr objects ─  Stored in maintained in Zookeeper ●  Deploy ─  curl -XPUT 'http://yoursolrserver/solr/collection/config/fstore' --data-binary @./features.json -H 'Content-type:application/json' ●  View ─  http://yoursolrserver/solr/collection/config/fstore
  • 43. LTR PLUGIN: FEATURES ●  Simplifies feature engineering through configuration file ●  Utilizes rich search functionality built-in to Solr ─  Phrase matching ─  Synonyms, Stemming, etc ●  Inherit the Feature class for specialized features
  • 44. SEARCH PIPELINE (ONLINE) Index [10 Million] Top-1000 retrieval User Query Matches [10k] Score [10k] Ranking Model Top-10 reranked Feature Extraction People Commodities News Other Sources
  • 45. TRAINING PIPELINE (OFFLINE) Index [10 Million] Top-1000 retrieval Training Queries Matches [10k] Score [10k] Feature Extraction Learning Algorithm Ranking Model People Commodities News Other Sources
  • 46. {          "name":    "Tim  Cook",          "primary_position":    "ceo",          "category  ":    "person",          …   }   FEATURES Extract “features” Features are signals that give an indication of a result’s importance Was the result a cofounder? 0 Does the query match the document title? 0 Does the document have an exec. position? 1 Popularity (%) 0.9
  • 47. <!-- Document transformer adding feature vectors with each retrieved document -->   <transformer name="fv" class= "org.apache.solr.ltr.ranking.LTRFeatureTransformer" />   LTR PLUGIN: FEATURE EXTRACTION ●  Feature extraction uses Solr’s TransformerFactory ─  Returns a custom field with each document ●  fl = *, [fv] {          "name":    "Tim  Cook",          "primary_position":    "ceo",          "category  ":    "person",          …          "[fv]":    "isCofounder:0.0,  isPersonAndExecutive:1.0,  matchTitle:0.0,  popularity:0.9"   }  
  • 48. LTR PLUGIN: MODEL{          "type":  "org.apache.solr.ltr.ranking.LambdaMARTModel",          "name":  "mymodel1",          "features":  [                  {  "name":  "matchedTitle"},                  {  "name":  "isPersonAndExecutive"}          ],          "params":  {                  "trees":  [                          {                                  "weight":  1,                                  "tree":  {                                          "feature":  "matchedTitle",                                          "threshold":  0.5,                                          "left":  {  "value":  -­‐100  },                                          "right":  {                                                  "feature":  "isPersonAndExecutive",                                                  "threshold":  0.5,                                                  "left":  {  "value":  50  },                                                  "right":  {  "value":  75  }                                          }                                  }                          }                  ]          }   }  
  • 49. LTR PLUGIN: MODEL ●  ModelStore is also a Solr Managed Resource ●  Deploy ─  curl -XPUT 'http://yoursolrserver/solr/collection/config/mstore' --data-binary @./model.json -H 'Content-type:application/json' ●  View ─  http://yoursolrserver/solr/collection/config/mstore ●  Inherit from the model class for new scoring algorithms ─  score() ─  explain()
  • 50. LTR PLUGIN: EVALUATION ●  Offline Metrics ─  nDCG increased approximately 10% after reranking ●  Online Metrics ─  Clicks @ 1 up by approximately 10%
  • 51. BEFORE AND AFTER Query: “unemployment” Solr Ranking Machine Learned Reranking
  • 52. LTR PLUGIN: EVALUATION ●  Offline Metrics ─  nDCG increased approximately 10% after reranking ●  Online Metrics ─  Clicks @ 1 up by approximately 10% ●  Performance ─  About 30% faster than previous external ranking system 10 million documents in collection 100k queries 1k features 1k documents/query reranked
  • 53. LTR PLUGIN: BENEFITS ●  Simpler feature engineering, without compiling ●  Access to rich internal Solr search functionality for feature building ●  Search result relevancy improvements vs regular Solr relevance ●  Automatic relevancy tuning ●  Compatible scores across collections ●  Performance benefits vs external ranking system
  • 54. FUTURE WORK ●  Continue work to open source the plugin ●  Support pipelining multiple reranking models ●  Allow a simple ranking model to be used in the first pass