SlideShare a Scribd company logo
LEAN RANKING INFRASTRUCTURE WITH SOLR
Ranking infrastructure and using Solr for lean prototyping
sergii.khomenko@stylight.com, @lc0d3r

Slide 1 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
AGENDA

1.

Problem definition

2.

Boosting

3.

Lean approach to ranking infrastructure

4.

Real-word examples

Slide 2 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LUCENE, SOLR, ELASTICSEARCH
SOLR USERS

Slide 4 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
STYLIGHT – THE BEST PLACE TO DISCOVER FASHION

Slide 5 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
GET INSPIRED BY LOOKS CREATED BY COMMUNITY

Slide 6 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
DISCOVER THOUSANDS OF BRANDS AND MILLIONS OF PRODUCTS

Slide 7 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
STYLIGHT – INTERNATIONAL COMMUNITY
Live in 13 countries

Slide 8 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
PROBLEM DEFINITION

Slide 9 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
PROBLEM DEFINITION

Slide 10 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
PROBLEM DEFINITION

Ranking specifics:
• Seasonal influence
• Trends
• Cold start of new countries, shops
• Multiple dimensions of ranking model

Slide 11 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE

Slide 12 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE
TF-IDF - default scoring model in Lucene/Solr

•
•
•

matching more query terms is better
more occurrences of a query term is better
more novel terms increase doc score more than common terms

Slide 13 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE

• Stages to improve relevance in Solr
• Editorial voting (QueryEvaluationComponent)
• Indexing time
(analyzing content, text analysis)

• Query-time
(function queries, boosting)

Slide 14 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE
Solr queries
q = +brand:adidas shop:monshowroom^3
q = +adidas monshowroom
defType = dismax
qf = brand shop^3
sort = user_ratings desc, score desc
qq = adidas
q = {!boost b=$b defType=dismax v=$qq}
b = prod(popularity, clicks)

Slide 15 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE
Definition of solr.ExternalFileField
<types>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="file_delta2" class="solr.ExternalFileField"
keyField="id" defVal="1.0" indexed="false" stored="false" valType="float"
/>
</types>
<fields>
<field name="delta2" type="file_delta2"/>
</fields>

Slide 16 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
IMPROVING RELEVANCE
Example of external file with boosting
coresde_DEproductsexternal_delta2.txt
15062471=0.5
15062479=0.2
15062507=0.41

Slide 17 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING

Slide 18 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING

Lean manufacturing, lean enterprise, or lean production, often simply,
"lean", is a production practice that considers the expenditure of resources for any goal
other than the creation of value for the end customer to be wasteful, and thus a target
for elimination.

Essentially, lean is centered on preserving value with less work.

Slide 19 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING

Requirements:
• Decreasing time to implement new ranking model
• Possibility to use more dynamic ranking models
• Keeping working infrastructure alive
• A/B testing without changing entire infrastructure
• Performance level - “still fast” and “transparent”

Slide 20 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING
Python benchmark
-h, --help
show this help message and exit
--gaid gaid, -g gaid Google analytics site id.
--gadate gadate a date to fetch the most popular pages from Google Analytics
–-solr solr, -s solr Solr server to benchmark performance.
--pages number, -p number a number of top pages from Google Analytics.
--repeats number, -r number a number of repeats for an every page.
--compare compare, -c compare Different rankings algorithms to compare.
--cmpmode CMPMODE run benchmark in comparison mode

python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2
python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2 --cmpmode 1

Slide 21 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING
Common search infrastructure

nginx

Solr-loadbalancer

Slide 22 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng

nginx

Solr

nginx

Jboss

Solr

Solr
LEAN APPROACH TO RANKING
Updated

nginx

Solr-loadbalancer

Front-end loadbalancer

Jboss

Slide 23 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng

Solr-loadbalancer

nginx

Solr

nginx

Jboss

Solr

Solr

nginx

Solr
LEAN APPROACH TO RANKING
nginx / templates / conf / solr-rewrites.conf.erb
include nginx
nginx::config { "solr_dev": }
nginx::solr-ranking { "delta2":
urls => [
"/search.action?gender=women&brand=2271&tag=1161&tag=877&tag=468",
"/search.action?gender=men&brand=11235&tag=10203&tag=10299&tag=10326"
],

Slide 24 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
LEAN APPROACH TO RANKING
nginx / templates / conf / solr-rewrites.conf.erb
<% urls.each do |url| -%>
if ($args ~* <% if url['gender'] > 0 -%>gender_id%3A<%= url['gender'] %>.*<% end %><% url['tags'].each do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><% if
url['brand'] > 0 -%>brand_id%3A%28<%= url['brand'] %>%29<% end -%>) {
set $orig $args;
set $args "q={!boost+b=%24b+defType=dismax+v=%24qq}&qq=id:*";
rewrite ^(.*)$ "$1?$orig" break;
}
<% end -%>

Slide 25 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
REAL-WORLD EXAMPLES

Slide 26 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
ELEPHANT-DRIVEN ARCHITECTURE
Multiple pieces to perform simple task

Slide 27 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
SIMPLIFIED VERSION
Less code – less bugs

Slide 28 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
REAL-WORLD EXAMPLES

Slide 29 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
REAL-WORLD EXAMPLES

Slide 30 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
REAL-WORLD EXAMPLES
Multiple points to evaluate

Stages to evaluate the model:
• R ranking model
• Independent Solr-node
• For internal use-cases
• Testing for some of pages
• A/B roll out for % of users
• Production roll out

Slide 31 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
THANKS FOR YOUR ATTENTION!
Questions?

Slide 32 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
Sergii Khomenko
Data Scientist
STYLIGHT GmbH
sergii.khomenko@stylight.com
@lc0d3r
http://www.stylight.com
Nymphenburger Straße 86
80636 Munich, Germany

Slide 33 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
REFERENCE LIST

• Stack Overflow Tag Trends
http://hewgill.com/~greg/stackoverflow/stack_overflow/tags/#!l
ucene+solr+elasticsearch+sphinx
• Public websites using Solr
http://wiki.apache.org/solr/PublicServers
• CommonQueryParameters
http://wiki.apache.org/solr/CommonQueryParameters
• Thoughts in plain text http://lc0.github.io/
• STYLIGHT Engineering http://www.stylight.com/Engineering/
FASHION FRIDAY

Slide 35 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng

More Related Content

Similar to Lean Ranking infrastructure with Solr

Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
Greg McMurray
 
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Lucian Precup
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Extending OutSystems with Javascript
Extending OutSystems with JavascriptExtending OutSystems with Javascript
Extending OutSystems with Javascript
RitaDias72
 
UI Realigning & Refactoring by Jina Bolton
UI Realigning & Refactoring by Jina BoltonUI Realigning & Refactoring by Jina Bolton
UI Realigning & Refactoring by Jina Bolton
Codemotion
 
Html5
Html5Html5
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?
Villu Ruusmann
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
Tobias Lindaaker
 
Web API Design 2013
Web API Design 2013Web API Design 2013
Web API Design 2013gidgreen
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
Bhavani Akunuri
 
Data Visualization with R
Data Visualization with RData Visualization with R
Data Visualization with R
Sergii Khomenko
 
Solid Works
Solid WorksSolid Works
Solid Works
Ashwin Shaji
 
Feed your inner data scientist. JS Visualization Tools
Feed your inner data scientist.  JS Visualization ToolsFeed your inner data scientist.  JS Visualization Tools
Feed your inner data scientist. JS Visualization Tools
Doug Mair
 
Lighting talk neo4j fosdem 2011
Lighting talk neo4j fosdem 2011Lighting talk neo4j fosdem 2011
Lighting talk neo4j fosdem 2011
Jordi Valverde
 
Rails MVC by Sergiy Koshovyi
Rails MVC by Sergiy KoshovyiRails MVC by Sergiy Koshovyi
Rails MVC by Sergiy Koshovyi
Pivorak MeetUp
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
Johann de Boer
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
Design patterns
Design patternsDesign patterns
Design patterns
nisheesh
 

Similar to Lean Ranking infrastructure with Solr (20)

Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
 
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Extending OutSystems with Javascript
Extending OutSystems with JavascriptExtending OutSystems with Javascript
Extending OutSystems with Javascript
 
UI Realigning & Refactoring by Jina Bolton
UI Realigning & Refactoring by Jina BoltonUI Realigning & Refactoring by Jina Bolton
UI Realigning & Refactoring by Jina Bolton
 
Html5
Html5Html5
Html5
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Web API Design 2013
Web API Design 2013Web API Design 2013
Web API Design 2013
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
Data Visualization with R
Data Visualization with RData Visualization with R
Data Visualization with R
 
Solid Works
Solid WorksSolid Works
Solid Works
 
Feed your inner data scientist. JS Visualization Tools
Feed your inner data scientist.  JS Visualization ToolsFeed your inner data scientist.  JS Visualization Tools
Feed your inner data scientist. JS Visualization Tools
 
Lighting talk neo4j fosdem 2011
Lighting talk neo4j fosdem 2011Lighting talk neo4j fosdem 2011
Lighting talk neo4j fosdem 2011
 
Rails MVC by Sergiy Koshovyi
Rails MVC by Sergiy KoshovyiRails MVC by Sergiy Koshovyi
Rails MVC by Sergiy Koshovyi
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Design patterns
Design patternsDesign patterns
Design patterns
 

More from Sergii Khomenko

Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...
Sergii Khomenko
 
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Sergii Khomenko
 
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Sergii Khomenko
 
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Sergii Khomenko
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Sergii Khomenko
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Sergii Khomenko
 
From simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with TableauFrom simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with Tableau
Sergii Khomenko
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-cases
Sergii Khomenko
 

More from Sergii Khomenko (8)

Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...
 
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
 
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
 
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
 
From simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with TableauFrom simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with Tableau
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-cases
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Lean Ranking infrastructure with Solr

  • 1. LEAN RANKING INFRASTRUCTURE WITH SOLR Ranking infrastructure and using Solr for lean prototyping sergii.khomenko@stylight.com, @lc0d3r Slide 1 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 2. AGENDA 1. Problem definition 2. Boosting 3. Lean approach to ranking infrastructure 4. Real-word examples Slide 2 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 4. SOLR USERS Slide 4 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 5. STYLIGHT – THE BEST PLACE TO DISCOVER FASHION Slide 5 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 6. GET INSPIRED BY LOOKS CREATED BY COMMUNITY Slide 6 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 7. DISCOVER THOUSANDS OF BRANDS AND MILLIONS OF PRODUCTS Slide 7 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 8. STYLIGHT – INTERNATIONAL COMMUNITY Live in 13 countries Slide 8 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 9. PROBLEM DEFINITION Slide 9 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 10. PROBLEM DEFINITION Slide 10 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 11. PROBLEM DEFINITION Ranking specifics: • Seasonal influence • Trends • Cold start of new countries, shops • Multiple dimensions of ranking model Slide 11 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 12. IMPROVING RELEVANCE Slide 12 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 13. IMPROVING RELEVANCE TF-IDF - default scoring model in Lucene/Solr • • • matching more query terms is better more occurrences of a query term is better more novel terms increase doc score more than common terms Slide 13 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 14. IMPROVING RELEVANCE • Stages to improve relevance in Solr • Editorial voting (QueryEvaluationComponent) • Indexing time (analyzing content, text analysis) • Query-time (function queries, boosting) Slide 14 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 15. IMPROVING RELEVANCE Solr queries q = +brand:adidas shop:monshowroom^3 q = +adidas monshowroom defType = dismax qf = brand shop^3 sort = user_ratings desc, score desc qq = adidas q = {!boost b=$b defType=dismax v=$qq} b = prod(popularity, clicks) Slide 15 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 16. IMPROVING RELEVANCE Definition of solr.ExternalFileField <types> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="file_delta2" class="solr.ExternalFileField" keyField="id" defVal="1.0" indexed="false" stored="false" valType="float" /> </types> <fields> <field name="delta2" type="file_delta2"/> </fields> Slide 16 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 17. IMPROVING RELEVANCE Example of external file with boosting coresde_DEproductsexternal_delta2.txt 15062471=0.5 15062479=0.2 15062507=0.41 Slide 17 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 18. LEAN APPROACH TO RANKING Slide 18 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 19. LEAN APPROACH TO RANKING Lean manufacturing, lean enterprise, or lean production, often simply, "lean", is a production practice that considers the expenditure of resources for any goal other than the creation of value for the end customer to be wasteful, and thus a target for elimination. Essentially, lean is centered on preserving value with less work. Slide 19 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 20. LEAN APPROACH TO RANKING Requirements: • Decreasing time to implement new ranking model • Possibility to use more dynamic ranking models • Keeping working infrastructure alive • A/B testing without changing entire infrastructure • Performance level - “still fast” and “transparent” Slide 20 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 21. LEAN APPROACH TO RANKING Python benchmark -h, --help show this help message and exit --gaid gaid, -g gaid Google analytics site id. --gadate gadate a date to fetch the most popular pages from Google Analytics –-solr solr, -s solr Solr server to benchmark performance. --pages number, -p number a number of top pages from Google Analytics. --repeats number, -r number a number of repeats for an every page. --compare compare, -c compare Different rankings algorithms to compare. --cmpmode CMPMODE run benchmark in comparison mode python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2 python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2 --cmpmode 1 Slide 21 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 22. LEAN APPROACH TO RANKING Common search infrastructure nginx Solr-loadbalancer Slide 22 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng nginx Solr nginx Jboss Solr Solr
  • 23. LEAN APPROACH TO RANKING Updated nginx Solr-loadbalancer Front-end loadbalancer Jboss Slide 23 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng Solr-loadbalancer nginx Solr nginx Jboss Solr Solr nginx Solr
  • 24. LEAN APPROACH TO RANKING nginx / templates / conf / solr-rewrites.conf.erb include nginx nginx::config { "solr_dev": } nginx::solr-ranking { "delta2": urls => [ "/search.action?gender=women&brand=2271&tag=1161&tag=877&tag=468", "/search.action?gender=men&brand=11235&tag=10203&tag=10299&tag=10326" ], Slide 24 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 25. LEAN APPROACH TO RANKING nginx / templates / conf / solr-rewrites.conf.erb <% urls.each do |url| -%> if ($args ~* <% if url['gender'] > 0 -%>gender_id%3A<%= url['gender'] %>.*<% end %><% url['tags'].each do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><% if url['brand'] > 0 -%>brand_id%3A%28<%= url['brand'] %>%29<% end -%>) { set $orig $args; set $args "q={!boost+b=%24b+defType=dismax+v=%24qq}&qq=id:*"; rewrite ^(.*)$ "$1?$orig" break; } <% end -%> Slide 25 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 26. REAL-WORLD EXAMPLES Slide 26 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 27. ELEPHANT-DRIVEN ARCHITECTURE Multiple pieces to perform simple task Slide 27 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 28. SIMPLIFIED VERSION Less code – less bugs Slide 28 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 29. REAL-WORLD EXAMPLES Slide 29 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 30. REAL-WORLD EXAMPLES Slide 30 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 31. REAL-WORLD EXAMPLES Multiple points to evaluate Stages to evaluate the model: • R ranking model • Independent Solr-node • For internal use-cases • Testing for some of pages • A/B roll out for % of users • Production roll out Slide 31 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 32. THANKS FOR YOUR ATTENTION! Questions? Slide 32 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 33. Sergii Khomenko Data Scientist STYLIGHT GmbH sergii.khomenko@stylight.com @lc0d3r http://www.stylight.com Nymphenburger Straße 86 80636 Munich, Germany Slide 33 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng
  • 34. REFERENCE LIST • Stack Overflow Tag Trends http://hewgill.com/~greg/stackoverflow/stack_overflow/tags/#!l ucene+solr+elasticsearch+sphinx • Public websites using Solr http://wiki.apache.org/solr/PublicServers • CommonQueryParameters http://wiki.apache.org/solr/CommonQueryParameters • Thoughts in plain text http://lc0.github.io/ • STYLIGHT Engineering http://www.stylight.com/Engineering/
  • 35. FASHION FRIDAY Slide 35 | STYLIGHT | Proud to bleed purple | @lc0d3r and @stylight_eng