SlideShare a Scribd company logo
Solr Payloads for Ranking Data
Soubhik
Search Quality, BloomReach, SNAP
Outline
● Ranking Data in External File
● Issues
● Payloads
● Benefits
Primary Contributors:
● Renuka Khandelwal, Ricardo Shih, Parag Agrawal
Ranking Data
Ranking Data
● computed offline
● a score is attached to terms in a product
● stored in an external data file
● a Solr FunctionQuery [2] is used to read the score from the file and apply in the
ranking equation
○ loads the data from the file to a Java HashMap<String, Float>
Issues
Issues
● collection specific ranking data
○ 500 MB memory for one collection
Issues
● collection specific ranking data
○ 500 MB memory for one collection
● 6 data centers, 6 replicas : 18 GB for one collection
Issues
● collection specific ranking data
○ 500 MB memory for one collection
● 6 data centers, 6 replicas : 18 GB for one collection
● in a multi-sharded collection: X number of shards
Issues
● collection specific ranking data
○ 500 MB memory for one collection
● 6 data centers, 6 replicas : 18 GB for one collection
● in a multi-sharded collection: X number of shards
● X number of merchants (~75)
Issues
● collection specific ranking data
○ 500 MB memory for one collection
● 6 data centers, 6 replicas : 18 GB for one collection
● in a multi-sharded collection: X number of shards
● X number of merchants (~75)
● reload: 2X
Issues
● collection specific ranking data
○ 500 MB memory for one collection
● 6 data centers, 6 replicas : 18 GB for one collection
● in a multi-sharded collection: X number of shards
● X number of merchants (~75)
● reload: 2X
● what about A/B test?
Payloads
Payloads
A payload is an arbitrary data that can be attached to an indexed term. It is stored as a
byte array.
● the precomputed ranking score is stored as payloads
● during scoring, the stored payload is used instead of the TF-IDF
Benefits
● better memory utilization
● better memory utilization
○ indexed terms are not replicated in memory
● better memory utilization
○ indexed terms are not replicated in memory
○ saved the overhead of Java HashMap: the String and Map.Entry objects
■ for English words, the average overhead of String is 3X
● better memory utilization
○ indexed terms are not replicated in memory
○ saved the overhead of Java HashMap: the String and Map.Entry objects
■ for English words, the average overhead of String is 3X
○ only those terms are stored that are present in index of a given shard
● better memory utilization
○ indexed terms are not replicated in memory
○ saved the overhead of Java HashMap: the String and Map.Entry objects
■ for English words, the average overhead of String is 3X
○ only those terms are stored that are present in index of a given shard
○ 60% reduction in JVM memory utilization
● better memory utilization
○ indexed terms are not replicated in memory
○ saved the overhead of Java HashMap: the String and Map.Entry objects
■ for English words, the average overhead of String is 3X
○ only those terms are stored that are present in index of a given shard
○ 60% reduction in JVM memory utilization
● better memory utilization
○ indexed terms are not replicated in memory
○ saved the overhead of Java HashMap: the String and Map.Entry objects
■ for English words, the average overhead of String is 3X
○ only those terms are stored that are present in index of a given shard
○ 60% reduction in JVM memory utilization
● no additional reloads
● A/B Testability
● use ranking data from field1 or field2 depending on the request parameters
field2
field1 Algo1
field2
field1
Algo2
References
1. Working with External Files and Processes
2. FunctionQuery
3. Payloads
EFF
EFF (External File Field)
● a schema fieldtype that allows the data to be read from a file outside of the index
[1] e.g.
the value of the field is read from a file external_category_rank.txt
<fieldType name="rankingData" keyField="category" defVal="0" stored="false" indexed="false" class="solr.
ExternalFileField" valType="pfloat"/>
<field name="category_rank" type="rankingData" />
fashion=2.2
nike=0.366

More Related Content

What's hot

Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
SeedRocket
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
ScaleGrid.io
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
Redis - Your Magical superfast database
Redis - Your Magical superfast databaseRedis - Your Magical superfast database
Redis - Your Magical superfast database
the100rabh
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
bigdatasyd
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
ArangoDB Database
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
vishnu rao
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
Eric Evans
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
Bas van Oudenaarde
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
Alaa Elhadba
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
GitHubGraph
GitHubGraphGitHubGraph
GitHubGraph
ronaknnatnani
 
Vegas ES
Vegas ESVegas ES
Vegas ES
Alaa Elhadba
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rb
Fumihiro Kato
 
Redis IU
Redis IURedis IU
Redis IU
Isaiah Edem
 
Parallel RDF generation of heterogeneous Big Data sources
Parallel RDF generation of heterogeneous Big Data sourcesParallel RDF generation of heterogeneous Big Data sources
Parallel RDF generation of heterogeneous Big Data sources
ssuserf3a67c
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
Alex Pinkin
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
CIARD Movement
 
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
Sarmad Baloch
 

What's hot (20)

Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Redis - Your Magical superfast database
Redis - Your Magical superfast databaseRedis - Your Magical superfast database
Redis - Your Magical superfast database
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
 
GitHubGraph
GitHubGraphGitHubGraph
GitHubGraph
 
Vegas ES
Vegas ESVegas ES
Vegas ES
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rb
 
Redis IU
Redis IURedis IU
Redis IU
 
Parallel RDF generation of heterogeneous Big Data sources
Parallel RDF generation of heterogeneous Big Data sourcesParallel RDF generation of heterogeneous Big Data sources
Parallel RDF generation of heterogeneous Big Data sources
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
Pakistan International cricket stadiums in ADVANCE DATABASE Managment System ...
 

Viewers also liked

Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
Ekta Grover
 
REDRAFT - Resume Martin Firth 2017-01-03
REDRAFT - Resume Martin Firth 2017-01-03REDRAFT - Resume Martin Firth 2017-01-03
REDRAFT - Resume Martin Firth 2017-01-03
Martin Firth
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsLearning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
BloomReach
 
Neiman Marcus Case Study
Neiman Marcus Case StudyNeiman Marcus Case Study
Neiman Marcus Case Study
BloomReach
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
Javier González
 
Big Data Marketing - 5th Click Conference
Big Data Marketing - 5th Click ConferenceBig Data Marketing - 5th Click Conference
Big Data Marketing - 5th Click Conference
BloomReach
 
HTML5 Animation in Mobile Web Games
HTML5 Animation in Mobile Web GamesHTML5 Animation in Mobile Web Games
HTML5 Animation in Mobile Web Games
livedoor
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
Nitin Sharma
 

Viewers also liked (8)

Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
Facilitating product discovery in e-commerce inventory, The Fifth elephant, 2016
 
REDRAFT - Resume Martin Firth 2017-01-03
REDRAFT - Resume Martin Firth 2017-01-03REDRAFT - Resume Martin Firth 2017-01-03
REDRAFT - Resume Martin Firth 2017-01-03
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsLearning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
 
Neiman Marcus Case Study
Neiman Marcus Case StudyNeiman Marcus Case Study
Neiman Marcus Case Study
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
Big Data Marketing - 5th Click Conference
Big Data Marketing - 5th Click ConferenceBig Data Marketing - 5th Click Conference
Big Data Marketing - 5th Click Conference
 
HTML5 Animation in Mobile Web Games
HTML5 Animation in Mobile Web GamesHTML5 Animation in Mobile Web Games
HTML5 Animation in Mobile Web Games
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 

Similar to Solr Payloads for Ranking Data

MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB plc
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
Vinoth Chandar
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
bangaloredjangousergroup
 
Storage talk
Storage talkStorage talk
Storage talk
christkv
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topics
Cube Solutions
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Harsh Thakkar
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
MongoDB
 
Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
arungansi
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
Prabhat gangwar
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
Scott Hernandez
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
Jihoon Son
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
Vinicius M Grippa
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
Michael Spector
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
felixbarny
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
Imply
 

Similar to Solr Payloads for Ranking Data (20)

MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Storage talk
Storage talkStorage talk
Storage talk
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topics
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)MongoDB Datacenter Awareness (mongosf2012)
MongoDB Datacenter Awareness (mongosf2012)
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 

More from BloomReach

Solr Schema in Multi Tenant Platform
Solr Schema in Multi Tenant PlatformSolr Schema in Multi Tenant Platform
Solr Schema in Multi Tenant Platform
BloomReach
 
Internet Retailer Mobile Webinar
Internet Retailer Mobile WebinarInternet Retailer Mobile Webinar
Internet Retailer Mobile Webinar
BloomReach
 
Making Mobile Exceed Expectations
Making Mobile Exceed ExpectationsMaking Mobile Exceed Expectations
Making Mobile Exceed Expectations
BloomReach
 
Marketing Trends: Search, Social, Shopping & Mobile
Marketing Trends: Search, Social, Shopping & MobileMarketing Trends: Search, Social, Shopping & Mobile
Marketing Trends: Search, Social, Shopping & Mobile
BloomReach
 
The Secret to Great Search Campaigns
The Secret to Great Search CampaignsThe Secret to Great Search Campaigns
The Secret to Great Search Campaigns
BloomReach
 
Maximize ROI by Unlocking the Full Value of Undiscovered Content
Maximize ROI by Unlocking the Full Value of Undiscovered ContentMaximize ROI by Unlocking the Full Value of Undiscovered Content
Maximize ROI by Unlocking the Full Value of Undiscovered Content
BloomReach
 
Creating Great Landing Pages
Creating Great Landing PagesCreating Great Landing Pages
Creating Great Landing Pages
BloomReach
 
Big Data Marketing
Big Data MarketingBig Data Marketing
Big Data Marketing
BloomReach
 
Guess Case Study
Guess Case StudyGuess Case Study
Guess Case Study
BloomReach
 
Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)
BloomReach
 
BloomReach Customer Success Stories
BloomReach Customer Success StoriesBloomReach Customer Success Stories
BloomReach Customer Success Stories
BloomReach
 
How to grab eCommerce by the Long Tail
How to grab eCommerce by the Long TailHow to grab eCommerce by the Long Tail
How to grab eCommerce by the Long Tail
BloomReach
 
Ampush Case Study
Ampush Case StudyAmpush Case Study
Ampush Case Study
BloomReach
 
Wayfair Case Study
Wayfair Case StudyWayfair Case Study
Wayfair Case Study
BloomReach
 
Getting found a zillion times
Getting found a zillion timesGetting found a zillion times
Getting found a zillion times
BloomReach
 

More from BloomReach (15)

Solr Schema in Multi Tenant Platform
Solr Schema in Multi Tenant PlatformSolr Schema in Multi Tenant Platform
Solr Schema in Multi Tenant Platform
 
Internet Retailer Mobile Webinar
Internet Retailer Mobile WebinarInternet Retailer Mobile Webinar
Internet Retailer Mobile Webinar
 
Making Mobile Exceed Expectations
Making Mobile Exceed ExpectationsMaking Mobile Exceed Expectations
Making Mobile Exceed Expectations
 
Marketing Trends: Search, Social, Shopping & Mobile
Marketing Trends: Search, Social, Shopping & MobileMarketing Trends: Search, Social, Shopping & Mobile
Marketing Trends: Search, Social, Shopping & Mobile
 
The Secret to Great Search Campaigns
The Secret to Great Search CampaignsThe Secret to Great Search Campaigns
The Secret to Great Search Campaigns
 
Maximize ROI by Unlocking the Full Value of Undiscovered Content
Maximize ROI by Unlocking the Full Value of Undiscovered ContentMaximize ROI by Unlocking the Full Value of Undiscovered Content
Maximize ROI by Unlocking the Full Value of Undiscovered Content
 
Creating Great Landing Pages
Creating Great Landing PagesCreating Great Landing Pages
Creating Great Landing Pages
 
Big Data Marketing
Big Data MarketingBig Data Marketing
Big Data Marketing
 
Guess Case Study
Guess Case StudyGuess Case Study
Guess Case Study
 
Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)Anatomy of a Big Data Application (BDA)
Anatomy of a Big Data Application (BDA)
 
BloomReach Customer Success Stories
BloomReach Customer Success StoriesBloomReach Customer Success Stories
BloomReach Customer Success Stories
 
How to grab eCommerce by the Long Tail
How to grab eCommerce by the Long TailHow to grab eCommerce by the Long Tail
How to grab eCommerce by the Long Tail
 
Ampush Case Study
Ampush Case StudyAmpush Case Study
Ampush Case Study
 
Wayfair Case Study
Wayfair Case StudyWayfair Case Study
Wayfair Case Study
 
Getting found a zillion times
Getting found a zillion timesGetting found a zillion times
Getting found a zillion times
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Solr Payloads for Ranking Data

  • 1. Solr Payloads for Ranking Data Soubhik Search Quality, BloomReach, SNAP
  • 2. Outline ● Ranking Data in External File ● Issues ● Payloads ● Benefits Primary Contributors: ● Renuka Khandelwal, Ricardo Shih, Parag Agrawal
  • 4. Ranking Data ● computed offline ● a score is attached to terms in a product ● stored in an external data file ● a Solr FunctionQuery [2] is used to read the score from the file and apply in the ranking equation ○ loads the data from the file to a Java HashMap<String, Float>
  • 6. Issues ● collection specific ranking data ○ 500 MB memory for one collection
  • 7. Issues ● collection specific ranking data ○ 500 MB memory for one collection ● 6 data centers, 6 replicas : 18 GB for one collection
  • 8. Issues ● collection specific ranking data ○ 500 MB memory for one collection ● 6 data centers, 6 replicas : 18 GB for one collection ● in a multi-sharded collection: X number of shards
  • 9. Issues ● collection specific ranking data ○ 500 MB memory for one collection ● 6 data centers, 6 replicas : 18 GB for one collection ● in a multi-sharded collection: X number of shards ● X number of merchants (~75)
  • 10. Issues ● collection specific ranking data ○ 500 MB memory for one collection ● 6 data centers, 6 replicas : 18 GB for one collection ● in a multi-sharded collection: X number of shards ● X number of merchants (~75) ● reload: 2X
  • 11. Issues ● collection specific ranking data ○ 500 MB memory for one collection ● 6 data centers, 6 replicas : 18 GB for one collection ● in a multi-sharded collection: X number of shards ● X number of merchants (~75) ● reload: 2X ● what about A/B test?
  • 13. Payloads A payload is an arbitrary data that can be attached to an indexed term. It is stored as a byte array. ● the precomputed ranking score is stored as payloads ● during scoring, the stored payload is used instead of the TF-IDF
  • 15. ● better memory utilization
  • 16. ● better memory utilization ○ indexed terms are not replicated in memory
  • 17. ● better memory utilization ○ indexed terms are not replicated in memory ○ saved the overhead of Java HashMap: the String and Map.Entry objects ■ for English words, the average overhead of String is 3X
  • 18. ● better memory utilization ○ indexed terms are not replicated in memory ○ saved the overhead of Java HashMap: the String and Map.Entry objects ■ for English words, the average overhead of String is 3X ○ only those terms are stored that are present in index of a given shard
  • 19. ● better memory utilization ○ indexed terms are not replicated in memory ○ saved the overhead of Java HashMap: the String and Map.Entry objects ■ for English words, the average overhead of String is 3X ○ only those terms are stored that are present in index of a given shard ○ 60% reduction in JVM memory utilization
  • 20. ● better memory utilization ○ indexed terms are not replicated in memory ○ saved the overhead of Java HashMap: the String and Map.Entry objects ■ for English words, the average overhead of String is 3X ○ only those terms are stored that are present in index of a given shard ○ 60% reduction in JVM memory utilization
  • 21. ● better memory utilization ○ indexed terms are not replicated in memory ○ saved the overhead of Java HashMap: the String and Map.Entry objects ■ for English words, the average overhead of String is 3X ○ only those terms are stored that are present in index of a given shard ○ 60% reduction in JVM memory utilization ● no additional reloads
  • 22. ● A/B Testability ● use ranking data from field1 or field2 depending on the request parameters field2 field1 Algo1 field2 field1 Algo2
  • 24. 1. Working with External Files and Processes 2. FunctionQuery 3. Payloads
  • 25. EFF
  • 26. EFF (External File Field) ● a schema fieldtype that allows the data to be read from a file outside of the index [1] e.g. the value of the field is read from a file external_category_rank.txt <fieldType name="rankingData" keyField="category" defVal="0" stored="false" indexed="false" class="solr. ExternalFileField" valType="pfloat"/> <field name="category_rank" type="rankingData" /> fashion=2.2 nike=0.366