SlideShare a Scribd company logo
1 of 18
Elasticsearch
@ ShopWiki
What is ShopWiki?
• ShopWiki is the retail division of Oversee.net.
• We run a collection of retail websites,
• Including the Comparison Shopping Engines (CSE)
– ShopWiki.com
– Compare.com
How do we use Elasticsearch?
• You know, for search (not logging).
• We index millions of products, offered from
hundreds of thousands of stores, and allow
users to search them.
Why Elasticsearch?
• ShopWiki was built using a proprietary search
server written in C++.
• Served us well for many years, but it needed
improvements, especially for non-English
language search.
• What about Lucene-based solutions?
Solr3
• We tried out Solr3 when building
CouponFinder.com.
• Solr worked well (for English & French), but
the coupon dataset is small in comparison to
our product dataset.
• The setup was simple master-slave replication.
How do we scale?
• To use Solr for our product data we needed to
shard the data across multiple machines.
• But, Solr3’s sharding capabilities were clunky
and difficult to use.
• Enter Elasticsearch!
• Designed to scale out-of-the-box.
Compare.com
• Compare.com was built using Elasticsearch
from the start.
• Allowed us to get up & running very quickly.
• Allowed us to scale up very quickly.
– 60 million products and growing.
• Allows us iterate on new features quickly.
Other Languages
• ShopWiki search is being gradually ported to
Elasticsearch.
• Allows us to have better non-English search
right out-of-the-box.
– French
– German
– Dutch
– Spanish
Our Elasticsearch Cluster
• 12 indices, one for each website.
• 3 replicas per shard.
• 3 master nodes (quorum of 2).
• 6 data nodes.
• Plan to add more data nodes as we proceed with
our migration of ShopWiki (500m products).
• Expect to need less hardware than the C++.
cluster (uses 50+ machines).
Elasticsearch Head
Realtime Updates
• C++ search servers need to have the entire
dataset re-indexed and swapped out all at
once.
• Could only do this oncea day, at night (affects
performance).
• With Elasticsearch, we can update our data all
the time (it’s not even a limiting factor).
Challenges
• Use TermsFacet to suggest filters to the user.
• E.g. filter by stores or brands.
• Using the 10 most frequent brands from a
search can produce bad results.
– A single brand may have lots of products that are
all weakly relevant.
Top-N Faceting
• The solution in Solr is to limit facets to the
top-N results.
• Elasticsearch doesn’t have this feature (as
mentioned at last Meetup).
• Solution: TermsStatsFacet(AKA aggregations in 1.0)
• Allows us to get the brands/stores with the
most relevant results.
• E.g. Σ(scoren) n allows us to tune facet results to our liking
N = 0 (same as count)
TermsStatsFacet for Brands
Query: “mixing bowl”
Σ(scoren)
N = 4
De-duping Products
• Use “more_like_this” query to find similar
products.
• If result’s score is “high enough”, it’s likely the
same product from a different store.
• “High enough” is defined as a fraction of the
identity match’s score.
• Questions?
• Rob Stewart
• Lead Software Engineer
• rstewart@shopwiki.com

More Related Content

What's hot

Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
Soren Macbeth
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
Amazon Web Services
 
Amazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services
 
Rp Nmoore
Rp NmooreRp Nmoore
Rp Nmoore
nicat98
 

What's hot (15)

Surviving Hadoop on AWS
Surviving Hadoop on AWSSurviving Hadoop on AWS
Surviving Hadoop on AWS
 
AWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWSAWS Customer Presenatation - SlingMedia uses AWS
AWS Customer Presenatation - SlingMedia uses AWS
 
How to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on JavaHow to reduce hosting costs for Redis based applications on Java
How to reduce hosting costs for Redis based applications on Java
 
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4 AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
AWS Enterprise Summit London 2013 - Bob Harris - Channel 4
 
Recover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle binRecover from accidental deletions of your snapshots using recycle bin
Recover from accidental deletions of your snapshots using recycle bin
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
 
Aem asset optimizations & best practices
Aem asset optimizations & best practicesAem asset optimizations & best practices
Aem asset optimizations & best practices
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 
Amazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for HomeAmazon Web Services Customer Case Study, Fashion for Home
Amazon Web Services Customer Case Study, Fashion for Home
 
AWS Customer Presentation - Zynga
AWS Customer Presentation - ZyngaAWS Customer Presentation - Zynga
AWS Customer Presentation - Zynga
 
Scalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosScalable Eventing Over Apache Mesos
Scalable Eventing Over Apache Mesos
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS Cloud
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
AEM - Key Learning from Escalations
AEM - Key Learning from EscalationsAEM - Key Learning from Escalations
AEM - Key Learning from Escalations
 
Rp Nmoore
Rp NmooreRp Nmoore
Rp Nmoore
 

Similar to Elasticsearch @ ShopWiki 2014-03-20

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 

Similar to Elasticsearch @ ShopWiki 2014-03-20 (20)

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
Building WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdfBuilding WordPress eCommerce at Scale .pdf
Building WordPress eCommerce at Scale .pdf
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Elasticsearch @ ShopWiki 2014-03-20

  • 2. What is ShopWiki? • ShopWiki is the retail division of Oversee.net. • We run a collection of retail websites, • Including the Comparison Shopping Engines (CSE) – ShopWiki.com – Compare.com
  • 3.
  • 4.
  • 5. How do we use Elasticsearch? • You know, for search (not logging). • We index millions of products, offered from hundreds of thousands of stores, and allow users to search them.
  • 6. Why Elasticsearch? • ShopWiki was built using a proprietary search server written in C++. • Served us well for many years, but it needed improvements, especially for non-English language search. • What about Lucene-based solutions?
  • 7. Solr3 • We tried out Solr3 when building CouponFinder.com. • Solr worked well (for English & French), but the coupon dataset is small in comparison to our product dataset. • The setup was simple master-slave replication.
  • 8. How do we scale? • To use Solr for our product data we needed to shard the data across multiple machines. • But, Solr3’s sharding capabilities were clunky and difficult to use. • Enter Elasticsearch! • Designed to scale out-of-the-box.
  • 9. Compare.com • Compare.com was built using Elasticsearch from the start. • Allowed us to get up & running very quickly. • Allowed us to scale up very quickly. – 60 million products and growing. • Allows us iterate on new features quickly.
  • 10. Other Languages • ShopWiki search is being gradually ported to Elasticsearch. • Allows us to have better non-English search right out-of-the-box. – French – German – Dutch – Spanish
  • 11. Our Elasticsearch Cluster • 12 indices, one for each website. • 3 replicas per shard. • 3 master nodes (quorum of 2). • 6 data nodes. • Plan to add more data nodes as we proceed with our migration of ShopWiki (500m products). • Expect to need less hardware than the C++. cluster (uses 50+ machines).
  • 13. Realtime Updates • C++ search servers need to have the entire dataset re-indexed and swapped out all at once. • Could only do this oncea day, at night (affects performance). • With Elasticsearch, we can update our data all the time (it’s not even a limiting factor).
  • 14. Challenges • Use TermsFacet to suggest filters to the user. • E.g. filter by stores or brands. • Using the 10 most frequent brands from a search can produce bad results. – A single brand may have lots of products that are all weakly relevant.
  • 15. Top-N Faceting • The solution in Solr is to limit facets to the top-N results. • Elasticsearch doesn’t have this feature (as mentioned at last Meetup). • Solution: TermsStatsFacet(AKA aggregations in 1.0) • Allows us to get the brands/stores with the most relevant results. • E.g. Σ(scoren) n allows us to tune facet results to our liking
  • 16. N = 0 (same as count) TermsStatsFacet for Brands Query: “mixing bowl” Σ(scoren) N = 4
  • 17. De-duping Products • Use “more_like_this” query to find similar products. • If result’s score is “high enough”, it’s likely the same product from a different store. • “High enough” is defined as a fraction of the identity match’s score.
  • 18. • Questions? • Rob Stewart • Lead Software Engineer • rstewart@shopwiki.com

Editor's Notes

  1. Similar functionality.Different business models (SEO vs SEM).ShopWiki.com was first.
  2. Long tail shopping.
  3. CouponFinder.com is coupon search website.
  4. Compare.com launchedSeptember, 2012.
  5. shopwiki.com, shopwiki.co.ukshopwiki.frshopwiki.deshopwiki.nlshopwiki.es