Elasticsearch
@ ShopWiki
What is ShopWiki?
• ShopWiki is the retail division of Oversee.net.
• We run a collection of retail websites,
• Including ...
How do we use Elasticsearch?
• You know, for search (not logging).
• We index millions of products, offered from
hundreds ...
Why Elasticsearch?
• ShopWiki was built using a proprietary search
server written in C++.
• Served us well for many years,...
Solr3
• We tried out Solr3 when building
CouponFinder.com.
• Solr worked well (for English & French), but
the coupon datas...
How do we scale?
• To use Solr for our product data we needed to
shard the data across multiple machines.
• But, Solr3’s s...
Compare.com
• Compare.com was built using Elasticsearch
from the start.
• Allowed us to get up & running very quickly.
• A...
Other Languages
• ShopWiki search is being gradually ported to
Elasticsearch.
• Allows us to have better non-English searc...
Our Elasticsearch Cluster
• 12 indices, one for each website.
• 3 replicas per shard.
• 3 master nodes (quorum of 2).
• 6 ...
Elasticsearch Head
Realtime Updates
• C++ search servers need to have the entire
dataset re-indexed and swapped out all at
once.
• Could only...
Challenges
• Use TermsFacet to suggest filters to the user.
• E.g. filter by stores or brands.
• Using the 10 most frequen...
Top-N Faceting
• The solution in Solr is to limit facets to the
top-N results.
• Elasticsearch doesn’t have this feature (...
N = 0 (same as count)
TermsStatsFacet for Brands
Query: “mixing bowl”
Σ(scoren)
N = 4
De-duping Products
• Use “more_like_this” query to find similar
products.
• If result’s score is “high enough”, it’s likel...
• Questions?
• Rob Stewart
• Lead Software Engineer
• rstewart@shopwiki.com
Elasticsearch @ ShopWiki 2014-03-20
Elasticsearch @ ShopWiki 2014-03-20
Upcoming SlideShare
Loading in …5
×

Elasticsearch @ ShopWiki 2014-03-20

2,982 views

Published on

Slides from the NY Elasticsearch Meetup on May 20, 2014.
http://www.meetup.com/Elasticsearch-NY/events/170714812/
http://vimeo.com/90124531

Published in: Technology, Self Improvement
1 Comment
1 Like
Statistics
Notes
  • Nice Presentation. Good jobs!!!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,982
On SlideShare
0
From Embeds
0
Number of Embeds
2,302
Actions
Shares
0
Downloads
5
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide
  • Similar functionality.Different business models (SEO vs SEM).ShopWiki.com was first.
  • Long tail shopping.
  • CouponFinder.com is coupon search website.
  • Compare.com launchedSeptember, 2012.
  • shopwiki.com, shopwiki.co.ukshopwiki.frshopwiki.deshopwiki.nlshopwiki.es
  • Elasticsearch @ ShopWiki 2014-03-20

    1. 1. Elasticsearch @ ShopWiki
    2. 2. What is ShopWiki? • ShopWiki is the retail division of Oversee.net. • We run a collection of retail websites, • Including the Comparison Shopping Engines (CSE) – ShopWiki.com – Compare.com
    3. 3. How do we use Elasticsearch? • You know, for search (not logging). • We index millions of products, offered from hundreds of thousands of stores, and allow users to search them.
    4. 4. Why Elasticsearch? • ShopWiki was built using a proprietary search server written in C++. • Served us well for many years, but it needed improvements, especially for non-English language search. • What about Lucene-based solutions?
    5. 5. Solr3 • We tried out Solr3 when building CouponFinder.com. • Solr worked well (for English & French), but the coupon dataset is small in comparison to our product dataset. • The setup was simple master-slave replication.
    6. 6. How do we scale? • To use Solr for our product data we needed to shard the data across multiple machines. • But, Solr3’s sharding capabilities were clunky and difficult to use. • Enter Elasticsearch! • Designed to scale out-of-the-box.
    7. 7. Compare.com • Compare.com was built using Elasticsearch from the start. • Allowed us to get up & running very quickly. • Allowed us to scale up very quickly. – 60 million products and growing. • Allows us iterate on new features quickly.
    8. 8. Other Languages • ShopWiki search is being gradually ported to Elasticsearch. • Allows us to have better non-English search right out-of-the-box. – French – German – Dutch – Spanish
    9. 9. Our Elasticsearch Cluster • 12 indices, one for each website. • 3 replicas per shard. • 3 master nodes (quorum of 2). • 6 data nodes. • Plan to add more data nodes as we proceed with our migration of ShopWiki (500m products). • Expect to need less hardware than the C++. cluster (uses 50+ machines).
    10. 10. Elasticsearch Head
    11. 11. Realtime Updates • C++ search servers need to have the entire dataset re-indexed and swapped out all at once. • Could only do this oncea day, at night (affects performance). • With Elasticsearch, we can update our data all the time (it’s not even a limiting factor).
    12. 12. Challenges • Use TermsFacet to suggest filters to the user. • E.g. filter by stores or brands. • Using the 10 most frequent brands from a search can produce bad results. – A single brand may have lots of products that are all weakly relevant.
    13. 13. Top-N Faceting • The solution in Solr is to limit facets to the top-N results. • Elasticsearch doesn’t have this feature (as mentioned at last Meetup). • Solution: TermsStatsFacet(AKA aggregations in 1.0) • Allows us to get the brands/stores with the most relevant results. • E.g. Σ(scoren) n allows us to tune facet results to our liking
    14. 14. N = 0 (same as count) TermsStatsFacet for Brands Query: “mixing bowl” Σ(scoren) N = 4
    15. 15. De-duping Products • Use “more_like_this” query to find similar products. • If result’s score is “high enough”, it’s likely the same product from a different store. • “High enough” is defined as a fraction of the identity match’s score.
    16. 16. • Questions? • Rob Stewart • Lead Software Engineer • rstewart@shopwiki.com

    ×