UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013

Magento Expert Consulting Group Webinar | July 31, 2013
Thinking Beyond Search with Solr
Understanding How Solr Can Help Your Business Scale

Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2
The presenters
Magento Expert Consulting Group

What is Apache Solr?
Business Use Cases for Scale
Supporting Initial Catalog Growth
Supporting Growing Traffic
Supporting Substantial Catalog Growth
Supporting A Real-Time Catalog
Key Points to Remember
Q&A
Today’s agenda
July 31, 2013 | 3Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

Solr
• Separate application – installed on its own server, or
on an existing server in the environment depending on
business needs.
• Solr uses schema configuration files which can be
found in Magentto/lib/Apache
• Magento communicates with Solr via HTTP/XML
• Searching options configured via the Magento admin
panel
General Solr Overview

Better text-based searching provides a better
customer experience
• More relevant “fuzzy” searching*
• Faceted searches
• Search corrections
• Out of the box type-ahead*
• Response caching for better performance
*Requires customization to leverage at 100%
Solr the Search Platform

Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
What Makes Solr Powerful

because…
• Solr uses a simpler data structure
product_id
attribute_id
product_id
attribute_name
attribute_id
product_id
attribute_value
product_id
attribute_name
attribute_value
MySQL (EAV)
Solr (No EAV)

because…
• Solr uses a simpler data structure
• Solr supports replication which allows it to
truly scale for growth
Solr
Solr
Solr
Solr
Solr
Magento

Business Background
• Growing catalog – from 10K to 100K SKUs
• From 1 to 2 stores
• From 1 to 2+ web nodes / 1 database node
• Using native Solr Search
Problems
• Increased indexing time
• Out-dated information on the front-end
Business Use Case

Problem – Increasing Index Footprint
*Expected indexing time
July 31, 2013 | 12
35
Min*17.5
min*
3.5
min*
Year 2
2 websites
2 store views
17.5
min*
10
Min*
1.75
Min*
Control
1 website
1 store view
10,000
SKUs
50,000
SKUs
100,000
SKUs
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Slow
Indexing

July 31, 2013 | 13
Concept
• Connects to the database using JDBC
• Extra data transformations must be
written in Java/JavaScript.
• Uses a prepared xml configuration
Solution – Custom Data Import Handler

Results
• 10 times faster indexing
• Supports delta-indexing
Data Import Handler – Results
Things to keep in mind
• Solr knows about its data source
• May require extra development efforts
• Extra data transformations must be
written in Java/JavaScript

Business Background
• Growing catalog – 1,000,000 SKUs
• Growing traffic: up to 100 requests / second
• 3 stores
• 3+ web nodes/ 1 database node
• Using Data Import Handler
Problem
• Solr can’t handle increasing user
concurrency
Business Use Case

47.5
Min*23.75
min*
35
min*
17.5
Min*
3.5
Min*
Control
2 website
2 store view
500,000
SKUs
1,000,000
SKUs
July 31, 2013 | 17
4.75
min*
Year 3
3 websites
3 store views
100,000
SKUs
< 1000 updates/sec
Indexing delta
data handles
updates
Increasing Index Footprint – OK

120
msec*100
msec*80
msec*
Year 3
3 websites
3 store views
105
msec*
95
msec*
75
msec*
Control
2 website
2 store view
100,000 SKUs
30 RPS
500,000 SKUs
60 RPS
1,000,000 SKUs
100 RPS
*Expected average response time
July 31, 2013 | 18
Solr CPU
is maxed
out
Problem – Increased Response Time

July 31, 2013 | 19
Solution – Solr Replication
Concept
• Separate reading requests
• Replicate index across multiple nodes
• Read from multiple servers

Results
• Allows Solr to handle read traffic
• Introduces fail-over
• Requires middle-ware or Magento customization
• Possible heavy data duplication
• Extra changes in infrastructure
July 31, 2013 | 20
Solr Replication – Results

Supporting Substantial
Catalog Growth

Business Background
• 4 stores
• 4+ web nodes / 1 database node
• Using Data Import Handler
• Using Solr replication
Problems
• Delta-indexing delays
• Slow response time
Business Use Case

317.5
Min*
158.75
min*
237.5
min*
118.75
Min*
47.5
Min*
Control
3 website
3 store view
2,500,000
SKUs
5,000,000
SKUs
July 31, 2013 | 23
63.5
min*
Year 4
4 websites
4 store views
1,000,000
SKUs
> 1000 updates/sec
Problem – Increasing Index Footprint
Delta
indexing
delays

400
msec*270
msec*150
msec*
Year 4
4 websites
4 store views
300
msec*
230
msec*
120
msec*
Control
3 website
3 store view
1,000,000 SKUs
100 RPS
2,500,000 SKUs
200 RPS
5,000,000 SKUs
400 RPS
*Expected average response time
July 31, 2013 | 24
Slow
response
time
Problem – Increased Response Time

July 31, 2013 | 25
Concept
• Distributed search
• Distributed + Replication
(SolrCloud)
Solution – Index Sharding

Results
• Distributed search for faster response time
• 50 times faster indexing with 5 shards
Index Sharding – Results
July 31, 2013 | 26
MySQL A B C
I D H
F G E
Magento
D E F
G H ISolr
Shards
Things to keep in mind…
• Custom solution
• Requires Magento customization or
middleware introduction
• Extra changes in infrastructure

Business Background
• 5 stores
• 5+ web nodes / 1 database node
• Data Import Handler
• SolrCloud and distributed search
Business Requirement
• Always up-to-date index
Business Use Case

Solution – Listen To The MySQL Bin Log
July 31, 2013 | 29
Concept
• Connect via MySql replication protocol
• Listen to data-related events
MySQL
MySql
Slave
ReplicationBinlog

Solution – Listen To The MySQL Bin Log
July 31, 2013 | 30
Concept
• Connect via MySql replication protocol
• Listen to data-related events
• Extract information from events
• Manipulate with document in Lucene index
MySQL
Solr
Log
Parser
Replication
Listener
Binlog

Results
• Replication-like connection
• Indexes are always up-to-date
• Relatively complex implementation
July 31, 2013 | 31
Magento
MySQL
A
Solr
Shards
B C
I D H
F G E
D E F
G H I
Bin log
Listening To The MySQL Bin Log – Results

• Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text
• Solr is more than a search platform – it is a key for scalability and growth
• Solr’s data import handler keeps Solr performing well as your catalog grows
• Solr replication helps accommodate growing traffic
• Solr shards help keep indexing execution time and search response times low for very
large catalogs
• Listening to the MySQL bin log can help facilitate a continuously updating catalog
Solr helps businesses scale

Scaling Solr
Solr Wiki http://wiki.apache.org/solr/
Type-Ahead http://wiki.apache.org/solr/Suggester
Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler
Replication http://wiki.apache.org/solr/SolrReplication
Shard http://wiki.apache.org/solr/SolrCloud
Distributed Search http://wiki.apache.org/solr/DistributedSearch
MySql Replication listening
Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011
Replication Listener (C) https://launchpad.net/mysql-replication-listener
Open-Replicator (Java) http://code.google.com/p/open-replicator/
References

Q&A

Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
July 31, 2013 | 36
The presenters
Magento Expert Consulting Group

UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013

Similar to UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013 (20)

UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013