SlideShare a Scribd company logo
Magento Expert Consulting Group Webinar | July 31, 2013
Thinking Beyond Search with Solr
Understanding How Solr Can Help Your Business Scale
Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2
The presenters
Magento Expert Consulting Group
What is Apache Solr?
Business Use Cases for Scale
Supporting Initial Catalog Growth
Supporting Growing Traffic
Supporting Substantial Catalog Growth
Supporting A Real-Time Catalog
Key Points to Remember
Q&A
Today’s agenda
July 31, 2013 | 3Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
July 31, 2013 | 4Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Solr
• Separate application – installed on its own server, or
on an existing server in the environment depending on
business needs.
• Solr uses schema configuration files which can be
found in Magentto/lib/Apache
• Magento communicates with Solr via HTTP/XML
• Searching options configured via the Magento admin
panel
July 31, 2013 | 5Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
General Solr Overview
Better text-based searching provides a better
customer experience
• More relevant “fuzzy” searching*
• Faceted searches
• Search corrections
• Out of the box type-ahead*
• Response caching for better performance
July 31, 2013 | 6Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
*Requires customization to leverage at 100%
What is Apache Solr?
Solr the Search Platform
Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
July 31, 2013 | 7Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
July 31, 2013 | 8Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
product_id
attribute_id
product_id
attribute_name
attribute_id
product_id
attribute_value
product_id
attribute_name
attribute_value
MySQL (EAV)
Solr (No EAV)
Solr is more than a search engine
because…
• Most data customers see is handled by
Solr instead of MySQL
• Solr uses a simpler data structure
• Solr supports replication which allows it to
truly scale for growth
July 31, 2013 | 9Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
What is Apache Solr?
What Makes Solr Powerful
Solr
Solr
Solr
Solr
Solr
Magento
Supporting Initial Catalog Growth
July 31, 2013 | 10Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background
• Growing catalog – from 10K to 100K SKUs
• From 1 to 2 stores
• From 1 to 2+ web nodes / 1 database node
• Using native Solr Search
July 31, 2013 | 11Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems
• Increased indexing time
• Out-dated information on the front-end
Business Use Case
Supporting Initial Catalog Growth
Supporting Initial Catalog Growth
Problem – Increasing Index Footprint
*Expected indexing time
July 31, 2013 | 12
35
Min*17.5
min*
3.5
min*
Year 2
2 websites
2 store views
17.5
min*
10
Min*
1.75
Min*
Control
1 website
1 store view
10,000
SKUs
50,000
SKUs
100,000
SKUs
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Slow
Indexing
July 31, 2013 | 13
Concept
• Connects to the database using JDBC
• Extra data transformations must be
written in Java/JavaScript.
• Uses a prepared xml configuration
Supporting Initial Catalog Growth
Solution – Custom Data Import Handler
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Results
• 10 times faster indexing
• Supports delta-indexing
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 14
Supporting Initial Catalog Growth
Data Import Handler – Results
Things to keep in mind
• Solr knows about its data source
• May require extra development efforts
• Extra data transformations must be
written in Java/JavaScript
Supporting Growing Traffic
July 31, 2013 | 15Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background
• Growing catalog – 1,000,000 SKUs
• Growing traffic: up to 100 requests / second
• 3 stores
• 3+ web nodes/ 1 database node
• Using Data Import Handler
July 31, 2013 | 16Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problem
• Solr can’t handle increasing user
concurrency
Business Use Case
Supporting Growing Traffic
47.5
Min*23.75
min*
35
min*
17.5
Min*
3.5
Min*
Control
2 website
2 store view
500,000
SKUs
1,000,000
SKUs
*Expected indexing time
July 31, 2013 | 17
4.75
min*
Year 3
3 websites
3 store views
100,000
SKUs
< 1000 updates/sec
Indexing delta
data handles
updates
Supporting Growing Traffic
Increasing Index Footprint – OK
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
120
msec*100
msec*80
msec*
Year 3
3 websites
3 store views
105
msec*
95
msec*
75
msec*
Control
2 website
2 store view
100,000 SKUs
30 RPS
500,000 SKUs
60 RPS
1,000,000 SKUs
100 RPS
*Expected average response time
July 31, 2013 | 18
Solr CPU
is maxed
out
Supporting Growing Traffic
Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
July 31, 2013 | 19
Supporting Growing Traffic
Solution – Solr Replication
Concept
• Separate reading requests
• Replicate index across multiple nodes
• Read from multiple servers
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Results
• Allows Solr to handle read traffic
• Introduces fail-over
Things to keep in mind
• Requires middle-ware or Magento customization
• Possible heavy data duplication
• Extra changes in infrastructure
July 31, 2013 | 20
Supporting Initial Catalog Growth
Solr Replication – Results
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting Substantial
Catalog Growth
July 31, 2013 | 21Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background
• Growing catalog – 5,000,000 SKUs
• 4 stores
• 4+ web nodes / 1 database node
• Using Data Import Handler
• Using Solr replication
July 31, 2013 | 22Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Problems
• Delta-indexing delays
• Slow response time
Business Use Case
Supporting Substantial Catalog Growth
317.5
Min*
158.75
min*
237.5
min*
118.75
Min*
47.5
Min*
Control
3 website
3 store view
2,500,000
SKUs
5,000,000
SKUs
*Expected indexing time
July 31, 2013 | 23
63.5
min*
Year 4
4 websites
4 store views
1,000,000
SKUs
> 1000 updates/sec
Supporting Substantial Catalog Growth
Problem – Increasing Index Footprint
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Delta
indexing
delays
400
msec*270
msec*150
msec*
Year 4
4 websites
4 store views
300
msec*
230
msec*
120
msec*
Control
3 website
3 store view
1,000,000 SKUs
100 RPS
2,500,000 SKUs
200 RPS
5,000,000 SKUs
400 RPS
*Expected average response time
July 31, 2013 | 24
Slow
response
time
Supporting Substantial Catalog Growth
Problem – Increased Response Time
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
July 31, 2013 | 25
Concept
• Distributed search
• Distributed + Replication
(SolrCloud)
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting Substantial Catalog Growth
Solution – Index Sharding
Results
• Distributed search for faster response time
• 50 times faster indexing with 5 shards
Supporting Growing Traffic
Index Sharding – Results
July 31, 2013 | 26
MySQL A B C
I D H
F G E
Magento
D E F
G H ISolr
Shards
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Things to keep in mind…
• Custom solution
• Requires Magento customization or
middleware introduction
• Extra changes in infrastructure
Supporting A Real-Time Catalog
July 31, 2013 | 27Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Background
• Growing catalog – 10,000,000 SKUs
• 5 stores
• 5+ web nodes / 1 database node
• Data Import Handler
• SolrCloud and distributed search
July 31, 2013 | 28Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Business Requirement
• Always up-to-date index
Business Use Case
Supporting A Real-Time Catalog
Supporting A Real-Time Catalog
Solution – Listen To The MySQL Bin Log
July 31, 2013 | 29
Concept
• Connect via MySql replication protocol
• Listen to data-related events
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
MySql
Slave
ReplicationBinlog
Supporting A Real-Time Catalog
Solution – Listen To The MySQL Bin Log
July 31, 2013 | 30
Concept
• Connect via MySql replication protocol
• Listen to data-related events
• Extract information from events
• Manipulate with document in Lucene index
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
MySQL
Solr
Log
Parser
Replication
Listener
Binlog
Results
• Replication-like connection
• Indexes are always up-to-date
Things to keep in mind
• Relatively complex implementation
July 31, 2013 | 31
Magento
MySQL
A
Solr
Shards
B C
I D H
F G E
D E F
G H I
Bin log
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Supporting A Real-Time Catalog
Listening To The MySQL Bin Log – Results
Key Points to Remember
July 31, 2013 | 32Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
• Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text
• Solr is more than a search platform – it is a key for scalability and growth
• Solr’s data import handler keeps Solr performing well as your catalog grows
• Solr replication helps accommodate growing traffic
• Solr shards help keep indexing execution time and search response times low for very
large catalogs
• Listening to the MySQL bin log can help facilitate a continuously updating catalog
July 31, 2013 | 33Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Key Points to Remember
Solr helps businesses scale
Scaling Solr
Solr Wiki http://wiki.apache.org/solr/
Type-Ahead http://wiki.apache.org/solr/Suggester
Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler
Replication http://wiki.apache.org/solr/SolrReplication
Shard http://wiki.apache.org/solr/SolrCloud
Distributed Search http://wiki.apache.org/solr/DistributedSearch
MySql Replication listening
Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011
Replication Listener (C) https://launchpad.net/mysql-replication-listener
Open-Replicator (Java) http://code.google.com/p/open-replicator/
References
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 34
Q&A
July 31, 2013 | 35Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
Udi Shamay
Head, Expert Consulting Group
udi@ebay.com
Steve Kukla
Business Solution Architect, Expert Consulting Group
skukla@ebay.com
Kirill Morozov
Application Architect, Expert Consulting Group
kmorozov@ebay.com
July 31, 2013 | 36
The presenters
Magento Expert Consulting Group
Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale

More Related Content

What's hot

Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
Usama Wahab Khan Cloud, Data and AI
 
How Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer RelationshipsHow Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer Relationships
Cloudera, Inc.
 
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
The SAS Search Journey:  Using AI to Move from Google to Lucidworks - Alex Fl...The SAS Search Journey:  Using AI to Move from Google to Lucidworks - Alex Fl...
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
Lucidworks
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
Vishal Shukla
 
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain   Big Data Baby Steps (4-12-2014) FinalUtah Big Mountain   Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
Nick Baguley
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
Dmitry Anoshin
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
DataStax
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
Ali Hodroj
 

What's hot (8)

Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
How Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer RelationshipsHow Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer Relationships
 
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
The SAS Search Journey:  Using AI to Move from Google to Lucidworks - Alex Fl...The SAS Search Journey:  Using AI to Move from Google to Lucidworks - Alex Fl...
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain   Big Data Baby Steps (4-12-2014) FinalUtah Big Mountain   Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 

Similar to UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
marketingsyone
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
Petter Skodvin-Hvammen
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
MariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB plc
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
Terry Bunio
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
lucenerevolution
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
enterprisesearchmeetup
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
The final frontier
The final frontierThe final frontier
The final frontier
Terry Bunio
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
DATAVERSITY
 
Webinar: Personalized Retail Search & Recommendations with Fusion
Webinar: Personalized Retail Search & Recommendations with FusionWebinar: Personalized Retail Search & Recommendations with Fusion
Webinar: Personalized Retail Search & Recommendations with Fusion
Lucidworks
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
lucenerevolution
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
Revolution Analytics
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Looker
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Looker
 

Similar to UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013 (20)

Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
Tiago Fonseca & Rui Velho - Syone & Leroy Merlin - OSL19
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
 
Webinar: Personalized Retail Search & Recommendations with Fusion
Webinar: Personalized Retail Search & Recommendations with FusionWebinar: Personalized Retail Search & Recommendations with Fusion
Webinar: Personalized Retail Search & Recommendations with Fusion
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 

UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013

  • 1. Magento Expert Consulting Group Webinar | July 31, 2013 Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale
  • 2. Udi Shamay Head, Expert Consulting Group udi@ebay.com Steve Kukla Business Solution Architect, Expert Consulting Group skukla@ebay.com Kirill Morozov Application Architect, Expert Consulting Group kmorozov@ebay.com Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 2 The presenters Magento Expert Consulting Group
  • 3. What is Apache Solr? Business Use Cases for Scale Supporting Initial Catalog Growth Supporting Growing Traffic Supporting Substantial Catalog Growth Supporting A Real-Time Catalog Key Points to Remember Q&A Today’s agenda July 31, 2013 | 3Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 4. What is Apache Solr? July 31, 2013 | 4Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 5. Solr • Separate application – installed on its own server, or on an existing server in the environment depending on business needs. • Solr uses schema configuration files which can be found in Magentto/lib/Apache • Magento communicates with Solr via HTTP/XML • Searching options configured via the Magento admin panel July 31, 2013 | 5Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale What is Apache Solr? General Solr Overview
  • 6. Better text-based searching provides a better customer experience • More relevant “fuzzy” searching* • Faceted searches • Search corrections • Out of the box type-ahead* • Response caching for better performance July 31, 2013 | 6Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale *Requires customization to leverage at 100% What is Apache Solr? Solr the Search Platform
  • 7. Solr is more than a search engine because… • Most data customers see is handled by Solr instead of MySQL July 31, 2013 | 7Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale What is Apache Solr? What Makes Solr Powerful
  • 8. Solr is more than a search engine because… • Most data customers see is handled by Solr instead of MySQL • Solr uses a simpler data structure July 31, 2013 | 8Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale What is Apache Solr? What Makes Solr Powerful product_id attribute_id product_id attribute_name attribute_id product_id attribute_value product_id attribute_name attribute_value MySQL (EAV) Solr (No EAV)
  • 9. Solr is more than a search engine because… • Most data customers see is handled by Solr instead of MySQL • Solr uses a simpler data structure • Solr supports replication which allows it to truly scale for growth July 31, 2013 | 9Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale What is Apache Solr? What Makes Solr Powerful Solr Solr Solr Solr Solr Magento
  • 10. Supporting Initial Catalog Growth July 31, 2013 | 10Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 11. Business Background • Growing catalog – from 10K to 100K SKUs • From 1 to 2 stores • From 1 to 2+ web nodes / 1 database node • Using native Solr Search July 31, 2013 | 11Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Problems • Increased indexing time • Out-dated information on the front-end Business Use Case Supporting Initial Catalog Growth
  • 12. Supporting Initial Catalog Growth Problem – Increasing Index Footprint *Expected indexing time July 31, 2013 | 12 35 Min*17.5 min* 3.5 min* Year 2 2 websites 2 store views 17.5 min* 10 Min* 1.75 Min* Control 1 website 1 store view 10,000 SKUs 50,000 SKUs 100,000 SKUs Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Slow Indexing
  • 13. July 31, 2013 | 13 Concept • Connects to the database using JDBC • Extra data transformations must be written in Java/JavaScript. • Uses a prepared xml configuration Supporting Initial Catalog Growth Solution – Custom Data Import Handler Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 14. Results • 10 times faster indexing • Supports delta-indexing Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 14 Supporting Initial Catalog Growth Data Import Handler – Results Things to keep in mind • Solr knows about its data source • May require extra development efforts • Extra data transformations must be written in Java/JavaScript
  • 15. Supporting Growing Traffic July 31, 2013 | 15Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 16. Business Background • Growing catalog – 1,000,000 SKUs • Growing traffic: up to 100 requests / second • 3 stores • 3+ web nodes/ 1 database node • Using Data Import Handler July 31, 2013 | 16Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Problem • Solr can’t handle increasing user concurrency Business Use Case Supporting Growing Traffic
  • 17. 47.5 Min*23.75 min* 35 min* 17.5 Min* 3.5 Min* Control 2 website 2 store view 500,000 SKUs 1,000,000 SKUs *Expected indexing time July 31, 2013 | 17 4.75 min* Year 3 3 websites 3 store views 100,000 SKUs < 1000 updates/sec Indexing delta data handles updates Supporting Growing Traffic Increasing Index Footprint – OK Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 18. 120 msec*100 msec*80 msec* Year 3 3 websites 3 store views 105 msec* 95 msec* 75 msec* Control 2 website 2 store view 100,000 SKUs 30 RPS 500,000 SKUs 60 RPS 1,000,000 SKUs 100 RPS *Expected average response time July 31, 2013 | 18 Solr CPU is maxed out Supporting Growing Traffic Problem – Increased Response Time Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 19. July 31, 2013 | 19 Supporting Growing Traffic Solution – Solr Replication Concept • Separate reading requests • Replicate index across multiple nodes • Read from multiple servers Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 20. Results • Allows Solr to handle read traffic • Introduces fail-over Things to keep in mind • Requires middle-ware or Magento customization • Possible heavy data duplication • Extra changes in infrastructure July 31, 2013 | 20 Supporting Initial Catalog Growth Solr Replication – Results Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 21. Supporting Substantial Catalog Growth July 31, 2013 | 21Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 22. Business Background • Growing catalog – 5,000,000 SKUs • 4 stores • 4+ web nodes / 1 database node • Using Data Import Handler • Using Solr replication July 31, 2013 | 22Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Problems • Delta-indexing delays • Slow response time Business Use Case Supporting Substantial Catalog Growth
  • 23. 317.5 Min* 158.75 min* 237.5 min* 118.75 Min* 47.5 Min* Control 3 website 3 store view 2,500,000 SKUs 5,000,000 SKUs *Expected indexing time July 31, 2013 | 23 63.5 min* Year 4 4 websites 4 store views 1,000,000 SKUs > 1000 updates/sec Supporting Substantial Catalog Growth Problem – Increasing Index Footprint Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Delta indexing delays
  • 24. 400 msec*270 msec*150 msec* Year 4 4 websites 4 store views 300 msec* 230 msec* 120 msec* Control 3 website 3 store view 1,000,000 SKUs 100 RPS 2,500,000 SKUs 200 RPS 5,000,000 SKUs 400 RPS *Expected average response time July 31, 2013 | 24 Slow response time Supporting Substantial Catalog Growth Problem – Increased Response Time Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 25. July 31, 2013 | 25 Concept • Distributed search • Distributed + Replication (SolrCloud) Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Supporting Substantial Catalog Growth Solution – Index Sharding
  • 26. Results • Distributed search for faster response time • 50 times faster indexing with 5 shards Supporting Growing Traffic Index Sharding – Results July 31, 2013 | 26 MySQL A B C I D H F G E Magento D E F G H ISolr Shards Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Things to keep in mind… • Custom solution • Requires Magento customization or middleware introduction • Extra changes in infrastructure
  • 27. Supporting A Real-Time Catalog July 31, 2013 | 27Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 28. Business Background • Growing catalog – 10,000,000 SKUs • 5 stores • 5+ web nodes / 1 database node • Data Import Handler • SolrCloud and distributed search July 31, 2013 | 28Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Business Requirement • Always up-to-date index Business Use Case Supporting A Real-Time Catalog
  • 29. Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log July 31, 2013 | 29 Concept • Connect via MySql replication protocol • Listen to data-related events Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale MySQL MySql Slave ReplicationBinlog
  • 30. Supporting A Real-Time Catalog Solution – Listen To The MySQL Bin Log July 31, 2013 | 30 Concept • Connect via MySql replication protocol • Listen to data-related events • Extract information from events • Manipulate with document in Lucene index Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale MySQL Solr Log Parser Replication Listener Binlog
  • 31. Results • Replication-like connection • Indexes are always up-to-date Things to keep in mind • Relatively complex implementation July 31, 2013 | 31 Magento MySQL A Solr Shards B C I D H F G E D E F G H I Bin log Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Supporting A Real-Time Catalog Listening To The MySQL Bin Log – Results
  • 32. Key Points to Remember July 31, 2013 | 32Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 33. • Solr’s search capabilities provide a better site experience than MySQL LIKE or Full-text • Solr is more than a search platform – it is a key for scalability and growth • Solr’s data import handler keeps Solr performing well as your catalog grows • Solr replication helps accommodate growing traffic • Solr shards help keep indexing execution time and search response times low for very large catalogs • Listening to the MySQL bin log can help facilitate a continuously updating catalog July 31, 2013 | 33Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale Key Points to Remember Solr helps businesses scale
  • 34. Scaling Solr Solr Wiki http://wiki.apache.org/solr/ Type-Ahead http://wiki.apache.org/solr/Suggester Data Import Handler(DIH) http://wiki.apache.org/solr/DataImportHandler Replication http://wiki.apache.org/solr/SolrReplication Shard http://wiki.apache.org/solr/SolrCloud Distributed Search http://wiki.apache.org/solr/DistributedSearch MySql Replication listening Change Data Capture http://www.slideshare.net/mkindahl/binary-log-api-presentation-oscon-2011 Replication Listener (C) https://launchpad.net/mysql-replication-listener Open-Replicator (Java) http://code.google.com/p/open-replicator/ References Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale July 31, 2013 | 34
  • 35. Q&A July 31, 2013 | 35Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale
  • 36. Udi Shamay Head, Expert Consulting Group udi@ebay.com Steve Kukla Business Solution Architect, Expert Consulting Group skukla@ebay.com Kirill Morozov Application Architect, Expert Consulting Group kmorozov@ebay.com July 31, 2013 | 36 The presenters Magento Expert Consulting Group Thinking Beyond Search with Solr – Understanding How Solr Can Help Your Business Scale