How to make a simple cheap high availability self-healing solr cluster

Lucene revolu+on 2013
SIMPLE & “CHEAP” SOLR CLUSTER
Stéphane Gamard
Searchbox CTO
1Lucene revolu+on 2013

BOOK GIVE-AWAY
Mail to: stephane.gamard@searchbox.com
Subject: [book-away]

Lucene revolu+on 2013 3
Searchbox -‐ Search as a Service
“We are in the business of providing
search engines on demand”

Solr Provisioning
4
High Availability
• Redundancy
• Sustained QPS
• Monitoring
• Recovery
Index Provisioning
• Collec+on crea+on
• Cluster resizing
• Node distribu+on

Solr Clustering
5
LB
Master
Slave
Slave
Master
Slave
Backup Backup
Master
Slave
Slave
LB
Monitoring
Before 4.x:
Master/Slave
Custom Rou+ng
Complex Provisioning

Solr Clustering
6
A6er 4.x:
Nodes
Automa+c Rou+ng
Simple Provisioning
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
Thank you to the SolrCloud Team !!!

What is SolrCloud?
7
Backward compa=bility
• Plain old Solr (with Lucene 4.x)
• Same schema
• Same solrconﬁg
• Same plugins
Some plugins might need update (distrib)

What is SolrCloud?
8
Centralized conﬁgura=on
• /conf
• /conf/schema.xml
• /conf/solrconﬁg.xml
• numShards
• replica+onFactor
• ...
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB

What is SolrCloud?
9
Conﬁgura=on & Architecture Agnos=c Nodes
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• ZK driven conﬁgura+on
• Shard (1 core)
• ZK driven role:
• Leader
• Replica
• Peer & Replica+on
• Disposable

What is SolrCloud?
10
Automa=c Rou=ng
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• Smart client connect to ZK
• Any node can forward a
requests to node that can
process it

What is SolrCloud?
11
Collec=on API
• Abstrac+on level
• An index is a collec+on
• A collec+on is a set of shards
• A shard is a set of cores
• CRUD API for collec+on
“Collec?ons represents a set of cores with
iden)cal conﬁgura?on. The set of cores of
a collec?on covers the en?re index”

What is SolrCloud?
12
Node
Core
Shard
Collec=on Abstrac+on level of interac+on & conﬁg
Scaling factor for collec+on size (numShards)
Scaling factor for QPS (replica?onFactor)
Scaling factor for cluster size (liveNodes)
=> SolrCloud is highly geared toward horizontal scaling

nodes => Single eﬀort for scalability
That’s SolrCloud
High Availability
• Redundancy
• Sustained QPS
• Monitoring
• Recovery
# replicas
ZK (clusterstatus, livenodes)
peer & replica+on
# replicas & # shards

Collection
Shards
Cores
Nodes
SolrCloud -‐ Design
Key metrics
• Collec+on size & complexity
• JVM requirement
• Node requirement

SolrCloud -‐ Collec+on Metrics
Pubmed Index
• ~12M documents
• 7 indexed ﬁelds
• 2 TF ﬁelds
• 3 sorted Fields
• 5 stored Fields

A note on sharding “The magic sauce of webscale”
Ram requirement eﬀect
!"
#!!!"
$!!!"
%!!!"
&!!!"
'!!!"
(!!!"
!" $" &" (" )" #!" #$"
!"#$%$&'()*$
# shards
ram

A note on sharding “The magic sauce of webscale”
Disk requirement eﬀect
!"
#"
$!"
$#"
%!"
%#"
&!"
&#"
'!"
'#"
#!"
!" %" '" (" )" $!" $%" $'" $("
!"#$%&%#'()*%
# shards
diskspace
“hidden quote for the book”

SolrCloud -‐ Collec+on Configura+on
Pubmed Index
• ~12M documents
• 7 indexed fields
• 2 TF fields
• 3 sorted Fields
• 5 stored Fields
Configura=on
• numShards: 3
• replica+onFactor: 2
• JVM ram: ~3G
• Disk: ~15G

SolrCloud -‐ Core Sizing
Heuris=cally inferred from “experience”
• Size on shard, not collec+on
• Do NOT starve resources on nodes
• Senle for JVM/Disk sizing
• Large amount of spare disk (op+mize)
RAM Disk
3 G 60 G

SolrCloud -‐ Cluster Availability
Depends on the nodes!!!
Instance ram disk $/h Nodes Min Size $/core/m
m1.medium 3.75 410 0.12 1 6 6 87
m1.large 7.5 850 0.24 2 6 12 87
m1.xlarge 15 1690 0.48 5 6 30 70
m2.xlarge 17.1 420 0.41 5 6 30 60
m2.2xlarge 34.2 850 0.82 11 6 66 54
m1.medium 3.75 410 0.12 3 6 18 28
CCtrl (paas) 1.02 420 -‐ 1 6 6 75( )

SolrCloud -‐ Monitoring
Solr Monitoring
• clusterstate.json
• /livenodes
Node Monitoring *
• load average
• core-‐to-‐resource consump+on (Core to CPU)
• collec+on-‐to-‐node consump+on (LB logs)

SolrCloud -‐ Provisioning
Stand-‐by nodes
• Automa+cally assigned as replica
• provides a metric of HA
Node addi=on * (self healing)
• Scheduled check on cluster conges+on
• Automa+cally spawn new nodes per need

SolrCloud -‐ Conclusion
Using SolrCloud is like juggling
• Gets bener with prac+ce
• There is always some magic leq
• Could become very overwhelming
• When it fails you loose your balls
Test -‐> Test -‐> Test -‐> some more Tests -‐> Test

What would make our current SolrCloud cluster
even more awesome:
• Balance/distribute core based on machine
load
• Standby core (replicas not serving request
and auto-‐shurng down
Next Steps

Requirement for solrCloud:
• Solr Mailing list: solr-‐user@lucene.apache.org
Further informa+on
• blogs & feed: hnp://www.searchbox.com/blog/
• Searchbox email: contact@searchbox.com
Further Informa+on

CONFERENCE PARTY
The Tipsy Crow: 770 5th Ave
Starts after Stump The Chump
Your conference badge gets
you in the door
TOMORROW
Breakfast starts at 7:30
Keynotes start at 8:30
CONTACT
Stephane Gamard
stephane.gamard@searchbox.com

How to make a simple cheap high availability self-healing solr cluster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to How to make a simple cheap high availability self-healing solr cluster

Similar to How to make a simple cheap high availability self-healing solr cluster (20)

More from lucenerevolution

More from lucenerevolution (20)

Recently uploaded

Recently uploaded (20)

How to make a simple cheap high availability self-healing solr cluster