SlideShare a Scribd company logo
Lucene	
  revolu+on	
  2013
SIMPLE & “CHEAP” SOLR CLUSTER
Stéphane Gamard
Searchbox CTO <stephane.gamard@searchbox.com>
1Lucene	
  revolu+on	
  2013
Lucene	
  revolu+on	
  2013 2
Searchbox	
  -­‐	
  Search	
  as	
  a	
  Service
“We	
  are	
  in	
  the	
  business	
  of	
  providing	
  
search	
  engines	
  on	
  demand”
	
  	
  	
  	
  	
  	
  	
  	
   	
  	
  	
  	
  	
  	
  	
  	
   	
  	
  	
  	
  	
  	
  	
  	
  
Lucene	
  revolu+on	
  2013
Solr	
  Provisioning
3
High	
  Availability
• Redundancy
• Sustained	
  QPS
• Monitoring
• Recovery
Index	
  Provisioning
• Collec+on	
  crea+on
• Cluster	
  resizing
• Node	
  distribu+on
Lucene	
  revolu+on	
  2013
Solr	
  Clustering
4
LB
Master
Slave
Slave
Master
Slave
Backup Backup
Master
Slave
Slave
LB
Monitoring
Before	
  4.x:
Master/Slave
Custom	
  Rou+ng
Complex	
  Provisioning
Lucene	
  revolu+on	
  2013
Solr	
  Clustering
5
A6er	
  4.x:
Nodes
Automa+c	
  Rou+ng
Simple	
  Provisioning
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
Thank	
  you	
  	
  to	
  the	
  SolrCloud	
  Team	
  !!!
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
6
Backward	
  compa=bility
• Plain	
  old	
  Solr	
  (with	
  Lucene	
  4.x)
• Same	
  schema
• Same	
  solrconfig
• Same	
  plugins
Some	
  plugins	
  might	
  need	
  update	
  (distrib)
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
7
Centralized	
  configura=on
• /conf
• /conf/schema.xml
• /conf/solrconfig.xml
• numShards
• replica+onFactor
• ...
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
8
Configura=on	
  &	
  Architecture	
  Agnos=c	
  Nodes
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• ZK	
  driven	
  configura+on
• Shard	
  (1	
  core)
• ZK	
  driven	
  role:
• Leader
• Replica
• Peer	
  	
  &	
  Replica+on
• Disposable
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
9
Automa=c	
  Rou=ng
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• Smart	
  client	
  connect	
  to	
  ZK
• Any	
  node	
  can	
  forward	
  a	
  
requests	
  to	
  node	
  that	
  can	
  
process	
  it
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
10
Collec=on	
  API
• Abstrac+on	
  level
• An	
  index	
  is	
  a	
  collec+on
• A	
  collec+on	
  is	
  a	
  set	
  of	
  shards
• A	
  shard	
  is	
  a	
  	
  set	
  of	
  cores
• CRUD	
  API	
  for	
  collec+on
“Collec?ons	
  represents	
  a	
  set	
  of	
  cores	
  with	
  
iden)cal	
  configura?on.	
  The	
  set	
  of	
  cores	
  of	
  
a	
  collec?on	
  covers	
  the	
  en?re	
  index”
Lucene	
  revolu+on	
  2013
What	
  is	
  SolrCloud?
11
Node
Core
Shard
Collec=on Abstrac+on	
  level	
  of	
  interac+on	
  &	
  config
Scaling	
  factor	
  for	
  collec+on	
  size	
  (numShards)
Scaling	
  factor	
  for	
  QPS	
  (replica?onFactor)
Scaling	
  factor	
  for	
  cluster	
  size	
  (liveNodes)
=>	
  SolrCloud	
  is	
  highly	
  geared	
  toward	
  horizontal	
  scaling
Lucene	
  revolu+on	
  2013 12
nodes	
  =>	
  Single	
  effort	
  for	
  scalability	
  
That’s	
  SolrCloud
High	
  Availability
• Redundancy
• Sustained	
  QPS
• Monitoring
• Recovery
#	
  replicas
ZK	
  (clusterstatus,	
  livenodes)
peer	
  &	
  replica+on
#	
  replicas	
  &	
  #	
  shards
Lucene	
  revolu+on	
  2013 13
Collection
Shards
Cores
Nodes
SolrCloud	
  -­‐	
  Design
Key	
  metrics
• Collec+on	
  size	
  &	
  complexity
• JVM	
  requirement
• Node	
  requirement
Lucene	
  revolu+on	
  2013 14
SolrCloud	
  -­‐	
  Collec+on	
  Metrics
Pubmed	
  Index
• ~12M	
  documents
• 7	
  indexed	
  fields
• 2	
  TF	
  fields
• 3	
  sorted	
  Fields
• 5	
  stored	
  Fields
Lucene	
  revolu+on	
  2013 15
A	
  note	
  on	
  sharding “The	
  magic	
  sauce	
  of	
  webscale”
Ram	
  requirement	
  effect
0"
1000"
2000"
3000"
4000"
5000"
6000"
0" 2" 4" 6" 8" 10" 12"
RAM$/$Shard$
# shards
ram
Lucene	
  revolu+on	
  2013 16
A	
  note	
  on	
  sharding “The	
  magic	
  sauce	
  of	
  webscale”
Disk	
  requirement	
  effect
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
50"
0" 2" 4" 6" 8" 10" 12" 14" 16"
Disk%/%shard%
# shards
diskspace
“hidden	
  quote	
  for	
  the	
  book”
Lucene	
  revolu+on	
  2013 17
SolrCloud	
  -­‐	
  Collec+on	
  Configura+on
Pubmed	
  Index
• ~12M	
  documents
• 7	
  indexed	
  fields
• 2	
  TF	
  fields
• 3	
  sorted	
  Fields
• 5	
  stored	
  Fields
Configura=on
• numShards:	
  3
• replica+onFactor:	
  2
• JVM	
  ram:	
  ~3G
• Disk:	
  ~15G
Lucene	
  revolu+on	
  2013 18
SolrCloud	
  -­‐	
  Core	
  Sizing
Heuris=cally	
  inferred	
  from	
  “experience”
• Size	
  on	
  shard,	
  not	
  collec+on
• Do	
  NOT	
  starve	
  resources	
  on	
  nodes
• Senle	
  for	
  JVM/Disk	
  sizing	
  
• Large	
  amount	
  of	
  spare	
  disk	
  (op+mize)
RAM Disk
3	
  G 60	
  G
Lucene	
  revolu+on	
  2013 19
SolrCloud	
  -­‐	
  Cluster	
  Availability
Depends	
  on	
  the	
  nodes!!!
Instance ram disk $/h Nodes Min Size $/core/m
m1.medium 3.75 410 0.12 1 6 6 87
m1.large 7.5 850 0.24 2 6 12 87
m1.xlarge 15 1690 0.48 5 6 30 70
m2.xlarge 17.1 420 0.41 5 6 30 60
m2.2xlarge 34.2 850 0.82 11 6 66 54
m1.medium 3.75 410 0.12 3 6 18 28
CCtrl	
  (paas) 1.02 420 -­‐ 1 6 6 75( )
Lucene	
  revolu+on	
  2013 20
SolrCloud	
  -­‐	
  Monitoring
Solr	
  Monitoring
• clusterstate.json
• /livenodes
Node	
  Monitoring	
  *
• load	
  average
• core-­‐to-­‐resource	
  consump+on	
  (Core	
  to	
  CPU)
• collec+on-­‐to-­‐node	
  consump+on	
  (LB	
  logs)
Lucene	
  revolu+on	
  2013 21
SolrCloud	
  -­‐	
  Provisioning
Stand-­‐by	
  nodes
• Automa+cally	
  assigned	
  as	
  replica
• provides	
  a	
  metric	
  of	
  HA
Node	
  addi=on	
  *	
  (self	
  healing)
• Scheduled	
  check	
  on	
  cluster	
  conges+on
• Automa+cally	
  spawn	
  new	
  nodes	
  per	
  need
Lucene	
  revolu+on	
  2013 22
SolrCloud	
  -­‐	
  Conclusion
Using	
  SolrCloud	
  is	
  like	
  juggling
• Gets	
  bener	
  with	
  prac+ce
• There	
  is	
  always	
  some	
  magic	
  leq
• Could	
  become	
  very	
  overwhelming
• When	
  it	
  fails	
  you	
  loose	
  your	
  balls
Test	
  -­‐>	
  Test	
  -­‐>	
  Test	
  -­‐>	
  some	
  more	
  Tests	
  -­‐>	
  Test
Lucene	
  revolu+on	
  2013 23
What	
  would	
  make	
  our	
  current	
  SolrCloud	
  cluster	
  
even	
  more	
  awesome:
• Balance/distribute	
  core	
  based	
  on	
  machine	
  
load
• Standby	
  core	
  (replicas	
  not	
  serving	
  request	
  
and	
  auto-­‐shurng	
  down
Next	
  Steps
Lucene	
  revolu+on	
  2013 24
Requirement	
  for	
  solrCloud:
• Solr	
  Mailing	
  list:	
  solr-­‐user@lucene.apache.org
Further	
  informa+on
• blogs	
  &	
  feed:	
  hnp://www.searchbox.com/blog/
• Searchbox	
  email:	
  contact@searchbox.com
Further	
  Informa+on
Lucene	
  revolu+on	
  2013
CONFERENCE PARTY
The Tipsy Crow: 770 5th Ave
Starts after Stump The Chump
Your conference badge gets
you in the door
TOMORROW
Breakfast starts at 7:30
Keynotes start at 8:30
CONTACT
Stephane Gamard
stephane.gamard@searchbox.com
25Lucene	
  revolu+on	
  2013

More Related Content

What's hot

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
thelabdude
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
Shalin Shekhar Mangar
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
ravikgiitk
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Shalin Shekhar Mangar
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
thelabdude
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
Sematext Group, Inc.
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
Shalin Shekhar Mangar
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
Shalin Shekhar Mangar
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Lucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Lucidworks
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Lucidworks
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
thelabdude
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Lucidworks (Archived)
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache Zookeeper
Alex Ehrnschwender
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
Anshul Patel
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 

What's hot (20)

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache Zookeeper
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 

Viewers also liked

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
Lucidworks
 
Real World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and SparkReal World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and Spark
QAware GmbH
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
Alex Moundalexis
 
Apache Solr
Apache SolrApache Solr
Apache Solr
Minh Tran
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
lucenerevolution
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
Cloudera, Inc.
 

Viewers also liked (13)

Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
SolrCloud - High Availability and Fault Tolerance: Presented by Mark Miller, ...
 
Real World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and SparkReal World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and Spark
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similar to Solr cluster with SolrCloud at lucenerevolution (tutorial)

Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
Lucidworks
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Lucidworks
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
Nitin Sharma
 
Devops kc
Devops kcDevops kc
Devops kc
Philip Thompson
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
bloomreacheng
 
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Continuent
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
Christian Johannsen
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Solr 4
Solr 4Solr 4
Solr 4
Erik Hatcher
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
Ton Ngo
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverviewDimas Prasetyo
 
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-orsCharacterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
Sonatype
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
Anshum Gupta
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Continuent
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
Cloudian
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
Cominvent AS
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
Scality
 

Similar to Solr cluster with SolrCloud at lucenerevolution (tutorial) (20)

Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Devops kc
Devops kcDevops kc
Devops kc
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Solr 4
Solr 4Solr 4
Solr 4
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-orsCharacterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
Characterizing and Contrasting Kuhn-tey-ner Awr-kuh-streyt-ors
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Solr cluster with SolrCloud at lucenerevolution (tutorial)

  • 1. Lucene  revolu+on  2013 SIMPLE & “CHEAP” SOLR CLUSTER Stéphane Gamard Searchbox CTO <stephane.gamard@searchbox.com> 1Lucene  revolu+on  2013
  • 2. Lucene  revolu+on  2013 2 Searchbox  -­‐  Search  as  a  Service “We  are  in  the  business  of  providing   search  engines  on  demand”                                                
  • 3. Lucene  revolu+on  2013 Solr  Provisioning 3 High  Availability • Redundancy • Sustained  QPS • Monitoring • Recovery Index  Provisioning • Collec+on  crea+on • Cluster  resizing • Node  distribu+on
  • 4. Lucene  revolu+on  2013 Solr  Clustering 4 LB Master Slave Slave Master Slave Backup Backup Master Slave Slave LB Monitoring Before  4.x: Master/Slave Custom  Rou+ng Complex  Provisioning
  • 5. Lucene  revolu+on  2013 Solr  Clustering 5 A6er  4.x: Nodes Automa+c  Rou+ng Simple  Provisioning Node Monitoring Node Node Node ZK NodeNode Node ZK ZK LB LB Thank  you    to  the  SolrCloud  Team  !!!
  • 6. Lucene  revolu+on  2013 What  is  SolrCloud? 6 Backward  compa=bility • Plain  old  Solr  (with  Lucene  4.x) • Same  schema • Same  solrconfig • Same  plugins Some  plugins  might  need  update  (distrib)
  • 7. Lucene  revolu+on  2013 What  is  SolrCloud? 7 Centralized  configura=on • /conf • /conf/schema.xml • /conf/solrconfig.xml • numShards • replica+onFactor • ... Node Monitoring Node Node Node ZK NodeNode Node ZK ZK LB LB
  • 8. Lucene  revolu+on  2013 What  is  SolrCloud? 8 Configura=on  &  Architecture  Agnos=c  Nodes Node Monitoring Node Node Node ZK NodeNode Node ZK ZK LB LB • ZK  driven  configura+on • Shard  (1  core) • ZK  driven  role: • Leader • Replica • Peer    &  Replica+on • Disposable
  • 9. Lucene  revolu+on  2013 What  is  SolrCloud? 9 Automa=c  Rou=ng Node Monitoring Node Node Node ZK NodeNode Node ZK ZK LB LB • Smart  client  connect  to  ZK • Any  node  can  forward  a   requests  to  node  that  can   process  it
  • 10. Lucene  revolu+on  2013 What  is  SolrCloud? 10 Collec=on  API • Abstrac+on  level • An  index  is  a  collec+on • A  collec+on  is  a  set  of  shards • A  shard  is  a    set  of  cores • CRUD  API  for  collec+on “Collec?ons  represents  a  set  of  cores  with   iden)cal  configura?on.  The  set  of  cores  of   a  collec?on  covers  the  en?re  index”
  • 11. Lucene  revolu+on  2013 What  is  SolrCloud? 11 Node Core Shard Collec=on Abstrac+on  level  of  interac+on  &  config Scaling  factor  for  collec+on  size  (numShards) Scaling  factor  for  QPS  (replica?onFactor) Scaling  factor  for  cluster  size  (liveNodes) =>  SolrCloud  is  highly  geared  toward  horizontal  scaling
  • 12. Lucene  revolu+on  2013 12 nodes  =>  Single  effort  for  scalability   That’s  SolrCloud High  Availability • Redundancy • Sustained  QPS • Monitoring • Recovery #  replicas ZK  (clusterstatus,  livenodes) peer  &  replica+on #  replicas  &  #  shards
  • 13. Lucene  revolu+on  2013 13 Collection Shards Cores Nodes SolrCloud  -­‐  Design Key  metrics • Collec+on  size  &  complexity • JVM  requirement • Node  requirement
  • 14. Lucene  revolu+on  2013 14 SolrCloud  -­‐  Collec+on  Metrics Pubmed  Index • ~12M  documents • 7  indexed  fields • 2  TF  fields • 3  sorted  Fields • 5  stored  Fields
  • 15. Lucene  revolu+on  2013 15 A  note  on  sharding “The  magic  sauce  of  webscale” Ram  requirement  effect 0" 1000" 2000" 3000" 4000" 5000" 6000" 0" 2" 4" 6" 8" 10" 12" RAM$/$Shard$ # shards ram
  • 16. Lucene  revolu+on  2013 16 A  note  on  sharding “The  magic  sauce  of  webscale” Disk  requirement  effect 0" 5" 10" 15" 20" 25" 30" 35" 40" 45" 50" 0" 2" 4" 6" 8" 10" 12" 14" 16" Disk%/%shard% # shards diskspace “hidden  quote  for  the  book”
  • 17. Lucene  revolu+on  2013 17 SolrCloud  -­‐  Collec+on  Configura+on Pubmed  Index • ~12M  documents • 7  indexed  fields • 2  TF  fields • 3  sorted  Fields • 5  stored  Fields Configura=on • numShards:  3 • replica+onFactor:  2 • JVM  ram:  ~3G • Disk:  ~15G
  • 18. Lucene  revolu+on  2013 18 SolrCloud  -­‐  Core  Sizing Heuris=cally  inferred  from  “experience” • Size  on  shard,  not  collec+on • Do  NOT  starve  resources  on  nodes • Senle  for  JVM/Disk  sizing   • Large  amount  of  spare  disk  (op+mize) RAM Disk 3  G 60  G
  • 19. Lucene  revolu+on  2013 19 SolrCloud  -­‐  Cluster  Availability Depends  on  the  nodes!!! Instance ram disk $/h Nodes Min Size $/core/m m1.medium 3.75 410 0.12 1 6 6 87 m1.large 7.5 850 0.24 2 6 12 87 m1.xlarge 15 1690 0.48 5 6 30 70 m2.xlarge 17.1 420 0.41 5 6 30 60 m2.2xlarge 34.2 850 0.82 11 6 66 54 m1.medium 3.75 410 0.12 3 6 18 28 CCtrl  (paas) 1.02 420 -­‐ 1 6 6 75( )
  • 20. Lucene  revolu+on  2013 20 SolrCloud  -­‐  Monitoring Solr  Monitoring • clusterstate.json • /livenodes Node  Monitoring  * • load  average • core-­‐to-­‐resource  consump+on  (Core  to  CPU) • collec+on-­‐to-­‐node  consump+on  (LB  logs)
  • 21. Lucene  revolu+on  2013 21 SolrCloud  -­‐  Provisioning Stand-­‐by  nodes • Automa+cally  assigned  as  replica • provides  a  metric  of  HA Node  addi=on  *  (self  healing) • Scheduled  check  on  cluster  conges+on • Automa+cally  spawn  new  nodes  per  need
  • 22. Lucene  revolu+on  2013 22 SolrCloud  -­‐  Conclusion Using  SolrCloud  is  like  juggling • Gets  bener  with  prac+ce • There  is  always  some  magic  leq • Could  become  very  overwhelming • When  it  fails  you  loose  your  balls Test  -­‐>  Test  -­‐>  Test  -­‐>  some  more  Tests  -­‐>  Test
  • 23. Lucene  revolu+on  2013 23 What  would  make  our  current  SolrCloud  cluster   even  more  awesome: • Balance/distribute  core  based  on  machine   load • Standby  core  (replicas  not  serving  request   and  auto-­‐shurng  down Next  Steps
  • 24. Lucene  revolu+on  2013 24 Requirement  for  solrCloud: • Solr  Mailing  list:  solr-­‐user@lucene.apache.org Further  informa+on • blogs  &  feed:  hnp://www.searchbox.com/blog/ • Searchbox  email:  contact@searchbox.com Further  Informa+on
  • 25. Lucene  revolu+on  2013 CONFERENCE PARTY The Tipsy Crow: 770 5th Ave Starts after Stump The Chump Your conference badge gets you in the door TOMORROW Breakfast starts at 7:30 Keynotes start at 8:30 CONTACT Stephane Gamard stephane.gamard@searchbox.com 25Lucene  revolu+on  2013