Solr cluster with SolrCloud at lucenerevolution (tutorial)


Published on

In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shards and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster. We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Solr cluster with SolrCloud at lucenerevolution (tutorial)

  1. 1. Lucene  revolu+on  2013SIMPLE & “CHEAP” SOLR CLUSTERStéphane GamardSearchbox CTO <>1Lucene  revolu+on  2013
  2. 2. Lucene  revolu+on  2013 2Searchbox  -­‐  Search  as  a  Service“We  are  in  the  business  of  providing  search  engines  on  demand”                                                
  3. 3. Lucene  revolu+on  2013Solr  Provisioning3High  Availability• Redundancy• Sustained  QPS• Monitoring• RecoveryIndex  Provisioning• Collec+on  crea+on• Cluster  resizing• Node  distribu+on
  4. 4. Lucene  revolu+on  2013Solr  Clustering4LBMasterSlaveSlaveMasterSlaveBackup BackupMasterSlaveSlaveLBMonitoringBefore  4.x:Master/SlaveCustom  Rou+ngComplex  Provisioning
  5. 5. Lucene  revolu+on  2013Solr  Clustering5A6er  4.x:NodesAutoma+c  Rou+ngSimple  ProvisioningNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LBThank  you    to  the  SolrCloud  Team  !!!
  6. 6. Lucene  revolu+on  2013What  is  SolrCloud?6Backward  compa=bility• Plain  old  Solr  (with  Lucene  4.x)• Same  schema• Same  solrconfig• Same  pluginsSome  plugins  might  need  update  (distrib)
  7. 7. Lucene  revolu+on  2013What  is  SolrCloud?7Centralized  configura=on• /conf• /conf/schema.xml• /conf/solrconfig.xml• numShards• replica+onFactor• ...NodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB
  8. 8. Lucene  revolu+on  2013What  is  SolrCloud?8Configura=on  &  Architecture  Agnos=c  NodesNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB• ZK  driven  configura+on• Shard  (1  core)• ZK  driven  role:• Leader• Replica• Peer    &  Replica+on• Disposable
  9. 9. Lucene  revolu+on  2013What  is  SolrCloud?9Automa=c  Rou=ngNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB• Smart  client  connect  to  ZK• Any  node  can  forward  a  requests  to  node  that  can  process  it
  10. 10. Lucene  revolu+on  2013What  is  SolrCloud?10Collec=on  API• Abstrac+on  level• An  index  is  a  collec+on• A  collec+on  is  a  set  of  shards• A  shard  is  a    set  of  cores• CRUD  API  for  collec+on“Collec?ons  represents  a  set  of  cores  with  iden)cal  configura?on.  The  set  of  cores  of  a  collec?on  covers  the  en?re  index”
  11. 11. Lucene  revolu+on  2013What  is  SolrCloud?11NodeCoreShardCollec=on Abstrac+on  level  of  interac+on  &  configScaling  factor  for  collec+on  size  (numShards)Scaling  factor  for  QPS  (replica?onFactor)Scaling  factor  for  cluster  size  (liveNodes)=>  SolrCloud  is  highly  geared  toward  horizontal  scaling
  12. 12. Lucene  revolu+on  2013 12nodes  =>  Single  effort  for  scalability  That’s  SolrCloudHigh  Availability• Redundancy• Sustained  QPS• Monitoring• Recovery#  replicasZK  (clusterstatus,  livenodes)peer  &  replica+on#  replicas  &  #  shards
  13. 13. Lucene  revolu+on  2013 13CollectionShardsCoresNodesSolrCloud  -­‐  DesignKey  metrics• Collec+on  size  &  complexity• JVM  requirement• Node  requirement
  14. 14. Lucene  revolu+on  2013 14SolrCloud  -­‐  Collec+on  MetricsPubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  Fields
  15. 15. Lucene  revolu+on  2013 15A  note  on  sharding “The  magic  sauce  of  webscale”Ram  requirement  effect0"1000"2000"3000"4000"5000"6000"0" 2" 4" 6" 8" 10" 12"RAM$/$Shard$# shardsram
  16. 16. Lucene  revolu+on  2013 16A  note  on  sharding “The  magic  sauce  of  webscale”Disk  requirement  effect0"5"10"15"20"25"30"35"40"45"50"0" 2" 4" 6" 8" 10" 12" 14" 16"Disk%/%shard%# shardsdiskspace“hidden  quote  for  the  book”
  17. 17. Lucene  revolu+on  2013 17SolrCloud  -­‐  Collec+on  Configura+onPubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  FieldsConfigura=on• numShards:  3• replica+onFactor:  2• JVM  ram:  ~3G• Disk:  ~15G
  18. 18. Lucene  revolu+on  2013 18SolrCloud  -­‐  Core  SizingHeuris=cally  inferred  from  “experience”• Size  on  shard,  not  collec+on• Do  NOT  starve  resources  on  nodes• Senle  for  JVM/Disk  sizing  • Large  amount  of  spare  disk  (op+mize)RAM Disk3  G 60  G
  19. 19. Lucene  revolu+on  2013 19SolrCloud  -­‐  Cluster  AvailabilityDepends  on  the  nodes!!!Instance ram disk $/h Nodes Min Size $/core/mm1.medium 3.75 410 0.12 1 6 6 87m1.large 7.5 850 0.24 2 6 12 87m1.xlarge 15 1690 0.48 5 6 30 70m2.xlarge 17.1 420 0.41 5 6 30 60m2.2xlarge 34.2 850 0.82 11 6 66 54m1.medium 3.75 410 0.12 3 6 18 28CCtrl  (paas) 1.02 420 -­‐ 1 6 6 75( )
  20. 20. Lucene  revolu+on  2013 20SolrCloud  -­‐  MonitoringSolr  Monitoring• clusterstate.json• /livenodesNode  Monitoring  *• load  average• core-­‐to-­‐resource  consump+on  (Core  to  CPU)• collec+on-­‐to-­‐node  consump+on  (LB  logs)
  21. 21. Lucene  revolu+on  2013 21SolrCloud  -­‐  ProvisioningStand-­‐by  nodes• Automa+cally  assigned  as  replica• provides  a  metric  of  HANode  addi=on  *  (self  healing)• Scheduled  check  on  cluster  conges+on• Automa+cally  spawn  new  nodes  per  need
  22. 22. Lucene  revolu+on  2013 22SolrCloud  -­‐  ConclusionUsing  SolrCloud  is  like  juggling• Gets  bener  with  prac+ce• There  is  always  some  magic  leq• Could  become  very  overwhelming• When  it  fails  you  loose  your  ballsTest  -­‐>  Test  -­‐>  Test  -­‐>  some  more  Tests  -­‐>  Test
  23. 23. Lucene  revolu+on  2013 23What  would  make  our  current  SolrCloud  cluster  even  more  awesome:• Balance/distribute  core  based  on  machine  load• Standby  core  (replicas  not  serving  request  and  auto-­‐shurng  downNext  Steps
  24. 24. Lucene  revolu+on  2013 24Requirement  for  solrCloud:• Solr  Mailing  list:  solr-­‐user@lucene.apache.orgFurther  informa+on• blogs  &  feed:  hnp://• Searchbox  email:  contact@searchbox.comFurther  Informa+on
  25. 25. Lucene  revolu+on  2013CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge getsyou in the doorTOMORROWBreakfast starts at 7:30Keynotes start at 8:30CONTACTStephane Gamardstephane.gamard@searchbox.com25Lucene  revolu+on  2013