• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 

Solr cluster with SolrCloud at lucenerevolution (tutorial)

on

  • 2,360 views

In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using ...

In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shards and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster. We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.

Statistics

Views

Total Views
2,360
Views on SlideShare
2,048
Embed Views
312

Actions

Likes
1
Downloads
49
Comments
0

4 Embeds 312

http://www.scoop.it 285
http://www.searchbox.com 16
https://twitter.com 9
http://webcache.googleusercontent.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Solr cluster with SolrCloud at lucenerevolution (tutorial) Solr cluster with SolrCloud at lucenerevolution (tutorial) Presentation Transcript

    • Lucene  revolu+on  2013SIMPLE & “CHEAP” SOLR CLUSTERStéphane GamardSearchbox CTO <stephane.gamard@searchbox.com>1Lucene  revolu+on  2013
    • Lucene  revolu+on  2013 2Searchbox  -­‐  Search  as  a  Service“We  are  in  the  business  of  providing  search  engines  on  demand”                                                
    • Lucene  revolu+on  2013Solr  Provisioning3High  Availability• Redundancy• Sustained  QPS• Monitoring• RecoveryIndex  Provisioning• Collec+on  crea+on• Cluster  resizing• Node  distribu+on
    • Lucene  revolu+on  2013Solr  Clustering4LBMasterSlaveSlaveMasterSlaveBackup BackupMasterSlaveSlaveLBMonitoringBefore  4.x:Master/SlaveCustom  Rou+ngComplex  Provisioning
    • Lucene  revolu+on  2013Solr  Clustering5A6er  4.x:NodesAutoma+c  Rou+ngSimple  ProvisioningNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LBThank  you    to  the  SolrCloud  Team  !!!
    • Lucene  revolu+on  2013What  is  SolrCloud?6Backward  compa=bility• Plain  old  Solr  (with  Lucene  4.x)• Same  schema• Same  solrconfig• Same  pluginsSome  plugins  might  need  update  (distrib)
    • Lucene  revolu+on  2013What  is  SolrCloud?7Centralized  configura=on• /conf• /conf/schema.xml• /conf/solrconfig.xml• numShards• replica+onFactor• ...NodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB
    • Lucene  revolu+on  2013What  is  SolrCloud?8Configura=on  &  Architecture  Agnos=c  NodesNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB• ZK  driven  configura+on• Shard  (1  core)• ZK  driven  role:• Leader• Replica• Peer    &  Replica+on• Disposable
    • Lucene  revolu+on  2013What  is  SolrCloud?9Automa=c  Rou=ngNodeMonitoringNode Node NodeZKNodeNode NodeZKZKLB LB• Smart  client  connect  to  ZK• Any  node  can  forward  a  requests  to  node  that  can  process  it
    • Lucene  revolu+on  2013What  is  SolrCloud?10Collec=on  API• Abstrac+on  level• An  index  is  a  collec+on• A  collec+on  is  a  set  of  shards• A  shard  is  a    set  of  cores• CRUD  API  for  collec+on“Collec?ons  represents  a  set  of  cores  with  iden)cal  configura?on.  The  set  of  cores  of  a  collec?on  covers  the  en?re  index”
    • Lucene  revolu+on  2013What  is  SolrCloud?11NodeCoreShardCollec=on Abstrac+on  level  of  interac+on  &  configScaling  factor  for  collec+on  size  (numShards)Scaling  factor  for  QPS  (replica?onFactor)Scaling  factor  for  cluster  size  (liveNodes)=>  SolrCloud  is  highly  geared  toward  horizontal  scaling
    • Lucene  revolu+on  2013 12nodes  =>  Single  effort  for  scalability  That’s  SolrCloudHigh  Availability• Redundancy• Sustained  QPS• Monitoring• Recovery#  replicasZK  (clusterstatus,  livenodes)peer  &  replica+on#  replicas  &  #  shards
    • Lucene  revolu+on  2013 13CollectionShardsCoresNodesSolrCloud  -­‐  DesignKey  metrics• Collec+on  size  &  complexity• JVM  requirement• Node  requirement
    • Lucene  revolu+on  2013 14SolrCloud  -­‐  Collec+on  MetricsPubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  Fields
    • Lucene  revolu+on  2013 15A  note  on  sharding “The  magic  sauce  of  webscale”Ram  requirement  effect0"1000"2000"3000"4000"5000"6000"0" 2" 4" 6" 8" 10" 12"RAM$/$Shard$# shardsram
    • Lucene  revolu+on  2013 16A  note  on  sharding “The  magic  sauce  of  webscale”Disk  requirement  effect0"5"10"15"20"25"30"35"40"45"50"0" 2" 4" 6" 8" 10" 12" 14" 16"Disk%/%shard%# shardsdiskspace“hidden  quote  for  the  book”
    • Lucene  revolu+on  2013 17SolrCloud  -­‐  Collec+on  Configura+onPubmed  Index• ~12M  documents• 7  indexed  fields• 2  TF  fields• 3  sorted  Fields• 5  stored  FieldsConfigura=on• numShards:  3• replica+onFactor:  2• JVM  ram:  ~3G• Disk:  ~15G
    • Lucene  revolu+on  2013 18SolrCloud  -­‐  Core  SizingHeuris=cally  inferred  from  “experience”• Size  on  shard,  not  collec+on• Do  NOT  starve  resources  on  nodes• Senle  for  JVM/Disk  sizing  • Large  amount  of  spare  disk  (op+mize)RAM Disk3  G 60  G
    • Lucene  revolu+on  2013 19SolrCloud  -­‐  Cluster  AvailabilityDepends  on  the  nodes!!!Instance ram disk $/h Nodes Min Size $/core/mm1.medium 3.75 410 0.12 1 6 6 87m1.large 7.5 850 0.24 2 6 12 87m1.xlarge 15 1690 0.48 5 6 30 70m2.xlarge 17.1 420 0.41 5 6 30 60m2.2xlarge 34.2 850 0.82 11 6 66 54m1.medium 3.75 410 0.12 3 6 18 28CCtrl  (paas) 1.02 420 -­‐ 1 6 6 75( )
    • Lucene  revolu+on  2013 20SolrCloud  -­‐  MonitoringSolr  Monitoring• clusterstate.json• /livenodesNode  Monitoring  *• load  average• core-­‐to-­‐resource  consump+on  (Core  to  CPU)• collec+on-­‐to-­‐node  consump+on  (LB  logs)
    • Lucene  revolu+on  2013 21SolrCloud  -­‐  ProvisioningStand-­‐by  nodes• Automa+cally  assigned  as  replica• provides  a  metric  of  HANode  addi=on  *  (self  healing)• Scheduled  check  on  cluster  conges+on• Automa+cally  spawn  new  nodes  per  need
    • Lucene  revolu+on  2013 22SolrCloud  -­‐  ConclusionUsing  SolrCloud  is  like  juggling• Gets  bener  with  prac+ce• There  is  always  some  magic  leq• Could  become  very  overwhelming• When  it  fails  you  loose  your  ballsTest  -­‐>  Test  -­‐>  Test  -­‐>  some  more  Tests  -­‐>  Test
    • Lucene  revolu+on  2013 23What  would  make  our  current  SolrCloud  cluster  even  more  awesome:• Balance/distribute  core  based  on  machine  load• Standby  core  (replicas  not  serving  request  and  auto-­‐shurng  downNext  Steps
    • Lucene  revolu+on  2013 24Requirement  for  solrCloud:• Solr  Mailing  list:  solr-­‐user@lucene.apache.orgFurther  informa+on• blogs  &  feed:  hnp://www.searchbox.com/blog/• Searchbox  email:  contact@searchbox.comFurther  Informa+on
    • Lucene  revolu+on  2013CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge getsyou in the doorTOMORROWBreakfast starts at 7:30Keynotes start at 8:30CONTACTStephane Gamardstephane.gamard@searchbox.com25Lucene  revolu+on  2013