Your SlideShare is downloading. ×
First oslo solr community meetup lightning talk janhoy
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

First oslo solr community meetup lightning talk janhoy


Published on

Lightning talk by Jan Høydahl on SolrCloud

Lightning talk by Jan Høydahl on SolrCloud

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Programmet starter...Sponsors:
  • 2. 2 th MeetUp May 8 2011 ()– Velkommen Bakgrunnen for MeetUpen– (Reklamepause)– Presentasjonsrunde– Ønsker for MeetUp-gruppen (diskusjon)– Lyn-taler á 10min (ca kl 18:30-19:00) • Sture Svensson ""Querying Solr in various ways" • Jan Høydahl ""What can I do with SolrCloud today" • NN?– Formelt slutt (ca 19:15)– Mingling...
  • 3. 3Scaling & HA (redundancy) ()– Index up to 25-100 million documents on a single server* • Scale linearly by adding servers (shards)– Query up to 50-1000 QPS on a single server • Scale linearly by adding servers (replicas)– Add redundancy or backup through extra replicas– Built-in software Load Balancer, auto failover– Indexing redundancy not out of the box • But possible to have every row do index+search– High Availability for config/admin using Apache ZooKeeper (TRUNK)
  • 4. 4Solr scaling example ()
  • 5. 5 Replication () – Goals: • Increase QPS capacity • High availability of search – Replication adds another "search row" – Done as a PULL from slave – ReplicationHandler is configured in solrconfig.xml
  • 6. 6Sharding ()– Goals: • Split an index too large for one box into smaller chunks • Lower HW footprint by smart partitioning of data – News search: One shard for last month, one shard per year • Lower latency by having smaller index per node– A shard is a core which participates in a collection • Shards A and B may thus be on different or same host • Shards A and B should but do not need to share schema– Shard distribution must be done by client application, adding documents to correct shard based on some policy • Most common policy is hash-based distribution • May also be date based or whatever client chooses– Work under way to add shard distribution natively to Solr, see SOLR-2358
  • 7. 7 Solr Cloud () – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr TRUNK contains the first features • Apache ZooKeeper support, including built-in ZK • Support for easy distrib=true query (by means of ZK) • NOTE: Still experimental, work in progress – Expected features to come • Auto index shard distribution using ZK • Tools to manage the config in ZK • Easy addition of row/shard through API – NOTE: We do not know when SolrCloud will be included in a released version of Solr. If you need it, use TRUNK
  • 8. 8 Solr Cloud... () – Setting up SolrCloud for our YP example • Well setup a 4-node cluster on our laptops using four instances of Jetty, on different ports • Well have 2 shards, each with one replica • Well index 5000 listings to each shard • And finally do distributed queries • For convenience, well use the ZK shipping with Solr – Bootstrapping ZooKeeper to create a config "yp-conf" • java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=yp-conf -DzkRun -jar start.jar – Starting the other Jetty nodes • java -Djetty.port=<port> -DhostPort=<port> -DzkHost=localhost:9983 -jar start.jar – Zookeeper admin • http://localhost:8983/solr/yp/admin/zookeeper.jsp
  • 9. 9 Solr Cloud... () – Solr Cloud will resolve all shards and replicas in a collection based on what is configured in solr.xml – Querying /solr/yp/select?q=foo&distrib=true on this core will cause SolrCloud to resolve the core name to "yp-cloud" and then distribute the request to each of the shards which are members of the same collection – Often, the core name and collection name will be the same – SolrCloud will load balance between replicas within the same shard
  • 10. 10Solr Cloud, 2x2 setup () localhost:8983 localhost:7973 Run ZK: localhost:9983 Run ZK: no -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (master) Shard: B (master) Colleciton: yp-collection Colleciton: yp-collection localhost:6963 localhost:5953 Run ZK: no Run ZK: N/A -DzkHost=localhost:9983 -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (replica) Shard: B (replica) Colleciton: yp-collection Colleciton: yp-collection