Scaling search with Solr Cloud

7,259 views

Published on

Enterprise search can grow big, really big! And growing. Tens, yes hundreds of servers may be involved, locally or in the cloud. Managing this has been complex and time consuming - until now :)

SolrCloud to the rescue
Using the world's most popular Open Source search engine, Apache Solr™, we will show you how the new upcoming version 4.0 makes scaling search in the cloud really simple and robust. A new feature called SolrCloud adds centralized configuration, distributed indexing & searching, automatic failover, recovery and leader election. Scaling is now as simple as adding a new server to your cluster and it will find its role where it is most needed and start serving searches.

0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,259
On SlideShare
0
From Embeds
0
Number of Embeds
1,659
Actions
Shares
0
Downloads
141
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Scaling search with Solr Cloud

  1. 1. 1 Jan Høydahl Scaling search withCominvent AS SolrCloud
  2. 2. 2Jan Høydahl 1995: Developer telecom 1998: Java developer 2000: Search - FAST 2006: Lucene 2007: new Cominvent() 2009: Lucene/Solr 2011: Lucene committer 2012: Lucene PMC > 100 projects
  3. 3. 3
  4. 4. 4 About Cominvent Business critical search Domain knowledge & best practices:Consulting Training Support
  5. 5. 5SolrTraining.com Next course in Oslo: SEPTEMBER 2012 MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Calendar from www.calendar-of-2012.com
  6. 6. 6http://www.meetup.com/Oslo-Solr-Community CommunityZone talk: «Solr 101» Thursday 14:20
  7. 7. 7http://www.meetup.com/Oslo-Solr-Community next MeetUp:
  8. 8. 8ApacheCon Europe 2012• Sinsheim, Germany • Lucene/Solr track• November 5-8 • www.apachecon.eu
  9. 9. 9Agenda• Intro to Solr• Scaling search - before• Introduction to SolrCloud• Demo with Wikipedia data• Plans for Solr going forward• Q&A
  10. 10. 10Intro to Solr
  11. 11. 11Apache Solr Search Server
  12. 12. 12Completely HTTP based
  13. 13. 13
  14. 14. 14Areas of use
  15. 15. 15Example: e-commerce www.libris.no
  16. 16. 16Boosting by functionBoosting on review popularityand sales numbers:log(sum(popularity,numsold))
  17. 17. 17Auto suggest & phonetic normalization
  18. 18. 18Example: classifieds/auctions www.finn.no
  19. 19. 18Example: classifieds/auctions www.finn.no
  20. 20. 19Who use Apache Lucene/Solr™ ?..and many more:http://wiki.apache.org/solr/PublicServers
  21. 21. 20Versions• Current stable = 3.6.1• Latest release = 4.0-beta• Next release = 4.0-FINAL --- «soon» :-) v3.6.1 v1.1 v1.3 v1.4 v3.1 v3.3 v3.6 v4.0a v4.0ß v4.0 01/2007 09/2008 11/2009 03/2011 06/2011 04/2012 06/2012 08/2012 ??/2012 07/2012
  22. 22. 21Scaling search
  23. 23. 22Why scale?• One single Solr server handles... –millions of documents (per shard) –hundreds of queries per second (per replica)• We need to scale if... –data volume increases –query volume increases –we need high availability / fault tolerance
  24. 24. 23Scaling search - before Solr shard 1 - config, schema - synonyms
  25. 25. 23Scaling search - before - Add shard node - Manually copy config - Manually index to right shard - Manually shards query parameter Solr shard 1 Solr shard 2 - config, schema - config, schema - synonyms - synonyms
  26. 26. 23Scaling search - before - Add shard node - Manually copy config - Manually index to right shard - Manually shards query parameter Solr shard 1 Solr shard 2 - config, schema - config, schema - synonyms - synonyms Solr 1 replica Solr 2 replica - config, schema - config, schema - synonyms - synonyms - Add replica node - Copy config - Setup poll based replication - No indexing failover - Monitor every node
  27. 27. 24Solr Cloud
  28. 28. 25What is SolrCloud?• New in Solr 4.0• Easier scaling ZooKeeper: «Because• Centralized config coordinating distributed systems is a Zoo»• Fault tolerant indexing and querying• Using Apache ZooKeeper as «registry»
  29. 29. 26What is SolrCloud
  30. 30. 26What is SolrCloud
  31. 31. 26What is SolrCloud
  32. 32. 26What is SolrCloud Logical collection
  33. 33. 26What is SolrCloud Logical collection Soft commit Transaction log
  34. 34. 27Scaling search - with SolrCloudSolr master 1ZK aware Apache ZooKeeper
  35. 35. 27Scaling search - with SolrCloud - Add shard node, point it to ZK - It assumes the role of shard 2 - Automatic document distribution - Automatic querying across cluster - Centralized config & monitoringSolr master 1 Solr master 2ZK aware ZK aware Apache ZooKeeper
  36. 36. 27Scaling search - with SolrCloud - Add shard node, point it to ZK - It assumes the role of shard 2 - Automatic document distribution - Automatic querying across cluster - Centralized config & monitoringSolr master 1 Solr master 2ZK aware ZK aware Apache ZooKeeperSolr replica 1 Solr replica 2ZK aware ZK aware - Add replica node(s) - Auto role assignment - Push based replication - Indexing failover - Leader election through ZK
  37. 37. 27Scaling search - with SolrCloudSolr master 1 Solr master 2ZK aware ZK aware Apache ZooKeeperSolr replica 1 Solr replica 2ZK aware ZK aware
  38. 38. 27Scaling search - with SolrCloudSolr master 1 Solr master 2ZK aware ZK aware Apache ZooKeeperSolr replica 1 Solr replica 2 masterZK aware ZK aware
  39. 39. 27Scaling search - with SolrCloudSolr master 1 Solr master 2 replicaZK aware ZK aware Apache ZooKeeperSolr replica 1 Solr replica 2 masterZK aware ZK aware
  40. 40. 28ConfigurationSolr master 1 Solr master 2ZK aware ZK aware ZKSolr replica 1 Solr replica 2ZK aware ZK aware
  41. 41. 28Configuration -DzkRunSolr master 1 -Dcollection.configName=jz Solr master 2 -DnumShards=2ZK aware -Dbootstrap_confdir=./solr/coll/conf ZK aware ZKSolr replica 1 Solr replica 2ZK aware ZK aware
  42. 42. 28Configuration -DzkRunSolr master 1 -Dcollection.configName=jz Solr master 2 -DnumShards=2 -DzkHost=localhost:xxxxZK aware -Dbootstrap_confdir=./solr/coll/conf ZK aware ZKSolr replica 1 Solr replica 2 -DzkHost=localhost:xxxx -DzkHost=localhost:xxxxZK aware ZK aware
  43. 43. 29 Demoindexing & querying
  44. 44. 30Solr 4.0 and beyond• Other news in v4.0 FINAL (expected later this autumn) –NRT –Real-time GET –Smaller index & memory footprint –New «modern» Admin GUI –Incremental updates –Pseudo-join• Future plans –More shard distribution mechanisms –Re-balancing cluster (split shards) –...
  45. 45. 31Recap• Apache Solr open source enterprise search• Scaling Solr was hard• Solr 4.0 with SolrCloud makes it easy :) –Centralized config –Effortless scaling of cluster –Fault tolerant indexing & querying• Download the 4.0-beta today, 4.0-FINAL soon
  46. 46. 32Remember Next Solr course in Oslo: SEPTEMBER 2012 MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CommunityZone talk: «Solr 101» 17 18 19 20 21 22 23 Thursday 14:20 24 25 26 27 28 29 30 Calendar from www.calendar-of-2012.com www.solrkurs.no
  47. 47. 33? Jan Høydahl Cominvent AS @cominvent

×