First oslo solr community meetup lightning talk janhoy
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

First oslo solr community meetup lightning talk janhoy

on

  • 2,355 views

Lightning talk by Jan Høydahl on SolrCloud

Lightning talk by Jan Høydahl on SolrCloud

Statistics

Views

Total Views
2,355
Views on SlideShare
2,355
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

First oslo solr community meetup lightning talk janhoy Presentation Transcript

  • 1. Programmet starter...Sponsors:
  • 2. 2 th MeetUp May 8 2011 ()– Velkommen Bakgrunnen for MeetUpen– (Reklamepause)– Presentasjonsrunde– Ønsker for MeetUp-gruppen (diskusjon)– Lyn-taler á 10min (ca kl 18:30-19:00) • Sture Svensson ""Querying Solr in various ways" • Jan Høydahl ""What can I do with SolrCloud today" • NN?– Formelt slutt (ca 19:15)– Mingling...
  • 3. 3Scaling & HA (redundancy) ()– Index up to 25-100 million documents on a single server* • Scale linearly by adding servers (shards)– Query up to 50-1000 QPS on a single server • Scale linearly by adding servers (replicas)– Add redundancy or backup through extra replicas– Built-in software Load Balancer, auto failover– Indexing redundancy not out of the box • But possible to have every row do index+search– High Availability for config/admin using Apache ZooKeeper (TRUNK)
  • 4. 4Solr scaling example ()
  • 5. 5 Replication () – Goals: • Increase QPS capacity • High availability of search – Replication adds another "search row" – Done as a PULL from slave – ReplicationHandler is configured in solrconfig.xmlhttp://wiki.apache.org/solr/SolrReplication
  • 6. 6Sharding ()– Goals: • Split an index too large for one box into smaller chunks • Lower HW footprint by smart partitioning of data – News search: One shard for last month, one shard per year • Lower latency by having smaller index per node– A shard is a core which participates in a collection • Shards A and B may thus be on different or same host • Shards A and B should but do not need to share schema– Shard distribution must be done by client application, adding documents to correct shard based on some policy • Most common policy is hash-based distribution • May also be date based or whatever client chooses– Work under way to add shard distribution natively to Solr, see SOLR-2358
  • 7. 7 Solr Cloud () – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr TRUNK contains the first features • Apache ZooKeeper support, including built-in ZK • Support for easy distrib=true query (by means of ZK) • NOTE: Still experimental, work in progress – Expected features to come • Auto index shard distribution using ZK • Tools to manage the config in ZK • Easy addition of row/shard through API – NOTE: We do not know when SolrCloud will be included in a released version of Solr. If you need it, use TRUNKhttp://wiki.apache.org/solr/SolrCloud
  • 8. 8 Solr Cloud... () – Setting up SolrCloud for our YP example • Well setup a 4-node cluster on our laptops using four instances of Jetty, on different ports • Well have 2 shards, each with one replica • Well index 5000 listings to each shard • And finally do distributed queries • For convenience, well use the ZK shipping with Solr – Bootstrapping ZooKeeper to create a config "yp-conf" • java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=yp-conf -DzkRun -jar start.jar – Starting the other Jetty nodes • java -Djetty.port=<port> -DhostPort=<port> -DzkHost=localhost:9983 -jar start.jar – Zookeeper admin • http://localhost:8983/solr/yp/admin/zookeeper.jsphttp://wiki.apache.org/solr/SolrCloud
  • 9. 9 Solr Cloud... () – Solr Cloud will resolve all shards and replicas in a collection based on what is configured in solr.xml – Querying /solr/yp/select?q=foo&distrib=true on this core will cause SolrCloud to resolve the core name to "yp-cloud" and then distribute the request to each of the shards which are members of the same collection – Often, the core name and collection name will be the same – SolrCloud will load balance between replicas within the same shardhttp://wiki.apache.org/solr/SolrCloud
  • 10. 10Solr Cloud, 2x2 setup () localhost:8983 localhost:7973 Run ZK: localhost:9983 Run ZK: no -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (master) Shard: B (master) Colleciton: yp-collection Colleciton: yp-collection localhost:6963 localhost:5953 Run ZK: no Run ZK: N/A -DzkHost=localhost:9983 -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (replica) Shard: B (replica) Colleciton: yp-collection Colleciton: yp-collection