2 th MeetUp May 8 2011 ()– Velkommen Bakgrunnen for MeetUpen– (Reklamepause)– Presentasjonsrunde– Ønsker for MeetUp-gruppen (diskusjon)– Lyn-taler á 10min (ca kl 18:30-19:00) • Sture Svensson ""Querying Solr in various ways" • Jan Høydahl ""What can I do with SolrCloud today" • NN?– Formelt slutt (ca 19:15)– Mingling...
3Scaling & HA (redundancy) ()– Index up to 25-100 million documents on a single server* • Scale linearly by adding servers (shards)– Query up to 50-1000 QPS on a single server • Scale linearly by adding servers (replicas)– Add redundancy or backup through extra replicas– Built-in software Load Balancer, auto failover– Indexing redundancy not out of the box • But possible to have every row do index+search– High Availability for config/admin using Apache ZooKeeper (TRUNK)
5 Replication () – Goals: • Increase QPS capacity • High availability of search – Replication adds another "search row" – Done as a PULL from slave – ReplicationHandler is configured in solrconfig.xmlhttp://wiki.apache.org/solr/SolrReplication
6Sharding ()– Goals: • Split an index too large for one box into smaller chunks • Lower HW footprint by smart partitioning of data – News search: One shard for last month, one shard per year • Lower latency by having smaller index per node– A shard is a core which participates in a collection • Shards A and B may thus be on different or same host • Shards A and B should but do not need to share schema– Shard distribution must be done by client application, adding documents to correct shard based on some policy • Most common policy is hash-based distribution • May also be date based or whatever client chooses– Work under way to add shard distribution natively to Solr, see SOLR-2358
7 Solr Cloud () – Solr Cloud is the popular name for an initiative to make Solr more easily scalable and managable in a distributed world – Enables centralized configuration and cluster status monitoring – Solr TRUNK contains the first features • Apache ZooKeeper support, including built-in ZK • Support for easy distrib=true query (by means of ZK) • NOTE: Still experimental, work in progress – Expected features to come • Auto index shard distribution using ZK • Tools to manage the config in ZK • Easy addition of row/shard through API – NOTE: We do not know when SolrCloud will be included in a released version of Solr. If you need it, use TRUNKhttp://wiki.apache.org/solr/SolrCloud
8 Solr Cloud... () – Setting up SolrCloud for our YP example • Well setup a 4-node cluster on our laptops using four instances of Jetty, on different ports • Well have 2 shards, each with one replica • Well index 5000 listings to each shard • And finally do distributed queries • For convenience, well use the ZK shipping with Solr – Bootstrapping ZooKeeper to create a config "yp-conf" • java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=yp-conf -DzkRun -jar start.jar – Starting the other Jetty nodes • java -Djetty.port=<port> -DhostPort=<port> -DzkHost=localhost:9983 -jar start.jar – Zookeeper admin • http://localhost:8983/solr/yp/admin/zookeeper.jsphttp://wiki.apache.org/solr/SolrCloud
9 Solr Cloud... () – Solr Cloud will resolve all shards and replicas in a collection based on what is configured in solr.xml – Querying /solr/yp/select?q=foo&distrib=true on this core will cause SolrCloud to resolve the core name to "yp-cloud" and then distribute the request to each of the shards which are members of the same collection – Often, the core name and collection name will be the same – SolrCloud will load balance between replicas within the same shardhttp://wiki.apache.org/solr/SolrCloud
10Solr Cloud, 2x2 setup () localhost:8983 localhost:7973 Run ZK: localhost:9983 Run ZK: no -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (master) Shard: B (master) Colleciton: yp-collection Colleciton: yp-collection localhost:6963 localhost:5953 Run ZK: no Run ZK: N/A -DzkHost=localhost:9983 -DzkHost=localhost:9983 Core: yp Core: yp Shard: A (replica) Shard: B (replica) Colleciton: yp-collection Colleciton: yp-collection