Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.
3. Solr History
Solr 4.1 and counting
Solr 4.0 released
Lucene / Solr merge
Solr 1.4 released
Solr 1.3 released
Incubator graduation
Solr donated to ASF
Y. Seeley creates Solr
17. Collection with Replica
Solr Server
Solr Server
$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=revolution&numShards=2&replicationFactor=2'
Solr Server
ZooKeeper
ZooKeeper
ZooKeeper
Solr Server
21. Shard and Replica Number
How your data looks
Expected data growth
Target performance
Target node number
Max number of nodes = number of
shards * (number of replicas + 1)
22. What should I go for?
More data?
Shard
Shard
Shard
More queries ?
Replica
Replica
Replica
Replica
Replica
Replica
38. Column oriented fields - DocValues
NRT compatible
Better compression than field cache
Can store data outside of JVM heap
Can improve things for dynamic indices
<field name="categories" type="string" indexed="false"
stored="false" multiValued="true" docValues="true"/>
<field name="categories" type="string" indexed="false"
stored="false" multiValued="true" docValues="true"
docValuesFormat="Disk"/>
56. We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html