SlideShare a Scribd company logo
1 of 94
Download to read offline
Optimize Is (Not) Bad For You
Deep Dive Into The Segment Merge Abyss
Rafał Kuć
Sematext Group, Inc.
Agenda
•  Segments – where, what & how
•  Writing segments
•  Modifying segments
•  Segment merging – what, where, how, why
•  Force merging
•  Force merging & SolrCloud
•  Performance considerations
•  Specialized merge policies
https://github.com/sematext/lr/tree/master/2017/optimize
01
Sematext & I
cloud
metrics
logs
&
01
Solr Collection Architecture
Zookeeper
01
Solr Collection Architecture
Zookeeper
SOLR
SOLR
SOLR
SOLR
01
Solr Collection Architecture
Zookeeper
SOLR
shard shard
SOLR
shard shard
SOLR
shard shard
SOLR
shard shard
01
Solr Shard Architecture
TLOG
01
Solr Shard Architecture
TLOG
Segment Segment Segment
Segment
01
Lucene Segment
Segment Info
Field Names
Stored Field Values
Point Values
Term Dictionary
Term Frequency
Term Proximity
Normalization
Per Document Vals
Live Documents
01
Inside the Segment – Term Dictionary
TERM DOCID
!
lucene!
!
!
<1>, <2>!
!
revolution!
!
!
<1>, <2>!
!
washington!
!
!
<1>!
!
boston!
!
!
<2>!
_1.tim
}
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
_1.tip
01
Inside the Segment – Doc Values
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
DOCID FIELD VALUE
!
1!
!
Title!
!
Lucene Revolution Washington!
!
!
1!
!
City!
!
Washington D.C.!
!
2!
!
Title!
!
Lucene Revolution Boston!
!
2!
!
City!
!
Boston!
_1.dvd
} _1.dvm
01
Inside the Segment – Stored Fields
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
DOCID VALUE
!
!
1!
!
!
!
Title: Lucene Revolution Washington!
!
City: Washington D.C!
!
!
!
2!
!
!
!
Title: Lucene Revolution Boston!
!
City: Boston!
!
_1.fdx
} _1.fdt
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
_2.cfs
_2.cfe
01
Indexing
01
Indexing
01
Indexing
01
Indexing
level/tier
01
Indexing
01
Indexing
01
Indexing
01
Indexing
01
Indexing
01
Indexing
01
Indexing
01
Deletes
01
Deletes – After Merge
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
retrieve document
{
"id" : 3,
"tags" : [ "lucene" ],
"awesome" : true
}
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
apply changes
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
delete old document
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
01
Atomic Updates – In Place
Works on top of numeric, doc values based fields
Fields need to be not indexed and not stored
Doesn’t require delete/index
Support only inc and set modifers
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
retrieve document
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true,
"views" : 100
}
apply changes
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true,
"views" : 100
}
update doc values
01
Search – Importance of Segments
Immutable –	
  write once read many
01
Search – Importance of Segments
Immutable –	
  write once read many
More segments –	
  slower search speed
01
Search – Importance of Segments
Immutable –	
  write once read many
More segments –	
  slower search speed
Fewer segments –	
  faster searches
01
Search – Importance of Segments
Immutable –	
  write once read many
More segments –	
  slower search speed
Fewer segments –	
  faster searches
Fewer segments –	
  smaller shard size
01
Search – Importance of Segments
Immutable –	
  write once read many
More segments –	
  slower search speed
Fewer segments –	
  faster searches
Fewer segments –	
  smaller shard size
Rapid segment changes –	
  worse I/O cache usage
01
Taking Control
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
01
Taking Control
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler"	
  />	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Taking Control
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler"	
  />	
  	
  	
  	
  	
  	
  	
  	
  	
  
Segment Warmer
<mergedSegmentWarmer	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  class="org.apache.lucene.index.SimpleMergedSegmentWarmer"	
  />	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Taking Control – Default Indexing Throughput
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
01
Taking Control – Default Indexing Throughput
throughput < 5k/sec @ ~14GB
01
Taking Control – Max Merged Segment Size
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Lower higher indexing throughput – smaller segments
Higher better search latency (depends) – more merges
01
Taking Control – Lowering Max Merged Size
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">512</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
01
Taking Control – Lowering Max Segment Size
throughput < 5k/sec @ ~15.5GB
11% throughput increase
01
Taking Control – Merge At Once
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Lower better search latency (depends)
Higher higher indexing throughput
01
Taking Control – Lowering Merge At Once
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">2</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
01
Taking Control – Lowering Merge At Once
throughput < 5k/sec @ ~13GB
8% throughput decrease
01
Taking Control – Merge At Once Explicit
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Controls number of segments merged at once during force merge
01
Taking Control – Segments Per Tier
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Lower value means more merging, but less segments
Along with maxMergeAtOnce can smoothen I/O spikes
For better indexing throughput set maxMergeAtOnce < segmentsPerTier
01
Taking Control – Combined Together
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">30</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">30</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">512</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
01
Taking Control – Combined Together
throughput < 5k/sec @ ~15GB
but look at read difference
01
Taking Control – Default vs Combined Read/Write
default settings
01
Taking Control – Default vs Combined Read/Write
default settings combined changes settings
01
Taking Control – Reclaim Deletes Weight
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Controls importance of merging segments with deleted documents
Increase to put priority on merging segments with deleted documents
01
Taking Control – No CFS Ratio
Merge Policy Factory
<mergePolicyFactory	
  class="org.apache.solr.index.TieredMergePolicyFactory">	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="maxMergeAtOnce">10</int>	
  
	
  	
  	
  <int	
  name="maxMergeAtOnceExplicit">30</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="floorSegmentMB">2048</int>	
  
	
  	
  	
  <int	
  name="maxMergedSegmentMB">5120</int>	
  
	
  <double	
  name="noCFSRatio">0.1</double>	
  
	
  	
  	
  <int	
  name="maxCFSSegmentSizeMB">2048</int>	
  
	
  	
  	
  <double	
  name="reclaimDeletesWeight">2.0</double>	
  
	
  	
  	
  <double	
  name="forceMergeDeletesPctAllowed">10.0</double>	
  
	
  </mergePolicyFactory>	
  
Controls compound file system segments ratio
To completely disable CFS set to 0.0
01
Taking Control – Merge Scheduler
Controls maximum number of concurrent merges
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler">	
  
	
  	
  	
  <int	
  name="maxMergeCount">4</int>	
  
	
  	
  	
  <int	
  name="maxThreadCount">4</int>	
  
	
  </mergeScheduler>	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler">	
  
	
  	
  	
  <int	
  name="maxMergeCount">4</int>	
  
	
  	
  	
  <int	
  name="maxThreadCount">4</int>	
  
	
  </mergeScheduler>	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler">	
  
	
  	
  	
  <int	
  name="maxMergeCount">4</int>	
  
	
  	
  	
  <int	
  name="maxThreadCount">4</int>	
  
	
  </mergeScheduler>	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
For SSD set maxThreadCount to min(4, #CPUs / 2)
Merge Scheduler
<mergeScheduler	
  class="org.apache.lucene.index.ConcurrentMergeScheduler">	
  
	
  	
  	
  <int	
  name="maxMergeCount">4</int>	
  
	
  	
  	
  <int	
  name="maxThreadCount">4</int>	
  
	
  </mergeScheduler>	
  	
  	
  	
  	
  	
  	
  	
  	
  
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
$ curl 'http://solr:8983/solr/lr/update?
optimize=true&numSegments=1&waitFlush=false'
01
Force Merge – The Good
Improves search speed (fewer segments)
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
Reduces number of used files
01
Force Merge – The Bad
Invalidates operating system I/O cache
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
Will cause temporary increase of disk usage (up to 3x)
01
Force Merge – SolrCloud Performance Example
01
Force Merge – SolrCloud Performance Example
01
Force Merge – Legacy
Index on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
index
Documents
01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
force merge
01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Replicate after optimize is done
Solr Master
Solr Slave
Solr Slave
Solr Slave
pull after optimize
01
Force Merge – SolrCloud (Solr 7 – pull replicas)
Create collection
Force merge
Solr will do the rest
Solr Solr
Solr Solr
Primary 1
Primary 2 Pull Replica 2
Pull Replica 1
01
Force Merge – SolrCloud (NRT, pre 7.0)
Ask yourself if you really need force merge
Solr Solr
Solr Solr
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Solr Solr
Solr Solr
Primary 1
Primary 2
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Solr Solr
Solr Solr
Primary 1
Primary 2
index
DocumentsDocuments
Documents
Documents
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Solr Solr
Solr Solr
Primary 1
Primary 2
optimize
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Create replicas
Solr Solr
Solr Solr
Primary 1
Primary 2 Replica 2
Replica 1
01
Specialized Merge Policy Example – Sorting
Sorting Merge Policy Factory Example
<mergePolicyFactory	
  class="org.apache.solr.index.SortingMergePolicyFactory">	
  
	
  	
  	
  <str	
  name="sort">timestamp	
  desc</str>	
  	
  	
  	
  
	
  	
  	
  <str	
  name="wrapper.prefix">inner</str>	
  	
  	
  	
  	
  
	
  	
  	
  <str	
  name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>	
  
	
  	
  	
  <int	
  name="inner.maxMergeAtOnce">10</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="inner.segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <double	
  name="inner.noCFSRatio">0.1</double>	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  </mergePolicyFactory>	
  
01
Specialized Merge Policy Example – Sorting
Sorting Merge Policy Factory Example
<mergePolicyFactory	
  class="org.apache.solr.index.SortingMergePolicyFactory">	
  
	
  	
  	
  <str	
  name="sort">timestamp	
  desc</str>	
  	
  	
  	
  
	
  	
  	
  <str	
  name="wrapper.prefix">inner</str>	
  	
  	
  	
  	
  
	
  	
  	
  <str	
  name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>	
  
	
  	
  	
  <int	
  name="inner.maxMergeAtOnce">10</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <int	
  name="inner.segmentsPerTier">10</int>	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  <double	
  name="inner.noCFSRatio">0.1</double>	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  </mergePolicyFactory>	
  
Pre-sorts data during merge for:
- faster range queries
- faster data retrieval
- possibility of early query termination
- convenient for time based data
01
http://sematext.com/jobs
You love like we do?
You want to work with ?
Want to work with open source?
You want to do fun stuff?
01
Get in touch
Rafał
rafal.kuc@sematext.com
@kucrafal
http://sematext.com
@sematext http://sematext.com/jobs
Come talk to us
at the booth
Thank You

More Related Content

What's hot

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL AzureIke Ellis
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrChitturi Kiran
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Lucidworks
 
Multithreading on iOS
Multithreading on iOSMultithreading on iOS
Multithreading on iOSMake School
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Codemotion 2013: Feliz 15 aniversario, SQL Injection
Codemotion 2013: Feliz 15 aniversario, SQL InjectionCodemotion 2013: Feliz 15 aniversario, SQL Injection
Codemotion 2013: Feliz 15 aniversario, SQL InjectionChema Alonso
 

What's hot (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL Azure
 
Oak Lucene Indexes
Oak Lucene IndexesOak Lucene Indexes
Oak Lucene Indexes
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Multithreading on iOS
Multithreading on iOSMultithreading on iOS
Multithreading on iOS
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Codemotion 2013: Feliz 15 aniversario, SQL Injection
Codemotion 2013: Feliz 15 aniversario, SQL InjectionCodemotion 2013: Feliz 15 aniversario, SQL Injection
Codemotion 2013: Feliz 15 aniversario, SQL Injection
 

Similar to Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.

Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupSease
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solrLucidworks (Archived)
 
Discovering the 2 in Alfresco Search Services 2.0
Discovering the 2 in Alfresco Search Services 2.0Discovering the 2 in Alfresco Search Services 2.0
Discovering the 2 in Alfresco Search Services 2.0Angel Borroy López
 
Using Rails to Create an Enterprise App: A Real-Life Case Study
Using Rails to Create an Enterprise App: A Real-Life Case StudyUsing Rails to Create an Enterprise App: A Real-Life Case Study
Using Rails to Create an Enterprise App: A Real-Life Case StudyDavid Keener
 
Top 5 things to know about sql azure for developers
Top 5 things to know about sql azure for developersTop 5 things to know about sql azure for developers
Top 5 things to know about sql azure for developersIke Ellis
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
Detect and fix the azure sql resources which uses tls version less than 1.2
Detect and fix the azure sql resources which uses tls version less than 1.2Detect and fix the azure sql resources which uses tls version less than 1.2
Detect and fix the azure sql resources which uses tls version less than 1.2Prancer Io
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchSematext Group, Inc.
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_osdstuartnz
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMNorberto Leite
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
CRESTCon Asia 2018 - Config Password Encryption Gone Wrong
CRESTCon Asia 2018 - Config Password Encryption Gone WrongCRESTCon Asia 2018 - Config Password Encryption Gone Wrong
CRESTCon Asia 2018 - Config Password Encryption Gone WrongKeith Lee
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Performance Tuning - MuraCon 2012
Performance Tuning - MuraCon 2012Performance Tuning - MuraCon 2012
Performance Tuning - MuraCon 2012eballisty
 
Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Eduardo Castro
 

Similar to Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc. (20)

Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
Discovering the 2 in Alfresco Search Services 2.0
Discovering the 2 in Alfresco Search Services 2.0Discovering the 2 in Alfresco Search Services 2.0
Discovering the 2 in Alfresco Search Services 2.0
 
Using Rails to Create an Enterprise App: A Real-Life Case Study
Using Rails to Create an Enterprise App: A Real-Life Case StudyUsing Rails to Create an Enterprise App: A Real-Life Case Study
Using Rails to Create an Enterprise App: A Real-Life Case Study
 
Top 5 things to know about sql azure for developers
Top 5 things to know about sql azure for developersTop 5 things to know about sql azure for developers
Top 5 things to know about sql azure for developers
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Detect and fix the azure sql resources which uses tls version less than 1.2
Detect and fix the azure sql resources which uses tls version less than 1.2Detect and fix the azure sql resources which uses tls version less than 1.2
Detect and fix the azure sql resources which uses tls version less than 1.2
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_os
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
CRESTCon Asia 2018 - Config Password Encryption Gone Wrong
CRESTCon Asia 2018 - Config Password Encryption Gone WrongCRESTCon Asia 2018 - Config Password Encryption Gone Wrong
CRESTCon Asia 2018 - Config Password Encryption Gone Wrong
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Nick harris-sic-2011
Nick harris-sic-2011Nick harris-sic-2011
Nick harris-sic-2011
 
Performance Tuning - MuraCon 2012
Performance Tuning - MuraCon 2012Performance Tuning - MuraCon 2012
Performance Tuning - MuraCon 2012
 
Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008Ajuste (tuning) del rendimiento de SQL Server 2008
Ajuste (tuning) del rendimiento de SQL Server 2008
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.

  • 1. Optimize Is (Not) Bad For You Deep Dive Into The Segment Merge Abyss Rafał Kuć Sematext Group, Inc.
  • 2. Agenda •  Segments – where, what & how •  Writing segments •  Modifying segments •  Segment merging – what, where, how, why •  Force merging •  Force merging & SolrCloud •  Performance considerations •  Specialized merge policies https://github.com/sematext/lr/tree/master/2017/optimize
  • 6. 01 Solr Collection Architecture Zookeeper SOLR shard shard SOLR shard shard SOLR shard shard SOLR shard shard
  • 8. 01 Solr Shard Architecture TLOG Segment Segment Segment Segment
  • 9. 01 Lucene Segment Segment Info Field Names Stored Field Values Point Values Term Dictionary Term Frequency Term Proximity Normalization Per Document Vals Live Documents
  • 10. 01 Inside the Segment – Term Dictionary TERM DOCID ! lucene! ! ! <1>, <2>! ! revolution! ! ! <1>, <2>! ! washington! ! ! <1>! ! boston! ! ! <2>! _1.tim } Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston _1.tip
  • 11. 01 Inside the Segment – Doc Values Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston DOCID FIELD VALUE ! 1! ! Title! ! Lucene Revolution Washington! ! ! 1! ! City! ! Washington D.C.! ! 2! ! Title! ! Lucene Revolution Boston! ! 2! ! City! ! Boston! _1.dvd } _1.dvm
  • 12. 01 Inside the Segment – Stored Fields Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston DOCID VALUE ! ! 1! ! ! ! Title: Lucene Revolution Washington! ! City: Washington D.C! ! ! ! 2! ! ! ! Title: Lucene Revolution Boston! ! City: Boston! ! _1.fdx } _1.fdt
  • 13. 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm
  • 14. 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm
  • 15. 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm _2.cfs _2.cfe
  • 29. 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' retrieve document { "id" : 3, "tags" : [ "lucene" ], "awesome" : true }
  • 30. 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true } apply changes
  • 31. 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true } delete old document
  • 32. 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true }
  • 33. 01 Atomic Updates – In Place Works on top of numeric, doc values based fields Fields need to be not indexed and not stored Doesn’t require delete/index Support only inc and set modifers $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]'
  • 34. 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' retrieve document { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true }
  • 35. 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true, "views" : 100 } apply changes
  • 36. 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true, "views" : 100 } update doc values
  • 37. 01 Search – Importance of Segments Immutable –  write once read many
  • 38. 01 Search – Importance of Segments Immutable –  write once read many More segments –  slower search speed
  • 39. 01 Search – Importance of Segments Immutable –  write once read many More segments –  slower search speed Fewer segments –  faster searches
  • 40. 01 Search – Importance of Segments Immutable –  write once read many More segments –  slower search speed Fewer segments –  faster searches Fewer segments –  smaller shard size
  • 41. 01 Search – Importance of Segments Immutable –  write once read many More segments –  slower search speed Fewer segments –  faster searches Fewer segments –  smaller shard size Rapid segment changes –  worse I/O cache usage
  • 42. 01 Taking Control Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>  
  • 43. 01 Taking Control Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler"  />                  
  • 44. 01 Taking Control Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler"  />                   Segment Warmer <mergedSegmentWarmer                                            class="org.apache.lucene.index.SimpleMergedSegmentWarmer"  />                  
  • 45. 01 Taking Control – Default Indexing Throughput Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>  
  • 46. 01 Taking Control – Default Indexing Throughput throughput < 5k/sec @ ~14GB
  • 47. 01 Taking Control – Max Merged Segment Size Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Lower higher indexing throughput – smaller segments Higher better search latency (depends) – more merges
  • 48. 01 Taking Control – Lowering Max Merged Size Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">512</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>  
  • 49. 01 Taking Control – Lowering Max Segment Size throughput < 5k/sec @ ~15.5GB 11% throughput increase
  • 50. 01 Taking Control – Merge At Once Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Lower better search latency (depends) Higher higher indexing throughput
  • 51. 01 Taking Control – Lowering Merge At Once Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">2</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>  
  • 52. 01 Taking Control – Lowering Merge At Once throughput < 5k/sec @ ~13GB 8% throughput decrease
  • 53. 01 Taking Control – Merge At Once Explicit Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Controls number of segments merged at once during force merge
  • 54. 01 Taking Control – Segments Per Tier Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Lower value means more merging, but less segments Along with maxMergeAtOnce can smoothen I/O spikes For better indexing throughput set maxMergeAtOnce < segmentsPerTier
  • 55. 01 Taking Control – Combined Together Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">30</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">30</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">512</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>  
  • 56. 01 Taking Control – Combined Together throughput < 5k/sec @ ~15GB but look at read difference
  • 57. 01 Taking Control – Default vs Combined Read/Write default settings
  • 58. 01 Taking Control – Default vs Combined Read/Write default settings combined changes settings
  • 59. 01 Taking Control – Reclaim Deletes Weight Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Controls importance of merging segments with deleted documents Increase to put priority on merging segments with deleted documents
  • 60. 01 Taking Control – No CFS Ratio Merge Policy Factory <mergePolicyFactory  class="org.apache.solr.index.TieredMergePolicyFactory">                            <int  name="maxMergeAtOnce">10</int>        <int  name="maxMergeAtOnceExplicit">30</int>                            <int  name="segmentsPerTier">10</int>                  <int  name="floorSegmentMB">2048</int>        <int  name="maxMergedSegmentMB">5120</int>    <double  name="noCFSRatio">0.1</double>        <int  name="maxCFSSegmentSizeMB">2048</int>        <double  name="reclaimDeletesWeight">2.0</double>        <double  name="forceMergeDeletesPctAllowed">10.0</double>    </mergePolicyFactory>   Controls compound file system segments ratio To completely disable CFS set to 0.0
  • 61. 01 Taking Control – Merge Scheduler Controls maximum number of concurrent merges Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler">        <int  name="maxMergeCount">4</int>        <int  name="maxThreadCount">4</int>    </mergeScheduler>                  
  • 62. 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler">        <int  name="maxMergeCount">4</int>        <int  name="maxThreadCount">4</int>    </mergeScheduler>                  
  • 63. 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging For spinning drives set maxThreadCount to 1 Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler">        <int  name="maxMergeCount">4</int>        <int  name="maxThreadCount">4</int>    </mergeScheduler>                  
  • 64. 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging For spinning drives set maxThreadCount to 1 For SSD set maxThreadCount to min(4, #CPUs / 2) Merge Scheduler <mergeScheduler  class="org.apache.lucene.index.ConcurrentMergeScheduler">        <int  name="maxMergeCount">4</int>        <int  name="maxThreadCount">4</int>    </mergeScheduler>                  
  • 65. 01 Optimize aka Force Merge Forces segment merge – usually very expensive
  • 66. 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified
  • 67. 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default)
  • 68. 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default) Can be very bad or very good – depending on the use case
  • 69. 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default) Can be very bad or very good – depending on the use case $ curl 'http://solr:8983/solr/lr/update? optimize=true&numSegments=1&waitFlush=false'
  • 70. 01 Force Merge – The Good Improves search speed (fewer segments)
  • 71. 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents
  • 72. 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents Shrinks the index by pruning duplicated data
  • 73. 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents Shrinks the index by pruning duplicated data Reduces number of used files
  • 74. 01 Force Merge – The Bad Invalidates operating system I/O cache
  • 75. 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments
  • 76. 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data
  • 77. 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data May cause performance issues
  • 78. 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data May cause performance issues Will cause temporary increase of disk usage (up to 3x)
  • 79. 01 Force Merge – SolrCloud Performance Example
  • 80. 01 Force Merge – SolrCloud Performance Example
  • 81. 01 Force Merge – Legacy Index on the master server Solr Master Solr Slave Solr Slave Solr Slave index Documents
  • 82. 01 Force Merge – Legacy Index on the master server Force merge on the master server Solr Master Solr Slave Solr Slave Solr Slave force merge
  • 83. 01 Force Merge – Legacy Index on the master server Force merge on the master server Replicate after optimize is done Solr Master Solr Slave Solr Slave Solr Slave pull after optimize
  • 84. 01 Force Merge – SolrCloud (Solr 7 – pull replicas) Create collection Force merge Solr will do the rest Solr Solr Solr Solr Primary 1 Primary 2 Pull Replica 2 Pull Replica 1
  • 85. 01 Force Merge – SolrCloud (NRT, pre 7.0) Ask yourself if you really need force merge Solr Solr Solr Solr
  • 86. 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Solr Solr Solr Solr Primary 1 Primary 2
  • 87. 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Solr Solr Solr Solr Primary 1 Primary 2 index DocumentsDocuments Documents Documents
  • 88. 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Force merge Solr Solr Solr Solr Primary 1 Primary 2 optimize
  • 89. 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Force merge Create replicas Solr Solr Solr Solr Primary 1 Primary 2 Replica 2 Replica 1
  • 90. 01 Specialized Merge Policy Example – Sorting Sorting Merge Policy Factory Example <mergePolicyFactory  class="org.apache.solr.index.SortingMergePolicyFactory">        <str  name="sort">timestamp  desc</str>              <str  name="wrapper.prefix">inner</str>                <str  name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>        <int  name="inner.maxMergeAtOnce">10</int>                            <int  name="inner.segmentsPerTier">10</int>                            <double  name="inner.noCFSRatio">0.1</double>                    </mergePolicyFactory>  
  • 91. 01 Specialized Merge Policy Example – Sorting Sorting Merge Policy Factory Example <mergePolicyFactory  class="org.apache.solr.index.SortingMergePolicyFactory">        <str  name="sort">timestamp  desc</str>              <str  name="wrapper.prefix">inner</str>                <str  name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>        <int  name="inner.maxMergeAtOnce">10</int>                            <int  name="inner.segmentsPerTier">10</int>                            <double  name="inner.noCFSRatio">0.1</double>                    </mergePolicyFactory>   Pre-sorts data during merge for: - faster range queries - faster data retrieval - possibility of early query termination - convenient for time based data
  • 92. 01 http://sematext.com/jobs You love like we do? You want to work with ? Want to work with open source? You want to do fun stuff?
  • 93. 01 Get in touch Rafał rafal.kuc@sematext.com @kucrafal http://sematext.com @sematext http://sematext.com/jobs Come talk to us at the booth