9. 01
Lucene Segment
Segment Info
Field Names
Stored Field Values
Point Values
Term Dictionary
Term Frequency
Term Proximity
Normalization
Per Document Vals
Live Documents
10. 01
Inside the Segment – Term Dictionary
TERM DOCID
!
lucene!
!
!
<1>, <2>!
!
revolution!
!
!
<1>, <2>!
!
washington!
!
!
<1>!
!
boston!
!
!
<2>!
_1.tim
}
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
_1.tip
11. 01
Inside the Segment – Doc Values
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
DOCID FIELD VALUE
!
1!
!
Title!
!
Lucene Revolution Washington!
!
!
1!
!
City!
!
Washington D.C.!
!
2!
!
Title!
!
Lucene Revolution Boston!
!
2!
!
City!
!
Boston!
_1.dvd
} _1.dvm
33. 01
Atomic Updates – In Place
Works on top of numeric, doc values based fields
Fields need to be not indexed and not stored
Doesn’t require delete/index
Support only inc and set modifers
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
52. 01
Taking Control – Lowering Merge At Once
throughput < 5k/sec @ ~13GB
8% throughput decrease
53. 01
Taking Control – Merge At Once Explicit
Merge Policy Factory
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int
name="maxMergeAtOnce">10</int>
<int
name="maxMergeAtOnceExplicit">30</int>
<int
name="segmentsPerTier">10</int>
<int
name="floorSegmentMB">2048</int>
<int
name="maxMergedSegmentMB">5120</int>
<double
name="noCFSRatio">0.1</double>
<int
name="maxCFSSegmentSizeMB">2048</int>
<double
name="reclaimDeletesWeight">2.0</double>
<double
name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls number of segments merged at once during force merge
54. 01
Taking Control – Segments Per Tier
Merge Policy Factory
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int
name="maxMergeAtOnce">10</int>
<int
name="maxMergeAtOnceExplicit">30</int>
<int
name="segmentsPerTier">10</int>
<int
name="floorSegmentMB">2048</int>
<int
name="maxMergedSegmentMB">5120</int>
<double
name="noCFSRatio">0.1</double>
<int
name="maxCFSSegmentSizeMB">2048</int>
<double
name="reclaimDeletesWeight">2.0</double>
<double
name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Lower value means more merging, but less segments
Along with maxMergeAtOnce can smoothen I/O spikes
For better indexing throughput set maxMergeAtOnce < segmentsPerTier
58. 01
Taking Control – Default vs Combined Read/Write
default settings combined changes settings
59. 01
Taking Control – Reclaim Deletes Weight
Merge Policy Factory
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int
name="maxMergeAtOnce">10</int>
<int
name="maxMergeAtOnceExplicit">30</int>
<int
name="segmentsPerTier">10</int>
<int
name="floorSegmentMB">2048</int>
<int
name="maxMergedSegmentMB">5120</int>
<double
name="noCFSRatio">0.1</double>
<int
name="maxCFSSegmentSizeMB">2048</int>
<double
name="reclaimDeletesWeight">2.0</double>
<double
name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls importance of merging segments with deleted documents
Increase to put priority on merging segments with deleted documents
60. 01
Taking Control – No CFS Ratio
Merge Policy Factory
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int
name="maxMergeAtOnce">10</int>
<int
name="maxMergeAtOnceExplicit">30</int>
<int
name="segmentsPerTier">10</int>
<int
name="floorSegmentMB">2048</int>
<int
name="maxMergedSegmentMB">5120</int>
<double
name="noCFSRatio">0.1</double>
<int
name="maxCFSSegmentSizeMB">2048</int>
<double
name="reclaimDeletesWeight">2.0</double>
<double
name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls compound file system segments ratio
To completely disable CFS set to 0.0
61. 01
Taking Control – Merge Scheduler
Controls maximum number of concurrent merges
Merge Scheduler
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int
name="maxMergeCount">4</int>
<int
name="maxThreadCount">4</int>
</mergeScheduler>
62. 01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
Merge Scheduler
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int
name="maxMergeCount">4</int>
<int
name="maxThreadCount">4</int>
</mergeScheduler>
63. 01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
Merge Scheduler
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int
name="maxMergeCount">4</int>
<int
name="maxThreadCount">4</int>
</mergeScheduler>
64. 01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
For SSD set maxThreadCount to min(4, #CPUs / 2)
Merge Scheduler
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int
name="maxMergeCount">4</int>
<int
name="maxThreadCount">4</int>
</mergeScheduler>
66. 01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
67. 01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
68. 01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
69. 01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
$ curl 'http://solr:8983/solr/lr/update?
optimize=true&numSegments=1&waitFlush=false'
70. 01
Force Merge – The Good
Improves search speed (fewer segments)
71. 01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
72. 01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
73. 01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
Reduces number of used files
75. 01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
76. 01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
77. 01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
78. 01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
Will cause temporary increase of disk usage (up to 3x)
81. 01
Force Merge – Legacy
Index on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
index
Documents
82. 01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
force merge
83. 01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Replicate after optimize is done
Solr Master
Solr Slave
Solr Slave
Solr Slave
pull after optimize
84. 01
Force Merge – SolrCloud (Solr 7 – pull replicas)
Create collection
Force merge
Solr will do the rest
Solr Solr
Solr Solr
Primary 1
Primary 2 Pull Replica 2
Pull Replica 1
85. 01
Force Merge – SolrCloud (NRT, pre 7.0)
Ask yourself if you really need force merge
Solr Solr
Solr Solr
86. 01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Solr Solr
Solr Solr
Primary 1
Primary 2
87. 01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Solr Solr
Solr Solr
Primary 1
Primary 2
index
DocumentsDocuments
Documents
Documents
88. 01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Solr Solr
Solr Solr
Primary 1
Primary 2
optimize
89. 01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Create replicas
Solr Solr
Solr Solr
Primary 1
Primary 2 Replica 2
Replica 1
91. 01
Specialized Merge Policy Example – Sorting
Sorting Merge Policy Factory Example
<mergePolicyFactory
class="org.apache.solr.index.SortingMergePolicyFactory">
<str
name="sort">timestamp
desc</str>
<str
name="wrapper.prefix">inner</str>
<str
name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
<int
name="inner.maxMergeAtOnce">10</int>
<int
name="inner.segmentsPerTier">10</int>
<double
name="inner.noCFSRatio">0.1</double>
</mergePolicyFactory>
Pre-sorts data during merge for:
- faster range queries
- faster data retrieval
- possibility of early query termination
- convenient for time based data