SlideShare a Scribd company logo
1 of 94
Optimize Is (Not) Bad For You
Deep Dive Into The Segment Merge Abyss
Rafał Kuć
Sematext Group, Inc.
Agenda
• Segments – where, what & how
• Writing segments
• Modifying segments
• Segment merging – what, where, how, why
• Force merging
• Force merging & SolrCloud
• Performance considerations
• Specialized merge policies
https://github.com/sematext/lr/tree/master/2017/optimize
3
01
Sematext & I
cloud
metrics
logs
&
4
01
Solr Collection Architecture
Zookeeper
5
01
Solr Collection Architecture
Zookeeper
SOLR
SOLR
SOLR
SOLR
6
01
Solr Collection Architecture
Zookeeper
SOLR
shard shard
SOLR
shard shard
SOLR
shard shard
SOLR
shard shard
7
01
Solr Shard Architecture
TLOG
8
01
Solr Shard Architecture
TLOG
Segment Segment Segment
Segment
9
01
Lucene Segment
Segment Info
Field Names
Stored Field Values
Point Values
Term Dictionary
Term Frequency
Term Proximity
Normalization
Per Document Vals
Live Documents
1
01
Inside the Segment – Term Dictionary
TERM DOCID
lucene <1>, <2>
revolution <1>, <2>
washington <1>
boston <2>
_1.tim
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
_1.tip
1
01
Inside the Segment – Doc Values
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
DOCID FIELD VALUE
1 Title Lucene Revolution Washington
1 City Washington D.C.
2 Title Lucene Revolution Boston
2 City Boston
_1.dvd
_1.dvm
1
01
Inside the Segment – Stored Fields
Doc1 Title: Lucene Revolution Washington, City: Washington D.C
Doc2 Title: Lucene Revolution Boston, City: Boston
DOCID VALUE
1 Title: Lucene Revolution Washington
City: Washington D.C
2 Title: Lucene Revolution Boston
City: Boston
_1.fdx
_1.fdt
1
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
1
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
1
01
Inside the Segment – Compound File System
_1.fdt
_1.fdx
_1.fnm
_1.nvd
_1.nvm
_1.si
_1.Lucene50_0.doc
_1.Lucene50_0.pos
_1.Lucene50_0.tim
_1.Lucene50_0.tip
_1.Lucene50_0.dvd
_1.Lucene50_0.dvm
_2.cfs
_2.cfe
1
01
Indexing
1
01
Indexing
1
01
Indexing
1
01
Indexing
level/tier
2
01
Indexing
2
01
Indexing
2
01
Indexing
2
01
Indexing
2
01
Indexing
2
01
Indexing
2
01
Indexing
2
01
Deletes
2
01
Deletes – After Merge
2
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
retrieve document
{
"id" : 3,
"tags" : [ "lucene" ],
"awesome" : true
}
3
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
apply changes
3
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
delete old document
3
01
Atomic Updates
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"tags" : {
"add" : [ "solr" ]
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
3
01
Atomic Updates – In Place
Works on top of numeric, doc values based fields
Fields need to be not indexed and not stored
Doesn’t require delete/index
Support only inc and set modifers
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
3
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
retrieve document
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true
}
3
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true,
"views" : 100
}
apply changes
3
01
Atomic Updates – In Place
$ curl -XPOST -H 'Content-Type: application/json'
'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[
{
"id" : "3",
"views" : {
"inc" : 100
}
}
]'
{
"id" : 3,
"tags" : [ "lucene", "solr" ],
"awesome" : true,
"views" : 100
}
update doc values
3
01
Search – Importance of Segments
Immutable – write once read many
3
01
Search – Importance of Segments
Immutable – write once read many
More segments – slower search speed
3
01
Search – Importance of Segments
Immutable – write once read many
More segments – slower search speed
Fewer segments – faster searches
4
01
Search – Importance of Segments
Immutable – write once read many
More segments – slower search speed
Fewer segments – faster searches
Fewer segments – smaller shard size
4
01
Search – Importance of Segments
Immutable – write once read many
More segments – slower search speed
Fewer segments – faster searches
Fewer segments – smaller shard size
Rapid segment changes – worse I/O cache usage
4
01
Taking Control
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
4
01
Taking Control
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" />
4
01
Taking Control
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" />
Segment Warmer
<mergedSegmentWarmer
class="org.apache.lucene.index.SimpleMergedSegmentWarmer" />
4
01
Taking Control – Default Indexing Throughput
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
4
01
Taking Control – Default Indexing Throughput
throughput < 5k/sec @ ~14GB
4
01
Taking Control – Max Merged Segment Size
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Lower higher indexing throughput – smaller segments
Higher better search latency (depends) – more merges
4
01
Taking Control – Lowering Max Merged Size
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">512</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
4
01
Taking Control – Lowering Max Segment Size
throughput < 5k/sec @ ~15.5GB
11% throughput increase
5
01
Taking Control – Merge At Once
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Lower better search latency (depends)
Higher higher indexing throughput
5
01
Taking Control – Lowering Merge At Once
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">2</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
5
01
Taking Control – Lowering Merge At Once
throughput < 5k/sec @ ~13GB
8% throughput decrease
5
01
Taking Control – Merge At Once Explicit
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls number of segments merged at once during force merge
5
01
Taking Control – Segments Per Tier
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Lower value means more merging, but less segments
Along with maxMergeAtOnce can smoothen I/O spikes
For better indexing throughput set maxMergeAtOnce <
segmentsPerTier
5
01
Taking Control – Combined Together
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">30</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">30</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">512</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
5
01
Taking Control – Combined Together
throughput < 5k/sec @ ~15GB
but look at read difference
5
01
Taking Control – Default vs Combined Read/Write
default settings
5
01
Taking Control – Default vs Combined Read/Write
default settings combined changes settings
5
01
Taking Control – Reclaim Deletes Weight
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls importance of merging segments with deleted documents
Increase to put priority on merging segments with deleted documents
6
01
Taking Control – No CFS Ratio
Merge Policy Factory
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="maxMergeAtOnceExplicit">30</int>
<int name="segmentsPerTier">10</int>
<int name="floorSegmentMB">2048</int>
<int name="maxMergedSegmentMB">5120</int>
<double name="noCFSRatio">0.1</double>
<int name="maxCFSSegmentSizeMB">2048</int>
<double name="reclaimDeletesWeight">2.0</double>
<double name="forceMergeDeletesPctAllowed">10.0</double>
</mergePolicyFactory>
Controls compound file system segments ratio
To completely disable CFS set to 0.0
6
01
Taking Control – Merge Scheduler
Controls maximum number of concurrent merges
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
<int name="maxThreadCount">4</int>
</mergeScheduler>
6
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
<int name="maxThreadCount">4</int>
</mergeScheduler>
6
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
<int name="maxThreadCount">4</int>
</mergeScheduler>
6
01
Taking Control – Merge Scheduler
Controls number of threads dedicated to merging
For spinning drives set maxThreadCount to 1
For SSD set maxThreadCount to min(4, #CPUs / 2)
Merge Scheduler
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
<int name="maxThreadCount">4</int>
</mergeScheduler>
6
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
6
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
6
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
6
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
6
01
Optimize aka Force Merge
Forces segment merge – usually very expensive
Desired number of segments can be specified
Done on all shards at the same time (by default)
Can be very bad or very good – depending on the use case
$ curl
'http://solr:8983/solr/lr/update?optimize=true&numSegments=1&waitFlush=false'
7
01
Force Merge – The Good
Improves search speed (fewer segments)
7
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
7
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
7
01
Force Merge – The Good
Improves search speed (fewer segments)
Removes deleted documents
Shrinks the index by pruning duplicated data
Reduces number of used files
7
01
Force Merge – The Bad
Invalidates operating system I/O cache
7
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
7
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
7
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
7
01
Force Merge – The Bad
Invalidates operating system I/O cache
Very expensive to perform – rewrites all segments
Not efficient on changing data
May cause performance issues
Will cause temporary increase of disk usage (up to 3x)
7
01
Force Merge – SolrCloud Performance Example
8
01
Force Merge – SolrCloud Performance Example
8
01
Force Merge – Legacy
Index on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
index
Documents
8
01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Solr Master
Solr Slave
Solr Slave
Solr Slave
force merge
8
01
Force Merge – Legacy
Index on the master server
Force merge on the master server
Replicate after optimize is done
Solr Master
Solr Slave
Solr Slave
Solr Slave
pull after optimize
8
01
Force Merge – SolrCloud (Solr 7 – pull replicas)
Create collection
Force merge
Solr will do the rest
Solr Solr
Solr Solr
Primary 1
Primary 2 Pull Replica 2
Pull Replica 1
8
01
Force Merge – SolrCloud (NRT, pre 7.0)
Ask yourself if you really need force merge
Solr Solr
Solr Solr
8
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Solr Solr
Solr Solr
Primary 1
Primary 2
8
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Solr Solr
Solr Solr
Primary 1
Primary 2
DocumentsDocuments
Documents
Documents
8
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Solr Solr
Solr Solr
Primary 1
Primary 2optimize
8
01
Force Merge – SolrCloud (NRT replicas, pre 7.0)
Ask yourself if you really need force merge
Create collection on part of the nodes
Index
Force merge
Create replicas
Solr Solr
Solr Solr
Primary 1
Primary 2 Replica 2
Replica 1
9
01
Specialized Merge Policy Example – Sorting
Sorting Merge Policy Factory Example
<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
<str name="sort">timestamp desc</str>
<str name="wrapper.prefix">inner</str>
<str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
<int name="inner.maxMergeAtOnce">10</int>
<int name="inner.segmentsPerTier">10</int>
<double name="inner.noCFSRatio">0.1</double>
</mergePolicyFactory>
9
01
Specialized Merge Policy Example – Sorting
Sorting Merge Policy Factory Example
<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
<str name="sort">timestamp desc</str>
<str name="wrapper.prefix">inner</str>
<str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
<int name="inner.maxMergeAtOnce">10</int>
<int name="inner.segmentsPerTier">10</int>
<double name="inner.noCFSRatio">0.1</double>
</mergePolicyFactory>
Pre-sorts data during merge for:
- faster range queries
- faster data retrieval
- possibility of early query termination
- convenient for time based data
9
01
http://sematext.com/jobs
You love like we do?
You want to work with ?
Want to work with open source?
You want to do fun stuff?
9
01
Get in touch
Rafał
rafal.kuc@sematext.com
@kucrafal
http://sematext.com
@sematext http://sematext.com/jobs
Come talk to us
at the booth
Thank You

More Related Content

What's hot

Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundDirkjanMollema
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Codemotion
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestratorYoungHeon (Roy) Kim
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
Red Team Revenge - Attacking Microsoft ATA
Red Team Revenge - Attacking Microsoft ATARed Team Revenge - Attacking Microsoft ATA
Red Team Revenge - Attacking Microsoft ATANikhil Mittal
 
MindMap - Forensics Windows Registry Cheat Sheet
MindMap - Forensics Windows Registry Cheat SheetMindMap - Forensics Windows Registry Cheat Sheet
MindMap - Forensics Windows Registry Cheat SheetJuan F. Padilla
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Markus Michalewicz
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in luceneLucidworks (Archived)
 
"It can always get worse!" – Lessons Learned in over 20 years working with Or...
"It can always get worse!" – Lessons Learned in over 20 years working with Or..."It can always get worse!" – Lessons Learned in over 20 years working with Or...
"It can always get worse!" – Lessons Learned in over 20 years working with Or...Markus Michalewicz
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningAdrian Cockcroft
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 

What's hot (20)

Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHound
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Red Team Revenge - Attacking Microsoft ATA
Red Team Revenge - Attacking Microsoft ATARed Team Revenge - Attacking Microsoft ATA
Red Team Revenge - Attacking Microsoft ATA
 
MindMap - Forensics Windows Registry Cheat Sheet
MindMap - Forensics Windows Registry Cheat SheetMindMap - Forensics Windows Registry Cheat Sheet
MindMap - Forensics Windows Registry Cheat Sheet
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Elk
Elk Elk
Elk
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Blacklist3r
Blacklist3rBlacklist3r
Blacklist3r
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in lucene
 
"It can always get worse!" – Lessons Learned in over 20 years working with Or...
"It can always get worse!" – Lessons Learned in over 20 years working with Or..."It can always get worse!" – Lessons Learned in over 20 years working with Or...
"It can always get worse!" – Lessons Learned in over 20 years working with Or...
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and Tuning
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 

Viewers also liked

Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive QueriesQubole
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 

Viewers also liked (6)

Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 

Similar to Solr Search Engine: Optimize Is (Not) Bad for You

Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Lucidworks
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupSease
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupJustin Borgman
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentJay Luker
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...Amazon Web Services
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Amazon Web Services
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solrLucidworks (Archived)
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL AzureIke Ellis
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational DatabasesAmazon Web Services
 

Similar to Solr Search Engine: Optimize Is (Not) Bad for You (20)

Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search Component
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL Azure
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedSematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?Sematext Group, Inc.
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization Sematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaSematext Group, Inc.
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleSematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 

More from Sematext Group, Inc. (20)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Solr Search Engine: Optimize Is (Not) Bad for You

  • 1. Optimize Is (Not) Bad For You Deep Dive Into The Segment Merge Abyss Rafał Kuć Sematext Group, Inc.
  • 2. Agenda • Segments – where, what & how • Writing segments • Modifying segments • Segment merging – what, where, how, why • Force merging • Force merging & SolrCloud • Performance considerations • Specialized merge policies https://github.com/sematext/lr/tree/master/2017/optimize
  • 6. 6 01 Solr Collection Architecture Zookeeper SOLR shard shard SOLR shard shard SOLR shard shard SOLR shard shard
  • 9. 9 01 Lucene Segment Segment Info Field Names Stored Field Values Point Values Term Dictionary Term Frequency Term Proximity Normalization Per Document Vals Live Documents
  • 10. 1 01 Inside the Segment – Term Dictionary TERM DOCID lucene <1>, <2> revolution <1>, <2> washington <1> boston <2> _1.tim Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston _1.tip
  • 11. 1 01 Inside the Segment – Doc Values Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston DOCID FIELD VALUE 1 Title Lucene Revolution Washington 1 City Washington D.C. 2 Title Lucene Revolution Boston 2 City Boston _1.dvd _1.dvm
  • 12. 1 01 Inside the Segment – Stored Fields Doc1 Title: Lucene Revolution Washington, City: Washington D.C Doc2 Title: Lucene Revolution Boston, City: Boston DOCID VALUE 1 Title: Lucene Revolution Washington City: Washington D.C 2 Title: Lucene Revolution Boston City: Boston _1.fdx _1.fdt
  • 13. 1 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm
  • 14. 1 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm
  • 15. 1 01 Inside the Segment – Compound File System _1.fdt _1.fdx _1.fnm _1.nvd _1.nvm _1.si _1.Lucene50_0.doc _1.Lucene50_0.pos _1.Lucene50_0.tim _1.Lucene50_0.tip _1.Lucene50_0.dvd _1.Lucene50_0.dvm _2.cfs _2.cfe
  • 29. 2 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' retrieve document { "id" : 3, "tags" : [ "lucene" ], "awesome" : true }
  • 30. 3 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true } apply changes
  • 31. 3 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true } delete old document
  • 32. 3 01 Atomic Updates $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "tags" : { "add" : [ "solr" ] } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true }
  • 33. 3 01 Atomic Updates – In Place Works on top of numeric, doc values based fields Fields need to be not indexed and not stored Doesn’t require delete/index Support only inc and set modifers $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]'
  • 34. 3 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' retrieve document { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true }
  • 35. 3 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true, "views" : 100 } apply changes
  • 36. 3 01 Atomic Updates – In Place $ curl -XPOST -H 'Content-Type: application/json' 'http://localhost:8983/solr/lr/update?commit=true' --data-binary '[ { "id" : "3", "views" : { "inc" : 100 } } ]' { "id" : 3, "tags" : [ "lucene", "solr" ], "awesome" : true, "views" : 100 } update doc values
  • 37. 3 01 Search – Importance of Segments Immutable – write once read many
  • 38. 3 01 Search – Importance of Segments Immutable – write once read many More segments – slower search speed
  • 39. 3 01 Search – Importance of Segments Immutable – write once read many More segments – slower search speed Fewer segments – faster searches
  • 40. 4 01 Search – Importance of Segments Immutable – write once read many More segments – slower search speed Fewer segments – faster searches Fewer segments – smaller shard size
  • 41. 4 01 Search – Importance of Segments Immutable – write once read many More segments – slower search speed Fewer segments – faster searches Fewer segments – smaller shard size Rapid segment changes – worse I/O cache usage
  • 42. 4 01 Taking Control Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory>
  • 43. 4 01 Taking Control Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" />
  • 44. 4 01 Taking Control Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" /> Segment Warmer <mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer" />
  • 45. 4 01 Taking Control – Default Indexing Throughput Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory>
  • 46. 4 01 Taking Control – Default Indexing Throughput throughput < 5k/sec @ ~14GB
  • 47. 4 01 Taking Control – Max Merged Segment Size Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Lower higher indexing throughput – smaller segments Higher better search latency (depends) – more merges
  • 48. 4 01 Taking Control – Lowering Max Merged Size Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">512</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory>
  • 49. 4 01 Taking Control – Lowering Max Segment Size throughput < 5k/sec @ ~15.5GB 11% throughput increase
  • 50. 5 01 Taking Control – Merge At Once Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Lower better search latency (depends) Higher higher indexing throughput
  • 51. 5 01 Taking Control – Lowering Merge At Once Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">2</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory>
  • 52. 5 01 Taking Control – Lowering Merge At Once throughput < 5k/sec @ ~13GB 8% throughput decrease
  • 53. 5 01 Taking Control – Merge At Once Explicit Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Controls number of segments merged at once during force merge
  • 54. 5 01 Taking Control – Segments Per Tier Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Lower value means more merging, but less segments Along with maxMergeAtOnce can smoothen I/O spikes For better indexing throughput set maxMergeAtOnce < segmentsPerTier
  • 55. 5 01 Taking Control – Combined Together Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">30</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">30</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">512</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory>
  • 56. 5 01 Taking Control – Combined Together throughput < 5k/sec @ ~15GB but look at read difference
  • 57. 5 01 Taking Control – Default vs Combined Read/Write default settings
  • 58. 5 01 Taking Control – Default vs Combined Read/Write default settings combined changes settings
  • 59. 5 01 Taking Control – Reclaim Deletes Weight Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Controls importance of merging segments with deleted documents Increase to put priority on merging segments with deleted documents
  • 60. 6 01 Taking Control – No CFS Ratio Merge Policy Factory <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="maxMergeAtOnceExplicit">30</int> <int name="segmentsPerTier">10</int> <int name="floorSegmentMB">2048</int> <int name="maxMergedSegmentMB">5120</int> <double name="noCFSRatio">0.1</double> <int name="maxCFSSegmentSizeMB">2048</int> <double name="reclaimDeletesWeight">2.0</double> <double name="forceMergeDeletesPctAllowed">10.0</double> </mergePolicyFactory> Controls compound file system segments ratio To completely disable CFS set to 0.0
  • 61. 6 01 Taking Control – Merge Scheduler Controls maximum number of concurrent merges Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">4</int> <int name="maxThreadCount">4</int> </mergeScheduler>
  • 62. 6 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">4</int> <int name="maxThreadCount">4</int> </mergeScheduler>
  • 63. 6 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging For spinning drives set maxThreadCount to 1 Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">4</int> <int name="maxThreadCount">4</int> </mergeScheduler>
  • 64. 6 01 Taking Control – Merge Scheduler Controls number of threads dedicated to merging For spinning drives set maxThreadCount to 1 For SSD set maxThreadCount to min(4, #CPUs / 2) Merge Scheduler <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">4</int> <int name="maxThreadCount">4</int> </mergeScheduler>
  • 65. 6 01 Optimize aka Force Merge Forces segment merge – usually very expensive
  • 66. 6 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified
  • 67. 6 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default)
  • 68. 6 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default) Can be very bad or very good – depending on the use case
  • 69. 6 01 Optimize aka Force Merge Forces segment merge – usually very expensive Desired number of segments can be specified Done on all shards at the same time (by default) Can be very bad or very good – depending on the use case $ curl 'http://solr:8983/solr/lr/update?optimize=true&numSegments=1&waitFlush=false'
  • 70. 7 01 Force Merge – The Good Improves search speed (fewer segments)
  • 71. 7 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents
  • 72. 7 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents Shrinks the index by pruning duplicated data
  • 73. 7 01 Force Merge – The Good Improves search speed (fewer segments) Removes deleted documents Shrinks the index by pruning duplicated data Reduces number of used files
  • 74. 7 01 Force Merge – The Bad Invalidates operating system I/O cache
  • 75. 7 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments
  • 76. 7 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data
  • 77. 7 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data May cause performance issues
  • 78. 7 01 Force Merge – The Bad Invalidates operating system I/O cache Very expensive to perform – rewrites all segments Not efficient on changing data May cause performance issues Will cause temporary increase of disk usage (up to 3x)
  • 79. 7 01 Force Merge – SolrCloud Performance Example
  • 80. 8 01 Force Merge – SolrCloud Performance Example
  • 81. 8 01 Force Merge – Legacy Index on the master server Solr Master Solr Slave Solr Slave Solr Slave index Documents
  • 82. 8 01 Force Merge – Legacy Index on the master server Force merge on the master server Solr Master Solr Slave Solr Slave Solr Slave force merge
  • 83. 8 01 Force Merge – Legacy Index on the master server Force merge on the master server Replicate after optimize is done Solr Master Solr Slave Solr Slave Solr Slave pull after optimize
  • 84. 8 01 Force Merge – SolrCloud (Solr 7 – pull replicas) Create collection Force merge Solr will do the rest Solr Solr Solr Solr Primary 1 Primary 2 Pull Replica 2 Pull Replica 1
  • 85. 8 01 Force Merge – SolrCloud (NRT, pre 7.0) Ask yourself if you really need force merge Solr Solr Solr Solr
  • 86. 8 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Solr Solr Solr Solr Primary 1 Primary 2
  • 87. 8 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Solr Solr Solr Solr Primary 1 Primary 2 DocumentsDocuments Documents Documents
  • 88. 8 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Force merge Solr Solr Solr Solr Primary 1 Primary 2optimize
  • 89. 8 01 Force Merge – SolrCloud (NRT replicas, pre 7.0) Ask yourself if you really need force merge Create collection on part of the nodes Index Force merge Create replicas Solr Solr Solr Solr Primary 1 Primary 2 Replica 2 Replica 1
  • 90. 9 01 Specialized Merge Policy Example – Sorting Sorting Merge Policy Factory Example <mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory"> <str name="sort">timestamp desc</str> <str name="wrapper.prefix">inner</str> <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str> <int name="inner.maxMergeAtOnce">10</int> <int name="inner.segmentsPerTier">10</int> <double name="inner.noCFSRatio">0.1</double> </mergePolicyFactory>
  • 91. 9 01 Specialized Merge Policy Example – Sorting Sorting Merge Policy Factory Example <mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory"> <str name="sort">timestamp desc</str> <str name="wrapper.prefix">inner</str> <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str> <int name="inner.maxMergeAtOnce">10</int> <int name="inner.segmentsPerTier">10</int> <double name="inner.noCFSRatio">0.1</double> </mergePolicyFactory> Pre-sorts data during merge for: - faster range queries - faster data retrieval - possibility of early query termination - convenient for time based data
  • 92. 9 01 http://sematext.com/jobs You love like we do? You want to work with ? Want to work with open source? You want to do fun stuff?