SlideShare a Scribd company logo
Writing a TSDB from scratch
performance optimizations
Roman Khavronenko | github.com/hagen1778
Roman Khavronenko
Co-founder of VictoriaMetrics
Software engineer with experience in distributed systems,
monitoring and high-performance services.
https://github.com/hagen1778
https://twitter.com/hagen1778
What is a metric?
What is a metric? Scrape target
> curl http://service:port/metrics
Collecting metrics
Collecting metrics
Delivering collected metrics to TSDB
Workload pattern for TSDB: writes
Workload pattern for TSDB: reads
Workload pattern for TSDB
● TSDBs process tremendous amounts of data
● They are usually write-heavy applications, optimized for ingestion
● Read load is usually much lower than write load
● Read queries are sporadic and unpredictable
How to deal with such workload?
System design oriented for time series data:
1. Log Structured Merge (LSM) data structure
2. Data for each column is stored separately
3. Append-only writes
How to deal with such workload?
And some more non-design-specific optimizations:
1. Strings interning
2. Function results caching
3. Concurrency limiting for CPU-bound operations
4. Sync pool for CPU-bound operations
Strings interning
String interning
Store only one unique copy in memory!
String interning
String interning: naive implementation
var internStringsMap = make(map[string]string)
func intern(s string) string {
m := internStringsMap
if v, ok := m[s]; ok {
return v
}
m[s] = s
return s
}
func ptr (s string) uintptr {
return (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
}
func main() {
s1 := intern("42")
s2 := intern(fmt.Sprintf("%d", 42))
fmt.Println(ptr(s1) == ptr(s2)) // true
}
String interning: naive implementation
1. Map isn't thread safe
String interning: naive implementation
1. Map isn't thread safe
2. Map with lock doesn't scale with number of CPUs
String interning: naive implementation
String interning: sync.Map
var internStringsMap = sync.Map{}
func intern(s string) string {
m := &internStringsMap
interned, _ := m.LoadOrStore(s, s)
return interned.(string)
}
String interning: sync.Map
sync.Map is optimized for two common use cases:
1. When the entry for a given key is only ever written once but read
many times
String interning: sync.Map
sync.Map is optimized for two common use cases:
1. When the entry for a given key is only ever written once but read
many times
2. When multiple goroutines read, write, and overwrite entries for
disjoint sets of keys.
In these two cases, use of a Map reduces lock contention
and improves performance compared to a Go map paired with a
separate Mutex or RWMutex.
String interning: gotchas
1. Map will grow over time:
a. Rotate maps once in a while
String interning: gotchas
1. Map will grow over time:
a. Rotate maps once in a while
b. Add TTL logic to purge cold entries
String interning: gotchas
1. Map will grow over time:
a. Rotate maps once in a while
b. Add TTL logic to purge cold entries
2. Sanity check of arguments:
a. At some point, someone will try to intern byte slice or substring:
*(*string)(unsafe.Pointer(&b)) or str[:n]
String interning: gotchas
1. Map will grow over time:
a. Rotate maps once in a while
b. Add TTL logic to purge cold entries
2. Sanity check of arguments:
a. At some point, someone will try to intern byte slice or substring:
*(*string)(unsafe.Pointer(&b)) or str[:n]
b. Make sure to clone received strings:
strings.Clone(s)
String interning: summary
● We use string interning for storing time series metadata (aka labels).
● It helps to reduce memory usage during metadata parsing
● Interning works the best for read-intensive workload with limited
number of variants with high hit rate
Function results caching
Function results caching: relabeling
Function results caching: caching Transformer
type Transformer struct {
m sync.Map
transformFunc func(s string) string
}
func (t *Transformer) Transform(s string) string {
v, ok := t.m.Load(s)
if ok {
// Fast path - the transformed s is found in the cache.
return v.(string)
}
// Slow path - transform s and store it in the cache.
sTransformed := t.transformFunc(s)
t.m.Store(s, sTransformed)
return sTransformed
}
Function results caching: caching Transformer
// SanitizeName replaces unsupported by Prometheus chars
// in metric names and label names with _.
func SanitizeName(name string) string {
return promSanitizer.Transform(name)
}
var promSanitizer = NewTransformer(func(s string) string {
return unsupportedPromChars.ReplaceAllString(s, "_")
})
Function results caching: example
Function results caching: summary
● Helps to save CPU time in the cost of increased mem usage
● Works best for heavy usage of string transforms, regex matching, etc
● And when the number of arguments and their variants is limited
● Doesn't work good when number of transformations is unlimited or
inconsistent - like query processing
Limiting concurrency for CPU-bound load
Volatile number of scrape targets
Spikes in ingestion stream
Limiting concurrency for CPU intensive operations
+ Makes system more stable and efficient
+ Helps to control the memory usage on load spikes (which is expected in
monitoring)
+ Improves the processing speed of each goroutine by reducing the number
of context switches
- The downside is complexity - it is easy to make a mistake and end up with
a deadlock or inefficient resource utilization.
Limited concurrency: workers
var concurrencyLimit = runtime.NumCPU()
func main() {
workCh := make(chan work, concurrencyLimit*2)
for i := 0; i < concurrencyLimit; i++ {
go func() {
for {
processData(<-workCh)
}
}()
}
}
Limited concurrency: workers
+ Workers could have scoped buffers, metrics, etc.
- Code becomes complicated: start and stop procedures for workers
- Additional synchronization to distribute work via channels
Limited concurrency: channel
var concurrencyLimitCh = make(chan struct{}, runtime.NumCPU())
// This function is CPU-bound and may allocate a lot of memory.
// We limit the number of concurrent calls to limit memory
// usage under high load without sacrificing the performance.
func processData(src, dst []byte) error {
concurrencyLimitCh <- struct{}{}
defer func() {
<-concurrencyLimitCh
}()
// heavy processing...
Limited concurrency: summary
● Works the best for CPU bound operations
● Helps to bound resource usage and process it sequentially with the
optimal performance instead of wasting resources on context switches
● Helps to prevent from excessive memory usage during load spikes
● Do not apply limiting to IO bound (disk, network) operations
sync.Pool for CPU bound operations
sync.Pool is widely used in VM
grep -r "sync.Pool" ./app ./lib | wc -l
118
grep -r "bytesutil.ByteBufferPool" ./app ./lib | wc -l
34
sync.Pool for CPU bound operations in one thread
● All processed on a single CPU core
● No object stealing
● Lower number of objects allocated, better pool utilization
● Lower GC pressure
sync.Pool for synchronous processing
● Object is retrieved, used and released by different goroutines
● High chances for goroutines to be scheduled to different threads
● High chances for objects stealing
sync.Pool for IO bound operations
● Obj retrieved from sync.pool used for IO operations.
● IO operations are slow and sporadic
● so sync.Pool can allocate big amount of objects and result in uncontrolled
mem usage
● Higher pressure on GC
sync.Pool - lib/bytesbuffer
type ByteBufferPool struct {
p sync.Pool
}
// Verify ByteBuffer implements the given interfaces.
_ io.Writer = &ByteBuffer{}
_ fs.MustReadAtCloser = &ByteBuffer{}
_ io.ReaderFrom = &ByteBuffer{}
sync.Pool - lib/bytesbuffer
func (bbp *ByteBufferPool) Get() *ByteBuffer {
bbv := bbp.p.Get()
if bbv == nil {
return &ByteBuffer{}
}
return bbv.(*ByteBuffer)
}
func (bbp *ByteBufferPool) Put(bb *ByteBuffer) {
bb.Reset()
bbp.p.Put(bb)
}
sync.Pool - lib/bytesbuffer
bb := bbPool.Get() // acquire from pool
bb.B, err = DecompressZSTD(bb.B[:0], src)
if err != nil {
return nil, fmt.Errorf("cannot decompress: %w", err)
}
// unmarshal from buffer to dst
dst, err = unmarshalInt64NearestDelta(dst, bb.B)
bbPool.Put(bb) // release to pool
Bytebuffer pool issues
1. sync.Pool assumes all entries it contains are "the same"
2. While in real world bytebuffer are usually have different size
3. Mixing big and small bytebuffers in a single pool can result into:
a. Excessive memory usage
b. Suboptimal objects reuse
Leveled (bucketized) bytebuffer pool
Leveled (bucketized) bytebuffer pool
// pools contains pools for byte slices of various capacities.
//
// pools[0] is for capacities from 0 to 8
// pools[1] is for capacities from 9 to 16
// pools[2] is for capacities from 17 to 32
// ...
// pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3)
//
// Limit the maximum capacity to 2^18, since there are no
performance benefits
// in caching byte slices with bigger capacities.
var pools [17]sync.Pool
Leveled (bucketized) bytebuffer pool
func (sw *scrapeWork) scrape() {
body := leveledbytebufferpool.Get(sw.prevBodyLen)
body.B = sw.ReadData(body.B[:0])
sw.processScrapedData(body)
leveledbytebufferpool.Put(body)
}
Ingestion of 100Mil samples/s benchmark
Summary
1. String interning for reducing GC pressure and memory usage for
read-intensive workloads
2. Function results caching for reducing CPU usage during strings
transformations
3. Concurrency limiting for the better performance and predictable
memory usage
4. Sync.pool for reducing GC pressure and improving performance of
CPU bound operations.
Questions?
● VictoriaMetrics scaling to 100M samples/s
● https://github.com/VictoriaMetrics
● https://github.com/hagen1778

More Related Content

Similar to Writing a TSDB from scratch_ performance optimizations.pdf

002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
Scott Miao
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
petabridge
 
gRPC in Go
gRPC in GogRPC in Go
gRPC in Go
Almog Baku
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
상욱 송
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
x1 ichi
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
jessesanford
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
Jason Hearne-McGuiness
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
Workhorse Computing
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola Peric
Nik Peric
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
 
Go on!
Go on!Go on!
Go on!
Vadim Petrov
 
Chapter Seven(2)
Chapter Seven(2)Chapter Seven(2)
Chapter Seven(2)bolovv
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
Fwdays
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
Router Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsMorteza Mahdilar
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
Pavan Chitumalla
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
Dave Hiltbrand
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
samthemonad
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
 

Similar to Writing a TSDB from scratch_ performance optimizations.pdf (20)

002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
gRPC in Go
gRPC in GogRPC in Go
gRPC in Go
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola Peric
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Go on!
Go on!Go on!
Go on!
 
Chapter Seven(2)
Chapter Seven(2)Chapter Seven(2)
Chapter Seven(2)
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Router Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditions
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Writing a TSDB from scratch_ performance optimizations.pdf

  • 1. Writing a TSDB from scratch performance optimizations Roman Khavronenko | github.com/hagen1778
  • 2. Roman Khavronenko Co-founder of VictoriaMetrics Software engineer with experience in distributed systems, monitoring and high-performance services. https://github.com/hagen1778 https://twitter.com/hagen1778
  • 3. What is a metric?
  • 4. What is a metric? Scrape target > curl http://service:port/metrics
  • 8. Workload pattern for TSDB: writes
  • 9. Workload pattern for TSDB: reads
  • 10. Workload pattern for TSDB ● TSDBs process tremendous amounts of data ● They are usually write-heavy applications, optimized for ingestion ● Read load is usually much lower than write load ● Read queries are sporadic and unpredictable
  • 11. How to deal with such workload? System design oriented for time series data: 1. Log Structured Merge (LSM) data structure 2. Data for each column is stored separately 3. Append-only writes
  • 12. How to deal with such workload? And some more non-design-specific optimizations: 1. Strings interning 2. Function results caching 3. Concurrency limiting for CPU-bound operations 4. Sync pool for CPU-bound operations
  • 15. Store only one unique copy in memory!
  • 17. String interning: naive implementation var internStringsMap = make(map[string]string) func intern(s string) string { m := internStringsMap if v, ok := m[s]; ok { return v } m[s] = s return s }
  • 18. func ptr (s string) uintptr { return (*reflect.StringHeader)(unsafe.Pointer(&s)).Data } func main() { s1 := intern("42") s2 := intern(fmt.Sprintf("%d", 42)) fmt.Println(ptr(s1) == ptr(s2)) // true } String interning: naive implementation
  • 19. 1. Map isn't thread safe String interning: naive implementation
  • 20. 1. Map isn't thread safe 2. Map with lock doesn't scale with number of CPUs String interning: naive implementation
  • 21. String interning: sync.Map var internStringsMap = sync.Map{} func intern(s string) string { m := &internStringsMap interned, _ := m.LoadOrStore(s, s) return interned.(string) }
  • 22. String interning: sync.Map sync.Map is optimized for two common use cases: 1. When the entry for a given key is only ever written once but read many times
  • 23. String interning: sync.Map sync.Map is optimized for two common use cases: 1. When the entry for a given key is only ever written once but read many times 2. When multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map reduces lock contention and improves performance compared to a Go map paired with a separate Mutex or RWMutex.
  • 24. String interning: gotchas 1. Map will grow over time: a. Rotate maps once in a while
  • 25. String interning: gotchas 1. Map will grow over time: a. Rotate maps once in a while b. Add TTL logic to purge cold entries
  • 26. String interning: gotchas 1. Map will grow over time: a. Rotate maps once in a while b. Add TTL logic to purge cold entries 2. Sanity check of arguments: a. At some point, someone will try to intern byte slice or substring: *(*string)(unsafe.Pointer(&b)) or str[:n]
  • 27. String interning: gotchas 1. Map will grow over time: a. Rotate maps once in a while b. Add TTL logic to purge cold entries 2. Sanity check of arguments: a. At some point, someone will try to intern byte slice or substring: *(*string)(unsafe.Pointer(&b)) or str[:n] b. Make sure to clone received strings: strings.Clone(s)
  • 28. String interning: summary ● We use string interning for storing time series metadata (aka labels). ● It helps to reduce memory usage during metadata parsing ● Interning works the best for read-intensive workload with limited number of variants with high hit rate
  • 31. Function results caching: caching Transformer type Transformer struct { m sync.Map transformFunc func(s string) string }
  • 32. func (t *Transformer) Transform(s string) string { v, ok := t.m.Load(s) if ok { // Fast path - the transformed s is found in the cache. return v.(string) } // Slow path - transform s and store it in the cache. sTransformed := t.transformFunc(s) t.m.Store(s, sTransformed) return sTransformed } Function results caching: caching Transformer
  • 33. // SanitizeName replaces unsupported by Prometheus chars // in metric names and label names with _. func SanitizeName(name string) string { return promSanitizer.Transform(name) } var promSanitizer = NewTransformer(func(s string) string { return unsupportedPromChars.ReplaceAllString(s, "_") }) Function results caching: example
  • 34. Function results caching: summary ● Helps to save CPU time in the cost of increased mem usage ● Works best for heavy usage of string transforms, regex matching, etc ● And when the number of arguments and their variants is limited ● Doesn't work good when number of transformations is unlimited or inconsistent - like query processing
  • 35. Limiting concurrency for CPU-bound load
  • 36. Volatile number of scrape targets
  • 38. Limiting concurrency for CPU intensive operations + Makes system more stable and efficient + Helps to control the memory usage on load spikes (which is expected in monitoring) + Improves the processing speed of each goroutine by reducing the number of context switches - The downside is complexity - it is easy to make a mistake and end up with a deadlock or inefficient resource utilization.
  • 39. Limited concurrency: workers var concurrencyLimit = runtime.NumCPU() func main() { workCh := make(chan work, concurrencyLimit*2) for i := 0; i < concurrencyLimit; i++ { go func() { for { processData(<-workCh) } }() } }
  • 40. Limited concurrency: workers + Workers could have scoped buffers, metrics, etc. - Code becomes complicated: start and stop procedures for workers - Additional synchronization to distribute work via channels
  • 41. Limited concurrency: channel var concurrencyLimitCh = make(chan struct{}, runtime.NumCPU()) // This function is CPU-bound and may allocate a lot of memory. // We limit the number of concurrent calls to limit memory // usage under high load without sacrificing the performance. func processData(src, dst []byte) error { concurrencyLimitCh <- struct{}{} defer func() { <-concurrencyLimitCh }() // heavy processing...
  • 42. Limited concurrency: summary ● Works the best for CPU bound operations ● Helps to bound resource usage and process it sequentially with the optimal performance instead of wasting resources on context switches ● Helps to prevent from excessive memory usage during load spikes ● Do not apply limiting to IO bound (disk, network) operations
  • 43. sync.Pool for CPU bound operations
  • 44. sync.Pool is widely used in VM grep -r "sync.Pool" ./app ./lib | wc -l 118 grep -r "bytesutil.ByteBufferPool" ./app ./lib | wc -l 34
  • 45. sync.Pool for CPU bound operations in one thread ● All processed on a single CPU core ● No object stealing ● Lower number of objects allocated, better pool utilization ● Lower GC pressure
  • 46. sync.Pool for synchronous processing ● Object is retrieved, used and released by different goroutines ● High chances for goroutines to be scheduled to different threads ● High chances for objects stealing
  • 47. sync.Pool for IO bound operations ● Obj retrieved from sync.pool used for IO operations. ● IO operations are slow and sporadic ● so sync.Pool can allocate big amount of objects and result in uncontrolled mem usage ● Higher pressure on GC
  • 48. sync.Pool - lib/bytesbuffer type ByteBufferPool struct { p sync.Pool } // Verify ByteBuffer implements the given interfaces. _ io.Writer = &ByteBuffer{} _ fs.MustReadAtCloser = &ByteBuffer{} _ io.ReaderFrom = &ByteBuffer{}
  • 49. sync.Pool - lib/bytesbuffer func (bbp *ByteBufferPool) Get() *ByteBuffer { bbv := bbp.p.Get() if bbv == nil { return &ByteBuffer{} } return bbv.(*ByteBuffer) } func (bbp *ByteBufferPool) Put(bb *ByteBuffer) { bb.Reset() bbp.p.Put(bb) }
  • 50. sync.Pool - lib/bytesbuffer bb := bbPool.Get() // acquire from pool bb.B, err = DecompressZSTD(bb.B[:0], src) if err != nil { return nil, fmt.Errorf("cannot decompress: %w", err) } // unmarshal from buffer to dst dst, err = unmarshalInt64NearestDelta(dst, bb.B) bbPool.Put(bb) // release to pool
  • 51. Bytebuffer pool issues 1. sync.Pool assumes all entries it contains are "the same" 2. While in real world bytebuffer are usually have different size 3. Mixing big and small bytebuffers in a single pool can result into: a. Excessive memory usage b. Suboptimal objects reuse
  • 53. Leveled (bucketized) bytebuffer pool // pools contains pools for byte slices of various capacities. // // pools[0] is for capacities from 0 to 8 // pools[1] is for capacities from 9 to 16 // pools[2] is for capacities from 17 to 32 // ... // pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3) // // Limit the maximum capacity to 2^18, since there are no performance benefits // in caching byte slices with bigger capacities. var pools [17]sync.Pool
  • 54. Leveled (bucketized) bytebuffer pool func (sw *scrapeWork) scrape() { body := leveledbytebufferpool.Get(sw.prevBodyLen) body.B = sw.ReadData(body.B[:0]) sw.processScrapedData(body) leveledbytebufferpool.Put(body) }
  • 55. Ingestion of 100Mil samples/s benchmark
  • 56. Summary 1. String interning for reducing GC pressure and memory usage for read-intensive workloads 2. Function results caching for reducing CPU usage during strings transformations 3. Concurrency limiting for the better performance and predictable memory usage 4. Sync.pool for reducing GC pressure and improving performance of CPU bound operations.
  • 57. Questions? ● VictoriaMetrics scaling to 100M samples/s ● https://github.com/VictoriaMetrics ● https://github.com/hagen1778