High performance (as measured by sub-millisecond response time for queries) is a key characteristic of the Redis database, and it is one of the main reasons why Redis is the most popular key-value database in the world.
In order to continue improving performance across all of the different Redis components, we’ve developed a framework for automatically triggering performance tests, telemetry gathering, profiling, and data visualization upon code commit.
In this talk, we describe how this type of automation and “zero-touch” profiling scaled our ability to pursue performance regressions and find opportunities to improve the efficiency of our code, helping us (as a company) to start shifting from a reactive to a more proactive performance mindset.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
End-To-End Performance Testing, Profiling, and Analysis at Redis
1. Brought to you by
E2E Performance Testing,
Profiling, and Analysis at Redis
Filipe Oliveira
Senior Performance Engineer at Redis
2. > whoami
2
■ Working on continuous performance analysis
■ ~3 years working for Redis Ltd
■ Open Source Contributor (C, Go): github.com/filipecosta90
● improve/develop open source performance/observability tools
■ https://github.com/HdrHistogram/hdrhistogram-go
■ https://github.com/RedisBloom/t-digest-c
3. 3
Agenda
■ Performance @Redis Ltd
■ The “old behaviour”
■ The do’s and don'ts
■ Our approach
■ What we’ve gained and what’s next
4. Performance @Redis
4
Vanilla Redis (purely OSS project) and Redis Ltd
1. foster benchmark and observability standards across community and vendors
2. support the contributions of other members to the OSS projects
a. performance numbers
b. performance how-tos
c. or the means to properly assess the performance impact of the change they’re proposing.
3. optimize an industry-leading solution
5. Ordinarily, on our Companies Core Products
5
We have...
● automatic extensive tests to catch functional failures
...but when
● we accidentally commit a performance regression, nothing intercepts it*!
7. A Real Case From 2019
7
Simple request
1. RediSearch minor version bump
2. Required multiple patch
a. Feedback cycle took us at-least 1 day
b. prioritized over other projects
c. Siloed
d. Jul. 30, Nov. 27, 2019
You can relate to...
● your team run performance tests before releasing
8. Ordinarily, on our Companies Core Products
8
You can state...
● your team run performance tests before releasing
...but solving slowdowns just before releasing is...
● dangerous
● time-consuming
● one of the most difficult tasks to estimate time to
...is just buffering potential issues!
9. Goal: Reduce Feedback Cycle. Avoid Silos
9
Requirements for valid tests
- Stable testing environment
- Deterministic testing tools
- Deterministic outcomes
- Reduced testing/probing overhead
- Reduce tested changes to the minimal
Requirements for acceptance in
products
- Acceptable duration
- No manual work
- Actionable items
- Well defined key performance
indicators
CODE REVIEW
PREVIEW /
UNSTABLE
RELEASE
MANUAL
PERF
CHECK
CODE REVIEW
PREVIEW /
UNSTABLE
RELEASE
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK
10. 10
This is Not New/Disruptive
Elastic
https://elasticsearch-benchmarks.elastic.co/#
Lucene
https://home.apache.org/~mikemccand/lucenebench/
12. Our Approach (Lane B)
12
B
[1] https://github.com/RedisTimeSeries/RedisTimeSeries/tree/master/tests/benchmarks
[2] https://github.com/RedisLabsModules/redisbench-admin
Redis Ltd
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
+ Redis Developers Group + Patterns ( )
+
13. by branch
scalability analysis
Our Approach
13
by version
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization
14. 14
Our Approach
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization
15. Our Approach
15
nightly:
feature* / perf* / v*:
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization
17. Our Approach
17
1. Full process Flame Graph + main thread Flame Graph
2. perf report per dso
3. perf report per dso,sym (w/wout callgraph)
4. perf report per dso,sym,srcline (w/wout callgraph)
5. identical stacks collapsed
6. hotpath callgraph
1
3
2
4
6
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization
18. Our Approach (Lane B)
18
Analysis:
https://github.com/RedisTimeSeries/RedisTimeSeries/issues/793
PR:
https://github.com/RedisTimeSeries/RedisTimeSeries/pull/794
Live in progress:
https://github.com/RedisTimeSeries/RedisTimeSeries/issues/907
1. Started by the small scale projects
a. Redis Module’s
2. Initial focus on OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization
19. 19
What We’ve Gained
● Deeply reduced the feedback cycle ( days -> 1hour )
● Dev’s can easily add tests (243 full suites)
● Scaled + more challenging!
● Finding performance problems/points of improvement is
now everyone’s power/responsibility
20. 20
What We’ve Gained
● A/B test new tech/state-of-the-art HW/SW components
● Continuous up-to-date numbers for use-cases that matter
● Foster openness / unbiased community/cross-company
efforts
21. 21
What’s Next
● Feature parity on OSS platform and Company platform
● Extend profiler daemon to bpf tooling, vtune
○ off-cpu analysis
○ threading/locking
○ vectorization reports
VISIBILITY for Points of Improvement
22. 22
What’s Next
● Improve anomaly/regression detection
● Increase OSS / Company adoption
○ expose data on docs
23. Brought to you by
@fcosta_oliveira
thank you
we’re hiring!
performance@redis.com