SlideShare a Scribd company logo
1 of 31
Download to read offline
Brought to you by
Where Did All These
Cycles Go?
Thomas Dullien
CEO at
Thomas Dullien
CEO at optimyze.cloud
■ 20+ year career in low-level security tools and research:
● Patch diffing (BinDiff), Binary Code Search (VxClass) - acquired by
Google
● DRAM flaws (Rowhammer), allocator analysis
■ Came to performance work because the skillset was similar, but results
are CO2-friendly.
■ Known as @halvarflake on Twitter, where I don’t self-censor enough
■ Like to be in a hammock by an ocean and read math.
Where have all the cycles gone?
This talk presents interesting places where CPU cycles are “lost” or “wasted” in
production systems.
The insights were obtained by deploying fleet- and system-wide continuous
profiling to large compute clusters across different companies.
All examples are from real-life deployments of nontrivial size.
Where have all the cycles gone?
The results can be roughly grouped into a few buckets:
■ Misconfiguration of the underlying OS vs. software assumptions
■ Surprising behavior of dependencies
■ Suboptimal choice of dependencies
■ Paying “too much” for serialization / deserialisation
The “classic” issue: Clock sources
on AWS Xen instances (m4, i3)
Clock sources on AWS Xen instances
Symptom: Excessive time spent … looking up the time in kernel space?
Understanding vDSO fast path
Userspace application
libc vDSO
Kernel
gettimeofday()
Can I return time without kernel
transition? If yes, do it.
If no, call the kernel to get it,
return result
Fast path for gettimeofday() is assumed!
Most software assumes gettimeofday() is very cheap.
This is particularly true of databases (timestamping of rows etc.)
But:
■ Old AWS instances are Xen by default, and configured with xen time source.
■ Xen time source disables usermode-only gettimeofday()
Fast path for gettimeofday() is assumed!
Configuring TSC-based timekeeping prevents the CPU from doing excessive
userspace-kernelspace transitions when doing gettimeofday().
More details:
https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/
https://heap.io/blog/clocksource-aws-ec2-vdso
No longer an issue on Nitro instances
“Modern” instances (m5, m6i etc.) are no longer Xen-based -- the issue goes away.
Important: i3 instances (instances with fast local SSDs) are still on Xen!
Many distributed databases are largely run on i3 ...
Kubelet walking directories
Kubelet walking directories
Kubelet
Symptom: Kubelet eating significant quantities of CPU time
Flamegraphs to the rescue
Clear slide for diagram with caption
The underlying issue: cadvisor
k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs.GetDirUsage()
In order to track per-container usage of ephemeral storage, cadvisor does a
recursive directory walk of the ephemeral layer of every running container every
minute.
Lesson: Whatever you do, do not have a container create an excessive number of
files in the ephemeral storage: Can cost CPU and even exhaust IOPS on EBS
volumes.
Throwing and catching exceptions
Throwing exceptions
Setup:
■ Large K8s cluster.
■ Machine learning workload using Python bindings to xgboost 0.81, doing
linear regression.
■ No GPUs on the instances.
Symptom: The most heavy function where most CPU time was spent was…
libc-2.28.so:determine_info
Following the stack trace
3) dmlc::StackTrace()
4) dmlc::LogMessageFatal::~LogMessageFatal()
5) dh::ThrowOnCudaError(cudaError, char const*, int)
6) xgboost::AllVisibleImpl::AllVisible()
7) xgboost::obj::RegLossObj<xgboost::obj::LogisticClassification>::Configure
(std::vector<std::pair<std::string, std::string>,
std::allocator<std::pair<std::string, std::string> > > const&)
8) void xgboost::ObjFunction::Configure<
std::_Rb_tree_iterator<std::pair<std::string const, std::string> >>
(std::_Rb_tree_iterator<std::pair<std::string const, std::string> >,
std::_Rb_tree_iterator<std::pair<std::string const, std::string> >)
9) xgboost::LearnerImpl::Load(dmlc::Stream*)
10) XGBoosterLoadModelFromBuffer
<use consolas for font when displaying code>
Throwing exceptions
Root cause:
■ Xgboost lower than 1.0 always tries to talk to the GPU first
■ If no GPU is found, an exception is thrown
■ Only after this is done, the code falls back to CPU
■ determine_info was called during the creation of the backtrace to log
Solution:
■ Recompile xgboost without CUDA support (or upgrade)
zlib vs. zlib-ng
Zlib vs. zlib-ng vs. zlib-cloudflare
■ Most deployments spend the majority of their time in dependencies
■ In any given large deployment, your own code is likely the “minority” of CPU
consumption
■ ⇒ Careful choice of dependencies can make a big difference
■ Compression is often a surprisingly heavy component of any workload.
■ We’ve seen 10%-20% of cycles in 2000-core clusters go into zlib
Drop-in replacement vs. other algorithm?
■ Proper choice of compression means picking an algorithm on the pareto
frontier
■ Zstd explicitly aims to outperform zlib everywhere, and usually succeeds
■ Dependent on data distribution etc.
■ If change of algorithm is not possible, change of zlib fork will help!
Serialization and Deserialization
Ser / Deserialisation code is often 20-30%+
Examples:
■ A large PySpark computational Physics job from CERN: 20% Java-to-Python
de/serialization, 10% Python-to-Java de/serialization
■ Customer deployment: Code falling back to pure-Python msgpack
implementation. Replacing with ormsgpack (Rust-implemented Python
msgpack extension) reduced whole-cluster CPU load by 40%.
Summary
Where have all the cycles gone?
■ Fleet-wide continuous profiling finds many surprising “sinks” for CPU cycles
■ Fleet-wide continuous profiling is becoming common outside hyperscalers
■ Prediction: By 2026, the majority of large infrastructures will be doing
continuous fleet-wide profiling
List of whole-system continuous profilers
■ Pixie Labs (link): C++, Go, Rust -- requires symbols for C++ and Rust to unwind
■ Pyroscope (link): Likely possible to do whole-system, requires symbols on
machines
■ Prodfiler (link): Our own continuous profiler,
C/C++/Rust/Go/JVM/Python/PHP/Perl/Kernel, no symbols on-prod, no
recompile for framepointers.
Questions?
Brought to you by
Thomas Dullien / Halvar Flake
thomasdullien@optimyze.cloud
@halvarflake on Twitter

More Related Content

What's hot

Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
Rust, Wright's Law, and the Future of Low-Latency Systems
Rust, Wright's Law, and the Future of Low-Latency SystemsRust, Wright's Law, and the Future of Low-Latency Systems
Rust, Wright's Law, and the Future of Low-Latency SystemsScyllaDB
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitScyllaDB
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and BeyondScyllaDB
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustScyllaDB
 
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...ScyllaDB
 
How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadScyllaDB
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?ScyllaDB
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?ScyllaDB
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VScyllaDB
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent StorageP99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent StorageScyllaDB
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingLubomir Rintel
 

What's hot (20)

Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Rust, Wright's Law, and the Future of Low-Latency Systems
Rust, Wright's Law, and the Future of Low-Latency SystemsRust, Wright's Law, and the Future of Low-Latency Systems
Rust, Wright's Law, and the Future of Low-Latency Systems
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
 
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
 
How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another Workload
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy Way
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
P99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent StorageP99CONF — What We Need to Unlearn About Persistent Storage
P99CONF — What We Need to Unlearn About Persistent Storage
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
 

Similar to Where Cycles Go: Insights from Continuous Profiling at Scale

Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleMayaData
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Haytham Elkhoja
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldTimothy Perrett
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)Casey Bisson
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingGrigoris Anagnostopoulos
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed_Hat_Storage
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
From data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFrom data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFogGuru MSCA Project
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudThe UberCloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudWolfgang Gentzsch
 

Similar to Where Cycles Go: Insights from Continuous Profiling at Scale (20)

Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional World
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modeling
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
From data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloudFrom data centers to fog computing: the evaporating cloud
From data centers to fog computing: the evaporating cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Where Cycles Go: Insights from Continuous Profiling at Scale

  • 1. Brought to you by Where Did All These Cycles Go? Thomas Dullien CEO at
  • 2. Thomas Dullien CEO at optimyze.cloud ■ 20+ year career in low-level security tools and research: ● Patch diffing (BinDiff), Binary Code Search (VxClass) - acquired by Google ● DRAM flaws (Rowhammer), allocator analysis ■ Came to performance work because the skillset was similar, but results are CO2-friendly. ■ Known as @halvarflake on Twitter, where I don’t self-censor enough ■ Like to be in a hammock by an ocean and read math.
  • 3. Where have all the cycles gone? This talk presents interesting places where CPU cycles are “lost” or “wasted” in production systems. The insights were obtained by deploying fleet- and system-wide continuous profiling to large compute clusters across different companies. All examples are from real-life deployments of nontrivial size.
  • 4. Where have all the cycles gone? The results can be roughly grouped into a few buckets: ■ Misconfiguration of the underlying OS vs. software assumptions ■ Surprising behavior of dependencies ■ Suboptimal choice of dependencies ■ Paying “too much” for serialization / deserialisation
  • 5. The “classic” issue: Clock sources on AWS Xen instances (m4, i3)
  • 6. Clock sources on AWS Xen instances Symptom: Excessive time spent … looking up the time in kernel space?
  • 7. Understanding vDSO fast path Userspace application libc vDSO Kernel gettimeofday() Can I return time without kernel transition? If yes, do it. If no, call the kernel to get it, return result
  • 8. Fast path for gettimeofday() is assumed! Most software assumes gettimeofday() is very cheap. This is particularly true of databases (timestamping of rows etc.) But: ■ Old AWS instances are Xen by default, and configured with xen time source. ■ Xen time source disables usermode-only gettimeofday()
  • 9. Fast path for gettimeofday() is assumed! Configuring TSC-based timekeeping prevents the CPU from doing excessive userspace-kernelspace transitions when doing gettimeofday(). More details: https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ https://heap.io/blog/clocksource-aws-ec2-vdso
  • 10. No longer an issue on Nitro instances “Modern” instances (m5, m6i etc.) are no longer Xen-based -- the issue goes away. Important: i3 instances (instances with fast local SSDs) are still on Xen! Many distributed databases are largely run on i3 ...
  • 12. Kubelet walking directories Kubelet Symptom: Kubelet eating significant quantities of CPU time
  • 14. Clear slide for diagram with caption
  • 15. The underlying issue: cadvisor k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs.GetDirUsage() In order to track per-container usage of ephemeral storage, cadvisor does a recursive directory walk of the ephemeral layer of every running container every minute. Lesson: Whatever you do, do not have a container create an excessive number of files in the ephemeral storage: Can cost CPU and even exhaust IOPS on EBS volumes.
  • 16. Throwing and catching exceptions
  • 17. Throwing exceptions Setup: ■ Large K8s cluster. ■ Machine learning workload using Python bindings to xgboost 0.81, doing linear regression. ■ No GPUs on the instances. Symptom: The most heavy function where most CPU time was spent was… libc-2.28.so:determine_info
  • 18. Following the stack trace 3) dmlc::StackTrace() 4) dmlc::LogMessageFatal::~LogMessageFatal() 5) dh::ThrowOnCudaError(cudaError, char const*, int) 6) xgboost::AllVisibleImpl::AllVisible() 7) xgboost::obj::RegLossObj<xgboost::obj::LogisticClassification>::Configure (std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > > const&) 8) void xgboost::ObjFunction::Configure< std::_Rb_tree_iterator<std::pair<std::string const, std::string> >> (std::_Rb_tree_iterator<std::pair<std::string const, std::string> >, std::_Rb_tree_iterator<std::pair<std::string const, std::string> >) 9) xgboost::LearnerImpl::Load(dmlc::Stream*) 10) XGBoosterLoadModelFromBuffer <use consolas for font when displaying code>
  • 19. Throwing exceptions Root cause: ■ Xgboost lower than 1.0 always tries to talk to the GPU first ■ If no GPU is found, an exception is thrown ■ Only after this is done, the code falls back to CPU ■ determine_info was called during the creation of the backtrace to log Solution: ■ Recompile xgboost without CUDA support (or upgrade)
  • 21. Zlib vs. zlib-ng vs. zlib-cloudflare ■ Most deployments spend the majority of their time in dependencies ■ In any given large deployment, your own code is likely the “minority” of CPU consumption ■ ⇒ Careful choice of dependencies can make a big difference ■ Compression is often a surprisingly heavy component of any workload. ■ We’ve seen 10%-20% of cycles in 2000-core clusters go into zlib
  • 22. Drop-in replacement vs. other algorithm? ■ Proper choice of compression means picking an algorithm on the pareto frontier ■ Zstd explicitly aims to outperform zlib everywhere, and usually succeeds ■ Dependent on data distribution etc. ■ If change of algorithm is not possible, change of zlib fork will help!
  • 23.
  • 24.
  • 26. Ser / Deserialisation code is often 20-30%+ Examples: ■ A large PySpark computational Physics job from CERN: 20% Java-to-Python de/serialization, 10% Python-to-Java de/serialization ■ Customer deployment: Code falling back to pure-Python msgpack implementation. Replacing with ormsgpack (Rust-implemented Python msgpack extension) reduced whole-cluster CPU load by 40%.
  • 28. Where have all the cycles gone? ■ Fleet-wide continuous profiling finds many surprising “sinks” for CPU cycles ■ Fleet-wide continuous profiling is becoming common outside hyperscalers ■ Prediction: By 2026, the majority of large infrastructures will be doing continuous fleet-wide profiling
  • 29. List of whole-system continuous profilers ■ Pixie Labs (link): C++, Go, Rust -- requires symbols for C++ and Rust to unwind ■ Pyroscope (link): Likely possible to do whole-system, requires symbols on machines ■ Prodfiler (link): Our own continuous profiler, C/C++/Rust/Go/JVM/Python/PHP/Perl/Kernel, no symbols on-prod, no recompile for framepointers.
  • 31. Brought to you by Thomas Dullien / Halvar Flake thomasdullien@optimyze.cloud @halvarflake on Twitter