HOSTED BY
Adventures in Thread-per-core Async
with Redpanda & Seastar
Travis Downs
Software Engineer at Redpanda
Travis Downs (He/Him)
Software Engineer at Redpanda
■ I love going deep on performance – all the way to
assembly, if necessary
■ I’ve held principal staff positions at Salesforce &
architect roles at SAP and Business Objects
■ I had hobbies like writing a software performance
blog, but now I’m a parent, so…
3
Redpanda in 60 seconds
Redpanda is a streaming storage engine
Clients speak Apache Kafka API to Redpanda nodes to produce
and consume from topic partitions.
Partitions are logs (~10,000s per cluster)
Each partition is a Raft group (~3 members)
Scale up and scale out should be ~equivalent
Thread-per-core
What is thread-per-core?
One thread per core and pinned: make scheduling decisions in userspace.
This thread must not block.
Question: how do we replace blocking calls?
Answer: …
5
Seastar was created by the ScyllaDB project.
Redpanda is built on Seastar. We 😍it.
Shared nothing architecture made up of “shards”:
■ A CPU core
■ A pool of memory NUMA-local to that core
■ All-to-all mesh of SPSC message queues
■ Cooperative multitasking
Seastar
6
Async C++ with
coroutines
Continuation style
ss::future<> consensus::stop() {
return _event_manager.stop()
.then([this] { return _append_requests_buffer.stop(); })
.then([this] { return _batcher.stop(); })
.then([this] { return _bg.close(); })
.then([this] {
if (likely(!_snapshot_writer)) {
return ss::now();
}
return _snapshot_writer->close().then(
[this] { _snapshot_writer.reset(); });
});
}
C++ coroutines
seastar::future<std::string> my_coroutine() {
co_await seastar::sleep(100ms); // returns future<>
co_return "hello world";
}
New in C++ 20: three new keywords
co_await
co_yield
co_return
Language provides a future concept but not implementation: Seastar still defines the
future/promise type.
When compiler sees a co_* keyword, the function is rewritten to stash stack variables on the
heap as needed to support suspension/resumption of execution.
C++20 coroutines: after
ss::future<> consensus::stop() {
…
co_await _event_manager.stop();
co_await _append_requests_buffer.stop();
co_await _batcher.stop();
_op_lock.broken();
co_await _bg.close();
if (unlikely(_snapshot_writer)) {
co_await _snapshot_writer->close();
_snapshot_writer.reset();
}
}
New vs old
ss::future<> consensus::stop() {
…
co_await _event_manager.stop();
co_await _append_requests_buffer.stop();
co_await _batcher.stop();
_op_lock.broken();
co_await _bg.close();
if (unlikely(_snapshot_writer)) {
co_await _snapshot_writer->close();
_snapshot_writer.reset();
}
}
ss::future<> consensus::stop() {
…
return _event_manager.stop()
.then([this] { return
_append_requests_buffer.stop(); })
.then([this] { return _batcher.stop(); })
.then([this] { return _bg.close(); })
.then([this] {
if (likely(!_snapshot_writer)) {
return ss::now();
}
return _snapshot_writer->close().then(
[this] { _snapshot_writer.reset(); });
});
Coroutine Performance
Coroutine performance depends on both on the framework implementing the
promise type and the compiler
Here we talk about seastar’s implementation and clang++
Preview: coroutines are not transparent when it comes to performance
Frame allocations
Observation: almost every coroutine allocates
Exception: if the compiler can statically prove the coro never suspends
- No suspension points (co_await or co_yield) in the function
- Suspension points is never reachable
- Suspension point is reachable but never suspends
Frame allocations 2
This coroutine:
- Never suspends
- Never even executes co_await
- ~200 instructions and ~80 cycles
- Always allocates
seastar::future<> empty_coro() {
if (always_false) {
co_await make_ready_future<>();
}
}
Case study: varint decode
Let’s look at a case study drawn from Redpanda code
Decode an unsigned 32-bit varint
1-5 bytes and MSB of 0 indicates final byte
Widely used in Kafka protocol (and other places)
Case study: coroutine decoder
read1() is async
Almost the same as the synchronous
version
Allocates once per decode
auto coro_decode(input_stream& s) {
detail::var_decoder decoder;
while (true) {
char c = co_await s.read1();
if (decoder.accept(c)) {
break;
}
}
co_return decoder.result();
}
~680 instructions
~220 cyles
176 bytes allocated
Case study: continuation decoder
Much harder to read (and
write)
Does not allocate
Recursion is bounded by
decoder
auto cont_recurse(iobuf_reader& s, var_decoder decoder) {
return s.read1().then([&s, decoder](char c) mutable {
if (decoder.accept(c)) {
return ss::make_ready_future<result_type>
(decoder.result());
}
return cont_recurse(s, decoder);
});
}
So is it faster?
Case study: runtime comparison
Case study: mystery method 1
Optimistic approach
Avoid any async machinery if
possible
Doubles the amount of code
auto cont_tricky(iobuf_reader& s, var_decoder decoder) {
auto f = s.read1();
while (f.available()) {
if (decoder.accept(f.get())) {
return decoder.result_as_future();
}
f = s.read1();
}
return std::move(f).then([&s, decoder](char c) mutable {
if (decoder.accept(c)) {
return decoder.result_as_future();
}
return cont_tricky(s, decoder);
});
}
Case study: mystery method 2
Synchronous version
Almost identical to coro version
Speedup varies from 4x to 9x
auto sync_decode(input_stream& s) {
detail::var_decoder decoder;
while (true) {
char c = s.read1_sync();
if (decoder.accept(c)) {
break;
}
}
return decoder.result();
}
Sync with async fallback
So how should we really do this?
Use sync with async fallback.
Peek at 5 bytes, fallback if not available.
Fallback must in own method!
auto decode_fallback(iobuf_reader& s) {
auto [buf, filled] = s.peek<5>();
if (filled) {
auto result = decode_u32(buf.data());
s.skip(result.second);
return ss::make_ready_future(result);
}
return coro_decode(s);
}
Performance Bottom Line
Async is still cheap in the large
- Context switches are 1,000s of cycles, large cache impact
Very short coroutines may be expensive: consider continuations
Continuations have a per-continuation cost: consider coroutines
Consider sync vs async fallback
Drive the above decisions via profiling
Summary: are coroutines “async made easy”?
⚠️ C++ is not memory safe, and async makes it (even) easier to write a segfault with a
careless reference. Sometimes coroutines help with this.
ℹ️ Compiler bugs: LLVM is great and things get fixed fast, but coroutines are “early adopter”
stage. Use the latest release! (e.g. llvm/llvm-project#51843).
ℹ️ Performance: It’s complicated.
✅ Net win for maintainability, robust use of RAII, and open the door to future compiler
optimization of async code.
27
Travis Downs
travis.downs@redpanda.co
m
@trav_downs
travisdowns
Thank you! Let’s connect.
Trade offs
Alternatives
Seastar is not the only option for writing fast async code:
■ C++/asio
■ Rust/tokio
■ Various GC language options (goroutines, Java lightweight threads)
Main difference: these do not adopt Seastar’s strict share-nothing model, do not avoid
atomics, and tend to only softly bind tasks to a core (e.g. tokio does work stealing).
Possibility of hybrid approaches (e.g. use Biased Reference Counting to avoid atomics
while avoiding pinning all memory to cores).
Seastar also has “alien threads” for mixing non-async code (Redpanda uses this for
Kerberos libs)
30
Trade offs
Using C++20 & Seastar is clear net benefit for Redpanda.
It might be right for you too if one or more of the following are true:
■ Starting a new project where high throughput and low latency are important
■ Does your work decompose into shard-affine units?
■ Do you need to scale to more than a few cores?
■ Is C++ your language of choice?
31
P99 Conf Template
#1A1047 #00E5FF
#753bf0 #FF2CDF
#2B53F9
Font usage Share Tech or Roboto
Color Palette
Table
Column 1 Column 2 Column 3 Column 5
Data 1 Data 2 Data 3 Data 4
Data 5 Data 6 Data 7 Data 8
#667EEA
33
What makes good high-throughput software?
Keep the disk/network fed with I/Os
Conform to the system’s topology
Not just high throughput: reliably low latency
Primary success metric: P99.9 latency
© 2023 REDPANDA DATA
Why Redpanda?
Fast
● 10x lower tail latency
vs Apache Kafka
● 6x faster transactions
● Written in C++ with
async, shared nothing
design
● No page cache, no
virtual memory
Easy
● Fully Kafka API-
compatible
● Single binary
● No JVM, No ZooKeeper
● Auto tuning & balancing
● Prometheus metrics
Efficient
● Thread-per-core
architecture
● Saturates your
infrastructure
● Extreme throughput
● Scales both vertically
and horizontally
Cost-Effective
● Reduces Kafka infra
costs by 6X
● Lower admin overhead
● Limitless data ingestion
and retention without
local disk
Coroutines and lifetimes: example 1
Real example helper function for constructing and writing a message batch, from PR #9154
ss::future<std::error_code> metadata::mark_clean(model::offset
clean_offset) {
// Construct a batch builder
auto builder = batch_start();
// Add one message
builder.mark_clean(clean_offset);
// Replicate using raft, return future for replication complete
return builder.replicate();
}
35
Coroutines and lifetimes: example 1
ss::future<std::error_code> metadata::mark_clean(model::offset clean_offset) {
auto builder = batch_start();
builder.mark_clean(clean_offset);
return builder.replicate();
// … builder falls out of scope here, the returned future still references it
}
// Imagine replicate() might generate a future that captures this
ss::future<> batch_builder::replicate() {
return something.then([this]{
// update some member variable here
});
}
36
Coroutines and lifetimes: example 1
ss::future<std::error_code> metadata::mark_clean(model::offset
clean_offset) {
auto builder = batch_start();
builder.mark_clean(clean_offset);
co_return co_await builder.replicate();
}
co_await futures inline -> ensure future completes before referenced object falls
out of scope.
Thank you coroutines! 🎉
37
Coroutines and lifetimes: example 2
// Print a string after a delay
seastar::future<> delayed_print(const std::string& msg) {
co_await seastar::sleep(100ms);
std::cout << "delayed_print: " << msg << std::endl;
}
// Print hello world after a delay
seastar::future<> delayed_hello_world(){
return delayed_print(std::string(“hello world!”)));
}
38
Coroutines and lifetimes: example 2
// Print a string after a delay
seastar::future<> delayed_print(const std::string& msg) {
co_await seastar::sleep(100ms);
std::cout << "delayed_print: " << msg << std::endl;
}
// Print hello world after a delay
seastar::future<> delayed_hello_world(){
return delayed_print(std::string(“hello world!”)));
}
39
Coroutines and lifetimes: example 2
// Print a string after a delay
seastar::future<> delayed_print(std::string msg) {
co_await seastar::sleep(100ms);
std::cout << "delayed_print: " << msg << std::endl;
}
// Print hello world after a delay
seastar::future<> delayed_hello_world(){
return delayed_print(std::string(“hello world!”)));
}
Pass by value is not expensive in this case: temporaries are rvalues, will be moved, not copied.
Always pass by value if you can, to avoid this class of issue.
40
P99 Conf Template
#1A1047
r26 g16 b71
c100 m100 y34 b45
Pantone 275c
#00E5FF
r0 g229 b255
c60 m0 y9 b0
#753bf0
r117 g59 b240
c79 m77 y0 b0
#FF2CDF
r255 g44 b223
C31 m78 y0 b0
#2B53F9
r38 g24 b250
c91 m75 y0 b0
Pantone 2727c
Color Palette Details
#667EEA
r102 g126 b234
c60 m35 y0 b5
Pantone 659c
P99 Conf Template
<here is some code>
<styling>
<use consolas for font when displaying code>
<don’t go below 12pt font size>
Slide title with white background
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu
faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula,
interdum blandit sem tortor eget dolor.
■ Bullet 1
● Bullet 2
■ Bullet 3
Slide title with white background
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu
faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula,
interdum blandit sem tortor eget dolor.
■ Bullet 1
● Bullet 2
■ Bullet 3
Hardware evolution
Hardware evolution
Not just CPUs:
■ Disk (SSD -> NVMe)
■ Network (100Gbps, 400Gbps)
Usually partitioned for virtualized
workloads
What if we want to run one high
throughput application on the whole
machine?
Hardware evolution
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu
faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula,
interdum blandit sem tortor eget dolor.
■ Bullet 1
● Bullet 2
■ Bullet 3
Slide title with white background
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu
faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula,
interdum blandit sem tortor eget dolor.
■ Bullet 1
● Bullet 2
■ Bullet 3
Slide title with white background
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu
faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula,
interdum blandit sem tortor eget dolor.
■ Bullet 1
● Bullet 2
■ Bullet 3
Centered Large Text
Clear slide for diagram with caption
First Last
email@address.com
@twitter
website/blog url
Thank you! Let’s connect.

Adventures in Thread-per-Core Async with Redpanda and Seastar

  • 1.
    HOSTED BY Adventures inThread-per-core Async with Redpanda & Seastar Travis Downs Software Engineer at Redpanda
  • 2.
    Travis Downs (He/Him) SoftwareEngineer at Redpanda ■ I love going deep on performance – all the way to assembly, if necessary ■ I’ve held principal staff positions at Salesforce & architect roles at SAP and Business Objects ■ I had hobbies like writing a software performance blog, but now I’m a parent, so…
  • 3.
    3 Redpanda in 60seconds Redpanda is a streaming storage engine Clients speak Apache Kafka API to Redpanda nodes to produce and consume from topic partitions. Partitions are logs (~10,000s per cluster) Each partition is a Raft group (~3 members) Scale up and scale out should be ~equivalent
  • 4.
  • 5.
    What is thread-per-core? Onethread per core and pinned: make scheduling decisions in userspace. This thread must not block. Question: how do we replace blocking calls? Answer: … 5
  • 6.
    Seastar was createdby the ScyllaDB project. Redpanda is built on Seastar. We 😍it. Shared nothing architecture made up of “shards”: ■ A CPU core ■ A pool of memory NUMA-local to that core ■ All-to-all mesh of SPSC message queues ■ Cooperative multitasking Seastar 6
  • 7.
  • 8.
    Continuation style ss::future<> consensus::stop(){ return _event_manager.stop() .then([this] { return _append_requests_buffer.stop(); }) .then([this] { return _batcher.stop(); }) .then([this] { return _bg.close(); }) .then([this] { if (likely(!_snapshot_writer)) { return ss::now(); } return _snapshot_writer->close().then( [this] { _snapshot_writer.reset(); }); }); }
  • 9.
    C++ coroutines seastar::future<std::string> my_coroutine(){ co_await seastar::sleep(100ms); // returns future<> co_return "hello world"; } New in C++ 20: three new keywords co_await co_yield co_return Language provides a future concept but not implementation: Seastar still defines the future/promise type. When compiler sees a co_* keyword, the function is rewritten to stash stack variables on the heap as needed to support suspension/resumption of execution.
  • 10.
    C++20 coroutines: after ss::future<>consensus::stop() { … co_await _event_manager.stop(); co_await _append_requests_buffer.stop(); co_await _batcher.stop(); _op_lock.broken(); co_await _bg.close(); if (unlikely(_snapshot_writer)) { co_await _snapshot_writer->close(); _snapshot_writer.reset(); } }
  • 11.
    New vs old ss::future<>consensus::stop() { … co_await _event_manager.stop(); co_await _append_requests_buffer.stop(); co_await _batcher.stop(); _op_lock.broken(); co_await _bg.close(); if (unlikely(_snapshot_writer)) { co_await _snapshot_writer->close(); _snapshot_writer.reset(); } } ss::future<> consensus::stop() { … return _event_manager.stop() .then([this] { return _append_requests_buffer.stop(); }) .then([this] { return _batcher.stop(); }) .then([this] { return _bg.close(); }) .then([this] { if (likely(!_snapshot_writer)) { return ss::now(); } return _snapshot_writer->close().then( [this] { _snapshot_writer.reset(); }); });
  • 12.
    Coroutine Performance Coroutine performancedepends on both on the framework implementing the promise type and the compiler Here we talk about seastar’s implementation and clang++ Preview: coroutines are not transparent when it comes to performance
  • 13.
    Frame allocations Observation: almostevery coroutine allocates Exception: if the compiler can statically prove the coro never suspends - No suspension points (co_await or co_yield) in the function - Suspension points is never reachable - Suspension point is reachable but never suspends
  • 14.
    Frame allocations 2 Thiscoroutine: - Never suspends - Never even executes co_await - ~200 instructions and ~80 cycles - Always allocates seastar::future<> empty_coro() { if (always_false) { co_await make_ready_future<>(); } }
  • 15.
    Case study: varintdecode Let’s look at a case study drawn from Redpanda code Decode an unsigned 32-bit varint 1-5 bytes and MSB of 0 indicates final byte Widely used in Kafka protocol (and other places)
  • 16.
    Case study: coroutinedecoder read1() is async Almost the same as the synchronous version Allocates once per decode auto coro_decode(input_stream& s) { detail::var_decoder decoder; while (true) { char c = co_await s.read1(); if (decoder.accept(c)) { break; } } co_return decoder.result(); }
  • 17.
  • 18.
  • 19.
  • 20.
    Case study: continuationdecoder Much harder to read (and write) Does not allocate Recursion is bounded by decoder auto cont_recurse(iobuf_reader& s, var_decoder decoder) { return s.read1().then([&s, decoder](char c) mutable { if (decoder.accept(c)) { return ss::make_ready_future<result_type> (decoder.result()); } return cont_recurse(s, decoder); }); }
  • 21.
    So is itfaster?
  • 22.
  • 23.
    Case study: mysterymethod 1 Optimistic approach Avoid any async machinery if possible Doubles the amount of code auto cont_tricky(iobuf_reader& s, var_decoder decoder) { auto f = s.read1(); while (f.available()) { if (decoder.accept(f.get())) { return decoder.result_as_future(); } f = s.read1(); } return std::move(f).then([&s, decoder](char c) mutable { if (decoder.accept(c)) { return decoder.result_as_future(); } return cont_tricky(s, decoder); }); }
  • 24.
    Case study: mysterymethod 2 Synchronous version Almost identical to coro version Speedup varies from 4x to 9x auto sync_decode(input_stream& s) { detail::var_decoder decoder; while (true) { char c = s.read1_sync(); if (decoder.accept(c)) { break; } } return decoder.result(); }
  • 25.
    Sync with asyncfallback So how should we really do this? Use sync with async fallback. Peek at 5 bytes, fallback if not available. Fallback must in own method! auto decode_fallback(iobuf_reader& s) { auto [buf, filled] = s.peek<5>(); if (filled) { auto result = decode_u32(buf.data()); s.skip(result.second); return ss::make_ready_future(result); } return coro_decode(s); }
  • 26.
    Performance Bottom Line Asyncis still cheap in the large - Context switches are 1,000s of cycles, large cache impact Very short coroutines may be expensive: consider continuations Continuations have a per-continuation cost: consider coroutines Consider sync vs async fallback Drive the above decisions via profiling
  • 27.
    Summary: are coroutines“async made easy”? ⚠️ C++ is not memory safe, and async makes it (even) easier to write a segfault with a careless reference. Sometimes coroutines help with this. ℹ️ Compiler bugs: LLVM is great and things get fixed fast, but coroutines are “early adopter” stage. Use the latest release! (e.g. llvm/llvm-project#51843). ℹ️ Performance: It’s complicated. ✅ Net win for maintainability, robust use of RAII, and open the door to future compiler optimization of async code. 27
  • 28.
  • 29.
  • 30.
    Alternatives Seastar is notthe only option for writing fast async code: ■ C++/asio ■ Rust/tokio ■ Various GC language options (goroutines, Java lightweight threads) Main difference: these do not adopt Seastar’s strict share-nothing model, do not avoid atomics, and tend to only softly bind tasks to a core (e.g. tokio does work stealing). Possibility of hybrid approaches (e.g. use Biased Reference Counting to avoid atomics while avoiding pinning all memory to cores). Seastar also has “alien threads” for mixing non-async code (Redpanda uses this for Kerberos libs) 30
  • 31.
    Trade offs Using C++20& Seastar is clear net benefit for Redpanda. It might be right for you too if one or more of the following are true: ■ Starting a new project where high throughput and low latency are important ■ Does your work decompose into shard-affine units? ■ Do you need to scale to more than a few cores? ■ Is C++ your language of choice? 31
  • 32.
    P99 Conf Template #1A1047#00E5FF #753bf0 #FF2CDF #2B53F9 Font usage Share Tech or Roboto Color Palette Table Column 1 Column 2 Column 3 Column 5 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 #667EEA
  • 33.
    33 What makes goodhigh-throughput software? Keep the disk/network fed with I/Os Conform to the system’s topology Not just high throughput: reliably low latency Primary success metric: P99.9 latency
  • 34.
    © 2023 REDPANDADATA Why Redpanda? Fast ● 10x lower tail latency vs Apache Kafka ● 6x faster transactions ● Written in C++ with async, shared nothing design ● No page cache, no virtual memory Easy ● Fully Kafka API- compatible ● Single binary ● No JVM, No ZooKeeper ● Auto tuning & balancing ● Prometheus metrics Efficient ● Thread-per-core architecture ● Saturates your infrastructure ● Extreme throughput ● Scales both vertically and horizontally Cost-Effective ● Reduces Kafka infra costs by 6X ● Lower admin overhead ● Limitless data ingestion and retention without local disk
  • 35.
    Coroutines and lifetimes:example 1 Real example helper function for constructing and writing a message batch, from PR #9154 ss::future<std::error_code> metadata::mark_clean(model::offset clean_offset) { // Construct a batch builder auto builder = batch_start(); // Add one message builder.mark_clean(clean_offset); // Replicate using raft, return future for replication complete return builder.replicate(); } 35
  • 36.
    Coroutines and lifetimes:example 1 ss::future<std::error_code> metadata::mark_clean(model::offset clean_offset) { auto builder = batch_start(); builder.mark_clean(clean_offset); return builder.replicate(); // … builder falls out of scope here, the returned future still references it } // Imagine replicate() might generate a future that captures this ss::future<> batch_builder::replicate() { return something.then([this]{ // update some member variable here }); } 36
  • 37.
    Coroutines and lifetimes:example 1 ss::future<std::error_code> metadata::mark_clean(model::offset clean_offset) { auto builder = batch_start(); builder.mark_clean(clean_offset); co_return co_await builder.replicate(); } co_await futures inline -> ensure future completes before referenced object falls out of scope. Thank you coroutines! 🎉 37
  • 38.
    Coroutines and lifetimes:example 2 // Print a string after a delay seastar::future<> delayed_print(const std::string& msg) { co_await seastar::sleep(100ms); std::cout << "delayed_print: " << msg << std::endl; } // Print hello world after a delay seastar::future<> delayed_hello_world(){ return delayed_print(std::string(“hello world!”))); } 38
  • 39.
    Coroutines and lifetimes:example 2 // Print a string after a delay seastar::future<> delayed_print(const std::string& msg) { co_await seastar::sleep(100ms); std::cout << "delayed_print: " << msg << std::endl; } // Print hello world after a delay seastar::future<> delayed_hello_world(){ return delayed_print(std::string(“hello world!”))); } 39
  • 40.
    Coroutines and lifetimes:example 2 // Print a string after a delay seastar::future<> delayed_print(std::string msg) { co_await seastar::sleep(100ms); std::cout << "delayed_print: " << msg << std::endl; } // Print hello world after a delay seastar::future<> delayed_hello_world(){ return delayed_print(std::string(“hello world!”))); } Pass by value is not expensive in this case: temporaries are rvalues, will be moved, not copied. Always pass by value if you can, to avoid this class of issue. 40
  • 41.
    P99 Conf Template #1A1047 r26g16 b71 c100 m100 y34 b45 Pantone 275c #00E5FF r0 g229 b255 c60 m0 y9 b0 #753bf0 r117 g59 b240 c79 m77 y0 b0 #FF2CDF r255 g44 b223 C31 m78 y0 b0 #2B53F9 r38 g24 b250 c91 m75 y0 b0 Pantone 2727c Color Palette Details #667EEA r102 g126 b234 c60 m35 y0 b5 Pantone 659c
  • 42.
    P99 Conf Template <hereis some code> <styling> <use consolas for font when displaying code> <don’t go below 12pt font size>
  • 43.
    Slide title withwhite background Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ■ Bullet 1 ● Bullet 2 ■ Bullet 3
  • 44.
    Slide title withwhite background Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ■ Bullet 1 ● Bullet 2 ■ Bullet 3
  • 45.
  • 46.
    Hardware evolution Not justCPUs: ■ Disk (SSD -> NVMe) ■ Network (100Gbps, 400Gbps) Usually partitioned for virtualized workloads What if we want to run one high throughput application on the whole machine?
  • 47.
    Hardware evolution Lorem ipsumdolor sit amet, consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ■ Bullet 1 ● Bullet 2 ■ Bullet 3
  • 48.
    Slide title withwhite background Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ■ Bullet 1 ● Bullet 2 ■ Bullet 3
  • 49.
    Slide title withwhite background Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ■ Bullet 1 ● Bullet 2 ■ Bullet 3
  • 50.
  • 55.
    Clear slide fordiagram with caption
  • 56.