SlideShare a Scribd company logo
Mathematics and development of fast
TLS handshakes
Alexander Krizhanovsky
Tempesta Technologies, Inc.
ak@tempesta-tech.com
Web content delivery & protection
2013: WAF development by request of Positive Technologies
“Visionar” from Gartner magic quadrant’15
●
Web attacks
●
L7 HTTP/HTTPS DDoS attacks
Nginx, HAProxy, etc. - perfect HTTP proxies,
not HTTP filters
Netfilter works in TCP/IP stack (softirq)
=> HTTP(S)/TCP/IP stack
Tempesta FW:
●
hybrid of HTTP accelerator & firewall
●
embedded into the Linux TCP/IP stack
Tempesta TLS
https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation
Part of Tempesta FW,
an open source
Application Delivery Controller
Open source alternative to
F5 BIG-IP or Fortinet ADC
“TLS CPS/TPS” is a
common specification for
network security appliances
& ADCs
Linux kernel TLS handshaks
Very fast light-weight Linux kernel implementation
●
...even for session resumption
●
there is modern research in the field
Resistant against DDoS on TLS handshakes (asymmetric DDoS)
Privileged address space for sensitive security data
●
Varnish: TLS is processed in separate process Hitch
http://varnish-cache.org/docs/trunk/phk/ssl.html
●
Resistance against attacks like CloudBleed
https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/
Why NIST p256?
ECDSA
●
RSA and NIST curves p256, p384, and p521 are the only allowed for
CA certificates
https://cabforum.org/baseline-requirements-documents/
●
P256 is the fastest NIST curve
●
P521 isn’t recommended by IANA
https://www.iana.org/assignments/tls-parameters/tls-parameters.xml#tls-parameters-8
●
RSA is slow and vulnerable to asymmetric DDoS
https://vincent.bernat.im/en/blog/2011-ssl-dos-mitigation
Curve25519
●
Much faster than NIST p256
●
In practice for ECDHE only
TLS libraries performance issues
Copies, memory
initializations/erasing,
memory comparisons
memcpy(), memset(),
memcmp() and their
constant-time analogs
Many dynamic allocations
Large data structures
Some math is outdated
12.79% libc-2.24.so _int_malloc
9.34% nginx(openssl) __ecp_nistz256_mul_montx
7.40% nginx(openssl) __ecp_nistz256_sqr_montx
3.54% nginx(openssl) sha256_block_data_order_avx2
2.87% nginx(openssl) ecp_nistz256_avx2_gather_w7
2.79% libc-2.24.so _int_free
2.49% libc-2.24.so malloc_consolidate
2.30% nginx(openssl) OPENSSL_cleanse
1.82% libc-2.24.so malloc
1.57% [kernel.kallsyms] do_syscall_64
1.45% libc-2.24.so free
1.32% nginx(openssl) ecp_nistz256_ord_sqr_montx
1.18% nginx(openssl) ecp_nistz256_point_doublex
1.12% nginx(openssl) __ecp_nistz256_sub_fromx
0.93% libc-2.24.so __memmove_avx_unaligned_erms
0.81% nginx(openssl) __ecp_nistz256_mul_by_2x
0.75% libc-2.24.so __memset_avx2_unaligned_erms
0.57% nginx(openssl) aesni_ecb_encrypt
0.54% nginx(openssl) ecp_nistz256_point_addx
0.54% nginx(openssl) EVP_MD_CTX_reset
0.50% [kernel.kallsyms] entry_SYSCALL_64
The source code
https://github.com/tempesta-tech/tempesta/tree/master/tls
Still in-progress: we implement some of the algorithms on our own
Initially the fork of mbed TLS 2.8.0 (https://tls.mbed.org/) - x40 faster!
●
very portable and easy to move into the kernel
●
cutting edge security
●
too many memory allocations (https://github.com/tempesta-tech/tempesta/issues/614)
●
big integer abstractions (https://github.com/tempesta-tech/tempesta/issues/1064)
●
inefficient algorithms, no architecture-specific implementations, ...
We also take parts from WolfSSL (https://github.com/wolfSSL/wolfssl/)
●
very fast, but not portable
●
security https://github.com/wolfSSL/wolfssl/issues/3184
ECDSA & ECDHE mathematics:
Tempesta TLS, OpenSSL, WolfSSL
OpenSSL 1.1.1h
256 bits ecdsa (nistp256) 36473 sign/s
256 bits ecdh (nistp256) 16620 op/s
WolfSSL (current master)
ECDSA 256 sign 43260 ops/sec (+19%)
ECDHE 256 agree 40878 ops/sec (+146%)
Tempesta TLS (full TLS handshake operation)
ECDSA sign (nistp256): ops/s=38393
ECDHE srv (nistp256): ops/s=13418
OpenSSL & WolfSSL don’t include ephemeral keys generation
(one more m * G operation)
Demo!
Tempesta TLS, Nginx-1.14.2/OpenSSL-1.1.1d, Nginx-1.17.8/WolfSSL
TLS 1.2
●
full handshakes
●
abbreviated handshakes
tls-perf
https://github.com/tempesta-tech/tls-perf
●
establish & drop many TLS connections in parallel
●
like TLS-THC-DOS, but faster, more flexible, more options
Data for proprietary vendors
BIG-IP is only 30-50% faster than Nginx/OpenSSL/DPDK
https://www.youtube.com/watch?v=Plv87h8GtLc
Avi Vantage (VMware) makes ~2000 handshakes/second per 1CPU
https://avinetworks.com/docs/latest/ssl-performance/
Why faster?
No memory allocations in run time
No context switches
No copies on socket I/O
Less message queues
Zero-copy handshakes state machine
https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation
State of the art cryptography mathematics
Elliptic curve cryptography
Secp256r1: y2
= x3
- 3x + b defined over the field GF(p)
p256 = 2256
– 2224
+ 2192
+ 296
- 1
The group law
● negatives: if P = (x, y),
then -P = (x, -y)
● addition: R = P + Q
● doubling: R = P + P = 2*P
ECDSA: k – secure random, G – known point
k * G is used for the signature
ECDH: d – private key, Q – public key
shared secret: d * Q
Point multiplication
OpenSSL: “Fast prime field elliptic-curve cryptography with 256-bit primes" by Gueron and Krasnov
Q = m * P - the most expensive elliptic curve operation
for i in bits(m):
Q ← point_double(Q)
if mi == 1:
Q ← point_add(Q, P)
Point multiplications in TLS handshake:
●
known point multiplication: precompute the table for doubled G
●
perfect forward secrecy ECDHE: generate keys G * d (d – random)
●
handshake: 2 known & 1 unknown point multiplications
Point representation and coordinate systems
http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html
“Analysis and optimization of elliptic-curve single-scalar multiplication”, Bernstein & Lange,
2007
Jacobian coordinates (rough estimations)
● conversion overhead: 39*M + 4*S + 3*I (for w(-indow) = 4)
● point addition (mixed) - 8*M + 3*S, doubling - 2*M + 4*S
Affine coordinates (rough estimations)
● point addition - 13*M + 4*S, doubling - 4*M + 5*S
NIST 256 bits, D = 256 / w = 64 Comb rounds (addition & doubling):
64 * (10*M + 7*S) << 64 * (17*M + 9*S)
Point addition
http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html#addition-add-2007-bl
ex. addition in Jacobian coordinates (cost: 11M + 5S)
A = (x1, y1, z1), B = (x2, y2, z2), then C = A + B = (x3, y3, z3) is
U1 = X1Z2
2
U2 = X2Z1
2
S1 = Y1Z2
3
S2 = Y2Z1
3
H = U2 - U1
R = S2 - S1
z3 = HZ1Z2
x3 = R2
- H3
– 2U1H2
y3 = (U1H2
- X3)R - S1H3
The cost
Modular multiplication (M) is the most expensive basic scalar operation
Modular squaring (S) faster than M, usually 0.8M (Montgomery)
(0.9 for optimized FIPS due to more expensive modular reduction)
Modular inversion (I) is very expensive, about 100M
Modular arithmetics
ex. prime field F(29)
addition: 17 + 20 = 8 since 37 mod 29 = 8
subtraction: 17 − 20 = 26 since −3 mod 29 = 26
multiplication: 17 * 20 = 21 since 340 mod 29 = 21
inversion: 17−1
= 12 since 17 · 12 mod 29 = 1
Montgomery reduction (the most used)
●
there is some overhead, but each modular operation is cheaper
FIPS reduction
●
Can be faster if small number of modular operations is used
●
There are optimization techniques,
e.g.“Low-Latency Elliptic Curve Scalar Multiplication” Bos, 2012
●
But still about 65% slower than Montgomery reduction
Montgomery multiplication in P256
“Montgomery Multiplication”, Henry S. Warren, Jr.
Fast 256-bit integer multiplication with modular reduction on P256
a, b < m (m - modulus P256)
Set n = 2256
Transform multipliers to Montgomery domain (overhead):
a’ = an mod m b’ = bn mod m
Fast multiplication with reduction: u = a’b’/n mod m
● compute only 256 bits of (a’b’ + (-m-1
a’b’ mod n)m)/n
● if u > m, then u ← u – m (unconditionally, carry as a mask)
Convert to ordinary number: v = un -1
mod m
The math layers
Different multiplication algorithms for fixed and unknown point
●
”Efficient Fixed Base Exponentiation and Scalar Multiplication based on a Multiplicative
Splitting Exponent Recoding”, Robert et al 2019
Point doubling and addition - everyone seems use the same algorithms
Jacobian coordinates: different modular inversion algorithms
●
“Fast constant-time gcd computation and modular inversion”, Bernstein et al, 2019
Modular reduction for scalar multiplications:
●
Montgomery has overhead vs FIPS speed: if we use less multiplcations it could make
sense to use different reduction method FIPS (seems deadend)
●
“Low-Latency Elliptic Curve Scalar Multiplication” Bos, 2012
=> Balance between all the layers
Example of math layers balancing
For w=5 we need 52 point additions for an unknown point multiplication
Jacobian coordinates addition takes 11M + 5S
Affine-Jacobian coordinates addition takes 8M + 3S
● about 4.4M cheaper if S = 0.8M
●
requires 2 coordinates normalizations for Comba precomputation
● coordinates normalization: 2w - 1
* (6M + 1S) + 1I
Almost the same for S = 0.9M and fast inversion I < 100M
Montgomery arithmetics (S = 0.8M):
●
ECDHE +28% and ECDSA +6% performance
Side channel attacks (SCA) resistance
Timing attacks, simple power analysis, differential power analysis etc.
Protections against SCAs:
●
Constant time algorithms
●
Dummy operations
●
Point randomization
●
e.g. modular inversion 741 vs 266 iterations
RDRAND allows to write faster non-constant time algorithms
●
SRBDS mitigation costs about 97% performance
https://software.intel.com/security-software-guidance/insights/processors-affected-special-register-buffe
r-data-sampling?fbclid=IwAR1ifj3ZuAtNOabKkj3vFItBLSvOnMqlxH2I-QeN5KB-aji54J1BCJa9lLk
https://www.phoronix.com/scan.php?page=news_item&px=RdRand-3-Percent&fbclid=IwAR2vmmR_Lir
oekUuw7KMRaHB7KThpqz0tIr1fX2GCW3HAvwt5Kb1p9xpLKo
Memory usage & SCA
ex. ECDSA precomputed table for fixed point multiplication
●
mbed TLS: ~8KB dynamically precomputed table, point
randomization, constant-time algorithm, full table scan
●
OpenSSL: ~150KB static table, full scan
●
WolfSSL: ~150KB, direct index access (fixed in the new version)
https://github.com/wolfSSL/wolfssl/issues/3184
=> 150KB is far larger than L1d cache size, so many cache misses:
Big Integers (aka MPIs)
“BigNum Math: Implementing Cryptographic Multiple Precision Arithmetic”, by Tom St Denis
All the libraries use them (not in hot paths), mbed TLS overuses them
linux/lib/mpi/, linux/include/linux/mpi.h
typedef unsigned long int mpi_limb_t;
struct gcry_mpi {
int alloced; /* array size (# of allocated limbs) */
int nlimbs; /* number of valid limbs */
int nbits; /* the real number of valid bits (info only) */
int sign; /* indicates a negative number */
unsigned flags;
mpi_limb_t *d; /* array with the limbs */
};
Need to manage variable-size integers
=> size-specific assembly implemetations
Easy assembly
// a := a + b
// x[0] is the less significant limb,
// x[1] is the most significant limb.
void s_mp_add(unsigned long *a, unsigned long *b) {
unsigned long carry;
a[0] += b[0];
carry = (a[0] < b[0]);
a[1] += b[1] + carry;
}
// Pointer to a is in %RDI, pointer to b is in %RSI
movq (%rdi), %r8
movq 8(%rdi), %r9
addq (%rsi), %r8 // add with carry
addc 8(%rsi), %r9 // use the carry in the next addition
movq (%r8), (%rdi)
movq (%r9), 8(%rdi)
Open questions and further research
Ice Lake CPUs have negligeable downclocking on AVX-512
https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html
Parallel Montgomery computations
J.W.Bos, "Montgomery Arithmetic from a Software Perspective", 2017
●
SIMD multiplications & squarings of two and more products
●
Interleaved Montgomery multiplications
Better methods for point multiplications
Going to the Linux kernel upstream
Details and the discussion
●
https://netdevconf.info/0x14/session.html?talk-performance-study-of-kernel-TLS-handshakes
●
https://github.com/tempesta-tech/tempesta/issues/1433
Server-side only
The full TLS handshake is in softirq (just like TCP)
Fallback to a user-space TLS library on ClientHello
Batches of handshakes in 1 FPU context
TODO
More cryptography mathematics performance optimizations
https://github.com/tempesta-tech/tempesta/issues/1064
https://github.com/tempesta-tech/tempesta/issues/1335
TLS 1.3
https://github.com/tempesta-tech/tempesta/issues/1031
Moving to the kernel asymmetric keys API
https://github.com/tempesta-tech/tempesta/issues/1332
The Linux kernel /crypto API performance issues
●
SHA-256 (crucial for TLS handshake) 30-100% slower than OpenSSL
https://github.com/tempesta-tech/tempesta/issues/1483
●
Extra copying and memory allocations in kTLS
https://github.com/tempesta-tech/tempesta/issues/1064
Netdev papers about Tempesta TLS
“Kernel HTTP/TCP/IP stack for HTTP DDoS mitigation”, Netdev 2.1,
https://netdevconf.info/2.1/session.html?krizhanovsky
“Kernel TLS handshakes for HTTPS DDoS mitigation”, Netdev 0x12,
https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation
“Performance study of kernel TLS handshakes”, Netdev 014,
https://netdevconf.info/0x14/session.html?talk-performance-study-of-kernel-TLS-handsh
akes
Thanks!
Contact us if you’re interested in fast Linux kernel TLS!
https://github.com/tempesta-tech/tempesta
ak@tempesta-tech.com

More Related Content

What's hot

【優秀修了】GCI Winter 2022 最終課題
【優秀修了】GCI Winter 2022 最終課題【優秀修了】GCI Winter 2022 最終課題
【優秀修了】GCI Winter 2022 最終課題
Takeru Abe
 
ゼロから作るDeepLearning 2~3章 輪読
ゼロから作るDeepLearning 2~3章 輪読ゼロから作るDeepLearning 2~3章 輪読
ゼロから作るDeepLearning 2~3章 輪読
KCS Keio Computer Society
 
逆説のスタートアップ思考的「逆張りワークショップ」手順書
逆説のスタートアップ思考的「逆張りワークショップ」手順書逆説のスタートアップ思考的「逆張りワークショップ」手順書
逆説のスタートアップ思考的「逆張りワークショップ」手順書
Takaaki Umada
 
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
智啓 出川
 
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
schoowebcampus
 
【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析
KazuyaYagihashi
 
Design Sprint Process / デザインスプリントの実際のプロセスについて
Design Sprint Process / デザインスプリントの実際のプロセスについてDesign Sprint Process / デザインスプリントの実際のプロセスについて
Design Sprint Process / デザインスプリントの実際のプロセスについて
Takaaki Umada
 
on log messages
on log messageson log messages
on log messages
Laurence Chen
 
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライドNLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
JunSuzuki21
 
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
Takaaki Umada
 
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
Megagon Labs
 
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
YoshiakiSeigenji
 
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
Yahoo!デベロッパーネットワーク
 
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み )
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み)Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み)
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み )
Takaaki Umada
 
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
Takaaki Umada
 
Design Sprint 概要 / デザインスプリント概要
Design Sprint 概要 / デザインスプリント概要Design Sprint 概要 / デザインスプリント概要
Design Sprint 概要 / デザインスプリント概要
Takaaki Umada
 
Vision driven workshop-ビジョンを具現化する技法
Vision driven workshop-ビジョンを具現化する技法Vision driven workshop-ビジョンを具現化する技法
Vision driven workshop-ビジョンを具現化する技法
kunitake saso
 
機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み
Shintaro Fukushima
 
ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活
Takaaki Umada
 
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
Hideo Hirose
 

What's hot (20)

【優秀修了】GCI Winter 2022 最終課題
【優秀修了】GCI Winter 2022 最終課題【優秀修了】GCI Winter 2022 最終課題
【優秀修了】GCI Winter 2022 最終課題
 
ゼロから作るDeepLearning 2~3章 輪読
ゼロから作るDeepLearning 2~3章 輪読ゼロから作るDeepLearning 2~3章 輪読
ゼロから作るDeepLearning 2~3章 輪読
 
逆説のスタートアップ思考的「逆張りワークショップ」手順書
逆説のスタートアップ思考的「逆張りワークショップ」手順書逆説のスタートアップ思考的「逆張りワークショップ」手順書
逆説のスタートアップ思考的「逆張りワークショップ」手順書
 
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
GPGPU Seminar (Accelerataion of Lattice Boltzmann Method using CUDA Fortran)
 
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
200以上のwebサービス事例から見えてきた鉄板グロースハック ~傾向と対策~ 先生:須藤 憲司
 
【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析
 
Design Sprint Process / デザインスプリントの実際のプロセスについて
Design Sprint Process / デザインスプリントの実際のプロセスについてDesign Sprint Process / デザインスプリントの実際のプロセスについて
Design Sprint Process / デザインスプリントの実際のプロセスについて
 
on log messages
on log messageson log messages
on log messages
 
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライドNLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
NLP2023 緊急パネル:ChatGPTで自然言語処理は終わるのか? 説明スライド
 
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
変革のためのスタートアップ思考 (2) / スタートアップの考え方を新規事業創出とスタートアップ連携に活かす
 
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
GiNZAで始める日本語依存構造解析 〜CaboCha, UDPipe, Stanford NLPとの比較〜
 
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
ChatGPTプロンプトエンジニアリング試験の話 (230619 ChatGPT触りまくってるBizのためのMeetup#1)
 
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
世界最速クラスのベクトル近傍検索NGTと深層学習によるファッション検索への応用事例 / YJTC19 in Shibuya B-3 #yjtc
 
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み )
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み)Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み)
Y Combinator に学ぶスタートアップ強化プログラム (3 か月間でスタートアップを成長させる Accelerator Program の仕組み )
 
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
変革のためのスタートアップ思考 (1) / スタートアップの考え方を理解する
 
Design Sprint 概要 / デザインスプリント概要
Design Sprint 概要 / デザインスプリント概要Design Sprint 概要 / デザインスプリント概要
Design Sprint 概要 / デザインスプリント概要
 
Vision driven workshop-ビジョンを具現化する技法
Vision driven workshop-ビジョンを具現化する技法Vision driven workshop-ビジョンを具現化する技法
Vision driven workshop-ビジョンを具現化する技法
 
機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み
 
ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活ゼロからはじめるプロダクトマネージャー生活
ゼロからはじめるプロダクトマネージャー生活
 
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
漸近理論をスライド1枚で(フォローアッププログラムクラス講義07132016)
 

Similar to Mathematics and development of fast TLS handshakes

Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Intel® Software
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Facultad de Informática UCM
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
prithan
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
Mikhail Afanasov
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
Davide Carboni
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
IJERA Editor
 
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
RSIS International
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Shuai Yuan
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone? DVClub
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 

Similar to Mathematics and development of fast TLS handshakes (20)

Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
Js2517181724
Js2517181724Js2517181724
Js2517181724
 
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone?
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 

Recently uploaded

Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 

Recently uploaded (20)

Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 

Mathematics and development of fast TLS handshakes

  • 1. Mathematics and development of fast TLS handshakes Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  • 2. Web content delivery & protection 2013: WAF development by request of Positive Technologies “Visionar” from Gartner magic quadrant’15 ● Web attacks ● L7 HTTP/HTTPS DDoS attacks Nginx, HAProxy, etc. - perfect HTTP proxies, not HTTP filters Netfilter works in TCP/IP stack (softirq) => HTTP(S)/TCP/IP stack Tempesta FW: ● hybrid of HTTP accelerator & firewall ● embedded into the Linux TCP/IP stack
  • 3. Tempesta TLS https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation Part of Tempesta FW, an open source Application Delivery Controller Open source alternative to F5 BIG-IP or Fortinet ADC “TLS CPS/TPS” is a common specification for network security appliances & ADCs
  • 4. Linux kernel TLS handshaks Very fast light-weight Linux kernel implementation ● ...even for session resumption ● there is modern research in the field Resistant against DDoS on TLS handshakes (asymmetric DDoS) Privileged address space for sensitive security data ● Varnish: TLS is processed in separate process Hitch http://varnish-cache.org/docs/trunk/phk/ssl.html ● Resistance against attacks like CloudBleed https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/
  • 5. Why NIST p256? ECDSA ● RSA and NIST curves p256, p384, and p521 are the only allowed for CA certificates https://cabforum.org/baseline-requirements-documents/ ● P256 is the fastest NIST curve ● P521 isn’t recommended by IANA https://www.iana.org/assignments/tls-parameters/tls-parameters.xml#tls-parameters-8 ● RSA is slow and vulnerable to asymmetric DDoS https://vincent.bernat.im/en/blog/2011-ssl-dos-mitigation Curve25519 ● Much faster than NIST p256 ● In practice for ECDHE only
  • 6. TLS libraries performance issues Copies, memory initializations/erasing, memory comparisons memcpy(), memset(), memcmp() and their constant-time analogs Many dynamic allocations Large data structures Some math is outdated 12.79% libc-2.24.so _int_malloc 9.34% nginx(openssl) __ecp_nistz256_mul_montx 7.40% nginx(openssl) __ecp_nistz256_sqr_montx 3.54% nginx(openssl) sha256_block_data_order_avx2 2.87% nginx(openssl) ecp_nistz256_avx2_gather_w7 2.79% libc-2.24.so _int_free 2.49% libc-2.24.so malloc_consolidate 2.30% nginx(openssl) OPENSSL_cleanse 1.82% libc-2.24.so malloc 1.57% [kernel.kallsyms] do_syscall_64 1.45% libc-2.24.so free 1.32% nginx(openssl) ecp_nistz256_ord_sqr_montx 1.18% nginx(openssl) ecp_nistz256_point_doublex 1.12% nginx(openssl) __ecp_nistz256_sub_fromx 0.93% libc-2.24.so __memmove_avx_unaligned_erms 0.81% nginx(openssl) __ecp_nistz256_mul_by_2x 0.75% libc-2.24.so __memset_avx2_unaligned_erms 0.57% nginx(openssl) aesni_ecb_encrypt 0.54% nginx(openssl) ecp_nistz256_point_addx 0.54% nginx(openssl) EVP_MD_CTX_reset 0.50% [kernel.kallsyms] entry_SYSCALL_64
  • 7. The source code https://github.com/tempesta-tech/tempesta/tree/master/tls Still in-progress: we implement some of the algorithms on our own Initially the fork of mbed TLS 2.8.0 (https://tls.mbed.org/) - x40 faster! ● very portable and easy to move into the kernel ● cutting edge security ● too many memory allocations (https://github.com/tempesta-tech/tempesta/issues/614) ● big integer abstractions (https://github.com/tempesta-tech/tempesta/issues/1064) ● inefficient algorithms, no architecture-specific implementations, ... We also take parts from WolfSSL (https://github.com/wolfSSL/wolfssl/) ● very fast, but not portable ● security https://github.com/wolfSSL/wolfssl/issues/3184
  • 8. ECDSA & ECDHE mathematics: Tempesta TLS, OpenSSL, WolfSSL OpenSSL 1.1.1h 256 bits ecdsa (nistp256) 36473 sign/s 256 bits ecdh (nistp256) 16620 op/s WolfSSL (current master) ECDSA 256 sign 43260 ops/sec (+19%) ECDHE 256 agree 40878 ops/sec (+146%) Tempesta TLS (full TLS handshake operation) ECDSA sign (nistp256): ops/s=38393 ECDHE srv (nistp256): ops/s=13418 OpenSSL & WolfSSL don’t include ephemeral keys generation (one more m * G operation)
  • 9. Demo! Tempesta TLS, Nginx-1.14.2/OpenSSL-1.1.1d, Nginx-1.17.8/WolfSSL TLS 1.2 ● full handshakes ● abbreviated handshakes tls-perf https://github.com/tempesta-tech/tls-perf ● establish & drop many TLS connections in parallel ● like TLS-THC-DOS, but faster, more flexible, more options
  • 10. Data for proprietary vendors BIG-IP is only 30-50% faster than Nginx/OpenSSL/DPDK https://www.youtube.com/watch?v=Plv87h8GtLc Avi Vantage (VMware) makes ~2000 handshakes/second per 1CPU https://avinetworks.com/docs/latest/ssl-performance/
  • 11. Why faster? No memory allocations in run time No context switches No copies on socket I/O Less message queues Zero-copy handshakes state machine https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation State of the art cryptography mathematics
  • 12. Elliptic curve cryptography Secp256r1: y2 = x3 - 3x + b defined over the field GF(p) p256 = 2256 – 2224 + 2192 + 296 - 1 The group law ● negatives: if P = (x, y), then -P = (x, -y) ● addition: R = P + Q ● doubling: R = P + P = 2*P ECDSA: k – secure random, G – known point k * G is used for the signature ECDH: d – private key, Q – public key shared secret: d * Q
  • 13. Point multiplication OpenSSL: “Fast prime field elliptic-curve cryptography with 256-bit primes" by Gueron and Krasnov Q = m * P - the most expensive elliptic curve operation for i in bits(m): Q ← point_double(Q) if mi == 1: Q ← point_add(Q, P) Point multiplications in TLS handshake: ● known point multiplication: precompute the table for doubled G ● perfect forward secrecy ECDHE: generate keys G * d (d – random) ● handshake: 2 known & 1 unknown point multiplications
  • 14. Point representation and coordinate systems http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html “Analysis and optimization of elliptic-curve single-scalar multiplication”, Bernstein & Lange, 2007 Jacobian coordinates (rough estimations) ● conversion overhead: 39*M + 4*S + 3*I (for w(-indow) = 4) ● point addition (mixed) - 8*M + 3*S, doubling - 2*M + 4*S Affine coordinates (rough estimations) ● point addition - 13*M + 4*S, doubling - 4*M + 5*S NIST 256 bits, D = 256 / w = 64 Comb rounds (addition & doubling): 64 * (10*M + 7*S) << 64 * (17*M + 9*S)
  • 15. Point addition http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html#addition-add-2007-bl ex. addition in Jacobian coordinates (cost: 11M + 5S) A = (x1, y1, z1), B = (x2, y2, z2), then C = A + B = (x3, y3, z3) is U1 = X1Z2 2 U2 = X2Z1 2 S1 = Y1Z2 3 S2 = Y2Z1 3 H = U2 - U1 R = S2 - S1 z3 = HZ1Z2 x3 = R2 - H3 – 2U1H2 y3 = (U1H2 - X3)R - S1H3
  • 16. The cost Modular multiplication (M) is the most expensive basic scalar operation Modular squaring (S) faster than M, usually 0.8M (Montgomery) (0.9 for optimized FIPS due to more expensive modular reduction) Modular inversion (I) is very expensive, about 100M
  • 17. Modular arithmetics ex. prime field F(29) addition: 17 + 20 = 8 since 37 mod 29 = 8 subtraction: 17 − 20 = 26 since −3 mod 29 = 26 multiplication: 17 * 20 = 21 since 340 mod 29 = 21 inversion: 17−1 = 12 since 17 · 12 mod 29 = 1 Montgomery reduction (the most used) ● there is some overhead, but each modular operation is cheaper FIPS reduction ● Can be faster if small number of modular operations is used ● There are optimization techniques, e.g.“Low-Latency Elliptic Curve Scalar Multiplication” Bos, 2012 ● But still about 65% slower than Montgomery reduction
  • 18. Montgomery multiplication in P256 “Montgomery Multiplication”, Henry S. Warren, Jr. Fast 256-bit integer multiplication with modular reduction on P256 a, b < m (m - modulus P256) Set n = 2256 Transform multipliers to Montgomery domain (overhead): a’ = an mod m b’ = bn mod m Fast multiplication with reduction: u = a’b’/n mod m ● compute only 256 bits of (a’b’ + (-m-1 a’b’ mod n)m)/n ● if u > m, then u ← u – m (unconditionally, carry as a mask) Convert to ordinary number: v = un -1 mod m
  • 19. The math layers Different multiplication algorithms for fixed and unknown point ● ”Efficient Fixed Base Exponentiation and Scalar Multiplication based on a Multiplicative Splitting Exponent Recoding”, Robert et al 2019 Point doubling and addition - everyone seems use the same algorithms Jacobian coordinates: different modular inversion algorithms ● “Fast constant-time gcd computation and modular inversion”, Bernstein et al, 2019 Modular reduction for scalar multiplications: ● Montgomery has overhead vs FIPS speed: if we use less multiplcations it could make sense to use different reduction method FIPS (seems deadend) ● “Low-Latency Elliptic Curve Scalar Multiplication” Bos, 2012 => Balance between all the layers
  • 20. Example of math layers balancing For w=5 we need 52 point additions for an unknown point multiplication Jacobian coordinates addition takes 11M + 5S Affine-Jacobian coordinates addition takes 8M + 3S ● about 4.4M cheaper if S = 0.8M ● requires 2 coordinates normalizations for Comba precomputation ● coordinates normalization: 2w - 1 * (6M + 1S) + 1I Almost the same for S = 0.9M and fast inversion I < 100M Montgomery arithmetics (S = 0.8M): ● ECDHE +28% and ECDSA +6% performance
  • 21. Side channel attacks (SCA) resistance Timing attacks, simple power analysis, differential power analysis etc. Protections against SCAs: ● Constant time algorithms ● Dummy operations ● Point randomization ● e.g. modular inversion 741 vs 266 iterations RDRAND allows to write faster non-constant time algorithms ● SRBDS mitigation costs about 97% performance https://software.intel.com/security-software-guidance/insights/processors-affected-special-register-buffe r-data-sampling?fbclid=IwAR1ifj3ZuAtNOabKkj3vFItBLSvOnMqlxH2I-QeN5KB-aji54J1BCJa9lLk https://www.phoronix.com/scan.php?page=news_item&px=RdRand-3-Percent&fbclid=IwAR2vmmR_Lir oekUuw7KMRaHB7KThpqz0tIr1fX2GCW3HAvwt5Kb1p9xpLKo
  • 22. Memory usage & SCA ex. ECDSA precomputed table for fixed point multiplication ● mbed TLS: ~8KB dynamically precomputed table, point randomization, constant-time algorithm, full table scan ● OpenSSL: ~150KB static table, full scan ● WolfSSL: ~150KB, direct index access (fixed in the new version) https://github.com/wolfSSL/wolfssl/issues/3184 => 150KB is far larger than L1d cache size, so many cache misses:
  • 23. Big Integers (aka MPIs) “BigNum Math: Implementing Cryptographic Multiple Precision Arithmetic”, by Tom St Denis All the libraries use them (not in hot paths), mbed TLS overuses them linux/lib/mpi/, linux/include/linux/mpi.h typedef unsigned long int mpi_limb_t; struct gcry_mpi { int alloced; /* array size (# of allocated limbs) */ int nlimbs; /* number of valid limbs */ int nbits; /* the real number of valid bits (info only) */ int sign; /* indicates a negative number */ unsigned flags; mpi_limb_t *d; /* array with the limbs */ }; Need to manage variable-size integers => size-specific assembly implemetations
  • 24. Easy assembly // a := a + b // x[0] is the less significant limb, // x[1] is the most significant limb. void s_mp_add(unsigned long *a, unsigned long *b) { unsigned long carry; a[0] += b[0]; carry = (a[0] < b[0]); a[1] += b[1] + carry; } // Pointer to a is in %RDI, pointer to b is in %RSI movq (%rdi), %r8 movq 8(%rdi), %r9 addq (%rsi), %r8 // add with carry addc 8(%rsi), %r9 // use the carry in the next addition movq (%r8), (%rdi) movq (%r9), 8(%rdi)
  • 25. Open questions and further research Ice Lake CPUs have negligeable downclocking on AVX-512 https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html Parallel Montgomery computations J.W.Bos, "Montgomery Arithmetic from a Software Perspective", 2017 ● SIMD multiplications & squarings of two and more products ● Interleaved Montgomery multiplications Better methods for point multiplications
  • 26. Going to the Linux kernel upstream Details and the discussion ● https://netdevconf.info/0x14/session.html?talk-performance-study-of-kernel-TLS-handshakes ● https://github.com/tempesta-tech/tempesta/issues/1433 Server-side only The full TLS handshake is in softirq (just like TCP) Fallback to a user-space TLS library on ClientHello Batches of handshakes in 1 FPU context
  • 27. TODO More cryptography mathematics performance optimizations https://github.com/tempesta-tech/tempesta/issues/1064 https://github.com/tempesta-tech/tempesta/issues/1335 TLS 1.3 https://github.com/tempesta-tech/tempesta/issues/1031 Moving to the kernel asymmetric keys API https://github.com/tempesta-tech/tempesta/issues/1332 The Linux kernel /crypto API performance issues ● SHA-256 (crucial for TLS handshake) 30-100% slower than OpenSSL https://github.com/tempesta-tech/tempesta/issues/1483 ● Extra copying and memory allocations in kTLS https://github.com/tempesta-tech/tempesta/issues/1064
  • 28. Netdev papers about Tempesta TLS “Kernel HTTP/TCP/IP stack for HTTP DDoS mitigation”, Netdev 2.1, https://netdevconf.info/2.1/session.html?krizhanovsky “Kernel TLS handshakes for HTTPS DDoS mitigation”, Netdev 0x12, https://netdevconf.info/0x12/session.html?kernel-tls-handhakes-for-https-ddos-mitigation “Performance study of kernel TLS handshakes”, Netdev 014, https://netdevconf.info/0x14/session.html?talk-performance-study-of-kernel-TLS-handsh akes
  • 29. Thanks! Contact us if you’re interested in fast Linux kernel TLS! https://github.com/tempesta-tech/tempesta ak@tempesta-tech.com