Arnaud Bouchez - Synopse
Frameworks
Tuning
Arnaud Bouchez
• Open Source Founder
mORMot 2
SynPDF, dmustache
• Modern Delphi and FPC
DDD, SOA, ORM, MVC
Performance, SOLID
• Synopse
https://synopse.info
Arnaud Bouchez
dev @ https://tranquil.it/wapt
SW deployment Windows updates
IT Inventory
Arnaud Bouchez
dev @ https://tranquil.it/wapt
+7000 +400 +1 300 000
sw packages customers pc equipped
Frameworks Tuning
Frameworks Tuning
not expressiveness (yesterday)
not exhaustive
not Delphi centric
Languages & Frameworks
Performance Bottlenecks
• The Web Server
• The Database layer
• The threading/execution Model
• The RTL (JSON, heap…)
Disclaimer:
slide borrowed from yesterday’s session
Menu du jour
• The TFB Challenge
• Async Web Server
• Async Database Access
• JSON, RTTI, Mustache
The TFB Challenge
https://www.techempower.com/benchmarks
The TFB Challenge
The TFB Challenge
Web Frameworks Benchmarks
• Since 2013, a collaborative project
• Now one official round per year
• Hundredths of frameworks tested
• Seven web /endpoints tested
• 24/7 continuous tests on dedicated HW
The TFB Challenge
One official round per year
• Round 22 is just finished
• mORMot is #12 over 301 frameworks !
(with current weighting)
The TFB Challenge
Hundredths of frameworks tested
The TFB Challenge
Seven web /endpoints tested
/plaintext
/json
/db
/query?queries=###
/cached_queries?count=###
/update?queries=###
/fortunes
The TFB Challenge
Seven web /endpoints tested
• Reproduce on my dev laptop
• Compiled on Linux with Lazarus
• Run locally on the very same machine
The TFB Challenge
Gimme Numbers: Requests Per Sec
The TFB Challenge
endpoint rps% vs #1
/plaintext 99.4%
/json 91.8%
/db 59.7%
/query?queries=### 89.7%
/cached_queries?count=### 99.8%
/update?queries=### 81.5%
/fortunes 59.5%
The TFB Challenge
endpoint rps% vs #1
/plaintext 99.4%
/json 91.8%
/db single SELECT potential 59.7%
/query?queries=### 89.7%
/cached_queries?count=### 99.8%
/update?queries=### 81.5%
/fortunes 59.5%
The TFB Challenge
24/7 continuous tests on big HW
Latest runs are available at
https://tfb-status.techempower.com
The TFB Challenge
24/7 continuous tests on big HW
• Several strategies to fill all CPU cores
pin to CPU, several bound servers, threads
• Several strategies for database access
ORM, direct, async
• A lot of test & trials
guessing is usually wrong
no definitive/logical rules
The TFB Challenge
Demanding but not realistic
• Measured with wrk over a single endpoint
• Such perfect/optimal network does not exist
• Very small dataset
• No long-running tests
• No memory consumption
Async Web Server
Async Web Server
Several Web Servers
• THttpServer
1 thread per HTTP/1.1 client
thread pool for HTTP/1.0 requests (proxy)
• THttpApiServer
Windows-specific – using http.sys API
• THttpAsyncServer
thread pool for all requests
Async Web Server
Several WebSockets Servers
• TWebSocketServerRest
1 thread per WebSockets client
• THttpApiWebSocketServer
Windows-specific – using http.sys API
• TWebSocketAsyncServerRest
thread pool for all requests
Async Web Server
THttpAsyncServer
• Non-blocking read/write state machine
Using epoll API on Linux
• Based on abstract THttpAsyncConnections
HTTP-over-TCP per-connection architecture
• Optional TLS layer
with OpenSSL or Windows SSPI
and ACME/Let’s Encrypt built-in support
Async Web Server
THttpAsyncServer
• Can sustain thousands of concurrent clients
• With minimal memory/cpu consumption
• Exist as TWebSocketAsyncServerRest flavor
• THttpServer may be more reactive
for a few connections
Async Web Server
THttpAsyncServer
• Responses are computed in a thread pool
with regular blocking end-user code
• Optionally returned out-of-order
e.g. for asynchronous DB operations
• Only wake up threads if needed
to avoid syscalls on small queries
Async Web Server
THttpAsyncServer
• Avoid memory allocation
e.g. HTTP buffers reuse between connections
optional string interning of HTTP headers
• Non-blocking process
smallest possible granularity locks
Async Web Server
THttpAsyncServer
• Minimized syscalls
profiled with strace on Linux
 libc vDSO for clock_gettime()
 our own light locks with no mutex
 wakeup threads using eventfd() – only if needed
 per-second cache of date/time or timeout ticks
Async Web Server
THttpAsyncServer
• Minimized syscalls
profiled with strace on Linux
• Less optimized on Windows
Async Web Server
THttpAsyncServer
• Very efficient URI routing
with O(1) parsing of routes and parameters
• Can use the RTTI over methods
to define the endpoints
• Can redirect/rewrite URI before a REST layer
Async Web Server
THttpAsyncServer
• Perfect for the TFB use-case
top of /plaintext or /cached_queries
• Has been tuned to increase TFB numbers
also noticeable on other HW or OS
Async Web Server
mORMot 2 TFB Sample
https://github.com/synopse/mORMot2/tree/
master/ex/techempower-bench
Async Database Access
Async Database Access
PostgreSQL
• The Database of choice for TFB
• The Database of choice for most projects
Async Database Access
mormot.db.sql.postgres.pas
• Direct libpq client access
• Written from scratch
• Using mormot.db.sql.pas simplified design
cached statements with no TDataSet overhead
• Optional array binding
• Optional pipelined mode support
• Optional asynchronous / non-blocking API
Async Database Access
mORMot 2 TFB Sample
https://github.com/synopse/mORMot2/tree/
master/ex/techempower-bench
JSON, RTTI, Mustache
JSON, RTTI, Mustache
JSON
• mORMot is natively UTF-8 and JSON
on all platforms and compilers (even Delphi 7)
• mORMot 2 JSON core has been rewritten
for performance
JSON, RTTI, Mustache
Several JSON parsers
• SAX approach
• TDocVariant
• TOrmTableJson
• TDynArray
JSON, RTTI, Mustache
Several JSON parsers Delphi XE8, Win32
• SAX approach 725 MB/s
• TDocVariant 117 MB/s
• TOrmTableJson 496 MB/s
• TDynArray 332 MB/s
JSON, RTTI, Mustache
Several JSON parsers Delphi XE8, Win32
• SAX approach 725 MB/s
• TDocVariant 117 MB/s
• TOrmTableJson 496 MB/s
• TDynArray 332 MB/s
• Delphi JSON 6 MB/s
• JsonDataObjects 103 MB/s
• SuperObject 35 MB/s
• Grijjy 54 MB/s
• dwsJSON 97 MB/s
JSON, RTTI, Mustache
RTTI
• mORMot has its own RTTI cache
on all platforms and compilers
• mORMot 2 RTTI core has been rewritten
for performance and maintainability
(mORMot 1 did have duplicated logic)
JSON, RTTI, Mustache
Mustache
• mORMot has its own Mustache renderer
• mORMot 2 Mustache
can work directly on in-memory data
instead of TDocVariant containers
JSON, RTTI, Mustache
mORMot 2 TFB Sample
https://github.com/synopse/mORMot2/tree/
master/ex/techempower-bench
Frameworks Tuning
Questions? Wishes?
Opinions? Reactions?
No Marmots Were Harmed in the Making of This Session

EKON27-FrameworksTuning.pdf

  • 1.
    Arnaud Bouchez -Synopse Frameworks Tuning
  • 2.
    Arnaud Bouchez • OpenSource Founder mORMot 2 SynPDF, dmustache • Modern Delphi and FPC DDD, SOA, ORM, MVC Performance, SOLID • Synopse https://synopse.info
  • 3.
    Arnaud Bouchez dev @https://tranquil.it/wapt SW deployment Windows updates IT Inventory
  • 4.
    Arnaud Bouchez dev @https://tranquil.it/wapt +7000 +400 +1 300 000 sw packages customers pc equipped
  • 5.
  • 6.
    Frameworks Tuning not expressiveness(yesterday) not exhaustive not Delphi centric
  • 7.
    Languages & Frameworks PerformanceBottlenecks • The Web Server • The Database layer • The threading/execution Model • The RTL (JSON, heap…) Disclaimer: slide borrowed from yesterday’s session
  • 8.
    Menu du jour •The TFB Challenge • Async Web Server • Async Database Access • JSON, RTTI, Mustache
  • 9.
  • 10.
  • 11.
    The TFB Challenge WebFrameworks Benchmarks • Since 2013, a collaborative project • Now one official round per year • Hundredths of frameworks tested • Seven web /endpoints tested • 24/7 continuous tests on dedicated HW
  • 12.
    The TFB Challenge Oneofficial round per year • Round 22 is just finished • mORMot is #12 over 301 frameworks ! (with current weighting)
  • 13.
    The TFB Challenge Hundredthsof frameworks tested
  • 14.
    The TFB Challenge Sevenweb /endpoints tested /plaintext /json /db /query?queries=### /cached_queries?count=### /update?queries=### /fortunes
  • 15.
    The TFB Challenge Sevenweb /endpoints tested • Reproduce on my dev laptop • Compiled on Linux with Lazarus • Run locally on the very same machine
  • 16.
    The TFB Challenge GimmeNumbers: Requests Per Sec
  • 17.
    The TFB Challenge endpointrps% vs #1 /plaintext 99.4% /json 91.8% /db 59.7% /query?queries=### 89.7% /cached_queries?count=### 99.8% /update?queries=### 81.5% /fortunes 59.5%
  • 18.
    The TFB Challenge endpointrps% vs #1 /plaintext 99.4% /json 91.8% /db single SELECT potential 59.7% /query?queries=### 89.7% /cached_queries?count=### 99.8% /update?queries=### 81.5% /fortunes 59.5%
  • 19.
    The TFB Challenge 24/7continuous tests on big HW Latest runs are available at https://tfb-status.techempower.com
  • 20.
    The TFB Challenge 24/7continuous tests on big HW • Several strategies to fill all CPU cores pin to CPU, several bound servers, threads • Several strategies for database access ORM, direct, async • A lot of test & trials guessing is usually wrong no definitive/logical rules
  • 21.
    The TFB Challenge Demandingbut not realistic • Measured with wrk over a single endpoint • Such perfect/optimal network does not exist • Very small dataset • No long-running tests • No memory consumption
  • 22.
  • 23.
    Async Web Server SeveralWeb Servers • THttpServer 1 thread per HTTP/1.1 client thread pool for HTTP/1.0 requests (proxy) • THttpApiServer Windows-specific – using http.sys API • THttpAsyncServer thread pool for all requests
  • 24.
    Async Web Server SeveralWebSockets Servers • TWebSocketServerRest 1 thread per WebSockets client • THttpApiWebSocketServer Windows-specific – using http.sys API • TWebSocketAsyncServerRest thread pool for all requests
  • 25.
    Async Web Server THttpAsyncServer •Non-blocking read/write state machine Using epoll API on Linux • Based on abstract THttpAsyncConnections HTTP-over-TCP per-connection architecture • Optional TLS layer with OpenSSL or Windows SSPI and ACME/Let’s Encrypt built-in support
  • 26.
    Async Web Server THttpAsyncServer •Can sustain thousands of concurrent clients • With minimal memory/cpu consumption • Exist as TWebSocketAsyncServerRest flavor • THttpServer may be more reactive for a few connections
  • 27.
    Async Web Server THttpAsyncServer •Responses are computed in a thread pool with regular blocking end-user code • Optionally returned out-of-order e.g. for asynchronous DB operations • Only wake up threads if needed to avoid syscalls on small queries
  • 28.
    Async Web Server THttpAsyncServer •Avoid memory allocation e.g. HTTP buffers reuse between connections optional string interning of HTTP headers • Non-blocking process smallest possible granularity locks
  • 29.
    Async Web Server THttpAsyncServer •Minimized syscalls profiled with strace on Linux  libc vDSO for clock_gettime()  our own light locks with no mutex  wakeup threads using eventfd() – only if needed  per-second cache of date/time or timeout ticks
  • 30.
    Async Web Server THttpAsyncServer •Minimized syscalls profiled with strace on Linux • Less optimized on Windows
  • 31.
    Async Web Server THttpAsyncServer •Very efficient URI routing with O(1) parsing of routes and parameters • Can use the RTTI over methods to define the endpoints • Can redirect/rewrite URI before a REST layer
  • 32.
    Async Web Server THttpAsyncServer •Perfect for the TFB use-case top of /plaintext or /cached_queries • Has been tuned to increase TFB numbers also noticeable on other HW or OS
  • 33.
    Async Web Server mORMot2 TFB Sample https://github.com/synopse/mORMot2/tree/ master/ex/techempower-bench
  • 34.
  • 35.
    Async Database Access PostgreSQL •The Database of choice for TFB • The Database of choice for most projects
  • 36.
    Async Database Access mormot.db.sql.postgres.pas •Direct libpq client access • Written from scratch • Using mormot.db.sql.pas simplified design cached statements with no TDataSet overhead • Optional array binding • Optional pipelined mode support • Optional asynchronous / non-blocking API
  • 37.
    Async Database Access mORMot2 TFB Sample https://github.com/synopse/mORMot2/tree/ master/ex/techempower-bench
  • 38.
  • 39.
    JSON, RTTI, Mustache JSON •mORMot is natively UTF-8 and JSON on all platforms and compilers (even Delphi 7) • mORMot 2 JSON core has been rewritten for performance
  • 40.
    JSON, RTTI, Mustache SeveralJSON parsers • SAX approach • TDocVariant • TOrmTableJson • TDynArray
  • 41.
    JSON, RTTI, Mustache SeveralJSON parsers Delphi XE8, Win32 • SAX approach 725 MB/s • TDocVariant 117 MB/s • TOrmTableJson 496 MB/s • TDynArray 332 MB/s
  • 42.
    JSON, RTTI, Mustache SeveralJSON parsers Delphi XE8, Win32 • SAX approach 725 MB/s • TDocVariant 117 MB/s • TOrmTableJson 496 MB/s • TDynArray 332 MB/s • Delphi JSON 6 MB/s • JsonDataObjects 103 MB/s • SuperObject 35 MB/s • Grijjy 54 MB/s • dwsJSON 97 MB/s
  • 43.
    JSON, RTTI, Mustache RTTI •mORMot has its own RTTI cache on all platforms and compilers • mORMot 2 RTTI core has been rewritten for performance and maintainability (mORMot 1 did have duplicated logic)
  • 44.
    JSON, RTTI, Mustache Mustache •mORMot has its own Mustache renderer • mORMot 2 Mustache can work directly on in-memory data instead of TDocVariant containers
  • 45.
    JSON, RTTI, Mustache mORMot2 TFB Sample https://github.com/synopse/mORMot2/tree/ master/ex/techempower-bench
  • 46.
    Frameworks Tuning Questions? Wishes? Opinions?Reactions? No Marmots Were Harmed in the Making of This Session