SlideShare a Scribd company logo
PERFORMANCE
IMPROVEMENTS REPORT
Goal was:
Deliver short term tactical performance
improvements
● Fix common performance bottlenecks
● Introduce incremental architectural improvements
based on Proof-of-Concept
● => No business logic or radical architecture changes
● => Update software to incorporate modern
programming and architectural standards
Where did we start?
Version 1.9.5 running 8 PG and 8 L
This version cannot run 20PG and 15L
Mean latency: 70+ms (with long tail)
% Messages over 100ms: 49%
20 Currency pairs meant 40 processes running
JVM overload
Inefficient processor utilization – only 40%
Message latency characteristics
● Sub 10ms: 9.64%
● Sub 20ms: 31.69%
● Sub 40ms: 64.17%
● Sub 80ms: 85.34%
Long tail on distribution
Where are we now?
Version PERF running 20 PG and 15 L
Mean latency: 16ms
% Messages over 100ms: 0.029%
20 Currency pairs means 20 processes running
JVM not taxed
All processors utilized: 80%
Message latency characteristics
● Sub 10ms: 15.75%
● Sub 20ms: 77.53%
● Sub 40ms: 96.48%
● Sub 80ms: 99.89%
Comparare to http://www.lmax.com/execution-performance (can’t
guarantee latency 100%)
How did we test?
Instrumented code with Fixprotocol Inter Party Latency
LMP's, and recorded timing info
Run simulated price feed with constant and live-like rates
19 currency pairs
20 price groups (PG) and 15 layers (L) per pair for PERF
branch
8 PG and 8 L for 1.9.5.56 branch
100 spot updates/sec
20 fwd updates/sec
Latency Distribution (ms)
Distribution of throughput(msgs/s)
Performance Improvements
Common Improvements
● Eliminate sources of latency common to many
applications
● While some may have seemed trivial, they had
significant impact
Improvements based on the PoC
● Apply PoC architecture principles in key areas where
latency was measured
● Only tactical changes, not strategic
● Required careful measurements: bottlenecks turned out
to be in different places than previously thought
Common Performance Bottlenecks:
Price Object Marshalling
Replaced object marshalling
● Significant source of latency, large message sizes, and
garbage (object) creation
● Serialize-Deserialize cycle was performed at least three
times for every price
● Previously based on JDK serialization, replaced with
custom code
● Removed one cycle (more on that later)
Optimized the Price Object data structure
Common Performance Bottlenecks:
Logging
Price Engine logging levels were insane
● INFO level logging was performed over 1,500 times
● Most INFO level logging was redundant
Significant performance bottlenecks
● Disk writes, thread contention, object creation (GC)
● Logs could grow to GB size in minutes
Removed all but necessary logging
● Logs will need further work short term...
Code Review and Optimization
All PE code was reviewed for efficiency
● Re-work (tactical) but not re-write (strategic)
Improvements
● Timer scheduling replaced with a more efficient
approach
● Replace synchronization locks with CAS operations when
possible to reduce contention
● Replace inefficient cache access
● Numerous code tweaks
PoC Architectural Principles
Only distribute components when absolutely
necessary
● Challenge the myth that distributed components improve
throughput and latency
Parellelism (threads) may dramatically slow a
system down
● Contrary to old conventional wisdom
● Mechanical Sympathy has challenged this assumption
● Data contention, context switching often leads to data
duplication and GC
● A lot can be done in a single thread
Reduced Parallelism
Significant contention was eliminated in the Broadcast module
Excessive use of “in memory” producer/consumer queues
● Price objects put on queues for margining, forward calculation, and plugin delivery
● Multiple worker consumer threads pull from those queues and process prices
Queues written using synchronization primitives
● Very inefficient
● Contention between producers and consumers (put and take operations)
● Large number of worker threads lead to context switching
Queues were replaced with a highly efficient lock-free buffer
● Uses CAS operations instead of synchronization to dramatically reduce contention
● Only one consumer thread to reduce context switching
We attempted to eliminate buffers and queues altogether
● Make processing synchronous (and therefore remove contention)
● Turned out to be higher latency than using the lock-free buffers
– Likely because business logic is not optimised
Reduced Distribution
Everyone thought the bottleneck was Broadcast
● It turned out that bottlenecks existed in Broadcast, but there were
other equally significant sources of latency...
...Validator and TW
● One Validator process and one TW process per currency pair --> 20
currency pairs = 40 processes!
● Context switching, JMS latency, serialization overhead
Combined Validator and TW into a single processes
● Halved the number of processes
● Removed one serialization cycle
● Greatly simplified system management

More Related Content

What's hot

Chilinet
ChilinetChilinet
Chilinet
hjkim0
 
Topic and schema management-meetupberlin
Topic and schema management-meetupberlinTopic and schema management-meetupberlin
Topic and schema management-meetupberlin
confluent
 
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Daniel Fireman
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
Faisal Siddiqi
 
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Going Microserverless on Google Cloud @ mabl
Going Microserverless on Google Cloud @ mablGoing Microserverless on Google Cloud @ mabl
Going Microserverless on Google Cloud @ mabl
Joseph Lust
 
Going Microserverless on Google Cloud
Going Microserverless on Google CloudGoing Microserverless on Google Cloud
Going Microserverless on Google Cloud
Joseph Lust
 
What we learnt at carousell tw for golang gathering #31
What we learnt at carousell tw for golang gathering #31What we learnt at carousell tw for golang gathering #31
What we learnt at carousell tw for golang gathering #31
Ronald Hsu
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
Mohsiur Rahman
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Autonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafkaAutonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafka
Indrajeet Kumar
 
Monolithic to microservices
Monolithic to microservicesMonolithic to microservices
Monolithic to microservices
Ronald Hsu
 
202104 technical challenging and our solutions - golang taipei
202104   technical challenging and our solutions - golang taipei202104   technical challenging and our solutions - golang taipei
202104 technical challenging and our solutions - golang taipei
Ronald Hsu
 
Introduction to GraalVM
Introduction to GraalVMIntroduction to GraalVM
Introduction to GraalVM
SHASHI KUMAR
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Till Rohrmann
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?
ScyllaDB
 
JBoss Developer Webinar jBPM5
JBoss Developer Webinar jBPM5JBoss Developer Webinar jBPM5
JBoss Developer Webinar jBPM5
Kris Verlaenen
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
Sylvain Wallez
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
George Teo
 

What's hot (20)

Chilinet
ChilinetChilinet
Chilinet
 
Topic and schema management-meetupberlin
Topic and schema management-meetupberlinTopic and schema management-meetupberlin
Topic and schema management-meetupberlin
 
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
Improving Tail Latency of Stateful Cloud Services via GC Control and Load She...
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
 
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
 
Going Microserverless on Google Cloud @ mabl
Going Microserverless on Google Cloud @ mablGoing Microserverless on Google Cloud @ mabl
Going Microserverless on Google Cloud @ mabl
 
Going Microserverless on Google Cloud
Going Microserverless on Google CloudGoing Microserverless on Google Cloud
Going Microserverless on Google Cloud
 
What we learnt at carousell tw for golang gathering #31
What we learnt at carousell tw for golang gathering #31What we learnt at carousell tw for golang gathering #31
What we learnt at carousell tw for golang gathering #31
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Autonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafkaAutonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafka
 
Monolithic to microservices
Monolithic to microservicesMonolithic to microservices
Monolithic to microservices
 
202104 technical challenging and our solutions - golang taipei
202104   technical challenging and our solutions - golang taipei202104   technical challenging and our solutions - golang taipei
202104 technical challenging and our solutions - golang taipei
 
Introduction to GraalVM
Introduction to GraalVMIntroduction to GraalVM
Introduction to GraalVM
 
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?
 
JBoss Developer Webinar jBPM5
JBoss Developer Webinar jBPM5JBoss Developer Webinar jBPM5
JBoss Developer Webinar jBPM5
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 

Similar to BAXTER phase 1b

Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
MarkTaylorIBM
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
LibbySchulze
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
Akhil Kaushik
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
Roberto Agostino Vitillo
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
supergigas
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Erik Krogen
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
Timothy McCormick
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
Shubham Tagra
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
ssuser2ae721
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
inside-BigData.com
 
Gatling
Gatling Gatling
Gatling
Gaurav Shukla
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
Knoldus Inc.
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
InfluxData
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
Parallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsParallel Batch Performance Considerations
Parallel Batch Performance Considerations
Martin Packer
 

Similar to BAXTER phase 1b (20)

Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
IBM MQ - better application performance
IBM MQ - better application performanceIBM MQ - better application performance
IBM MQ - better application performance
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Parallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsParallel Batch Performance Considerations
Parallel Batch Performance Considerations
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

BAXTER phase 1b

  • 2. Goal was: Deliver short term tactical performance improvements ● Fix common performance bottlenecks ● Introduce incremental architectural improvements based on Proof-of-Concept ● => No business logic or radical architecture changes ● => Update software to incorporate modern programming and architectural standards
  • 3. Where did we start? Version 1.9.5 running 8 PG and 8 L This version cannot run 20PG and 15L Mean latency: 70+ms (with long tail) % Messages over 100ms: 49% 20 Currency pairs meant 40 processes running JVM overload Inefficient processor utilization – only 40% Message latency characteristics ● Sub 10ms: 9.64% ● Sub 20ms: 31.69% ● Sub 40ms: 64.17% ● Sub 80ms: 85.34% Long tail on distribution
  • 4. Where are we now? Version PERF running 20 PG and 15 L Mean latency: 16ms % Messages over 100ms: 0.029% 20 Currency pairs means 20 processes running JVM not taxed All processors utilized: 80% Message latency characteristics ● Sub 10ms: 15.75% ● Sub 20ms: 77.53% ● Sub 40ms: 96.48% ● Sub 80ms: 99.89% Comparare to http://www.lmax.com/execution-performance (can’t guarantee latency 100%)
  • 5. How did we test? Instrumented code with Fixprotocol Inter Party Latency LMP's, and recorded timing info Run simulated price feed with constant and live-like rates 19 currency pairs 20 price groups (PG) and 15 layers (L) per pair for PERF branch 8 PG and 8 L for 1.9.5.56 branch 100 spot updates/sec 20 fwd updates/sec
  • 8. Performance Improvements Common Improvements ● Eliminate sources of latency common to many applications ● While some may have seemed trivial, they had significant impact Improvements based on the PoC ● Apply PoC architecture principles in key areas where latency was measured ● Only tactical changes, not strategic ● Required careful measurements: bottlenecks turned out to be in different places than previously thought
  • 9. Common Performance Bottlenecks: Price Object Marshalling Replaced object marshalling ● Significant source of latency, large message sizes, and garbage (object) creation ● Serialize-Deserialize cycle was performed at least three times for every price ● Previously based on JDK serialization, replaced with custom code ● Removed one cycle (more on that later) Optimized the Price Object data structure
  • 10. Common Performance Bottlenecks: Logging Price Engine logging levels were insane ● INFO level logging was performed over 1,500 times ● Most INFO level logging was redundant Significant performance bottlenecks ● Disk writes, thread contention, object creation (GC) ● Logs could grow to GB size in minutes Removed all but necessary logging ● Logs will need further work short term...
  • 11. Code Review and Optimization All PE code was reviewed for efficiency ● Re-work (tactical) but not re-write (strategic) Improvements ● Timer scheduling replaced with a more efficient approach ● Replace synchronization locks with CAS operations when possible to reduce contention ● Replace inefficient cache access ● Numerous code tweaks
  • 12. PoC Architectural Principles Only distribute components when absolutely necessary ● Challenge the myth that distributed components improve throughput and latency Parellelism (threads) may dramatically slow a system down ● Contrary to old conventional wisdom ● Mechanical Sympathy has challenged this assumption ● Data contention, context switching often leads to data duplication and GC ● A lot can be done in a single thread
  • 13. Reduced Parallelism Significant contention was eliminated in the Broadcast module Excessive use of “in memory” producer/consumer queues ● Price objects put on queues for margining, forward calculation, and plugin delivery ● Multiple worker consumer threads pull from those queues and process prices Queues written using synchronization primitives ● Very inefficient ● Contention between producers and consumers (put and take operations) ● Large number of worker threads lead to context switching Queues were replaced with a highly efficient lock-free buffer ● Uses CAS operations instead of synchronization to dramatically reduce contention ● Only one consumer thread to reduce context switching We attempted to eliminate buffers and queues altogether ● Make processing synchronous (and therefore remove contention) ● Turned out to be higher latency than using the lock-free buffers – Likely because business logic is not optimised
  • 14. Reduced Distribution Everyone thought the bottleneck was Broadcast ● It turned out that bottlenecks existed in Broadcast, but there were other equally significant sources of latency... ...Validator and TW ● One Validator process and one TW process per currency pair --> 20 currency pairs = 40 processes! ● Context switching, JMS latency, serialization overhead Combined Validator and TW into a single processes ● Halved the number of processes ● Removed one serialization cycle ● Greatly simplified system management