1. Academics Certificates
➢ TRIUM Global Executive MBA - Sep 2015 to Feb
2017
Degree Jointly Issued by
▪ NYU Stern School of Business, New York City, USA
▪ London School of Economics and Political Science,
London, UK
▪ HEC Paris School of Management, Paris, France
➢ Indian Institute of Technology Kanpur (India)
Master of Technology in Aerospace Engineering
Jul 1998 to Mar 2000
➢ Indian Institute of Technology Kanpur (India)
Bachelor of Technology in Aerospace Engineering
Jul 1993 to May 1997
➢ World Quant University
MSc Engineering in finance, Quantitative Finance
Apr 2017- Till now
➢ Project Management Professional (PMP) - 2014
➢ PMI Agile Certified Practitioner (PMI-ACP) - 2017
Barun Sharma
A dynamic professional with a high caliber industry exposure of 18+ years worked across
leading multinational firms such as Standard Chartered Bank, Deutsche Bank, Credit Suisse,
Trianz US, BlackMagic Design, Amdocs, Verizon, Thomson Financial, Persistent System
Ltd., and Parametric Technology Corporation.
2. Low Latency Trading System
➢ What is Low Latency Trading System
➢ Why Low Latency Trading System
➢ Measurement of Trading System Latency
➢ Wire Time Latency
➢ Network Processing Latency
➢ Operating System Latency Consideration
➢ Application Processing Latency Consideration
➢ Direct Market Access in Low Latency Trading
➢ FPGA in High-Frequency Trading
➢ Overview – InfiniBand
➢ Low Latency Trading System - Recent Trend
3. What is Low Latency Trading System
“In capital markets, low latency is the use of algorithmic
trading to react to market events faster than the
competition to increase profitability of trades. "
A Traditional Trading System
4. Why Low Latency Trading System
LOW LATENCY TRADING: STAYING
COMPETITIVE BY OPTIMIZING SPEED
The upsurge of investor interest in high-frequency trading
(HFT) have their origins in the computer
networking/systems industry, which is to be expected
given that HFT is based on incredibly fast computer
architecture and state-of-the-art software.
High frequency trading is entirely automated, with trades
executed based on the processing of algorithms that
optimize trades based on the changing market prices.
Capital markets firms compete on the speed with which
these algorithms can be processed and trades executed.
Role of High Performance Computing
➢ 1. a high-throughput system able to process millions of messages per second, extensible to tens of millions
in the future, and
➢ 2. a low-latency system with response times of a few microseconds to nanoseconds.
5. Measurement of Trading System Latency
Wire Time
The time it takes for a
message to be transmitted
on a physical medium (e.g.,
ethernet)
Network Processing
The time spent in networking
equipment (load balancers,
firewalls and routers)
Operating System Latency
The time spent within
operating system calls and
hardware network interfaces
Application Processing
The time spent within the
application business rules
and logic
Cumulative Factor
6. Wire Time Latency
➢ Fiber delay: the length of the actual fiber cables
➢ Proximity delay: the physical distance to the fiber connection
➢ Equipment delay: the processing speed of the network
equipment
Three common sources of delay in optical transport
Optical Networking functions that can increase latency
7. Network Processing Latency
Network processing latency is the time delay, it takes routers to process the packet
header. Network processing delay is a key component in network latency.
Processing Delay Computation:
Different delay types that occur in the network between sending and receiving a packet:
➢ Propagation delay (∆t-prop): The propagation delay denotes the delay of the signal in the physical
medium (twisted pair copper wires in our experiments). We assume a signal speed of two third of the
speed of light, i.e., 2/3×3×108 m/s
➢ Transmission delay (∆t-trans): The transmission delay is the time to “put the data on the wire”. This
delay depends on the frame size s and the data rate r of the link (1Gbps in our experiments): ∆t-trans =
s/r
➢ Queuing delay (∆t-queue): The queuing delay denotes the time a packet spends in a queue of the
switch, e.g., if there are multiple packets waiting to be transmitted over the same outgoing port.
➢ Processing delay (∆t-proc): The processing delay defines the time it takes the switch to make a
forwarding decision to decide over which outgoing port to forward a packet.
8. Operating System Latency Consideration
Best practice into consideration
➢ Minimize propagation delay by using high-end co-located server
➢ Minimize queuing delay
➢ Rationalize transport-layer implementation
➢ Best usage of middleware techniques available
➢ Good understanding of applications and different communication
channel
➢ Good knowledge about high end servers & operating systems
➢ Meet security/compliance guidelines
Operating system performance tuning
➢ Follow hardware manufacturers' guidelines for low latency BIOS
tuning
➢ PCI Locality
➢ NUMA topology (Non Uniform Memory Access)
➢ Using the network-latency tuned profile
➢ Low-latency kernel bypass or zero copy networking technique
9. Application Processing Latency Consideration
Best practice for low latency system development
➢ Choose the right language (C, C++, Java, Scala, Haskell or others)
➢ Keep it all in memory
➢ Keep data and processing collocated
➢ Keep the system underutilized
➢ Keep context switches to a minimum
➢ Keep reads sequential
➢ Batch writes
➢ Respect cache
➢ Non blocking as much as possible
➢ Async as much as possible
➢ Parallelize as much as possible
➢ Optimize packet size, most preferred are small packet size
➢ Optimized data structure algorithm (Mostly with O(1) complexity)
➢ Minimize network hop
➢ Best messaging middleware techniques with minimum latency
➢ Best usage of IP Multicast, TCP and UDP protocol
10. Direct Market Access in Low Latency Trading
Ultra-low latency direct market access is a set of technologies used as part of
modern trading strategies, where speed of execution is critical. Direct market
access (DMA), often combined with algorithmic trading is a means of executing
trading flow on a selected trading venue by bypassing the brokers' discretionary
methods.
Benefits of DMA
➢ Latency Reduction
➢ Minimal Slippage
➢ Anonymity
➢ Price Improvements
After advent of DMA,
➢ Latency between Event
occurrence & order generation
reduced to micro-milli seconds.
➢ Order management to be made
more robust to handle thousands
of orders in a second
➢ Risk management to be done real
time without human intervention
11. FPGA in High-Frequency Trading
FPGA cards enable low latency as
they compute in parallel using logic
gates. Latency in an FPGA solution
is significantly lower and more
deterministic than software.
The use of FPGA platforms in high-
frequency trading enables
companies to collect, cleanse,
enrich, and disseminate the
burgeoning array of rapidly
changing financial data in short
terms. Without loading a CPU, FPGA
hardware is able to quickly execute
various trading tasks.
12. Overview - InfiniBand
InfiniBand (IB) is a computer-networking communications standard used in high-
performance computing that features very high throughput and very low latency.
IB offers a better ROI, with higher throughput and CPU efficiency at competitive
pricing, equalling higher productivity with a lower cost per endpoint.
IB creates a private, protected channel directly
between the nodes via switches, and facilitates
data and message movement without CPU
involvement with Remote Direct Memory
Access (RDMA) and Send/Receive offloads
that are managed and performed by InfiniBand
adapters.
Traditional Interconnect RDMA Zero-Copy Interconnect
Advantage:
➢ Higher throughput
➢ Lower latency
➢ Enhanced scalability
➢ Higher CPU efficiency
➢ Reduced management overhead
➢ Simplicity
13. Low Latency Trading System - Recent Trend
➢ In 2017, Nasdaq decreases time to disseminate SIP (Securities Information Processor or Consolidated Market
Data) from 480 uSecs to 50 uSecs, then 20 uSecs;
✓ Almost all Trading Firms actively use the SIP
✓ Trading Firms are bound to optimize infrastructure for quicker access to SIP
➢ More firms & algo applications are upgraded for single digit uSecs for Tick-2-Trades
➢ Speed enhancements in FPGA’s, GPU’s, Servers, Caches, Middleware continue to lower trading latencies
✓ FIX Engines and Mkt Data order books in FPGA NICs
✓ Simple algo’s, along with FIX Engines, SOR, Risk checks – 100% FPGA ready
➢ More competitions of vendors in space of Ultra Low Latencies(ULL) appliances using parallelized processing
using 2 or more of
✓ FPGA’s, GPU’s, Intel Cores, ULL Switch capabilities
➢ Some ULL Switches now transmit Market Data to subscribers in 5 ns
➢ 48 port ULL switches are inexpensive (under $20K) - easier to meet ROI
➢ Kernel Bypass is now ubiquitous; I/O’s are at approx. 1 uSec, down from 12 uSecs
✓ Increased use of FPGA based kernel bypass to sub uSec I/O latencies