Architectural Differences in High-Frequency Trading Systems
Jim Wang, CFA
Program in Financial Engineering
Stevens Institute of Technology
Hoboken, NJ, 07030, USA
Trading Engine GUI
• Based on Trading Engine Location:
Desktop – Broker – Exchange.
• 10 millisecond or larger: Engine on your
• Btw 1-10 millisecond: Engine collocated
with your broker.
• 1 millisecond or less: Engine collocated
with exchange matching engine.
Engine on Desktop: Architecture
Exchange Match Engine
Broker GUI API
Broker Collocation: Cost
• Collocation Fee: > $200 per month
• Market Data Fee: Level I and II, Professional > $600 per
• Commission: USD 0.0005 – 0.001 based on volume
• Island Exchange Fees: Remove $0.0030; Add ($0.0025).
• ARCA Exchange Fees: Remove $0.0030; Add ($0.0021)
Exchange Collocation: Architecture
Exchange Exchange Match Engine
Data Sponsored Access Naked Access
Broker Check API
Source of Latency
• Propagation latency: speed of light 5us/km,
Mahwah – Weehawken 40km.
• Transmission latency: high speed communication
link throughput rate 1-10 Gbps. 1us/1kb to
serialize and transport.
• Processing latency. dedicated CPU for critical
threads, kernel bypass, hardware acceleration.
• Collocation within 200m 1us.
• Dark Fiber: Optical fiber infrastructure that
is currently in place, but is not being used
• End-to-end direct connectivity, WAN
• Dedicated use of a wavelength (not the
cable), Wavelength-division Multiplexing.
• Cut-through switching: the switch starts
forwarding a frame before the whole frame
has been received.
• Kernel Bypass: application communicate
with NIC using special supplied lib without
using standard system call.
• RDMA (remote direct memory access):
one computer place information directory
into the memory of another.
• Parallel Problem: With 10k symbols, 6
major exchanges, relatively independent,
tasks be streamlined.
• Through Software optimization:
flexibility, take advantage of general
purpose CPU improvements over time.
• Through Hardware Acceleration:
specialized hardware, improve consistency
by reducing jitter.
• Separation of high speed vs high
complexity. Latency sensitive task in
critical path, computation intensive tasks
offload to separate thread/process.
• memory caching: critical decision thread
pined to a dedicated CPU.
• inline vs function calls: C, C++, Java.