SlideShare a Scribd company logo
1 of 62
1 © Hortonworks Inc. 2011–2018. All rights reserved
Machine Learning Driven
Trading Bots
Diego Baez
General Manager Financial Services
Hortonworks
2 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• Background
• A Brief History
• What is a Trading Bot
• Machine Learning driven Approach
• Implementation
• Lessons so far
3 © Hortonworks Inc. 2011–2018. All rights reserved
The US Equity Market
• USD$40 Trillion traded every year
• Over 6 Billion shares Trade every day
Genesis of
Electronic Trading
• The majority of trades interact with the exchange without a human in the middle
• Automated Trading Bots account for half of the total volume
• 3 Billion shares per day
• Algorithms generate large amounts of orders for each execution
4 © Hortonworks Inc. 2011–2018. All rights reserved
A Brief History
5 © Hortonworks Inc. 2011–2018. All rights reserved
Two Forces drive Market Evolution
Technology
• Enables new business models
• Reduces Cost
• Becomes a Competitive advantage
Regulation
• Creates New Opportunities
• Ends Business Models
6 © Hortonworks Inc. 2011–2018. All rights reserved
Electronic Trading: Both Forces Converging
Regulation
1998, Reg. ATS: passed by SEC in order to
restrict the monopoly enjoyed by NYSE and
NASDAQ
2001, U.S. stock exchanges began quoting
prices in decimals instead of fractions,
bringing down the minimum spread
between the bid and ask prices from 1/6th
of a dollar (6.25 cents) to one cent
2005, Reg. NMS: Trade-through Rule,
promote transparency and competition
between markets and requiring trade
orders to be posted nationally and not at
individual exchanges.
No Action Letter from SEC to list ETF’s –
Exchange Traded Funds
Technology
Cost of computation rapidly decreases: Lowers
barriers of entry to set up a ECN
From 2000–2010, storage costs dropped by 500x:
Emergence of ECN’s:
alternative electronic
trading platforms
Narrowing Spreads:
Harder to make money
Intra-Exchange
Arbitrage
Profit from any small
price difference of a
security between two
different exchanges
Growth of ETF’s
Arbitrage
Opportunities Multiply
exponencially
7 © Hortonworks Inc. 2011–2018. All rights reserved
US Markets After Electronic Trading
Fragmented
• 12 Exchanges
• Nearly 50 “Dark Pools”
Electronic
• Over half of Volume is Algorithm Generated
ETF’s are rising
• Fast and Dramatic rise of ETF’s
8 © Hortonworks Inc. 2011–2018. All rights reserved
Trading Bots Require Electronic Markets
Need Direct fast access to Market
Activity (Market Data)
• Orders and Trades are indicators of
Supply and Demand
• Delayed information is Stale
Information
• Exchange duopoly was maintained
by restricting or delaying Market
Data
Need Direct Market Access to Buy and Sell
directly on the exchange (DMA)
• Ability to quickly act based on trigger
conditions
• Any delay could miss the price
• Prevent chasing the price
• Human in the loop = information
leakage
The most significant evolution over the last 30 years in the financial markets is the transition of
trading from a manual to an automated process. This has affected everyone: exchanges,
market centers, regulators, brokers and investors
Automated trading has reshaped our markets and is and intrinsic part of our Markets moving
forward.
9 © Hortonworks Inc. 2011–2018. All rights reserved
Impact on Financial Services Industry
• Less Brokers: FINRA reports that it has 3,816 registered securities firms in February 2017, which is
down from 5,005 a decade earlier in 2007.
• More Venues: Equity trading can occur on any of 12 registered public exchanges, over 30 ATSs
• Dark Liquidity: The Tabb Group reports that in Q2-2016, equity market volume was split between
56.9% on lit venues and 43.1% on dark venues
• Dark liquidity and dark trading have always existed on U.S. equity markets. In prior years, NYSE floor
brokers were a source of dark liquidity, either leaving large customer orders with the specialist (passive
participation) or working them over time as a member of the trading crowd (active participation).
10 © Hortonworks Inc. 2011–2018. All rights reserved
Market Failures
• Market breaks are not a new phenomenon. An entire chapter (Chapter XIII) of the original Special
Study focused on the market break of May 28, 1962. As these two passages from the study illustrate,
many of the issues in 1962 are still relevant today:
The avalanche of orders which came into the market during this period subjected the market mechanisms to extraordinary
strain, and in many respects they did not function in a normal way. Particularly significant were the lateness of the tape and the
consequent inability of investors to predict accurately the prices at which market orders would be executed. (p. 859) […] The
history of the May 28 market break reveals that a complex interaction of causes and effects--including rational and emotional
motivations as well as a variety of mechanisms and pressures--may suddenly create a downward spiral of great velocity and
force. (p. 861)
11 © Hortonworks Inc. 2011–2018. All rights reserved
The Trading Bot
Automation, Obfuscation, Optimization
12 © Hortonworks Inc. 2011–2018. All rights reserved
What is a Trading Bot?
• Computer program which interacts
directly with the market
• Buys or Sells directly without human
in the loop with the Market
• Can be used for:
• Cost Reduction
• Optimize Operations
• Defensive mechanism
• Maximize profits
• Utilizes Electronic Market Access to
Buy and Sell
• Not possible until Regulation and
Technology converged in the late 90’s
Trading Bot
Algorithm Trading
HFT
High
Frequency
Trading
Automated
Trading
Markets
Direct
Market
Access
13 © Hortonworks Inc. 2011–2018. All rights reserved
Intersection of Disciplines
Market
Knowledge
Technology
Infrastructure
Mathematical
Models
How each market behaves
Buying/Selling Dynamics
What strategies work
Moving Average
Models which signal Divergence
Complex Multi-Variant models
Vast Data Storage
Analytics Platform
Back-Testing Platform
Interfaces to interact with the market
14 © Hortonworks Inc. 2011–2018. All rights reserved
When is it Most Useful
• Fast Markets
• Market information changes fast
• Market execution moves fast
• Many variables are changing fast
• US Equity Markets generate 40,000,000 Tick Events per day
• Many Instruments to trade
• US Equity markets have over 10,000 instruments which trade everyday
• Each is an opportunity
• Add Options, FX, Futures, Indexes and the options multiple exponentially
• Many opportunities occur at the same time
• Even with a successful strategy, a single trader can only make markets manually in 1-5 stocks at the
same time
• While opportunities are appearing across many instruments at the same time
• Not applying that strategy to ALL opportunities is wasted P&L
15 © Hortonworks Inc. 2011–2018. All rights reserved
Number of Instruments which trade
Number of Trades
Volatility
Time the Market is Open
Rule of thumb
Opportunities in a Market
16 © Hortonworks Inc. 2011–2018. All rights reserved
The majority of the trading strategies fall into one of these
categories:
1. Momentum -> Follow the market
2. Mean Reversion -> Go against the market
3. Statistical Arbitrage-> Take advantage of Market disequilibrium
Primary types of Strategies
Strategies
17 © Hortonworks Inc. 2011–2018. All rights reserved
• Methodology:
• Detect when the market has an Up or Down
bias, as early in the move as possible
• Enter a position when the move begins
• Detect the move has completed
• Exit the position
• Common indicators used to detect
Momentum
• Stochastics
• MACD
• ROC
• RSI
• Momentum Indicator
Arrive early, stay until closes
Momentum
18 © Hortonworks Inc. 2011–2018. All rights reserved
• Methodology:
• We are looking for one of three possible situations:
1. The market/instrument is over bought or oversold
2. The market/instrument is above or bellow our theoretical price
3. We detect extreme Optimism or Pessimism in the market/instrument
• Common indicators:
• Moving Average
• Ichimoku cloud
• Relative Strength Indicator (RSI)
• ConnorsRSI
• %b (Bollinger Bands)
• Moving average stretch
• Rate of change
• The number of days down
Look for inbalances
Mean Reversion
19 © Hortonworks Inc. 2011–2018. All rights reserved
Mean Reversion
Indicator
Price is bellow our Indicator
it is cheap so we buy
Price returns to Theoretical Price
Divergence is over, we Sell
20 © Hortonworks Inc. 2011–2018. All rights reserved
Statistical Arbitrage
• Methodology:
• Find a mathematical relationship between two
or more instruments which holds the majority
of the time, based on historical analysis
• Fix the relationship between the instruments
as a constant by putting one in the numerator
and another in the denominator.
• If the current value of the constant calculated
with the latest prices is greater than the value
of the constant, then SELL the numerator AND
BUY the denominator. If it is less than the
constant value, do the opposite.
• Exit both trades when the current value of the
returns to the historical value.
• Many variations:
• Pairs Trading
• Index Arbitrage
• ETF Arbitrage
• Multi-market Arbitrage
21 © Hortonworks Inc. 2011–2018. All rights reserved
Pairs Trading
Constant is 4, >1
Sell A & Buy B
Flipped
Buy A & Sell B
P&L
P&L
1.00
22 © Hortonworks Inc. 2011–2018. All rights reserved
Evolution
23 © Hortonworks Inc. 2011–2018. All rights reserved
Evolution
Algorithms have developed on a Path of: Automation->Obfuscation->Optimization
1. Automation:
• Begun as simply automation of repetitive tasks, or ones which required monitoring the market at all times
(VWAP)
• Worked very well, except for two major flaws: (1) They were very predictable, and (2) Oblivious to Changing
Market Conditions
2. Obfuscation:
• Market participants took advantage of the predictability of the algorithms to trade against, or to infer
information
• Solution: randomize order placement, more intelligent order placement, show partial orders, in short
Obfuscate your intentions
3. Optimization:
• Draw individual Instrument profile based on historical data
• Adjust to market conditions
• Generate signals to help guide Algorithms
• Proactive vs Reactive strategies
24 © Hortonworks Inc. 2011–2018. All rights reserved
Illustrated Case Study - VWAP
• VWAP – Volume weighted Average Price
• Measure of fairness of an Execution
• Not necessarily the best execution
• Nobody argues with a large order executed at the VWAP
• First family of Algorithms to be implemented
25 © Hortonworks Inc. 2011–2018. All rights reserved
Automation Obfuscation Optimization
Split Order Qty into 15
minutes interval and
send 1/25 of the order
each 15 minutes
Randomize Order
Placement
Show Partial
Hidden Orders
Optimize Volume Curves
Optimize Executions
Algo’s using Algos
Brokers trade against
predictable flow and move
prices ahead of each 15
minute interval
Freed up traders
Execute more VWAP orders
Lowered costs
Executions not very good
Almost always worse then
VWAP
Improved Execution
Match VWAP
Algorithms Taken for a Ride Beat VWAP Consistently
Hard to get liquidity
Fast moving markets
Low Order/Execution Ratios
Proactively smoke out
Algos:
Poke-Poke-Slap
Crazy Joy-Stick
Competing Algo’s dry out
Liquidity
New Level Playing field
26 © Hortonworks Inc. 2011–2018. All rights reserved
So What’s wrong?
27 © Hortonworks Inc. 2011–2018. All rights reserved
• All Algorithms up to now have the same IF-DO overarching pattern: IF this happens DO this
• They are a snapshot of the Authors ideas, biases and shortcomings
• The creator HAS to forecast ALL possibilities and program all possible scenarios
• If markets have an atypical event, they either shut off or have unpredictable consequences. Markets
ALWAYS have unpredictable events, coupled with fast execution potential risk is dramatic
• Models have the same issue, no matter how sophisticated the analysis, unless it is coded, it is not going
to react
• Algorithms and models don’t adjust and evolve on their own, they are static.
• Industry littered with examples of algorithms going bad, and its effects compounded by the ability to
execute large volumes in very short time
• The Markets are not static, why have a static approach to Algorithms?
• ENTER The Machine Learning Driven Trading Bot…
How one bad algorithm cost traders $440m. The Register
Limitations
28 © Hortonworks Inc. 2011–2018. All rights reserved
Machine Learning
Driven Approach
29 © Hortonworks Inc. 2011–2018. All rights reserved
• Artificial Intelligence is focused on creating programs which
can extend their actions beyond following a strict set of
instructions.
• Machine learning Provide systems the ability to
automatically learn and improve from experience without
being explicitly programmed. Focuses on the development
of computer programs that can access data and use it learn
for themselves
• Deep Learning involves feeding a computer system a lot of
data, which it can use to make decisions about other
data. The core of deep learning is that we now have fast
enough computers and enough data to actually train large
neural networks. That as we construct larger neural
networks and train them with more and more data, their
performance continues to increase. This is generally
different to other machine learning techniques that reach a
plateau in performance.
AI-ML-DL
Machine Learning Applied
Artificial Intelligence
Machine Learning
Deep Learning
30 © Hortonworks Inc. 2011–2018. All rights reserved
The Pillars of Machine Learning
Machine learning
algorithms have been
around since 1950 or
earlier
The breakthrough is
the ability to perform
complex
mathematical
computations in a
short amount of time
over huge sets of
Data (Big Data), over
and over, faster and
faster
Mathematical
Models
AI Algorithms
Parallel
Computing
BigData
Cheap
Storage &
CPU
Run Complex Models
In useful time
Data is the
Food for Models
BigData is possible
Due to Cheap Storage
And Hadoop
Fast Parallel Computing
Possible due to Fast Cheap
Computation Hardware
Store Large Amounts
Diverse Data
Scalable
Cost Efficient
Fast
Run Models in Parallel
Use Fast GPU’s
Use fast RAM
Off-the shelve Basic
Models
Adapt and train
models to my data
Run Models in Parallel
Simple Parallel
Processing Platform
& Implementation
Machine
Learning
Renaissance
31 © Hortonworks Inc. 2011–2018. All rights reserved
• Closed System
• Clearly defined boundaries
• Single Reward – P&L
• Manageable set of predictors
• All data is labeled
• Time Scales typically short
• Every day a new additional set of training Data
The Domain
How appropriate is ML for Trading?
32 © Hortonworks Inc. 2011–2018. All rights reserved
• Analysis which used to take hours or days can now be done in seconds
• Back-testing over a larger length of time with fuller data now possible
• More data sources are are available that can be used to build richer more accurate
models
• almost all the value today of deep learning is through supervised learning or learning
from labeled data
Why?
Machine Learning in Trading
33 © Hortonworks Inc. 2011–2018. All rights reserved
Implementation
34 © Hortonworks Inc. 2011–2018. All rights reserved
The Goal
Build an Algorithm which can:
1. Learn from a training set
2. Optimize Risk Adjusted P&L
3. Automatically adjust to changing market conditions
4. All within my acceptable Risk Directive
35 © Hortonworks Inc. 2011–2018. All rights reserved
Methodology
• Define the Domain
• Pick the Predictor Variables & normalize
• Define constrains (trading times, Max Loss, Min/Max holding time, etc.)
• Train Models
• Predict using the Models and Rank
• Deploy
• Continuously Train Models
36 © Hortonworks Inc. 2011–2018. All rights reserved
Events & Training
Tick
10:00 S 100 AAPL
@$150 NYSE
Update Position
5,000 AAPL $750,000
Update Risk
……..
Update Total Volume
AAPL 1,251,600
Update Position Price
5,000 AAPL $750,000
Update Sector Move
Tech Sector up %0.2
Update Unrealized P&L
5,000 AAPL ($2,500)
Calculate Exposure
AAPL 5% of Total
Exposure
Calculate Relative
Value
AAPL +$0.12
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Update …
……..
Recalculate Avg Price
AAPL 5,000 @ $151.12
Recalculate Tendency
UP +0.3
Recalculate Dev Sector
AAPL (0.01%)
Update …
……..
• Core Events:
• Quotes
• Trades
• Core Events Frequency
• APPL
• 30,000 Trades per day
• 100,000 Quote Events per Day
• Market
• 40,000,000 Tick Events per day
• External Events:
• Fed Announcements
• Earnings
• Economic Indicators
• 3rd Party Indicators
• Each events propagates and
updates/creates predictors
37 © Hortonworks Inc. 2011–2018. All rights reserved
Instruments & Predictors
• Target Instruments
• US Exchange Listed Equities
• > $5 Last Closing Price (Stocks < $5 subject to different Trading restrictions)
• > 100,000 Average Daily Volume
• Approximately 3,007 Equities
• Predictors/Features are endless:
• Technical Indicators, Price, sectors, volume, indexes, etc
• Careful determination needs to be made to pick the best predictors
• If it does not make sense, do not include it!
• Simple is good
• Keep predictors to the least amount necessary for a good result
38 © Hortonworks Inc. 2011–2018. All rights reserved
Train – Validate - Train
• Reward/Target variable is positive P&L
• Two Simple Questions need to be answered:
1. If I Buy Stock x when Features (A=x, B=y, C=z,..) and Sell when Features (A=r, B=s, C=t,..), how much
money do I make over period P, as long as my negative P&L is never greater than L at any point in time
2. If I Sell Short Stock x when Features (A=x, B=y, C=z,..) and Buy when Features (A=r, B=s, C=t,..), how much
money do I make over period P, as long as my negative P&L is never greater than L at any point in time
(This can also be deducted by taking the opposite of (1)
• Permutations get very large, since each tick is an opportunity to Buy or Sell
and we have 40,000,000 Ticks in a day:
• Buy at first Tick, Sell at second Tick * each permutation of predictors at Buy time and at Sell Time
• Buy at first Tick Sell at third Tick * each permutation of predictors at Buy time and at Sell Time
• For Buy at x Tick Sell at y Tick * each permutation of predictors at Buy time and at Sell Time
39 © Hortonworks Inc. 2011–2018. All rights reserved
Train
• Since data is labeled and we want to optimize the output (P&L) given the input (features) ->
Supervised Machine Learning with Logistic Regression works well
• Optimize for Maximum P&L within the boundaries of Risk. Result
• Entry and Exit parameters, Feature settings, Take profit level, Stop loss level, Position size, …
• We are looking for Both ends of the Result Rank, those with highest probability of positive P&L
and those with the lowest probability (Why?)
Raw Tick
Data
Calculate
Derived
Indicators
Enriched Data
Economic &
Non-Tick
Data
Feature
Selection
Feature
SelectionFeature
Selection
Train
Tune
Test
Models
Rank Deploy
40 © Hortonworks Inc. 2011–2018. All rights reserved
…A few twists
• Need to take into considerations Execution related variables:
• Slippage: Can I trade at the price displayed?
• Liquidity: Can I get the full size that I want to trade?
• Market Impact: Will my actions cause the market to move?
• Strategies to deal with Execution:
• Train only with completed trades, or
• Run Analysis to calculate slippage variable (as accurate as possible, per symbol,
size, etc.)
• Run Analysis to calculate Liquidity Probability (as accurate as possible)
• Limit volume to a maximum % which has a low chance of causing market impact
41 © Hortonworks Inc. 2011–2018. All rights reserved
Retraining the Model
• Models need to be retrained so that they are dynamic and adjust to changing market conditions.
• We want a model which :
• Takes into account Recent Trades when making a decision
• Is updatable, and “adjust” as data streams through
• There are Five basic strategies:
1. Periodic retraining: If monitoring for data in real-time has high overhead, or the model takes too long to
retrain
2. Micro-Batching: Retraining with smaller sets
3. Sliding Window: Continuously retrain the model with a smaller set of ordered date including the latest
observations
4. Incremental Algorithms: The model is updated each time a new observation arrives. There are
incremental versions of Support Vector Machines and Neural networks. Bayesian Networks can be
made to learn incrementally.
5. Online learning: Stochastic Gradient Descent, computationally cheap method for adaptive supervised
learning in an online environment
42 © Hortonworks Inc. 2011–2018. All rights reserved
• Find the optimal mix of values which will yield the desired return within the required
risk limitations
• Search for the optimal mix of Risk & P&L
• Some of the considerations:
• Size of each position
• Total Open positions
• Stop-loss
• Stop-Gain
• Max Draw-down
• Max loosing days vs winning days
My appetite for Risk
Risk Adjusted Optimization
43 © Hortonworks Inc. 2011–2018. All rights reserved
Optimization Example
44 © Hortonworks Inc. 2011–2018. All rights reserved
Actual Trading P&L
45 © Hortonworks Inc. 2011–2018. All rights reserved
Approach
• Ran trade by trade P&L on each day traded and calculated the total P&L at
each trade (Realized + unrealized).
• Took 90,000 possible combinations of a Stop-Loss/Max-Gain by:
• Taking Max-Loss from $1,000 to $300,000 of floor P&L, in increments of $1,000
• Taking Max-Gain from ($300,000) to ($1,000) of floor P&L, in increments of $1,000
• From each pair of Max-Loss/Max-Gain, I calculated at which point we would
have exited the day, or if the limits were not hit I took the End of Day P&L.
• Finally I calculated the Sharpe Ratio for each set of Max-Loss/Max-Gain pair,
as well as number of positive days vs negative days.
46 © Hortonworks Inc. 2011–2018. All rights reserved
Extreme Risk Aversion
47 © Hortonworks Inc. 2011–2018. All rights reserved
Win a little and Exit
All Winning
48 © Hortonworks Inc. 2011–2018. All rights reserved
Extremely Low Volatility Tolerance
49 © Hortonworks Inc. 2011–2018. All rights reserved
• Safest:
• Stop-Loss of $21,000 floor P&L, Stop-Gain($2,000)
• Total September P&L of $56,934 at 100% reverse
• Highest Sharpe Ratio of 3.88
• 100% winning days.
• Best Combination of P&L and Risk Adjusted
Return:
• Stop-Loss of $28,000 floor P&L, Stop-Gain($60,000)
• Total September P&L of $622,822 at 100% reverse
• Sharpe Ratio of 0.72
• 13/7 winning/losing days.
Goldilocks Zone
Results
Highest P&L:
Stop-Loss of $100,000, Stop-Gain of ($1,000,000)
Total September P&L of $1,201,545 at 100% reverse
Sharpe Ratio of 0.44
12/8 Winning/Loosing days
Second Highest P&L:
Stop-Loss of $64,000, Stop-Gain of ($299,000)
Total September P&L of $1,121,497 at 100% reverse
Sharpe Ratio of 0.44
12/8 Winning/Loosing days.
The higher the Sharpe Ratio, the better. Not surprising, Sharpe Ratio was directly
correlated to number of winning days vs loosing days
50 © Hortonworks Inc. 2011–2018. All rights reserved
Safest Middle Ground
All Winning
51 © Hortonworks Inc. 2011–2018. All rights reserved
Best Combination of P&L and Risk Adjusted Return
52 © Hortonworks Inc. 2011–2018. All rights reserved
Highest P&L
53 © Hortonworks Inc. 2011–2018. All rights reserved
Second Highest P&L:
54 © Hortonworks Inc. 2011–2018. All rights reserved
• Monzo, a British banking startup, built a model quick enough to stop would-be fraudsters from completing a
transaction, bringing the fraud rate on its pre-paid cards down from 0.85% in June 2016 to less than 0.1% by
January 2017
• In June 2016 JPMorgan Chase deployed software that can sift through 12,000 commercial-loan contracts in
seconds, compared with the 360,000 hours it used to take lawyers and loan officers to review the contracts
• The quantitative-investment strategies division at Goldman Sachs uses language processing driven by
machine-learning to go through thousands of analysts’ reports on companies. It compiles an aggregate
“sentiment score” based on the balance of positive to negative words. This score is then used to help pick
stocks.
• Castle Ridge Asset Management, a Toronto-based upstart, has achieved annual average returns of 32% since
its founding in 2013. It uses a sophisticated machine-learning system, like those used to model evolutionary
biology, to make investment decisions. It is so sensitive, claims the firm’s chief executive, Adrian de Valois-
Franklin, that it picked up 24 acquisitions before they were even announced (because of telltale signals
suggesting a small amount of insider trading)
Public Statements
Does it Work?
The Economist, May 25th 2017
55 © Hortonworks Inc. 2011–2018. All rights reserved
Lessons so far
From the battlefield
56 © Hortonworks Inc. 2011–2018. All rights reserved
Lessons
1. Algorithms always need to be bound by
reason
2. Trading Bots must always be bound by
Risk Constrains
3. Complexity is your enemy, if you don’t
understand it, don’t do it
4. Best defense is offense
5. Execution is a big part of performance
6. Continuously adjust your modes and
verify your assumptions
7. Have a PANIC button
8. Predicting Bad, Arbitrage good
9. On a risk adjusted basis taking lots of
small bets better than a large one
10. Controlling risk is the key to long term
positive returns
11. Speed does not beat intelligence
57 © Hortonworks Inc. 2011–2018. All rights reserved
So What About HFT?• “Have no fear for atomic energy, 'Cause none of them can stop the time.” Bob Marley, Redemption Song
58 © Hortonworks Inc. 2011–2018. All rights reserved
All About Speed
• Latencies associated with HFT are now well under one millisecond. Data produced by the SEC show that order
interaction times can be as low as 50 microseconds. That is, once an order is placed on the books of an
exchange, it can either be traded against or canceled, in whole or in part, within 50 millionths of a second
• Data show the typical trade-to-order submission ratios are between 2% and 4% on the major exchanges. That
is, between 25 and 50 orders are generated for every execution. These submission ratios are even lower for
exchange traded products such as ETFs, running well under 1%.
• The lifetime of these orders can be very short as the governing algorithms implementing their designated
strategies by continuously canceling and replacing orders. For example, about 8% of orders are fully canceled
in 500 microseconds, and almost half of orders are canceled in less than a second.
• My Opinion: Competing based in speed is a model of diminishing returns, it takes increasingly more money to
keep speed advantage for ever decreasing rate of return.
• Speed advantage will disappear either because of regulation, or because a faster competitor will arrive
59 © Hortonworks Inc. 2011–2018. All rights reserved
Opportunities
Bountiful
60 © Hortonworks Inc. 2011–2018. All rights reserved
Expanding Opportunities
• ETF Arbitrage (Poor man’s index arbitrage)
• Ratio Trading
• Passive Market Making
• ADR Arbitrage
• Multi-Leg Arbitrage (most large companies listed in multiple exchanges)
61 © Hortonworks Inc. 2011–2018. All rights reserved
62 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You!
Diego Baez
dbaez@hortonworks.com

More Related Content

What's hot

Order Flow Imbalance Trading Algorithm
Order Flow Imbalance Trading AlgorithmOrder Flow Imbalance Trading Algorithm
Order Flow Imbalance Trading Algorithm
Reed Jessen
 

What's hot (20)

Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Classification of quantitative trading strategies webinar ppt
Classification of quantitative trading strategies webinar pptClassification of quantitative trading strategies webinar ppt
Classification of quantitative trading strategies webinar ppt
 
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
 
What we learned from running a quant crypto hedge fund
What we learned from running a quant crypto hedge fundWhat we learned from running a quant crypto hedge fund
What we learned from running a quant crypto hedge fund
 
Algorithmic trading
Algorithmic tradingAlgorithmic trading
Algorithmic trading
 
Support resistance trading strategies
Support resistance trading strategiesSupport resistance trading strategies
Support resistance trading strategies
 
Order Flow Imbalance Trading Algorithm
Order Flow Imbalance Trading AlgorithmOrder Flow Imbalance Trading Algorithm
Order Flow Imbalance Trading Algorithm
 
Algorithmic trading
Algorithmic tradingAlgorithmic trading
Algorithmic trading
 
Algo trading
Algo tradingAlgo trading
Algo trading
 
Algo trading(Minor Project) strategy EMA with Ipython
Algo trading(Minor Project) strategy EMA with IpythonAlgo trading(Minor Project) strategy EMA with Ipython
Algo trading(Minor Project) strategy EMA with Ipython
 
UTILITY OF AI
UTILITY OF AIUTILITY OF AI
UTILITY OF AI
 
How to build a trading system
How to build a trading systemHow to build a trading system
How to build a trading system
 
Quantitative Trading
Quantitative TradingQuantitative Trading
Quantitative Trading
 
Technical Analysis Preview
Technical Analysis PreviewTechnical Analysis Preview
Technical Analysis Preview
 
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C..."Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
 
Algorithmic trading and Machine Learning by Michael Kearns, Professor of Comp...
Algorithmic trading and Machine Learning by Michael Kearns, Professor of Comp...Algorithmic trading and Machine Learning by Michael Kearns, Professor of Comp...
Algorithmic trading and Machine Learning by Michael Kearns, Professor of Comp...
 
What is Erc20 token? How it Works/
What is Erc20 token? How it Works/What is Erc20 token? How it Works/
What is Erc20 token? How it Works/
 
Technical analysis Fundamentals
Technical analysis FundamentalsTechnical analysis Fundamentals
Technical analysis Fundamentals
 
The Ethereum Experience
The Ethereum ExperienceThe Ethereum Experience
The Ethereum Experience
 
Ethereum 2.0
Ethereum 2.0Ethereum 2.0
Ethereum 2.0
 

Similar to Machine Learning trading bots

futureofelectronictradinguk
futureofelectronictradingukfutureofelectronictradinguk
futureofelectronictradinguk
Tristan Gitman
 
Tick by Tick Market Data
Tick by Tick Market DataTick by Tick Market Data
Tick by Tick Market Data
Shrinivas Viswanath
 
Tsl version 1.1_review
Tsl version 1.1_reviewTsl version 1.1_review
Tsl version 1.1_review
Ball Sutta
 
IndexOptionsMarketMaking
IndexOptionsMarketMakingIndexOptionsMarketMaking
IndexOptionsMarketMaking
Markus Kämpe
 

Similar to Machine Learning trading bots (20)

Electronic Trading: A Primer
Electronic Trading: A PrimerElectronic Trading: A Primer
Electronic Trading: A Primer
 
Algorithmic & High-Frequency Trading
Algorithmic & High-Frequency TradingAlgorithmic & High-Frequency Trading
Algorithmic & High-Frequency Trading
 
futureofelectronictradinguk
futureofelectronictradingukfutureofelectronictradinguk
futureofelectronictradinguk
 
Pwc High Frequency Trading Dark Pools
Pwc High Frequency Trading Dark PoolsPwc High Frequency Trading Dark Pools
Pwc High Frequency Trading Dark Pools
 
Real time trade surveillance in financial markets
Real time trade surveillance in financial marketsReal time trade surveillance in financial markets
Real time trade surveillance in financial markets
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
April_2013
April_2013April_2013
April_2013
 
EXTENT-2015: Prognoz Market Surveillance
EXTENT-2015: Prognoz  Market SurveillanceEXTENT-2015: Prognoz  Market Surveillance
EXTENT-2015: Prognoz Market Surveillance
 
High frequency trading
High frequency tradingHigh frequency trading
High frequency trading
 
Impact of Automation System in DSE & CSE
Impact of Automation System in DSE & CSE Impact of Automation System in DSE & CSE
Impact of Automation System in DSE & CSE
 
Op Risk High Frequency Trading June 14 Final
Op Risk   High Frequency Trading   June 14 FinalOp Risk   High Frequency Trading   June 14 Final
Op Risk High Frequency Trading June 14 Final
 
FESE Capital Markets Academy - Equity and Market Data
FESE Capital Markets Academy - Equity and Market DataFESE Capital Markets Academy - Equity and Market Data
FESE Capital Markets Academy - Equity and Market Data
 
Inside Augur
Inside AugurInside Augur
Inside Augur
 
Tick by Tick Market Data
Tick by Tick Market DataTick by Tick Market Data
Tick by Tick Market Data
 
Trade Surveillance with Big Data
Trade Surveillance with Big DataTrade Surveillance with Big Data
Trade Surveillance with Big Data
 
Tsl version 1.1_review
Tsl version 1.1_reviewTsl version 1.1_review
Tsl version 1.1_review
 
IndexOptionsMarketMaking
IndexOptionsMarketMakingIndexOptionsMarketMaking
IndexOptionsMarketMaking
 
Am virtual workshop algo trading
Am virtual workshop   algo tradingAm virtual workshop   algo trading
Am virtual workshop algo trading
 
2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simula...
2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simula...2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simula...
2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simula...
 
The Impact of Algorithmic Trading
The Impact of Algorithmic TradingThe Impact of Algorithmic Trading
The Impact of Algorithmic Trading
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Machine Learning trading bots

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Machine Learning Driven Trading Bots Diego Baez General Manager Financial Services Hortonworks
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • Background • A Brief History • What is a Trading Bot • Machine Learning driven Approach • Implementation • Lessons so far
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved The US Equity Market • USD$40 Trillion traded every year • Over 6 Billion shares Trade every day Genesis of Electronic Trading • The majority of trades interact with the exchange without a human in the middle • Automated Trading Bots account for half of the total volume • 3 Billion shares per day • Algorithms generate large amounts of orders for each execution
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved A Brief History
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Two Forces drive Market Evolution Technology • Enables new business models • Reduces Cost • Becomes a Competitive advantage Regulation • Creates New Opportunities • Ends Business Models
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Electronic Trading: Both Forces Converging Regulation 1998, Reg. ATS: passed by SEC in order to restrict the monopoly enjoyed by NYSE and NASDAQ 2001, U.S. stock exchanges began quoting prices in decimals instead of fractions, bringing down the minimum spread between the bid and ask prices from 1/6th of a dollar (6.25 cents) to one cent 2005, Reg. NMS: Trade-through Rule, promote transparency and competition between markets and requiring trade orders to be posted nationally and not at individual exchanges. No Action Letter from SEC to list ETF’s – Exchange Traded Funds Technology Cost of computation rapidly decreases: Lowers barriers of entry to set up a ECN From 2000–2010, storage costs dropped by 500x: Emergence of ECN’s: alternative electronic trading platforms Narrowing Spreads: Harder to make money Intra-Exchange Arbitrage Profit from any small price difference of a security between two different exchanges Growth of ETF’s Arbitrage Opportunities Multiply exponencially
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved US Markets After Electronic Trading Fragmented • 12 Exchanges • Nearly 50 “Dark Pools” Electronic • Over half of Volume is Algorithm Generated ETF’s are rising • Fast and Dramatic rise of ETF’s
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Trading Bots Require Electronic Markets Need Direct fast access to Market Activity (Market Data) • Orders and Trades are indicators of Supply and Demand • Delayed information is Stale Information • Exchange duopoly was maintained by restricting or delaying Market Data Need Direct Market Access to Buy and Sell directly on the exchange (DMA) • Ability to quickly act based on trigger conditions • Any delay could miss the price • Prevent chasing the price • Human in the loop = information leakage The most significant evolution over the last 30 years in the financial markets is the transition of trading from a manual to an automated process. This has affected everyone: exchanges, market centers, regulators, brokers and investors Automated trading has reshaped our markets and is and intrinsic part of our Markets moving forward.
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Impact on Financial Services Industry • Less Brokers: FINRA reports that it has 3,816 registered securities firms in February 2017, which is down from 5,005 a decade earlier in 2007. • More Venues: Equity trading can occur on any of 12 registered public exchanges, over 30 ATSs • Dark Liquidity: The Tabb Group reports that in Q2-2016, equity market volume was split between 56.9% on lit venues and 43.1% on dark venues • Dark liquidity and dark trading have always existed on U.S. equity markets. In prior years, NYSE floor brokers were a source of dark liquidity, either leaving large customer orders with the specialist (passive participation) or working them over time as a member of the trading crowd (active participation).
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Market Failures • Market breaks are not a new phenomenon. An entire chapter (Chapter XIII) of the original Special Study focused on the market break of May 28, 1962. As these two passages from the study illustrate, many of the issues in 1962 are still relevant today: The avalanche of orders which came into the market during this period subjected the market mechanisms to extraordinary strain, and in many respects they did not function in a normal way. Particularly significant were the lateness of the tape and the consequent inability of investors to predict accurately the prices at which market orders would be executed. (p. 859) […] The history of the May 28 market break reveals that a complex interaction of causes and effects--including rational and emotional motivations as well as a variety of mechanisms and pressures--may suddenly create a downward spiral of great velocity and force. (p. 861)
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved The Trading Bot Automation, Obfuscation, Optimization
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved What is a Trading Bot? • Computer program which interacts directly with the market • Buys or Sells directly without human in the loop with the Market • Can be used for: • Cost Reduction • Optimize Operations • Defensive mechanism • Maximize profits • Utilizes Electronic Market Access to Buy and Sell • Not possible until Regulation and Technology converged in the late 90’s Trading Bot Algorithm Trading HFT High Frequency Trading Automated Trading Markets Direct Market Access
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Intersection of Disciplines Market Knowledge Technology Infrastructure Mathematical Models How each market behaves Buying/Selling Dynamics What strategies work Moving Average Models which signal Divergence Complex Multi-Variant models Vast Data Storage Analytics Platform Back-Testing Platform Interfaces to interact with the market
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved When is it Most Useful • Fast Markets • Market information changes fast • Market execution moves fast • Many variables are changing fast • US Equity Markets generate 40,000,000 Tick Events per day • Many Instruments to trade • US Equity markets have over 10,000 instruments which trade everyday • Each is an opportunity • Add Options, FX, Futures, Indexes and the options multiple exponentially • Many opportunities occur at the same time • Even with a successful strategy, a single trader can only make markets manually in 1-5 stocks at the same time • While opportunities are appearing across many instruments at the same time • Not applying that strategy to ALL opportunities is wasted P&L
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Number of Instruments which trade Number of Trades Volatility Time the Market is Open Rule of thumb Opportunities in a Market
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved The majority of the trading strategies fall into one of these categories: 1. Momentum -> Follow the market 2. Mean Reversion -> Go against the market 3. Statistical Arbitrage-> Take advantage of Market disequilibrium Primary types of Strategies Strategies
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved • Methodology: • Detect when the market has an Up or Down bias, as early in the move as possible • Enter a position when the move begins • Detect the move has completed • Exit the position • Common indicators used to detect Momentum • Stochastics • MACD • ROC • RSI • Momentum Indicator Arrive early, stay until closes Momentum
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved • Methodology: • We are looking for one of three possible situations: 1. The market/instrument is over bought or oversold 2. The market/instrument is above or bellow our theoretical price 3. We detect extreme Optimism or Pessimism in the market/instrument • Common indicators: • Moving Average • Ichimoku cloud • Relative Strength Indicator (RSI) • ConnorsRSI • %b (Bollinger Bands) • Moving average stretch • Rate of change • The number of days down Look for inbalances Mean Reversion
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Mean Reversion Indicator Price is bellow our Indicator it is cheap so we buy Price returns to Theoretical Price Divergence is over, we Sell
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Statistical Arbitrage • Methodology: • Find a mathematical relationship between two or more instruments which holds the majority of the time, based on historical analysis • Fix the relationship between the instruments as a constant by putting one in the numerator and another in the denominator. • If the current value of the constant calculated with the latest prices is greater than the value of the constant, then SELL the numerator AND BUY the denominator. If it is less than the constant value, do the opposite. • Exit both trades when the current value of the returns to the historical value. • Many variations: • Pairs Trading • Index Arbitrage • ETF Arbitrage • Multi-market Arbitrage
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Pairs Trading Constant is 4, >1 Sell A & Buy B Flipped Buy A & Sell B P&L P&L 1.00
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Evolution
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Evolution Algorithms have developed on a Path of: Automation->Obfuscation->Optimization 1. Automation: • Begun as simply automation of repetitive tasks, or ones which required monitoring the market at all times (VWAP) • Worked very well, except for two major flaws: (1) They were very predictable, and (2) Oblivious to Changing Market Conditions 2. Obfuscation: • Market participants took advantage of the predictability of the algorithms to trade against, or to infer information • Solution: randomize order placement, more intelligent order placement, show partial orders, in short Obfuscate your intentions 3. Optimization: • Draw individual Instrument profile based on historical data • Adjust to market conditions • Generate signals to help guide Algorithms • Proactive vs Reactive strategies
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Illustrated Case Study - VWAP • VWAP – Volume weighted Average Price • Measure of fairness of an Execution • Not necessarily the best execution • Nobody argues with a large order executed at the VWAP • First family of Algorithms to be implemented
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Automation Obfuscation Optimization Split Order Qty into 15 minutes interval and send 1/25 of the order each 15 minutes Randomize Order Placement Show Partial Hidden Orders Optimize Volume Curves Optimize Executions Algo’s using Algos Brokers trade against predictable flow and move prices ahead of each 15 minute interval Freed up traders Execute more VWAP orders Lowered costs Executions not very good Almost always worse then VWAP Improved Execution Match VWAP Algorithms Taken for a Ride Beat VWAP Consistently Hard to get liquidity Fast moving markets Low Order/Execution Ratios Proactively smoke out Algos: Poke-Poke-Slap Crazy Joy-Stick Competing Algo’s dry out Liquidity New Level Playing field
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved So What’s wrong?
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved • All Algorithms up to now have the same IF-DO overarching pattern: IF this happens DO this • They are a snapshot of the Authors ideas, biases and shortcomings • The creator HAS to forecast ALL possibilities and program all possible scenarios • If markets have an atypical event, they either shut off or have unpredictable consequences. Markets ALWAYS have unpredictable events, coupled with fast execution potential risk is dramatic • Models have the same issue, no matter how sophisticated the analysis, unless it is coded, it is not going to react • Algorithms and models don’t adjust and evolve on their own, they are static. • Industry littered with examples of algorithms going bad, and its effects compounded by the ability to execute large volumes in very short time • The Markets are not static, why have a static approach to Algorithms? • ENTER The Machine Learning Driven Trading Bot… How one bad algorithm cost traders $440m. The Register Limitations
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Machine Learning Driven Approach
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved • Artificial Intelligence is focused on creating programs which can extend their actions beyond following a strict set of instructions. • Machine learning Provide systems the ability to automatically learn and improve from experience without being explicitly programmed. Focuses on the development of computer programs that can access data and use it learn for themselves • Deep Learning involves feeding a computer system a lot of data, which it can use to make decisions about other data. The core of deep learning is that we now have fast enough computers and enough data to actually train large neural networks. That as we construct larger neural networks and train them with more and more data, their performance continues to increase. This is generally different to other machine learning techniques that reach a plateau in performance. AI-ML-DL Machine Learning Applied Artificial Intelligence Machine Learning Deep Learning
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved The Pillars of Machine Learning Machine learning algorithms have been around since 1950 or earlier The breakthrough is the ability to perform complex mathematical computations in a short amount of time over huge sets of Data (Big Data), over and over, faster and faster Mathematical Models AI Algorithms Parallel Computing BigData Cheap Storage & CPU Run Complex Models In useful time Data is the Food for Models BigData is possible Due to Cheap Storage And Hadoop Fast Parallel Computing Possible due to Fast Cheap Computation Hardware Store Large Amounts Diverse Data Scalable Cost Efficient Fast Run Models in Parallel Use Fast GPU’s Use fast RAM Off-the shelve Basic Models Adapt and train models to my data Run Models in Parallel Simple Parallel Processing Platform & Implementation Machine Learning Renaissance
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved • Closed System • Clearly defined boundaries • Single Reward – P&L • Manageable set of predictors • All data is labeled • Time Scales typically short • Every day a new additional set of training Data The Domain How appropriate is ML for Trading?
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved • Analysis which used to take hours or days can now be done in seconds • Back-testing over a larger length of time with fuller data now possible • More data sources are are available that can be used to build richer more accurate models • almost all the value today of deep learning is through supervised learning or learning from labeled data Why? Machine Learning in Trading
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Implementation
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved The Goal Build an Algorithm which can: 1. Learn from a training set 2. Optimize Risk Adjusted P&L 3. Automatically adjust to changing market conditions 4. All within my acceptable Risk Directive
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Methodology • Define the Domain • Pick the Predictor Variables & normalize • Define constrains (trading times, Max Loss, Min/Max holding time, etc.) • Train Models • Predict using the Models and Rank • Deploy • Continuously Train Models
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Events & Training Tick 10:00 S 100 AAPL @$150 NYSE Update Position 5,000 AAPL $750,000 Update Risk …….. Update Total Volume AAPL 1,251,600 Update Position Price 5,000 AAPL $750,000 Update Sector Move Tech Sector up %0.2 Update Unrealized P&L 5,000 AAPL ($2,500) Calculate Exposure AAPL 5% of Total Exposure Calculate Relative Value AAPL +$0.12 Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Update … …….. Recalculate Avg Price AAPL 5,000 @ $151.12 Recalculate Tendency UP +0.3 Recalculate Dev Sector AAPL (0.01%) Update … …….. • Core Events: • Quotes • Trades • Core Events Frequency • APPL • 30,000 Trades per day • 100,000 Quote Events per Day • Market • 40,000,000 Tick Events per day • External Events: • Fed Announcements • Earnings • Economic Indicators • 3rd Party Indicators • Each events propagates and updates/creates predictors
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Instruments & Predictors • Target Instruments • US Exchange Listed Equities • > $5 Last Closing Price (Stocks < $5 subject to different Trading restrictions) • > 100,000 Average Daily Volume • Approximately 3,007 Equities • Predictors/Features are endless: • Technical Indicators, Price, sectors, volume, indexes, etc • Careful determination needs to be made to pick the best predictors • If it does not make sense, do not include it! • Simple is good • Keep predictors to the least amount necessary for a good result
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Train – Validate - Train • Reward/Target variable is positive P&L • Two Simple Questions need to be answered: 1. If I Buy Stock x when Features (A=x, B=y, C=z,..) and Sell when Features (A=r, B=s, C=t,..), how much money do I make over period P, as long as my negative P&L is never greater than L at any point in time 2. If I Sell Short Stock x when Features (A=x, B=y, C=z,..) and Buy when Features (A=r, B=s, C=t,..), how much money do I make over period P, as long as my negative P&L is never greater than L at any point in time (This can also be deducted by taking the opposite of (1) • Permutations get very large, since each tick is an opportunity to Buy or Sell and we have 40,000,000 Ticks in a day: • Buy at first Tick, Sell at second Tick * each permutation of predictors at Buy time and at Sell Time • Buy at first Tick Sell at third Tick * each permutation of predictors at Buy time and at Sell Time • For Buy at x Tick Sell at y Tick * each permutation of predictors at Buy time and at Sell Time
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Train • Since data is labeled and we want to optimize the output (P&L) given the input (features) -> Supervised Machine Learning with Logistic Regression works well • Optimize for Maximum P&L within the boundaries of Risk. Result • Entry and Exit parameters, Feature settings, Take profit level, Stop loss level, Position size, … • We are looking for Both ends of the Result Rank, those with highest probability of positive P&L and those with the lowest probability (Why?) Raw Tick Data Calculate Derived Indicators Enriched Data Economic & Non-Tick Data Feature Selection Feature SelectionFeature Selection Train Tune Test Models Rank Deploy
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved …A few twists • Need to take into considerations Execution related variables: • Slippage: Can I trade at the price displayed? • Liquidity: Can I get the full size that I want to trade? • Market Impact: Will my actions cause the market to move? • Strategies to deal with Execution: • Train only with completed trades, or • Run Analysis to calculate slippage variable (as accurate as possible, per symbol, size, etc.) • Run Analysis to calculate Liquidity Probability (as accurate as possible) • Limit volume to a maximum % which has a low chance of causing market impact
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved Retraining the Model • Models need to be retrained so that they are dynamic and adjust to changing market conditions. • We want a model which : • Takes into account Recent Trades when making a decision • Is updatable, and “adjust” as data streams through • There are Five basic strategies: 1. Periodic retraining: If monitoring for data in real-time has high overhead, or the model takes too long to retrain 2. Micro-Batching: Retraining with smaller sets 3. Sliding Window: Continuously retrain the model with a smaller set of ordered date including the latest observations 4. Incremental Algorithms: The model is updated each time a new observation arrives. There are incremental versions of Support Vector Machines and Neural networks. Bayesian Networks can be made to learn incrementally. 5. Online learning: Stochastic Gradient Descent, computationally cheap method for adaptive supervised learning in an online environment
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved • Find the optimal mix of values which will yield the desired return within the required risk limitations • Search for the optimal mix of Risk & P&L • Some of the considerations: • Size of each position • Total Open positions • Stop-loss • Stop-Gain • Max Draw-down • Max loosing days vs winning days My appetite for Risk Risk Adjusted Optimization
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved Optimization Example
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved Actual Trading P&L
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved Approach • Ran trade by trade P&L on each day traded and calculated the total P&L at each trade (Realized + unrealized). • Took 90,000 possible combinations of a Stop-Loss/Max-Gain by: • Taking Max-Loss from $1,000 to $300,000 of floor P&L, in increments of $1,000 • Taking Max-Gain from ($300,000) to ($1,000) of floor P&L, in increments of $1,000 • From each pair of Max-Loss/Max-Gain, I calculated at which point we would have exited the day, or if the limits were not hit I took the End of Day P&L. • Finally I calculated the Sharpe Ratio for each set of Max-Loss/Max-Gain pair, as well as number of positive days vs negative days.
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved Extreme Risk Aversion
  • 47. 47 © Hortonworks Inc. 2011–2018. All rights reserved Win a little and Exit All Winning
  • 48. 48 © Hortonworks Inc. 2011–2018. All rights reserved Extremely Low Volatility Tolerance
  • 49. 49 © Hortonworks Inc. 2011–2018. All rights reserved • Safest: • Stop-Loss of $21,000 floor P&L, Stop-Gain($2,000) • Total September P&L of $56,934 at 100% reverse • Highest Sharpe Ratio of 3.88 • 100% winning days. • Best Combination of P&L and Risk Adjusted Return: • Stop-Loss of $28,000 floor P&L, Stop-Gain($60,000) • Total September P&L of $622,822 at 100% reverse • Sharpe Ratio of 0.72 • 13/7 winning/losing days. Goldilocks Zone Results Highest P&L: Stop-Loss of $100,000, Stop-Gain of ($1,000,000) Total September P&L of $1,201,545 at 100% reverse Sharpe Ratio of 0.44 12/8 Winning/Loosing days Second Highest P&L: Stop-Loss of $64,000, Stop-Gain of ($299,000) Total September P&L of $1,121,497 at 100% reverse Sharpe Ratio of 0.44 12/8 Winning/Loosing days. The higher the Sharpe Ratio, the better. Not surprising, Sharpe Ratio was directly correlated to number of winning days vs loosing days
  • 50. 50 © Hortonworks Inc. 2011–2018. All rights reserved Safest Middle Ground All Winning
  • 51. 51 © Hortonworks Inc. 2011–2018. All rights reserved Best Combination of P&L and Risk Adjusted Return
  • 52. 52 © Hortonworks Inc. 2011–2018. All rights reserved Highest P&L
  • 53. 53 © Hortonworks Inc. 2011–2018. All rights reserved Second Highest P&L:
  • 54. 54 © Hortonworks Inc. 2011–2018. All rights reserved • Monzo, a British banking startup, built a model quick enough to stop would-be fraudsters from completing a transaction, bringing the fraud rate on its pre-paid cards down from 0.85% in June 2016 to less than 0.1% by January 2017 • In June 2016 JPMorgan Chase deployed software that can sift through 12,000 commercial-loan contracts in seconds, compared with the 360,000 hours it used to take lawyers and loan officers to review the contracts • The quantitative-investment strategies division at Goldman Sachs uses language processing driven by machine-learning to go through thousands of analysts’ reports on companies. It compiles an aggregate “sentiment score” based on the balance of positive to negative words. This score is then used to help pick stocks. • Castle Ridge Asset Management, a Toronto-based upstart, has achieved annual average returns of 32% since its founding in 2013. It uses a sophisticated machine-learning system, like those used to model evolutionary biology, to make investment decisions. It is so sensitive, claims the firm’s chief executive, Adrian de Valois- Franklin, that it picked up 24 acquisitions before they were even announced (because of telltale signals suggesting a small amount of insider trading) Public Statements Does it Work? The Economist, May 25th 2017
  • 55. 55 © Hortonworks Inc. 2011–2018. All rights reserved Lessons so far From the battlefield
  • 56. 56 © Hortonworks Inc. 2011–2018. All rights reserved Lessons 1. Algorithms always need to be bound by reason 2. Trading Bots must always be bound by Risk Constrains 3. Complexity is your enemy, if you don’t understand it, don’t do it 4. Best defense is offense 5. Execution is a big part of performance 6. Continuously adjust your modes and verify your assumptions 7. Have a PANIC button 8. Predicting Bad, Arbitrage good 9. On a risk adjusted basis taking lots of small bets better than a large one 10. Controlling risk is the key to long term positive returns 11. Speed does not beat intelligence
  • 57. 57 © Hortonworks Inc. 2011–2018. All rights reserved So What About HFT?• “Have no fear for atomic energy, 'Cause none of them can stop the time.” Bob Marley, Redemption Song
  • 58. 58 © Hortonworks Inc. 2011–2018. All rights reserved All About Speed • Latencies associated with HFT are now well under one millisecond. Data produced by the SEC show that order interaction times can be as low as 50 microseconds. That is, once an order is placed on the books of an exchange, it can either be traded against or canceled, in whole or in part, within 50 millionths of a second • Data show the typical trade-to-order submission ratios are between 2% and 4% on the major exchanges. That is, between 25 and 50 orders are generated for every execution. These submission ratios are even lower for exchange traded products such as ETFs, running well under 1%. • The lifetime of these orders can be very short as the governing algorithms implementing their designated strategies by continuously canceling and replacing orders. For example, about 8% of orders are fully canceled in 500 microseconds, and almost half of orders are canceled in less than a second. • My Opinion: Competing based in speed is a model of diminishing returns, it takes increasingly more money to keep speed advantage for ever decreasing rate of return. • Speed advantage will disappear either because of regulation, or because a faster competitor will arrive
  • 59. 59 © Hortonworks Inc. 2011–2018. All rights reserved Opportunities Bountiful
  • 60. 60 © Hortonworks Inc. 2011–2018. All rights reserved Expanding Opportunities • ETF Arbitrage (Poor man’s index arbitrage) • Ratio Trading • Passive Market Making • ADR Arbitrage • Multi-Leg Arbitrage (most large companies listed in multiple exchanges)
  • 61. 61 © Hortonworks Inc. 2011–2018. All rights reserved
  • 62. 62 © Hortonworks Inc. 2011–2018. All rights reserved Thank You! Diego Baez dbaez@hortonworks.com