SlideShare a Scribd company logo
Building a Quant Research
Pipeline from Scratch
SOME LESSONS LEARNED IN BATTLE
GEORGI KIROV, CASI-SOFIA UNIVERSITY
GEORGI@KIROV.EU
Goals of this talk
* Depends on ICE, Ezpada requirements
Systematic Trading Research
Something
happens
somewhere in
the world
Trading
Decision
Profit
Systematic Trading Research
ML & Data
Science?
Profit?
We are not interested in the big spike
Rather, in everything else
1. Infrastructure Matters
Live Data Feed
Data Handler
ICE Europe l1
Signal 1, Brent
Front Month
Signal 2, Brent
Front Month
Optimizer, Brent
Front Month
Executor
Data Handler
ICE US l1
Signal 1, WTI Optimizer, WTI Executor
Data Handler
ICE US l2
Signal 2, WTI
Backtest according to your infrastructure
Backfill
Data Format
ICE Europe l1
Signal 1, Brent
Front Month
Signal 2, Brent
Front Month
Optimizer, Brent
Front Month
Backtester
Data Format
ICE US l1
Signal 1, WTI Optimizer, WTI Backtester
Data Format
ICE US l2
Signal 2, WTI
MODELING CAN BE USEFUL HERE
CORE DEVELOPMENT
Low-level programming
System architecture
Model Integration
Model Recalibration
Data Storage
Startup, Drop copy, Order
Gateways
Data cleaning
Research environment
Strategy research
ML & Data Science
Optimization
Risk Management
Reconciliation
Quant Funds: an overview
• Back Office:
• Monitoring
• Management
• Monitoring
• Networking/Hardwar
e
• Legal/Compliance
• Brokers
• Data Service
Providers
STRATEGY RESEARCH OPERATIONS
CORE DEVELOPMENT
Low-level programming
System architecture
Model Integration
Model Recalibration
Data Storage
Startup, Drop copy, Order
Gateways
Data cleaning
Research environment
Strategy research
ML & Data Science
Optimization
Risk Management
Reconciliation
The field has leveled a bit
• Networking/Hardwar
e
• Optiver/Cloud Infra
• Brokers
• Interactive Brokers
• Lime
• Data Service
Providers
• Quandl, AlgoSeek
• Vela, CQG, TickData
• IDS, TREP
STRATEGY RESEARCH OPERATIONS
2. Good Baselines > Fancy Models
Consider a simple equity model
regressed on market, sector, industry:
𝜀𝑡
𝑀𝑆𝐼
= 𝛽𝑡
𝑀𝑆𝐼
𝑥𝑡 − 𝑎 − 𝑦𝑡
Simple quadratic risk function:
𝑈(𝜔) = 𝜔 − 𝑏𝜔2
• Investigate liquidity-making vs liquidity-taking
versions
• When does 𝜀𝑡
𝑀𝑆𝐼
mean-revert best?
• What are the best performance measures ?
• PnL
• Sharpe
• Maximum Drawdown
• Risk Profile
SPEND A FEW MONTHS TO DOUBLE THE
PERFORMANCE OF THIS TOY MODEL:
GOOGLE’S ML RULES ARE
CURIOUSLY APPLICABLE HERE
High Frequency Trading
OPTIMIZED FOR EXECUTION
• Model less than 1 tick
• Many sophisticated competitors: huge barriers of entry
OPTIMIZED FOR STATISTICS
• Statistical models predict markets
direction under a single tick
• Go through the spread on the other
side to do execution
• Past and present market data drive
trade decisions
• Really optimized code, firms own
radio tower links between exchanges
Statistical Arbitrage
• Stat Arb does not depend tightly on market microstructure:
• Slower ‘time to payoff’
• Make time work for you:
• Invest in assets with positive carry (‘sail with the wind’)
• Hedge partially by holding related assets.
• Minimize risks in order to survive intermediate shocks.
3. Backtesting should be forward-looking
• Only actual forward performance matters
• Realistic Backtesting is a sequential problem:
• 𝑝𝑜𝑠𝑡, the desired position at time 𝑡 depends on
transaction costs, b/a spread and 𝑝𝑜𝑠𝑡−1
• Even with perfect execution, summing up requires a
massive convolution : unstable, difficult to efficiently
compute
• Slippage…
Source: SMBC Comics
Backtesting
should stop
early
At the very least, use a
fixed cross-validation
cycle:
Monthly, weekly, daily
retraining
Training Performance
ValidationPerformance
Wrapping up
1. INFRASTRUCTURE MATTERS
2. GOOD BASELINES > FANCY MODELS
3. FORWARD-LOOKING BACKTESTING
Source: XKCD

More Related Content

What's hot

A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
Quantopian
 
quantmachine
quantmachinequantmachine
quantmachine
Hovhannes Grigoryan
 
Algorithmic Trading: an Overview
Algorithmic Trading: an Overview Algorithmic Trading: an Overview
Algorithmic Trading: an Overview
EXANTE
 
Exante algotrading
Exante algotradingExante algotrading
Exante algotrading
EXANTE
 
AmiBroker Xtreme Course, ABXC
AmiBroker Xtreme Course, ABXCAmiBroker Xtreme Course, ABXC
AmiBroker Xtreme Course, ABXC
ThaiQuants
 

What's hot (6)

Capm models
Capm models Capm models
Capm models
 
A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
A Guided Tour of Machine Learning for Traders by Tucker Balch at QuantCon 2016
 
quantmachine
quantmachinequantmachine
quantmachine
 
Algorithmic Trading: an Overview
Algorithmic Trading: an Overview Algorithmic Trading: an Overview
Algorithmic Trading: an Overview
 
Exante algotrading
Exante algotradingExante algotrading
Exante algotrading
 
AmiBroker Xtreme Course, ABXC
AmiBroker Xtreme Course, ABXCAmiBroker Xtreme Course, ABXC
AmiBroker Xtreme Course, ABXC
 

Similar to [Data Meetup] Data Science in Finance - Building a Quant ML pipeline

SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
Laura Chiticariu
 
Big data in Private Banking
Big data in Private BankingBig data in Private Banking
Big data in Private Banking
Jérôme Kehrli
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
Andre Langevin
 
Wall Street Derivative Risk Solutions Using Geode
Wall Street Derivative Risk Solutions Using GeodeWall Street Derivative Risk Solutions Using Geode
Wall Street Derivative Risk Solutions Using Geode
VMware Tanzu
 
Smart Data Webinar: A semantic solution for financial regulatory compliance
Smart Data Webinar: A semantic solution for financial regulatory complianceSmart Data Webinar: A semantic solution for financial regulatory compliance
Smart Data Webinar: A semantic solution for financial regulatory compliance
DATAVERSITY
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
Norberto Leite
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Bitcoin Price Predictions and Machine Learning: Some New Ideas and Results
Bitcoin Price Predictions and Machine Learning: Some New Ideas and ResultsBitcoin Price Predictions and Machine Learning: Some New Ideas and Results
Bitcoin Price Predictions and Machine Learning: Some New Ideas and Results
intotheblock
 
Building Wall St Risk Systems with Apache Geode
Building Wall St Risk Systems with Apache GeodeBuilding Wall St Risk Systems with Apache Geode
Building Wall St Risk Systems with Apache Geode
Andre Langevin
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PivotalOpenSourceHub
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoDB
 
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory BusinessSamsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
Vasilis Ananiadis
 
A Modern Data Architecture for Risk Management... For Financial Services
A Modern Data Architecture for Risk Management... For Financial ServicesA Modern Data Architecture for Risk Management... For Financial Services
A Modern Data Architecture for Risk Management... For Financial Services
Mammoth Data
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
Dell World
 
Strangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven ApproachStrangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven Approach
VMware Tanzu
 
Lec 02
Lec 02Lec 02
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Precisely
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Sameer Wadkar
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
Insight Technology, Inc.
 

Similar to [Data Meetup] Data Science in Finance - Building a Quant ML pipeline (20)

SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Big data in Private Banking
Big data in Private BankingBig data in Private Banking
Big data in Private Banking
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
 
Wall Street Derivative Risk Solutions Using Geode
Wall Street Derivative Risk Solutions Using GeodeWall Street Derivative Risk Solutions Using Geode
Wall Street Derivative Risk Solutions Using Geode
 
Smart Data Webinar: A semantic solution for financial regulatory compliance
Smart Data Webinar: A semantic solution for financial regulatory complianceSmart Data Webinar: A semantic solution for financial regulatory compliance
Smart Data Webinar: A semantic solution for financial regulatory compliance
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Bitcoin Price Predictions and Machine Learning: Some New Ideas and Results
Bitcoin Price Predictions and Machine Learning: Some New Ideas and ResultsBitcoin Price Predictions and Machine Learning: Some New Ideas and Results
Bitcoin Price Predictions and Machine Learning: Some New Ideas and Results
 
Building Wall St Risk Systems with Apache Geode
Building Wall St Risk Systems with Apache GeodeBuilding Wall St Risk Systems with Apache Geode
Building Wall St Risk Systems with Apache Geode
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
 
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory BusinessSamsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
Samsung Analyst Day 2013: Memory Dong-Soo Jun Memory Business
 
A Modern Data Architecture for Risk Management... For Financial Services
A Modern Data Architecture for Risk Management... For Financial ServicesA Modern Data Architecture for Risk Management... For Financial Services
A Modern Data Architecture for Risk Management... For Financial Services
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Strangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven ApproachStrangle The Monolith: A Data Driven Approach
Strangle The Monolith: A Data Driven Approach
 
Lec 02
Lec 02Lec 02
Lec 02
 
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 

More from Data Science Society

[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
Data Science Society
 
Computer Vision in Real Estate
Computer Vision in Real EstateComputer Vision in Real Estate
Computer Vision in Real Estate
Data Science Society
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to Production
Data Science Society
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Data Science Society
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment
Data Science Society
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution
Data Science Society
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large Corporations
Data Science Society
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Data Science Society
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
Data Science Society
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
Data Science Society
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Data Science Society
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
Data Science Society
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
Data Science Society
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
Data Science Society
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
Data Science Society
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Data Science Society
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Data Science Society
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
Data Science Society
 
Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017
Data Science Society
 
Network Analysis Public Procurement
Network Analysis Public ProcurementNetwork Analysis Public Procurement
Network Analysis Public Procurement
Data Science Society
 

More from Data Science Society (20)

[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
 
Computer Vision in Real Estate
Computer Vision in Real EstateComputer Vision in Real Estate
Computer Vision in Real Estate
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to Production
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use Cases
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large Corporations
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
 
Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017Open Data reveals corruption practices - case from Datathon 2017
Open Data reveals corruption practices - case from Datathon 2017
 
Network Analysis Public Procurement
Network Analysis Public ProcurementNetwork Analysis Public Procurement
Network Analysis Public Procurement
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 

[Data Meetup] Data Science in Finance - Building a Quant ML pipeline

  • 1. Building a Quant Research Pipeline from Scratch SOME LESSONS LEARNED IN BATTLE GEORGI KIROV, CASI-SOFIA UNIVERSITY GEORGI@KIROV.EU
  • 2. Goals of this talk * Depends on ICE, Ezpada requirements
  • 3. Systematic Trading Research Something happens somewhere in the world Trading Decision Profit
  • 4. Systematic Trading Research ML & Data Science? Profit?
  • 5. We are not interested in the big spike Rather, in everything else
  • 6. 1. Infrastructure Matters Live Data Feed Data Handler ICE Europe l1 Signal 1, Brent Front Month Signal 2, Brent Front Month Optimizer, Brent Front Month Executor Data Handler ICE US l1 Signal 1, WTI Optimizer, WTI Executor Data Handler ICE US l2 Signal 2, WTI
  • 7. Backtest according to your infrastructure Backfill Data Format ICE Europe l1 Signal 1, Brent Front Month Signal 2, Brent Front Month Optimizer, Brent Front Month Backtester Data Format ICE US l1 Signal 1, WTI Optimizer, WTI Backtester Data Format ICE US l2 Signal 2, WTI MODELING CAN BE USEFUL HERE
  • 8. CORE DEVELOPMENT Low-level programming System architecture Model Integration Model Recalibration Data Storage Startup, Drop copy, Order Gateways Data cleaning Research environment Strategy research ML & Data Science Optimization Risk Management Reconciliation Quant Funds: an overview • Back Office: • Monitoring • Management • Monitoring • Networking/Hardwar e • Legal/Compliance • Brokers • Data Service Providers STRATEGY RESEARCH OPERATIONS
  • 9. CORE DEVELOPMENT Low-level programming System architecture Model Integration Model Recalibration Data Storage Startup, Drop copy, Order Gateways Data cleaning Research environment Strategy research ML & Data Science Optimization Risk Management Reconciliation The field has leveled a bit • Networking/Hardwar e • Optiver/Cloud Infra • Brokers • Interactive Brokers • Lime • Data Service Providers • Quandl, AlgoSeek • Vela, CQG, TickData • IDS, TREP STRATEGY RESEARCH OPERATIONS
  • 10. 2. Good Baselines > Fancy Models Consider a simple equity model regressed on market, sector, industry: 𝜀𝑡 𝑀𝑆𝐼 = 𝛽𝑡 𝑀𝑆𝐼 𝑥𝑡 − 𝑎 − 𝑦𝑡 Simple quadratic risk function: 𝑈(𝜔) = 𝜔 − 𝑏𝜔2 • Investigate liquidity-making vs liquidity-taking versions • When does 𝜀𝑡 𝑀𝑆𝐼 mean-revert best? • What are the best performance measures ? • PnL • Sharpe • Maximum Drawdown • Risk Profile SPEND A FEW MONTHS TO DOUBLE THE PERFORMANCE OF THIS TOY MODEL: GOOGLE’S ML RULES ARE CURIOUSLY APPLICABLE HERE
  • 11. High Frequency Trading OPTIMIZED FOR EXECUTION • Model less than 1 tick • Many sophisticated competitors: huge barriers of entry OPTIMIZED FOR STATISTICS • Statistical models predict markets direction under a single tick • Go through the spread on the other side to do execution • Past and present market data drive trade decisions • Really optimized code, firms own radio tower links between exchanges
  • 12. Statistical Arbitrage • Stat Arb does not depend tightly on market microstructure: • Slower ‘time to payoff’ • Make time work for you: • Invest in assets with positive carry (‘sail with the wind’) • Hedge partially by holding related assets. • Minimize risks in order to survive intermediate shocks.
  • 13. 3. Backtesting should be forward-looking • Only actual forward performance matters • Realistic Backtesting is a sequential problem: • 𝑝𝑜𝑠𝑡, the desired position at time 𝑡 depends on transaction costs, b/a spread and 𝑝𝑜𝑠𝑡−1 • Even with perfect execution, summing up requires a massive convolution : unstable, difficult to efficiently compute • Slippage… Source: SMBC Comics
  • 14. Backtesting should stop early At the very least, use a fixed cross-validation cycle: Monthly, weekly, daily retraining Training Performance ValidationPerformance
  • 15. Wrapping up 1. INFRASTRUCTURE MATTERS 2. GOOD BASELINES > FANCY MODELS 3. FORWARD-LOOKING BACKTESTING Source: XKCD