SlideShare a Scribd company logo
1 of 16
Download to read offline
DeepValue	

Hadoop Summit	

June 2013	

	

DeepValue, Inc.
Outline of talk	

l  Who are we	

l  What do we do	

l  What is HFT	

l  What is the structure of our technology effort	

l  How we use Hadoop	

l  Focus on what we've built at top level and lessons learned	

l  Next steps? Open source with founding team
DeepValue	

l  Started in 2006 to provide high performance execution
algorithms on a “paid for performance” basis.	

l  Execution algorithms take large client orders and split into
small pieces to execute through the day	

l  Routinely trade 0.5 – 1% of US stock market volumes.
Highest date in 2012 was ~4% and ~3% this year	

l  Exchange sponsored execution algorithms to NYSE floor
brokers. 	

l  45 people based in US and India
What do we do	

l  Utilize sophisticated math and statistics to see patterns in
the data to come up with trading tactics	

l  Use simulation to understand if trading ideas in-fact work. 	

l  Core business is providing tools (algos) to mutual funds and
others to avoid being gamed by pure HFT-traders	

l  Ability to harness compute resources is a key determinant
of success - Hadoop	

l  All compute resources are now cluster based and need a
grid platform to utilize - Hadoop
What if HFT?	

l  Look at every order in the market and make real-time
decisions on what to do next	

l  Looking to receive rebates by providing liquidity when
sensible to do so	

–  Citibank was favourite for many years due to low price
and thus large % spread	

l  Some amount of “sniffing out” of large orders	

l  Often a speed game – faster routers, shorter wires, FPGA	

l  We use smarts to try and not show our hand
Trading Systems	

l  Order management systems (OMS) / Execution
Management Systems (EMS)	

l  Takes in market data representing every order placed in
every market	

l  Sends out orders to market, manipulates those orders
(replace/cancel) and receives fills	

–  Via name-value protocol call FIX	

l  Fills represent actual trades	

l  Logs what it is doing via structured logging
Cloe
Lessons from building grid	

l  Cluster wide locks is the problem	

– Focus on these in design	

– Batch changes and get lock once	

l  Build for performance case, and have failure case be
potentially slower / more complex	

– Regular message processing doesn't get cluster locks	

l  Hybrid of message passing & centralized control
Questions to solve: Hadoop	

l  What is the algorithm actually doing?	

– Complexity e.g. feedback loops	

– Testing against intentions	

l  Can we do better next time	

– Back-testing	

– Improved research process	

l  Log and historical market data management
DV Research Process	

l  What to be able to look at “raw” market data to be able to
prove ideas	

– Typically non-programmers with statistical background	

– R-project including R-Hadoop	

l  Want to be able to make change to production code, and
test if this works better via simulation	

– Does it work better, how, when?	

l  Roll out code to production easily
Hadoop-ifying Cloe	

l  Realized we could run Cloe under Hadoop	

l  Drive “orders” into Cloe via Hadoop	

l  Pass in market data quote files via HBase	

l  Store simulation results in Hadoop/HBase	

l  Market Simulation Framework outputs fills	

l  Cascading to allow complex analysis by senior coders
Lessons learned - Hadoop	

l  EC2 costs can mount quickly	

–  Had hybrid plan (either own or EC2)	

–  Built our own 50 node cluster. See DV blog.	

l  Smaller files should be in Hbase not Hadoop has a
NameNode limitation	

–  All file pointers in memory	

l  Different tasks with different resource requirements don't
play nicely in single cluster	

–  YARN should solve this.
Lessons learned – Hadoop...	

	

l Make developer machine setup turn-key	

–  We use extensive scripting to make getting dev
environment running a one step process	

–  Dev environment was controlled to close to cluster
environment	

l Cascading is great for complex analysis	

l Importance of configuration of cluster	

–  Memory, threads, cores for your jobs
Next steps	

l  Considering open-sourcing via Apache license	

l  Bring some sanity to traditional execution technology space 	

l  Looking for a founding team	

l  Please talk to me afterward if you're interested in
investigating further
End

More Related Content

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 

Recently uploaded

Recently uploaded (20)

Cuttack Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Cuttack Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableCuttack Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Cuttack Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableNanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Nanded Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Puri CALL GIRL ❤️8084732287❤️ CALL GIRLS IN ESCORT SERVICE WE ARW PROVIDING
Puri CALL GIRL ❤️8084732287❤️ CALL GIRLS IN ESCORT SERVICE WE ARW PROVIDINGPuri CALL GIRL ❤️8084732287❤️ CALL GIRLS IN ESCORT SERVICE WE ARW PROVIDING
Puri CALL GIRL ❤️8084732287❤️ CALL GIRLS IN ESCORT SERVICE WE ARW PROVIDING
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
 

Haefele june27 1150am_room212_v2

  • 2. Outline of talk l  Who are we l  What do we do l  What is HFT l  What is the structure of our technology effort l  How we use Hadoop l  Focus on what we've built at top level and lessons learned l  Next steps? Open source with founding team
  • 3. DeepValue l  Started in 2006 to provide high performance execution algorithms on a “paid for performance” basis. l  Execution algorithms take large client orders and split into small pieces to execute through the day l  Routinely trade 0.5 – 1% of US stock market volumes. Highest date in 2012 was ~4% and ~3% this year l  Exchange sponsored execution algorithms to NYSE floor brokers. l  45 people based in US and India
  • 4. What do we do l  Utilize sophisticated math and statistics to see patterns in the data to come up with trading tactics l  Use simulation to understand if trading ideas in-fact work. l  Core business is providing tools (algos) to mutual funds and others to avoid being gamed by pure HFT-traders l  Ability to harness compute resources is a key determinant of success - Hadoop l  All compute resources are now cluster based and need a grid platform to utilize - Hadoop
  • 5. What if HFT? l  Look at every order in the market and make real-time decisions on what to do next l  Looking to receive rebates by providing liquidity when sensible to do so –  Citibank was favourite for many years due to low price and thus large % spread l  Some amount of “sniffing out” of large orders l  Often a speed game – faster routers, shorter wires, FPGA l  We use smarts to try and not show our hand
  • 6. Trading Systems l  Order management systems (OMS) / Execution Management Systems (EMS) l  Takes in market data representing every order placed in every market l  Sends out orders to market, manipulates those orders (replace/cancel) and receives fills –  Via name-value protocol call FIX l  Fills represent actual trades l  Logs what it is doing via structured logging
  • 8. Lessons from building grid l  Cluster wide locks is the problem – Focus on these in design – Batch changes and get lock once l  Build for performance case, and have failure case be potentially slower / more complex – Regular message processing doesn't get cluster locks l  Hybrid of message passing & centralized control
  • 9. Questions to solve: Hadoop l  What is the algorithm actually doing? – Complexity e.g. feedback loops – Testing against intentions l  Can we do better next time – Back-testing – Improved research process l  Log and historical market data management
  • 10. DV Research Process l  What to be able to look at “raw” market data to be able to prove ideas – Typically non-programmers with statistical background – R-project including R-Hadoop l  Want to be able to make change to production code, and test if this works better via simulation – Does it work better, how, when? l  Roll out code to production easily
  • 11. Hadoop-ifying Cloe l  Realized we could run Cloe under Hadoop l  Drive “orders” into Cloe via Hadoop l  Pass in market data quote files via HBase l  Store simulation results in Hadoop/HBase l  Market Simulation Framework outputs fills l  Cascading to allow complex analysis by senior coders
  • 12.
  • 13. Lessons learned - Hadoop l  EC2 costs can mount quickly –  Had hybrid plan (either own or EC2) –  Built our own 50 node cluster. See DV blog. l  Smaller files should be in Hbase not Hadoop has a NameNode limitation –  All file pointers in memory l  Different tasks with different resource requirements don't play nicely in single cluster –  YARN should solve this.
  • 14. Lessons learned – Hadoop... l Make developer machine setup turn-key –  We use extensive scripting to make getting dev environment running a one step process –  Dev environment was controlled to close to cluster environment l Cascading is great for complex analysis l Importance of configuration of cluster –  Memory, threads, cores for your jobs
  • 15. Next steps l  Considering open-sourcing via Apache license l  Bring some sanity to traditional execution technology space l  Looking for a founding team l  Please talk to me afterward if you're interested in investigating further
  • 16. End