Haefele june27 1150am_room212_v2


Published on

Deep Value has been using Hadoop to do simulations of trading strategies that trade over 3.5% of the US stock market. We provide both high frequency market making and execution strategies. Our largest customer is the NYSE where we provide execution services to the floor broker community. We have taken our high performance, fault tolerant Java trading engine and adapted it to run as a Map-Reduce job. Our execution engine Mapper is then used to pull out the order-by-order data of all orders going into the US stock market and replay these against our production algorithmic logic. We do this to understand if any changes made to the algorithmic logic improve the overall performance of our trading. However this approach, although solving one set of issues (“is this approach better than than that”), creates a new set of challenges. These include not blowing our compute budget (EC2 costs add up so we built our own 50 server base cluster), and deal with the escalating data that these simulations generate. Luckily these are first world problems that Hadoop itself can help us address. We will describe how we went about converting our execution engine to use Hadoop and what components are needed to build a suitable trading simulation environment. We will also examine the types of analysis that we have build on top of the trading data that have helped us us understand what we are doing.

Published in: Business, Economy & Finance
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Haefele june27 1150am_room212_v2

  1. 1. DeepValue Hadoop Summit June 2013 DeepValue, Inc.
  2. 2. Outline of talk l  Who are we l  What do we do l  What is HFT l  What is the structure of our technology effort l  How we use Hadoop l  Focus on what we've built at top level and lessons learned l  Next steps? Open source with founding team
  3. 3. DeepValue l  Started in 2006 to provide high performance execution algorithms on a “paid for performance” basis. l  Execution algorithms take large client orders and split into small pieces to execute through the day l  Routinely trade 0.5 – 1% of US stock market volumes. Highest date in 2012 was ~4% and ~3% this year l  Exchange sponsored execution algorithms to NYSE floor brokers. l  45 people based in US and India
  4. 4. What do we do l  Utilize sophisticated math and statistics to see patterns in the data to come up with trading tactics l  Use simulation to understand if trading ideas in-fact work. l  Core business is providing tools (algos) to mutual funds and others to avoid being gamed by pure HFT-traders l  Ability to harness compute resources is a key determinant of success - Hadoop l  All compute resources are now cluster based and need a grid platform to utilize - Hadoop
  5. 5. What if HFT? l  Look at every order in the market and make real-time decisions on what to do next l  Looking to receive rebates by providing liquidity when sensible to do so –  Citibank was favourite for many years due to low price and thus large % spread l  Some amount of “sniffing out” of large orders l  Often a speed game – faster routers, shorter wires, FPGA l  We use smarts to try and not show our hand
  6. 6. Trading Systems l  Order management systems (OMS) / Execution Management Systems (EMS) l  Takes in market data representing every order placed in every market l  Sends out orders to market, manipulates those orders (replace/cancel) and receives fills –  Via name-value protocol call FIX l  Fills represent actual trades l  Logs what it is doing via structured logging
  7. 7. Cloe
  8. 8. Lessons from building grid l  Cluster wide locks is the problem – Focus on these in design – Batch changes and get lock once l  Build for performance case, and have failure case be potentially slower / more complex – Regular message processing doesn't get cluster locks l  Hybrid of message passing & centralized control
  9. 9. Questions to solve: Hadoop l  What is the algorithm actually doing? – Complexity e.g. feedback loops – Testing against intentions l  Can we do better next time – Back-testing – Improved research process l  Log and historical market data management
  10. 10. DV Research Process l  What to be able to look at “raw” market data to be able to prove ideas – Typically non-programmers with statistical background – R-project including R-Hadoop l  Want to be able to make change to production code, and test if this works better via simulation – Does it work better, how, when? l  Roll out code to production easily
  11. 11. Hadoop-ifying Cloe l  Realized we could run Cloe under Hadoop l  Drive “orders” into Cloe via Hadoop l  Pass in market data quote files via HBase l  Store simulation results in Hadoop/HBase l  Market Simulation Framework outputs fills l  Cascading to allow complex analysis by senior coders
  12. 12. Lessons learned - Hadoop l  EC2 costs can mount quickly –  Had hybrid plan (either own or EC2) –  Built our own 50 node cluster. See DV blog. l  Smaller files should be in Hbase not Hadoop has a NameNode limitation –  All file pointers in memory l  Different tasks with different resource requirements don't play nicely in single cluster –  YARN should solve this.
  13. 13. Lessons learned – Hadoop... l Make developer machine setup turn-key –  We use extensive scripting to make getting dev environment running a one step process –  Dev environment was controlled to close to cluster environment l Cascading is great for complex analysis l Importance of configuration of cluster –  Memory, threads, cores for your jobs
  14. 14. Next steps l  Considering open-sourcing via Apache license l  Bring some sanity to traditional execution technology space l  Looking for a founding team l  Please talk to me afterward if you're interested in investigating further
  15. 15. End