Haefele june27 1150am_room212_v2

DeepValue

Hadoop Summit

June 2013

DeepValue, Inc.

Outline of talk

l  Who are we

l  What do we do

l  What is HFT

l  What is the structure of our technology effort

l  How we use Hadoop

l  Focus on what we've built at top level and lessons learned

l  Next steps? Open source with founding team

DeepValue

l  Started in 2006 to provide high performance execution
algorithms on a “paid for performance” basis.

l  Execution algorithms take large client orders and split into
small pieces to execute through the day

l  Routinely trade 0.5 – 1% of US stock market volumes.
Highest date in 2012 was ~4% and ~3% this year

l  Exchange sponsored execution algorithms to NYSE ﬂoor
brokers.

l  45 people based in US and India

What do we do

l  Utilize sophisticated math and statistics to see patterns in
the data to come up with trading tactics

l  Use simulation to understand if trading ideas in-fact work.

l  Core business is providing tools (algos) to mutual funds and
others to avoid being gamed by pure HFT-traders

l  Ability to harness compute resources is a key determinant
of success - Hadoop

l  All compute resources are now cluster based and need a
grid platform to utilize - Hadoop

What if HFT?

l  Look at every order in the market and make real-time
decisions on what to do next

l  Looking to receive rebates by providing liquidity when
sensible to do so

–  Citibank was favourite for many years due to low price
and thus large % spread

l  Some amount of “snifﬁng out” of large orders

l  Often a speed game – faster routers, shorter wires, FPGA

l  We use smarts to try and not show our hand

Trading Systems

l  Order management systems (OMS) / Execution
Management Systems (EMS)

l  Takes in market data representing every order placed in
every market

l  Sends out orders to market, manipulates those orders
(replace/cancel) and receives ﬁlls

–  Via name-value protocol call FIX

l  Fills represent actual trades

l  Logs what it is doing via structured logging

Lessons from building grid

l  Cluster wide locks is the problem

– Focus on these in design

– Batch changes and get lock once

l  Build for performance case, and have failure case be
potentially slower / more complex

– Regular message processing doesn't get cluster locks

l  Hybrid of message passing & centralized control

Questions to solve: Hadoop

l  What is the algorithm actually doing?

– Complexity e.g. feedback loops

– Testing against intentions

l  Can we do better next time

– Back-testing

– Improved research process

l  Log and historical market data management

DV Research Process

l  What to be able to look at “raw” market data to be able to
prove ideas

– Typically non-programmers with statistical background

– R-project including R-Hadoop

l  Want to be able to make change to production code, and
test if this works better via simulation

– Does it work better, how, when?

l  Roll out code to production easily

Hadoop-ifying Cloe

l  Realized we could run Cloe under Hadoop

l  Drive “orders” into Cloe via Hadoop

l  Pass in market data quote ﬁles via HBase

l  Store simulation results in Hadoop/HBase

l  Market Simulation Framework outputs ﬁlls

l  Cascading to allow complex analysis by senior coders

Lessons learned - Hadoop

l  EC2 costs can mount quickly

–  Had hybrid plan (either own or EC2)

–  Built our own 50 node cluster. See DV blog.

l  Smaller ﬁles should be in Hbase not Hadoop has a
NameNode limitation

–  All ﬁle pointers in memory

l  Different tasks with different resource requirements don't
play nicely in single cluster

–  YARN should solve this.

Lessons learned – Hadoop...

l Make developer machine setup turn-key

–  We use extensive scripting to make getting dev
environment running a one step process

–  Dev environment was controlled to close to cluster
environment

l Cascading is great for complex analysis

l Importance of conﬁguration of cluster

–  Memory, threads, cores for your jobs

Next steps

l  Considering open-sourcing via Apache license

l  Bring some sanity to traditional execution technology space

l  Looking for a founding team

l  Please talk to me afterward if you're interested in
investigating further

Haefele june27 1150am_room212_v2

Recommended

Recommended

More Related Content

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Haefele june27 1150am_room212_v2