Using MongoDB As a Tick Database

Sr. Solution Architect, MongoDB
Matt Kalan
How Capital Markets Firms
Use MongoDB as a Tick
Database

Agenda
• MongoDB One Slide Overview
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A

MongoDB Technical Benefits
Horizontally Scalable
-Sharding
Agile &
Flexible
High
Performance
-Indexes
-RAM
Application
Highly
Available
-Replica Sets
{ name: “John Smith”,
date: “2013-08-01”),
address: “10 3rd St.”,
phone: [
{ home: 1234567890},
{ mobile: 1234568138} ]
}
db.cust.insert({…})
db.cust.find({
name:”John Smith”})

Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. RiskAnalysis & Reporting
4. Trade Repository
5. Portfolio Reporting

Writing and Capturing Tick
Data

Tick Data Capture & Analysis
Requirements
• Capture real-time market data (multi-asset, top of
book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or
aggregates
• Drive backtesting or automated signals

Tick Data Capture & Analysis –
Why MongoDB?
• High throughput => can capturereal-timefeeds for all products/assetclasses
needed
• High scalability=> all data and depth for all historical time periods can be
captured
• Flexible & Range-basedindexing => fast querying on time rangesand any
fields
• Aggregation Framework => can shape raw data into aggregates (e.g. ticks to
bars)
• Map-reduce capability(Native MR or Hadoop Connector) => batch analysis
looking for patternsand opportunities
• Easy to use => native language drivers and JSON expressionsthat you can

Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders

Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Mark
ets/Brokers
Capturing
Application
Low Latency
Applications
Higher Latency
Trading
Applications
Backtesting and
Analysis
Applications
Market Data
Cached Static &
Aggregated Data
News & social
networking
sources
Orders
Orders
Data Types
• Top of book
• Depth of book
• Multi-asset
• Derivatives (e.g. strips)
• News (text, video)
• Social Networking

{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Top of Book [e.g. equities]

{
symbol : "DIS",
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Depth of Book

{
symbol : "DIS",
bids: [
{price: 55.37, amount: 500},
{price: 55.37, amount: 1000},
{price: 55.37, amount: 2000} ],
offers: [
{price: 55.58, amount: 1000},
{price: 55.58, amount: 2000},
{price: 55.59, amount: 3000} ]
}
> db.ticks.find( {"bids.price": {$gt: 55.36} } )
Or However Your App Uses It

{
symbol : "DIS",
spreadPrice: 0.58
leg1: {symbol: “CLM13, price: 97.34}
leg2: {symbol: “CLK13, price: 96.92}
}
db.ticks.find( { “leg1” : “CLM13” },
{ “leg2” : “CLK13” },
{ “spreadPrice” : {$gt: 0.50 } } )
Synthetic Spreads

{
symbol : "DIS",
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
News

{
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Social Networking

{
symbol : "DIS”,
openTS: Date("2013-02-15 10:00"),
closeTS: Date("2013-02-15 10:05"),
open: 55.36,
high: 55.80,
low: 55.20,
close: 55.70
}
Aggregates (bars, daily, etc)

Architecture for Querying Data
Higher Latency
Trading
Applications
Backtesting
Applications
• Ticks
• Bars
• Other analysis
Research &
Analysis
Applications

// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Index Any Fields: Arrays, Nested,
etc.

// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-01")},
timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-01")}})
Query for ticks by time; price
threshold

Analyzing/Aggregating Options
• Custom application code
– Run your queries, compute your results
• Aggregation framework
– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB
– Javascript functions distributed across cluster
• Hadoop Connector
– Offline batch processing/computation

//Aggregate minute bars for Disney for February
db.ticks.aggregate(
{ $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}},
{ $project: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$hour: "$timestamp"},
minute: {$minute: "$timestamp"},
second: {$second: "$timestamp"},
timestamp: 1,
price: 1}},
{ $sort: { timestamp: 1}},
{ $group :
{ _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute:
"$minute"},
open: {$first: "$price"},
high: {$max: "$price"},
low: {$min: "$price"},
close: {$last: "$price"} }} )
Aggregate into min bars

…
//then count the number of down bars
{ $project: {
downBar: {$lt: [“$close”, “$open”] },
timestamp: 1,
open: 1, high: 1, low: 1, close: 1}},
{ $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
Add Analysis on the Bars

var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
MapReduce Example: Sum

Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB

Performance, Scalability, and High
Availability

Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scaling

Auto-sharding for Horizontal Scale
mongod
Read/Write Scalability
Key Range
Symbol: A…Z

Auto-sharding for Horizontal Scale
mongod mongod
Key Range
Symbol: A…J
Key Range
Symbol: K…Z

Sharding
mongod mongod
mongod mongod
Key Range
Symbol: A…F
Key Range
Symbol: G…J
Key Range
Symbol: K…O
Key Range
Symbol: P…Z

Primary
Secondar
y
Secondar
y
Primary
Secondar
y
Secondar
y
Primary
Secondar
y
Secondar
y
Primary
Secondar
y
Secondar
y
MongoS MongoS MongoS
Key Range
Symbol: A…F,
Time
Key Range
Symbol: G…J,
Time
Key Range
Symbol: K…O,
Time
Key Range
Symbol: P…Z,
Time
Application

Summary
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• We can support you with anything discussed

Sr. Solution Architect, MongoDB
Matt Kalan
#ConferenceHashtag
Thank You

Using MongoDB As a Tick Database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using MongoDB As a Tick Database

Similar to Using MongoDB As a Tick Database (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Using MongoDB As a Tick Database