Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Webinar: How Banks Use MongoDB as a Tick Database
1. How Capital Markets Firms Use
MongoDB as a Tick Database
Antoine Girbal, Technical Account Manager
Email: antoine@10gen.com
Twitter: @antoinegirbal
2. 2
• MongoDB Introduction
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A
Agenda
3. 3
Introduction
10gen is the company behind MongoDB –
the leading next generation database
Document-
Oriented
Open-
Source
General
Purpose
4. 4
10gen Overview
200+ employees 500+ customers
Over $81 million in funding
Offices in New York, Palo Alto, Washington
DC, London, Dublin, Barcelona and Sydney
8. 8
Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
9. 9
Tick Data Capture & Analysis -
Requirements
• Capture real-time market data (multi-asset, top of
book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or
aggregates
• Drive backtesting or automated signals
10. 10
Tick Data Capture & Analysis –
Why MongoDB?
• High throughput => can capture real-time feeds for all
products/asset classes needed
• High scalability => all data and depth for all historical time periods
can be captured
• Flexible & Range-based indexing => fast querying on time ranges
and any fields
• Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
• Map-reduce capability (Native MR or Hadoop Connector) => batch
analysis looking for patterns and opportunities
• Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
• Low TCO => Low software license cost and commodity hardware
22. 22
Architecture for Querying Data
Higher Latency
Trading
Applications
Backtesting
Applications
• Ticks
• Bars
• Other analysis
Research &
Analysis
Applications
23. 23
Index any fields: arrays, nested, etc
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
24. 24
Query for ticks by time; price
threshold
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-01")},
timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-01")}})
25. 25
• Custom application code
– Run your queries, compute your results
• Aggregation framework
– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB
– Javascript functions distributed across cluster
• Hadoop Connector
– Offline batch processing/computation
Analyzing/Aggregating Options
27. 27
…
//then count the number of down bars
{ $project: {
downBar: {$lt: [“$close”, “$open”] },
timestamp: 1,
open: 1, high: 1, low: 1, close: 1}},
{ $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
Add analysis on the bars
28. 28
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
Map-Reduce Example: Sum
29. 29
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB
Process Data on Hadoop
36. 36
Subscriptions
Professional Support, Enterprise Edition and Commercial License
10gen Products and Services
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person, for Developers and Administrators
37. 37
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-
sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• 10gen can support you with anything discussed
Summary
38. 38
Resource Location
MongoDB Downloads www.mongodb.org/download
Free Online Training education.10gen.com
Webinars and Events www.10gen.com/events
White Papers www.10gen.com/white-papers
Customer Case Studies www.10gen.com/customers
Presentations www.10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
For More Information
Resource User Data Management
39. How Capital Markets Firms Use
MongoDB as a Tick Database
Matt Kalan, Sr. Solution Architect
Email: Matt.kalan@10gen.com
Twitter: @matthewkalan
Editor's Notes
Mention tick databases
JSON document – contains key value pairs, different types, values can also be arrays and other documents
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
comments is an array of JSON documentswe can query by fields inside embedded documents as well as array members.
secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
secondary indexes, compound indexes, multikey indexes.why is it important to have all of document together? data locality
Fewer reads, data is together, memory mapped files, caching handled by OS, naturally leaves most frequently accessed data in RAM (have enough RAM to fit indexes and working data set into RAM for best performance), horizontal scaling is "built-in" to the product by design from the start.
Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.