Using MongoDB As a Tick Database

11,803 views

Published on

Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.

0 Comments
22 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
11,803
On SlideShare
0
From Embeds
0
Number of Embeds
189
Actions
Shares
0
Downloads
217
Comments
0
Likes
22
Embeds 0
No embeds

No notes for slide

Using MongoDB As a Tick Database

  1. Sr. Solution Architect, MongoDB Matt Kalan How Capital Markets Firms Use MongoDB as a Tick Database
  2. Agenda • MongoDB One Slide Overview • FS Use Cases • Writing/Capturing Market Data • Reading/Analyzing Market Data • Performance, Scalability, & High Availability • Q&A
  3. MongoDB Technical Benefits Horizontally Scalable -Sharding Agile & Flexible High Performance -Indexes -RAM Application Highly Available -Replica Sets { name: “John Smith”, date: “2013-08-01”), address: “10 3rd St.”, phone: [ { home: 1234567890}, { mobile: 1234568138} ] } db.cust.insert({…}) db.cust.find({ name:”John Smith”})
  4. Most Common FS Use Cases 1. Tick Data Capture & Analysis 2. Reference Data Management 3. RiskAnalysis & Reporting 4. Trade Repository 5. Portfolio Reporting
  5. Writing and Capturing Tick Data
  6. Tick Data Capture & Analysis Requirements • Capture real-time market data (multi-asset, top of book, depth of book, even news) • Load historical data • Aggregate data into bars, daily, monthly intervals • Enable queries & analysis on raw ticks or aggregates • Drive backtesting or automated signals
  7. Tick Data Capture & Analysis – Why MongoDB? • High throughput => can capturereal-timefeeds for all products/assetclasses needed • High scalability=> all data and depth for all historical time periods can be captured • Flexible & Range-basedindexing => fast querying on time rangesand any fields • Aggregation Framework => can shape raw data into aggregates (e.g. ticks to bars) • Map-reduce capability(Native MR or Hadoop Connector) => batch analysis looking for patternsand opportunities • Easy to use => native language drivers and JSON expressionsthat you can
  8. Trades/metrics High Level Trading Architecture Feed Handler Exchanges/Mark ets/Brokers Capturing Application Low Latency Applications Higher Latency Trading Applications Backtesting and Analysis Applications Market Data Cached Static & Aggregated Data News & social networking sources Orders Orders
  9. Trades/metrics High Level Trading Architecture Feed Handler Exchanges/Mark ets/Brokers Capturing Application Low Latency Applications Higher Latency Trading Applications Backtesting and Analysis Applications Market Data Cached Static & Aggregated Data News & social networking sources Orders Orders Data Types • Top of book • Depth of book • Multi-asset • Derivatives (e.g. strips) • News (text, video) • Social Networking
  10. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrice: 55.37, offerPrice: 55.58, bidQuantity: 500, offerQuantity: 700 } > db.ticks.find( {symbol: "DIS", bidPrice: {$gt: 55.36} } ) Top of Book [e.g. equities]
  11. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], offerPrices: [55.58, 55.59, 55.60], bidQuantities: [500, 1000, 2000], offerQuantities: [1000, 2000, 3000] } > db.ticks.find( {bidPrices: {$gt: 55.36} } ) Depth of Book
  12. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bids: [ {price: 55.37, amount: 500}, {price: 55.37, amount: 1000}, {price: 55.37, amount: 2000} ], offers: [ {price: 55.58, amount: 1000}, {price: 55.58, amount: 2000}, {price: 55.59, amount: 3000} ] } > db.ticks.find( {"bids.price": {$gt: 55.36} } ) Or However Your App Uses It
  13. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), spreadPrice: 0.58 leg1: {symbol: “CLM13, price: 97.34} leg2: {symbol: “CLK13, price: 96.92} } db.ticks.find( { “leg1” : “CLM13” }, { “leg2” : “CLK13” }, { “spreadPrice” : {$gt: 0.50 } } ) Synthetic Spreads
  14. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), title: “Disney Earnings…” body: “Walt Disney Company reported…”, tags: [“earnings”, “media”, “walt disney”] } News
  15. { _id : ObjectId("4e2e3f92268cdda473b628f6"), timestamp: ISODate("2013-02-15 10:00"), twitterHandle: “jdoe”, tweet: “Heard @DisneyPictures is releasing…”, usernamesIncluded: [“DisneyPictures”], hashTags: [“movierumors”, “disney”] } Social Networking
  16. { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS”, openTS: Date("2013-02-15 10:00"), closeTS: Date("2013-02-15 10:05"), open: 55.36, high: 55.80, low: 55.20, close: 55.70 } Aggregates (bars, daily, etc)
  17. Querying/Analyzing Tick Data
  18. Architecture for Querying Data Higher Latency Trading Applications Backtesting Applications • Ticks • Bars • Other analysis Research & Analysis Applications
  19. // Compound indexes > db.ticks.ensureIndex({symbol: 1, timestamp:1}) // Index on arrays >db.ticks.ensureIndex( {bidPrices: -1}) // Index on any depth > db.ticks.ensureIndex( {“bids.price”: 1} ) // Full text search > db.ticks.ensureIndex ( {tweet: “text”} ) Index Any Fields: Arrays, Nested, etc.
  20. // Ticks for last month for media companies > db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}}) // Ticks when Disney’s bid breached 55.50 this month > db.ticks.find({ symbol: "DIS", bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}}) Query for ticks by time; price threshold
  21. Analyzing/Aggregating Options • Custom application code – Run your queries, compute your results • Aggregation framework – Declarative, pipeline-based approach • Native Map/Reduce in MongoDB – Javascript functions distributed across cluster • Hadoop Connector – Offline batch processing/computation
  22. //Aggregate minute bars for Disney for February db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} ) Aggregate into min bars
  23. … //then count the number of down bars { $project: { downBar: {$lt: [“$close”, “$open”] }, timestamp: 1, open: 1, high: 1, low: 1, close: 1}}, { $group: { _id: “$downBar”, sum: {$sum: 1}}} }) Add Analysis on the Bars
  24. var mapFunction = function () { emit(this.symbol, this.bidPrice); } var reduceFunction = function (symbol, priceList) { return Array.sum(priceList); } > db.ticks.mapReduce( map, reduceFunction, {out: ”tickSums"}) MapReduce Example: Sum
  25. Process Data in Hadoop • MongoDB’s Hadoop Connector • Supports Map/Reduce, Streaming, Pig • MongoDB as input/output storage for Hadoop jobs – No need to go through HDFS • Leverage power of Hadoop ecosystem against operational data in MongoDB
  26. Performance, Scalability, and High Availability
  27. Why MongoDB Is Fast and Scalable Better data locality Relational MongoDB In-Memory Caching Auto-Sharding Read/write scaling
  28. Auto-sharding for Horizontal Scale mongod Read/Write Scalability Key Range Symbol: A…Z
  29. Auto-sharding for Horizontal Scale Read/Write Scalability mongod mongod Key Range Symbol: A…J Key Range Symbol: K…Z
  30. Sharding mongod mongod mongod mongod Read/Write Scalability Key Range Symbol: A…F Key Range Symbol: G…J Key Range Symbol: K…O Key Range Symbol: P…Z
  31. Primary Secondar y Secondar y Primary Secondar y Secondar y Primary Secondar y Secondar y Primary Secondar y Secondar y MongoS MongoS MongoS Key Range Symbol: A…F, Time Key Range Symbol: G…J, Time Key Range Symbol: K…O, Time Key Range Symbol: P…Z, Time Application
  32. Summary • MongoDB is high performance for tick data • Scales horizontally automatically by auto-sharding • Fast, flexible querying, analysis, & aggregation • Dynamic schema can handle any data types • MongoDB has all these features with low TCO • We can support you with anything discussed
  33. Questions?
  34. Sr. Solution Architect, MongoDB Matt Kalan #ConferenceHashtag Thank You

×