MongoDB Tick Data Presentation

3,194 views

Published on

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,194
On SlideShare
0
From Embeds
0
Number of Embeds
338
Actions
Shares
0
Downloads
123
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
    ----- Meeting Notes (11/02/2014 12:00) -----
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance.
    NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model.
    MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance.
    Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section.
    May need to revise the graphic – either remove the line or all points should be on the line.
    To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc.
    Document data model is good segue to next section >> Data Model
  • Makes MongoDB a Hadoop-enabled file system
    Read and write to live data, in-place
    Copy data between Hadoop and MongoDB
    Uses MongoDB indexes to filter data
    Full support for data processing
    Hive
    MapReduce
    Pig
    Streaming
  • Good for regulatory reporting, e.g. KYC
  • MongoDB Tick Data Presentation

    1. 1. MongoDB as a Tick Store
    2. 2. MongoDB World New York City, June 23-25 #MongoDBWorld See what’s next in MongoDB including •MongoDB 2.6 •Sharding •Replication •Aggregation http://world.mongodb.com Save $200 with discount code THANKYOU
    3. 3. 3 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
    4. 4. 4 MongoDB Overview 350+ employees 1,000+ customers Over $231 million in funding13 offices around the world
    5. 5. 5 7,000,000+7,000,000+ MongoDB DownloadsMongoDB Downloads 150,000+150,000+ Online Education RegistrantsOnline Education Registrants 35,000+35,000+ MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users 30,000+30,000+ MongoDB User Group MembersMongoDB User Group Members 20,000+20,000+ MongoDB Days AttendeesMongoDB Days Attendees Global Community
    6. 6. 6 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
    7. 7. 7 MongoDB. NoSQL Document based database. Designed to build todays applications. •Fast to build. •Quick to adapt. •Easy to scale •Lessons learned from 40 years of RDBMS.
    8. 8. 8 Relational Model PlanID BenFK Plan 100 1 PPO Plus 200 2 Standard EmpID Name Dept Title Manage Payband 9950 Dunham, Justin 500 1500 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 BenID Benefit 1 Health 2 Dental DeptID Department 500 Marketing TitleID Title 1500 Product Manager
    9. 9. 9 Document Model EmpID Name Dept Title Manage Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 Health PPO Plus Dental Standard PlanID BenFK Plan 100 Health PPO Plus 200 Dental Standard
    10. 10. 10 MongoDB - Agility Dynamic Schemas V 1.0 V 1.1 V 2.0 EmpID Name Dept Title Manager Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpID Name Title Payband Bonus 9952 Joe White CEO E 20,000 EmpID Name Dept Title Manager Payband Shares 9531 Nearey, Graham Marketing Director 9952 D 5000 Health PPO Plus Dental Standard
    11. 11. 11 Shell Command-line shell for interacting directly with database MongoDB - Usability Drivers Drivers for most popular programming languages and frameworks > db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() { “_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database” } Java Python Perl Ruby Haskell JavaScript
    12. 12. 12 MongoDB - Utility • Complex Indexed Queries • Aggregation. Age > 65 AND Male living near Lyon Age Profit Margin 1-17 0 18-35 20 36-50 80 51-65 50 66+ 5
    13. 13. 13 MongoDB - Scalability • High Availability • Auto Sharding • Enterprise Monitoring • Grid file storage
    14. 14. 14 Column Family Key/Value Store Relational Document Store Options for building a Operational Database
    15. 15. 15 MongoDB & Hadoop • Multi-source analytics • Interactive & Batch • Data lake • Online, Real-time • High concurrency & HA • Live analytics Operational Post Processingand MongoDB Connector for Hadoop
    16. 16. 16 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
    17. 17. 17 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
    18. 18. 18 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrice: 55.37, offerPrice: 55.58, bidQuantity: 500, offerQuantity: 700 } > db.ticks.find( {symbol: "DIS", bidPrice: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Equities
    19. 19. 19 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], offerPrices: [55.58, 55.59, 55.60], bidQuantities: [500, 1000, 2000], offerQuantities: [1000, 2000, 3000] } > db.ticks.find( {bidPrices: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Depth of Book
    20. 20. 20 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), title: “Disney Earnings…” body: “Walt Disney Company reported…”, tags: [“earnings”, “media”, “walt disney”] } Flexible Data Model Easy Onboarding – e.g. News
    21. 21. 21 { _id : ObjectId("4e2e3f92268cdda473b628f6"), timestamp: ISODate("2013-02-15 10:00"), twitterHandle: “jdoe”, tweet: “Heard @DisneyPictures is releasing…”, usernamesIncluded: [“DisneyPictures”], hashTags: [“movierumors”, “disney”] } Flexible Data Model Easy Onboarding – e.g. Social Networking
    22. 22. 22 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
    23. 23. 23 Architecture for Querying Data Higher Latency Trading Applications Higher Latency Trading Applications Backtesting Applications Backtesting Applications Research & Analysis Applications Research & Analysis Applications
    24. 24. 24 // Compound indexes > db.ticks.ensureIndex({symbol: 1, timestamp:1}) // Index on arrays >db.ticks.ensureIndex( {bidPrices: -1}) // Index on any depth > db.ticks.ensureIndex( {“bids.price”: 1} ) // Full text search > db.ticks.ensureIndex ( {tweet: “text”} ) Flexible Querying and Indexing Index any field [or arrays]
    25. 25. 25 // Ticks for last month for media companies > db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01- 01")}, timestamp: {$lte: new ISODate("2013-01- 31")}}) // Ticks when Disney’s bid breached 55.50 this month > db.ticks.find({ symbol: "DIS", bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02- 01")}}) Flexible Querying and Indexing Rich Query Language
    26. 26. 26 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
    27. 27. 27 //Aggregate minute bars for Disney for February db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} ) Aggregation Framework Parallel execution across cluster
    28. 28. 28 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
    29. 29. 29 Pre-aggregation pattern Real-time and continuous state { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], … } { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 … } { _id : ObjectId("4e2e3f9226 8cdda473b628f6”) symbol : "DIS", Daily_high: 66.1 Daily_low: 57.1 Daily_volume: 100222 } All Ticks CollectionPre-aggregated State
    30. 30. 30 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
    31. 31. 31 Process Data in Hadoop • MongoDB’s Hadoop Connector • Supports Map/Reduce, Streaming, Pig • MongoDB as input/output storage for Hadoop jobs – No need to go through HDFS • Leverage power of Hadoop ecosystem against operational data in MongoDB
    32. 32. 32 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & High Scalability
    33. 33. 33 Why MongoDB Is Fast and Scalable Better data locality Relational MongoDB In-Memory Caching Auto-Sharding Read/write scalingRead/write scaling
    34. 34. 34 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
    35. 35. 35 Easy On-boarding Easy On-boarding of all Financial Data Problem Why MongoDB • Financial data comes in many different shapes and sizes, and it needs to be on-boarded for research and analysis from multiple platforms like Bloombergs and Reuters Shapes - Time Series News - Event - Sentiment Sizes - 1MB 1x a day price data - 1GB x 1000s data matrices - 40GB 1-minute data - 30TB Tick data - Even bigger << options data • On-boarding can takes week in a relational model with complex schema designs and ETL •An FX Option can be a 80+ table schema • Relational technology is a scale up architecture and did not meet performance requirement of AHL • Dynamic schema: can on-board data of any shape or size almost instantly, without having to go through a typical “ETL” lifecyle • Performance: Quant researchers want data rendered in <1s for up-to 20 years of historical data for back-testing trading strategies • Replication: Team of 40 Quants researchers who rely on this system being up. • Sharding: can scale seamlessly and accommodate data of any shape and size
    36. 36. 36 Low latency: -1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL) -OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick) -1s for 15M rows Java - Parallel Access: -Cluster with 256+ concurrent data access -Consistent throughput – little load on the Mongo server Efficient: -10-15x reduction in network load -Negligible decompression cost (lz4: 1.8Gb/s) Easy On-boarding Results
    37. 37. 37
    38. 38. 38
    39. 39. 39 James (AHL) Presentation Links • Slides: • http://www.slideshare.net/JamesBlackburn1/mo ngodb-and-python-as-a-market-data-platform • YouTube: • James Blackburn - Python and MongoDB as a Platform for Financial Market Data
    40. 40. Q&A

    ×