• Like

MongoDB Tick Data Presentation

  • 1,133 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,133
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
51
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
    ----- Meeting Notes (11/02/2014 12:00) -----
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance.
    NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model.
    MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance.
    Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section.
    May need to revise the graphic – either remove the line or all points should be on the line.
    To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc.
    Document data model is good segue to next section >> Data Model
  • Makes MongoDB a Hadoop-enabled file system
    Read and write to live data, in-place
    Copy data between Hadoop and MongoDB
    Uses MongoDB indexes to filter data
    Full support for data processing
    Hive
    MapReduce
    Pig
    Streaming
  • Good for regulatory reporting, e.g. KYC

Transcript

  • 1. MongoDB as a Tick Store
  • 2. MongoDB World New York City, June 23-25 #MongoDBWorld See what’s next in MongoDB including •MongoDB 2.6 •Sharding •Replication •Aggregation http://world.mongodb.com Save $200 with discount code THANKYOU
  • 3. 3 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 4. 4 MongoDB Overview 350+ employees 1,000+ customers Over $231 million in funding13 offices around the world
  • 5. 5 7,000,000+7,000,000+ MongoDB DownloadsMongoDB Downloads 150,000+150,000+ Online Education RegistrantsOnline Education Registrants 35,000+35,000+ MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users 30,000+30,000+ MongoDB User Group MembersMongoDB User Group Members 20,000+20,000+ MongoDB Days AttendeesMongoDB Days Attendees Global Community
  • 6. 6 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 7. 7 MongoDB. NoSQL Document based database. Designed to build todays applications. •Fast to build. •Quick to adapt. •Easy to scale •Lessons learned from 40 years of RDBMS.
  • 8. 8 Relational Model PlanID BenFK Plan 100 1 PPO Plus 200 2 Standard EmpID Name Dept Title Manage Payband 9950 Dunham, Justin 500 1500 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 BenID Benefit 1 Health 2 Dental DeptID Department 500 Marketing TitleID Title 1500 Product Manager
  • 9. 9 Document Model EmpID Name Dept Title Manage Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 Health PPO Plus Dental Standard PlanID BenFK Plan 100 Health PPO Plus 200 Dental Standard
  • 10. 10 MongoDB - Agility Dynamic Schemas V 1.0 V 1.1 V 2.0 EmpID Name Dept Title Manager Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpID Name Title Payband Bonus 9952 Joe White CEO E 20,000 EmpID Name Dept Title Manager Payband Shares 9531 Nearey, Graham Marketing Director 9952 D 5000 Health PPO Plus Dental Standard
  • 11. 11 Shell Command-line shell for interacting directly with database MongoDB - Usability Drivers Drivers for most popular programming languages and frameworks > db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() { “_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database” } Java Python Perl Ruby Haskell JavaScript
  • 12. 12 MongoDB - Utility • Complex Indexed Queries • Aggregation. Age > 65 AND Male living near Lyon Age Profit Margin 1-17 0 18-35 20 36-50 80 51-65 50 66+ 5
  • 13. 13 MongoDB - Scalability • High Availability • Auto Sharding • Enterprise Monitoring • Grid file storage
  • 14. 14 Column Family Key/Value Store Relational Document Store Options for building a Operational Database
  • 15. 15 MongoDB & Hadoop • Multi-source analytics • Interactive & Batch • Data lake • Online, Real-time • High concurrency & HA • Live analytics Operational Post Processingand MongoDB Connector for Hadoop
  • 16. 16 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 17. 17 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 18. 18 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrice: 55.37, offerPrice: 55.58, bidQuantity: 500, offerQuantity: 700 } > db.ticks.find( {symbol: "DIS", bidPrice: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Equities
  • 19. 19 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], offerPrices: [55.58, 55.59, 55.60], bidQuantities: [500, 1000, 2000], offerQuantities: [1000, 2000, 3000] } > db.ticks.find( {bidPrices: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Depth of Book
  • 20. 20 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), title: “Disney Earnings…” body: “Walt Disney Company reported…”, tags: [“earnings”, “media”, “walt disney”] } Flexible Data Model Easy Onboarding – e.g. News
  • 21. 21 { _id : ObjectId("4e2e3f92268cdda473b628f6"), timestamp: ISODate("2013-02-15 10:00"), twitterHandle: “jdoe”, tweet: “Heard @DisneyPictures is releasing…”, usernamesIncluded: [“DisneyPictures”], hashTags: [“movierumors”, “disney”] } Flexible Data Model Easy Onboarding – e.g. Social Networking
  • 22. 22 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 23. 23 Architecture for Querying Data Higher Latency Trading Applications Higher Latency Trading Applications Backtesting Applications Backtesting Applications Research & Analysis Applications Research & Analysis Applications
  • 24. 24 // Compound indexes > db.ticks.ensureIndex({symbol: 1, timestamp:1}) // Index on arrays >db.ticks.ensureIndex( {bidPrices: -1}) // Index on any depth > db.ticks.ensureIndex( {“bids.price”: 1} ) // Full text search > db.ticks.ensureIndex ( {tweet: “text”} ) Flexible Querying and Indexing Index any field [or arrays]
  • 25. 25 // Ticks for last month for media companies > db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01- 01")}, timestamp: {$lte: new ISODate("2013-01- 31")}}) // Ticks when Disney’s bid breached 55.50 this month > db.ticks.find({ symbol: "DIS", bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02- 01")}}) Flexible Querying and Indexing Rich Query Language
  • 26. 26 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 27. 27 //Aggregate minute bars for Disney for February db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} ) Aggregation Framework Parallel execution across cluster
  • 28. 28 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
  • 29. 29 Pre-aggregation pattern Real-time and continuous state { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], … } { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 … } { _id : ObjectId("4e2e3f9226 8cdda473b628f6”) symbol : "DIS", Daily_high: 66.1 Daily_low: 57.1 Daily_volume: 100222 } All Ticks CollectionPre-aggregated State
  • 30. 30 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
  • 31. 31 Process Data in Hadoop • MongoDB’s Hadoop Connector • Supports Map/Reduce, Streaming, Pig • MongoDB as input/output storage for Hadoop jobs – No need to go through HDFS • Leverage power of Hadoop ecosystem against operational data in MongoDB
  • 32. 32 Tick Data – Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & High Scalability
  • 33. 33 Why MongoDB Is Fast and Scalable Better data locality Relational MongoDB In-Memory Caching Auto-Sharding Read/write scalingRead/write scaling
  • 34. 34 • What is MongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 35. 35 Easy On-boarding Easy On-boarding of all Financial Data Problem Why MongoDB • Financial data comes in many different shapes and sizes, and it needs to be on-boarded for research and analysis from multiple platforms like Bloombergs and Reuters Shapes - Time Series News - Event - Sentiment Sizes - 1MB 1x a day price data - 1GB x 1000s data matrices - 40GB 1-minute data - 30TB Tick data - Even bigger << options data • On-boarding can takes week in a relational model with complex schema designs and ETL •An FX Option can be a 80+ table schema • Relational technology is a scale up architecture and did not meet performance requirement of AHL • Dynamic schema: can on-board data of any shape or size almost instantly, without having to go through a typical “ETL” lifecyle • Performance: Quant researchers want data rendered in <1s for up-to 20 years of historical data for back-testing trading strategies • Replication: Team of 40 Quants researchers who rely on this system being up. • Sharding: can scale seamlessly and accommodate data of any shape and size
  • 36. 36 Low latency: -1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL) -OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick) -1s for 15M rows Java - Parallel Access: -Cluster with 256+ concurrent data access -Consistent throughput – little load on the Mongo server Efficient: -10-15x reduction in network load -Negligible decompression cost (lz4: 1.8Gb/s) Easy On-boarding Results
  • 37. 37
  • 38. 38
  • 39. 39 James (AHL) Presentation Links • Slides: • http://www.slideshare.net/JamesBlackburn1/mo ngodb-and-python-as-a-market-data-platform • YouTube: • James Blackburn - Python and MongoDB as a Platform for Financial Market Data
  • 40. Q&A