MongoDB as a Tick Store
MongoDB World
New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including
•MongoDB 2.6
•Sharding
•Replication
•Aggregation
http://world.mongodb.com
Save $200 with discount code THANKYOU
3
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
4
MongoDB Overview
350+ employees 1,000+ customers
Over $231 million in funding13 offices around the world
5
7,000,000+7,000,000+
MongoDB DownloadsMongoDB Downloads
150,000+150,000+
Online Education RegistrantsOnline Education Registrants
35,000+35,000+
MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users
30,000+30,000+
MongoDB User Group MembersMongoDB User Group Members
20,000+20,000+
MongoDB Days AttendeesMongoDB Days Attendees
Global Community
6
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
7
MongoDB.
NoSQL Document based database.
Designed to build todays applications.
•Fast to build.
•Quick to adapt.
•Easy to scale
•Lessons learned from 40 years of RDBMS.
8
Relational Model
PlanID BenFK Plan
100 1 PPO Plus
200 2 Standard
EmpID Name Dept Title Manage Payband
9950 Dunham,
Justin
500 1500 6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
BenID Benefit
1 Health
2 Dental
DeptID Department
500 Marketing
TitleID Title
1500 Product Manager
9
Document Model
EmpID Name Dept Title Manage Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpBenPlanID EmpFK PlanFK
1 9950 100
2 9950 200
Health PPO Plus
Dental Standard
PlanID BenFK Plan
100 Health PPO Plus
200 Dental Standard
10
MongoDB - Agility
Dynamic Schemas
V 1.0 V 1.1 V 2.0
EmpID Name Dept Title Manager Payband Benefits
9950 Dunham,
Justin
Marketing Product
Manager
6531 C
EmpID Name Title Payband Bonus
9952 Joe White CEO E 20,000
EmpID Name Dept Title Manager Payband Shares
9531 Nearey,
Graham
Marketing Director 9952 D 5000
Health PPO Plus
Dental Standard
11
Shell
Command-line shell for
interacting directly with
database
MongoDB - Usability
Drivers
Drivers for most popular
programming languages and
frameworks
> db.collection.insert({product:“MongoDB”,
type:“Document Database”})
>
> db.collection.findOne()
{
“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),
“product” : “MongoDB”
“type” : “Document Database”
}
Java
Python
Perl
Ruby
Haskell
JavaScript
12
MongoDB - Utility
• Complex Indexed Queries
• Aggregation.
Age > 65 AND Male
living near Lyon
Age Profit Margin
1-17 0
18-35 20
36-50 80
51-65 50
66+ 5
13
MongoDB - Scalability
• High Availability
• Auto Sharding
• Enterprise Monitoring
• Grid file storage
14
Column Family
Key/Value Store
Relational
Document Store
Options for building a Operational Database
15
MongoDB & Hadoop
• Multi-source analytics
• Interactive & Batch
• Data lake
• Online, Real-time
• High concurrency & HA
• Live analytics
Operational
Post
Processingand
MongoDB
Connector for
Hadoop
16
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
17
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
18
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrice: 55.37,
offerPrice: 55.58,
bidQuantity: 500,
offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Equities
19
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
bidPrices: [55.37, 55.36, 55.35],
offerPrices: [55.58, 55.59, 55.60],
bidQuantities: [500, 1000, 2000],
offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Flexible Data Model
Easy Onboarding – e.g. Depth of Book
20
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
Flexible Data Model
Easy Onboarding – e.g. News
21
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Flexible Data Model
Easy Onboarding – e.g. Social Networking
22
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
23
Architecture for Querying Data
Higher Latency
Trading
Applications
Higher Latency
Trading
Applications
Backtesting
Applications
Backtesting
Applications
Research &
Analysis
Applications
Research &
Analysis
Applications
24
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Flexible Querying and Indexing
Index any field [or arrays]
25
// Ticks for last month for media companies
> db.ticks.find({
symbol: {$in: ["DIS", “VIA“, “CBS"]},
timestamp: {$gt: new ISODate("2013-01-
01")},
timestamp: {$lte: new ISODate("2013-01-
31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({
symbol: "DIS",
bidPrice: {$gt: 55.50},
timestamp: {$gt: new ISODate("2013-02-
01")}})
Flexible Querying and Indexing
Rich Query Language
26
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & Linear Scalability
27
//Aggregate minute bars for Disney for February
db.ticks.aggregate(
{ $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}},
{ $project: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$hour: "$timestamp"},
minute: {$minute: "$timestamp"},
second: {$second: "$timestamp"},
timestamp: 1,
price: 1}},
{ $sort: { timestamp: 1}},
{ $group :
{ _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"},
open: {$first: "$price"},
high: {$max: "$price"},
low: {$min: "$price"},
close: {$last: "$price"} }} )
Aggregation Framework
Parallel execution across cluster
28
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
29
Pre-aggregation pattern
Real-time and continuous state
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
10:00"),
bidPrices: [55.37, 55.36, 55.35],
…
}
{
_id :
ObjectId("4e2e3f92268cdda473b628f6”)
symbol : "DIS",
timestamp: ISODate("2013-02-15
…
}
{
_id :
ObjectId("4e2e3f9226
8cdda473b628f6”)
symbol : "DIS",
Daily_high: 66.1
Daily_low: 57.1
Daily_volume: 100222
}
All Ticks CollectionPre-aggregated State
30
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
31
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop
jobs
– No need to go through HDFS
• Leverage power of Hadoop ecosystem against
operational data in MongoDB
32
Tick Data – Why MongoDB?
• Flexible Data Model
– Easy Onboarding
• Flexible Querying and Indexing
– Primary, Secondary & Index Intersection
• Aggregation Framework & Map-Reduce
– Native to MongoDB
• Pre-aggregation pattern
– Continuous and up-to-date snapshot of “object”
• Language Drivers & Hadoop Connector
– Java, Python, Scala, R, Matlab
• High Throughput & High Scalability
33
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory
Caching
Auto-Sharding
Read/write scalingRead/write scaling
34
• What is MongoDB
- The Company
- The Product
• MongoDB for Tick Data
• Case Study
Agenda
35
Easy On-boarding
Easy On-boarding of all Financial Data
Problem Why MongoDB
• Financial data comes in many different shapes and sizes,
and it needs to be on-boarded for research and analysis from
multiple platforms like Bloombergs and Reuters
Shapes
- Time Series News
- Event
- Sentiment
Sizes
- 1MB 1x a day price data
- 1GB x 1000s data matrices
- 40GB 1-minute data
- 30TB Tick data
- Even bigger << options data
• On-boarding can takes week in a relational model with
complex schema designs and ETL
•An FX Option can be a 80+ table schema
• Relational technology is a scale up architecture and did not
meet performance requirement of AHL
• Dynamic schema: can on-board data of any
shape or size almost instantly, without having to
go through a typical “ETL” lifecyle
• Performance: Quant researchers want data
rendered in <1s for up-to 20 years of historical
data for back-testing trading strategies
• Replication: Team of 40 Quants researchers who
rely on this system being up.
• Sharding: can scale seamlessly and
accommodate data of any shape and size
36
Low latency:
-1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL)
-OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick)
-1s for 15M rows Java
-
Parallel Access:
-Cluster with 256+ concurrent data access
-Consistent throughput – little load on the Mongo server
Efficient:
-10-15x reduction in network load
-Negligible decompression cost (lz4: 1.8Gb/s)
Easy On-boarding
Results
37
38
39
James (AHL) Presentation Links
• Slides:
• http://www.slideshare.net/JamesBlackburn1/mo
ngodb-and-python-as-a-market-data-platform
• YouTube:
• James Blackburn - Python and MongoDB as a
Platform for Financial Market Data
Q&A

MongoDB Tick Data Presentation

  • 1.
    MongoDB as aTick Store
  • 2.
    MongoDB World New YorkCity, June 23-25 #MongoDBWorld See what’s next in MongoDB including •MongoDB 2.6 •Sharding •Replication •Aggregation http://world.mongodb.com Save $200 with discount code THANKYOU
  • 3.
    3 • What isMongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 4.
    4 MongoDB Overview 350+ employees1,000+ customers Over $231 million in funding13 offices around the world
  • 5.
    5 7,000,000+7,000,000+ MongoDB DownloadsMongoDB Downloads 150,000+150,000+ OnlineEducation RegistrantsOnline Education Registrants 35,000+35,000+ MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users 30,000+30,000+ MongoDB User Group MembersMongoDB User Group Members 20,000+20,000+ MongoDB Days AttendeesMongoDB Days Attendees Global Community
  • 6.
    6 • What isMongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 7.
    7 MongoDB. NoSQL Document baseddatabase. Designed to build todays applications. •Fast to build. •Quick to adapt. •Easy to scale •Lessons learned from 40 years of RDBMS.
  • 8.
    8 Relational Model PlanID BenFKPlan 100 1 PPO Plus 200 2 Standard EmpID Name Dept Title Manage Payband 9950 Dunham, Justin 500 1500 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 BenID Benefit 1 Health 2 Dental DeptID Department 500 Marketing TitleID Title 1500 Product Manager
  • 9.
    9 Document Model EmpID NameDept Title Manage Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpBenPlanID EmpFK PlanFK 1 9950 100 2 9950 200 Health PPO Plus Dental Standard PlanID BenFK Plan 100 Health PPO Plus 200 Dental Standard
  • 10.
    10 MongoDB - Agility DynamicSchemas V 1.0 V 1.1 V 2.0 EmpID Name Dept Title Manager Payband Benefits 9950 Dunham, Justin Marketing Product Manager 6531 C EmpID Name Title Payband Bonus 9952 Joe White CEO E 20,000 EmpID Name Dept Title Manager Payband Shares 9531 Nearey, Graham Marketing Director 9952 D 5000 Health PPO Plus Dental Standard
  • 11.
    11 Shell Command-line shell for interactingdirectly with database MongoDB - Usability Drivers Drivers for most popular programming languages and frameworks > db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() { “_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database” } Java Python Perl Ruby Haskell JavaScript
  • 12.
    12 MongoDB - Utility •Complex Indexed Queries • Aggregation. Age > 65 AND Male living near Lyon Age Profit Margin 1-17 0 18-35 20 36-50 80 51-65 50 66+ 5
  • 13.
    13 MongoDB - Scalability •High Availability • Auto Sharding • Enterprise Monitoring • Grid file storage
  • 14.
    14 Column Family Key/Value Store Relational DocumentStore Options for building a Operational Database
  • 15.
    15 MongoDB & Hadoop •Multi-source analytics • Interactive & Batch • Data lake • Online, Real-time • High concurrency & HA • Live analytics Operational Post Processingand MongoDB Connector for Hadoop
  • 16.
    16 • What isMongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 17.
    17 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 18.
    18 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol: "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrice: 55.37, offerPrice: 55.58, bidQuantity: 500, offerQuantity: 700 } > db.ticks.find( {symbol: "DIS", bidPrice: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Equities
  • 19.
    19 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol: "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], offerPrices: [55.58, 55.59, 55.60], bidQuantities: [500, 1000, 2000], offerQuantities: [1000, 2000, 3000] } > db.ticks.find( {bidPrices: {$gt: 55.36} } ) Flexible Data Model Easy Onboarding – e.g. Depth of Book
  • 20.
    20 { _id : ObjectId("4e2e3f92268cdda473b628f6"), symbol: "DIS", timestamp: ISODate("2013-02-15 10:00"), title: “Disney Earnings…” body: “Walt Disney Company reported…”, tags: [“earnings”, “media”, “walt disney”] } Flexible Data Model Easy Onboarding – e.g. News
  • 21.
    21 { _id : ObjectId("4e2e3f92268cdda473b628f6"), timestamp:ISODate("2013-02-15 10:00"), twitterHandle: “jdoe”, tweet: “Heard @DisneyPictures is releasing…”, usernamesIncluded: [“DisneyPictures”], hashTags: [“movierumors”, “disney”] } Flexible Data Model Easy Onboarding – e.g. Social Networking
  • 22.
    22 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 23.
    23 Architecture for QueryingData Higher Latency Trading Applications Higher Latency Trading Applications Backtesting Applications Backtesting Applications Research & Analysis Applications Research & Analysis Applications
  • 24.
    24 // Compound indexes >db.ticks.ensureIndex({symbol: 1, timestamp:1}) // Index on arrays >db.ticks.ensureIndex( {bidPrices: -1}) // Index on any depth > db.ticks.ensureIndex( {“bids.price”: 1} ) // Full text search > db.ticks.ensureIndex ( {tweet: “text”} ) Flexible Querying and Indexing Index any field [or arrays]
  • 25.
    25 // Ticks forlast month for media companies > db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01- 01")}, timestamp: {$lte: new ISODate("2013-01- 31")}}) // Ticks when Disney’s bid breached 55.50 this month > db.ticks.find({ symbol: "DIS", bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02- 01")}}) Flexible Querying and Indexing Rich Query Language
  • 26.
    26 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & Linear Scalability
  • 27.
    27 //Aggregate minute barsfor Disney for February db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} ) Aggregation Framework Parallel execution across cluster
  • 28.
    28 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
  • 29.
    29 Pre-aggregation pattern Real-time andcontinuous state { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 10:00"), bidPrices: [55.37, 55.36, 55.35], … } { _id : ObjectId("4e2e3f92268cdda473b628f6”) symbol : "DIS", timestamp: ISODate("2013-02-15 … } { _id : ObjectId("4e2e3f9226 8cdda473b628f6”) symbol : "DIS", Daily_high: 66.1 Daily_low: 57.1 Daily_volume: 100222 } All Ticks CollectionPre-aggregated State
  • 30.
    30 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab
  • 31.
    31 Process Data inHadoop • MongoDB’s Hadoop Connector • Supports Map/Reduce, Streaming, Pig • MongoDB as input/output storage for Hadoop jobs – No need to go through HDFS • Leverage power of Hadoop ecosystem against operational data in MongoDB
  • 32.
    32 Tick Data –Why MongoDB? • Flexible Data Model – Easy Onboarding • Flexible Querying and Indexing – Primary, Secondary & Index Intersection • Aggregation Framework & Map-Reduce – Native to MongoDB • Pre-aggregation pattern – Continuous and up-to-date snapshot of “object” • Language Drivers & Hadoop Connector – Java, Python, Scala, R, Matlab • High Throughput & High Scalability
  • 33.
    33 Why MongoDB IsFast and Scalable Better data locality Relational MongoDB In-Memory Caching Auto-Sharding Read/write scalingRead/write scaling
  • 34.
    34 • What isMongoDB - The Company - The Product • MongoDB for Tick Data • Case Study Agenda
  • 35.
    35 Easy On-boarding Easy On-boardingof all Financial Data Problem Why MongoDB • Financial data comes in many different shapes and sizes, and it needs to be on-boarded for research and analysis from multiple platforms like Bloombergs and Reuters Shapes - Time Series News - Event - Sentiment Sizes - 1MB 1x a day price data - 1GB x 1000s data matrices - 40GB 1-minute data - 30TB Tick data - Even bigger << options data • On-boarding can takes week in a relational model with complex schema designs and ETL •An FX Option can be a 80+ table schema • Relational technology is a scale up architecture and did not meet performance requirement of AHL • Dynamic schema: can on-board data of any shape or size almost instantly, without having to go through a typical “ETL” lifecyle • Performance: Quant researchers want data rendered in <1s for up-to 20 years of historical data for back-testing trading strategies • Replication: Team of 40 Quants researchers who rely on this system being up. • Sharding: can scale seamlessly and accommodate data of any shape and size
  • 36.
    36 Low latency: -1xDay data:4ms for 10,000 rows (vs. 2,210ms from SQL) -OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick) -1s for 15M rows Java - Parallel Access: -Cluster with 256+ concurrent data access -Consistent throughput – little load on the Mongo server Efficient: -10-15x reduction in network load -Negligible decompression cost (lz4: 1.8Gb/s) Easy On-boarding Results
  • 37.
  • 38.
  • 39.
    39 James (AHL) PresentationLinks • Slides: • http://www.slideshare.net/JamesBlackburn1/mo ngodb-and-python-as-a-market-data-platform • YouTube: • James Blackburn - Python and MongoDB as a Platform for Financial Market Data
  • 40.

Editor's Notes

  • #8 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • #9 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • #10 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • #11 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • #13 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more ----- Meeting Notes (11/02/2014 12:00) -----
  • #14 MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more
  • #15 Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance. NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model. MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance. Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. &amp;gt;&amp;gt; segue to data model section. May need to revise the graphic – either remove the line or all points should be on the line. To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc. Document data model is good segue to next section &amp;gt;&amp;gt; Data Model
  • #16 Makes MongoDB a Hadoop-enabled file system Read and write to live data, in-place Copy data between Hadoop and MongoDB Uses MongoDB indexes to filter data Full support for data processing Hive MapReduce Pig Streaming
  • #36 Good for regulatory reporting, e.g. KYC