• Share
  • Email
  • Embed
  • Like
  • Private Content
The Weather of the Century Part 2: High Performance
 

The Weather of the Century Part 2: High Performance

on

  • 296 views

 

Statistics

Views

Total Views
296
Views on SlideShare
268
Embed Views
28

Actions

Likes
0
Downloads
6
Comments
0

4 Embeds 28

https://www.mongodb.com 12
https://live.mongodb.com 8
http://www.mongodb.com 6
http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Weather of the Century Part 2: High Performance The Weather of the Century Part 2: High Performance Presentation Transcript

    • Consulting Engineer, MongoDB André Spiegel #MongoDBWorld The Weather of the Century Part II: High Performance
    • What was the weather when you were born?
    • Data Format: Raw and in MongoDB 0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975 ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999 GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859... { "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" } } Station Identifier (»NYC Central Park«)
    • How Big Is It? • 2.5 billion data points • 4 Terabyte (1.6k per document) • “moderately big”
    • How to do this with MongoDB?
    • First Deployment • Asingle server with a really big disk Application mongod i2.8xlarge 251 GB RAM 6 TB SSD c3.8xlarge
    • Second Deployment • Areally big cluster where everything is in RAM Application / mongos ... 100 x r3.2xlarge 61 GB RAM @ 100 GB disk mongod c3.8xlarge
    • Second Deployment • Areally big cluster where everything is in RAM Application / mongos ... 100 x r3.2xlarge 61 GB RAM @ 100 GB disk mongod
    • Now... how much would you pay? .. $60,000 / yr $700,000 / yr
    • Use Cases • Bulk loading – getting all data into the system • Latency and throughput for queries – point in space-time – one station, one year – the whole world, once upon a time • Aggregation and Exploration – warmest and coldest day ever, etc.
    • Bulk Loading: Principles • On the application side: – batch size – number of client threads – use unordered bulk writes • On the server side: – Journaling off ( temporarily! ) – Index later – In cluster: pre-split, no balancing
    • Bulk Loading: Single Server batch size threads through put 8 threads, batch size 100 → 85,000 doc/s
    • Bulk Loading: Single Server • Settings: 8 threads batch size 100 • Total loading time: 10 h 20 min • Documents per second: 70,000 • Index build time: 7 h 40 min (ts_1_st_1)
    • Bulk Loading: Cluster 144 threads, batch size 200 → 220,000 doc/s
    • Bulk Loading: Cluster • Shard Key: Station ID, hashed • Settings: 10 mongos @ 144 threads batch size 200 • Total loading time: 3 h 10 min • Documents per second: 228,000 • Index build time: 5 min (ts_1_st_1)
    • Queries: Point in Space-Time db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})
    • Queries: Point in Space-Time 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 single server cluster ms avg 95th 99th max. throughput: 40,000/s 610,000/s (10 mongos) db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})
    • Queries: One Station, One Year db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})
    • 0 1000 2000 3000 4000 5000 single server cluster ms avg 95th 99th Queries: One Station, One Year max. throughput: 20/s 430/s (10 mongos) targeted query db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})
    • Queries: The Whole World, Once Upon... db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
    • 0 2000 4000 6000 8000 10000 single server cluster ms avg 95th 99th Queries: The Whole World, Once Upon... max. throughput: 8/s 310/s (10 mongos) scatter/gather query db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
    • Analytics and Exploration • Analytics means ad-hoc queries for which we do not have an index – Find all tornados – Maximum reported temperature • We cannot just index everything – memory – write performance
    • Analytics: Find all Tornados db.data.find ({ "presentWeatherObservation.condition" : "99" }) 47 s Cluster 1 h 28 min Single Server
    • Analytics: Maximum Temperature db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } } ]) 61.8 °C = 143 °F 2 min Cluster 4 h 45 min Single Server
    • Summary: Single Server Pro • Cost-effective • Very good latency for single queries Con • Some operations are prohibitive: – Indexing – Table Scans
    • Summary: Cluster Con • High cost Pro • High throughput • Very good latency for single queries • Scatter-gather yields significant speed-up • Analytics are possible ..
    • Thank you.