Your SlideShare is downloading. ×
0
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mongo db 2.4 time series data - Brignoli

644

Published on

Le slide di Massimo Brignoli presentate a Codemotion Roma 2014

Le slide di Massimo Brignoli presentate a Codemotion Roma 2014

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
644
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb
  2. Agenda • What is time series data? • Schema design considerations • Broader use case: operational intelligence • MMS Monitoring schema design • Thinking ahead • Questions
  3. What is time series data?
  4. Time Series Data is Everywhere • Financial markets pricing (stock ticks) • Sensors (temperature, pressure, proximity) • Industrial fleets (location, velocity, operational) • Social networks (status updates) • Mobile devices (calls, texts) • Systems (server logs, application logs)
  5. Time Series Data at a Higher Level • Widely applicable data model • Applies to several different “data use cases” • Various schema and modeling options • Application requirements drive schema design
  6. Time Series Data Considerations • Resolution of raw events • Resolution needed to support – Applications – Analysis – Reporting • Data retention policies – Data ages out – Retention
  7. Schema Design Considerations
  8. Designing For Writing and Reading • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  9. Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload • Aggregations computed at application-level
  10. Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  11. Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  12. Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  13. Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  14. Characterzing Write Differences • Example: data generated every second • Capturing data per minute requires: – Document per event: 60 writes – Document per minute: 1 write, 59 updates • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits
  15. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: – Document per event: 3600 reads – Document per minute: 60 reads • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks
  16. MMS Monitoring Schema Design
  17. MMS Monitoring • MongoDB Management System Monitoring • Available in two flavors – Free cloud-hosted monitoring – On-premise with MongoDB Enterprise • Monitor single node, replica set, or sharded cluster deployments • Metric dashboards and custom alert triggers
  18. MMS Monitoring
  19. MMS Monitoring
  20. MMS Application Requirements Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5- minute resolution Display dictates the stored pre- aggregations, e.g. total and count
  21. Monitoring Schema Design • Per-minute documentmodel • Documentsstore individual metrics and counts • Supports“total” and “avg/sec”display { timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } }
  22. Monitoring Data Updates • Single update required to add new data and increment associated counts db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } )
  23. Monitoring Data Management • Data stored at different granularity levels for read performance • Collections are organized into specific intervals • Retention is managed by simply dropping collections as they age out • Document structure is pre-created to maximize write performance
  24. Use Case: Operational Intelligence
  25. What is Operational Intelligence • Storing log data – Capturing application and/or server generated events • Hierarchical aggregation – Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly • Pre-aggregated reports – Processing data to generate reporting from raw events
  26. Storing Log Data { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" } 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”
  27. Pre-Aggregation • Analytics across raw events can involve many reads • Alternative schemas can improve read and write performance • Data can be organized into more coarse buckets • Transition from insert-driven to update-driven workloads
  28. Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } } • Leverage time-seriesstyle bucketing • Trackindividual metrics (ex. page views) • Improve performancefor reads/writes • Minimal processingoverhead
  29. Hierarchical Aggregation • Analytical approach as opposed to schema approach – Leverage built-inAggregation Framework or MapReduce • Execute multiple tasks sequentially to aggregate at varying levels • Raw events  Hourly  Weekly  Monthly • Rolling approach distributes the aggregation workload
  30. Thinking Ahead
  31. Before You Start • What are the application requirements? • Is pre-aggregation useful for your application? • What are your retention and age-out policies? • What are the gotchas? – Pre-create document structure to avoid fragmentation and performance problems – Organize your data for growth – time series data grows fast!
  32. Down The Road • Scale-out considerations – Vertical vs. horizontal (with sharding) • Understanding the data – Aggregation – Analytics – Reporting • Deeper data analysis – Patterns – Predictions
  33. Scaling Time Series Data in MongoDB • Vertical growth – Larger instances with more CPU and memory – Increased storage capacity • Horizontal growth – Partitioning data across many machines – Dividing and distributing the workload
  34. Time Series Sharding Considerations • What are the application requirements? – Primarily collecting data – Primarily reporting data – Both • Map those back to – Write performance needs – Read/write query distribution – Collection organization (see MMS Monitoring) • Example: {metric name, coarse timestamp}
  35. Aggregates, Analytics, Reporting • Aggregation Framework can be used for analysis – Does it work with the chosen schema design? – What sorts of aggregations are needed? • Reporting can be done on predictable, rolling basis – See “HierarchicalAggregation” • Consider secondary reads for analytical operations – Minimize load on production primaries
  36. Deeper Data Analysis • Leverage MongoDB-Hadoop connector – Bi-directional support for reading/writing – Works with online and offline data (e.g. backup files) • Compute using MapReduce – Patterns – Recommendations – Etc. • Explore data – Pig – Hive
  37. Questions?
  38. Resources • Schema Design for Time Series Data in MongoDB http://blog.mongodb.org/post/65517193370/schema-design-for-time-series- data-in-mongodb • Operational Intelligence Use Case http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence • Data Modeling in MongoDB http://docs.mongodb.org/manual/data-modeling/ • Schema Design (webinar) http://www.mongodb.com/events/webinar/schema-design-oct2013

×