Making Sense of Time Series Data in MongoDB

•

1 like•305 views

MongoDB

Matrix for MongoDB

Technology

Making Sense of
Time Series Data in
MongoDB
Matan Zohar
CTO, Matrix Open Source

2
Whoami?
• Helping customers with MongoDB & Data intensive architectures for the past 3 years.
Anything from MongoDB, Kafka, Spark, Nifi, Hadoop to Docker, Kubernetes / Openshift
and the mix of them together.
• Experienced in designing and running production Data-driven, Messaging & Event-
driven architectures for more than 10 years.
• Looking constantly for new software projects to make our work life smarter and
easier.
• Loves chatting about martial arts, dogs, and philosophical thoughts.

3
Agenda
• What is time series data?
• Why MongoDB for time series?
• Time series challenges and patterns
• Querying time series data in MongoDB
• Tools of the trade

5
Actually it’s quite simple…
• Timestamp
• value
• data tags (optional)

6
Why MongoDB?
• Flexible document model
• Transactional
• Easily horizontally scalable
• Optimized indexes
• Native analytics + reporting tools
• Mobile, On-prem, SaaS
• Massive community 30M+ downloads

7
Somebody did that?
https://www.mongodb.com/customers/bosch

What is the right pattern for time series?

13
Key to successful time series app
• Writes per second?
• Read patterns?
• Read intensive?
• Data retention?
• Security requirements?
Know the
application

14
General structure of time series apps
1. Write fast
2. Aggregate
1. Load to memory
2. Optimize aggregation for query pattern
3. Save to file / store
3. Query

15
Pattern #1 : One document per data point
Tick per second

16
Pattern #2 : Time-based bucketing
Bucket per minute

17
Pattern #3 : Size-based bucketing
Update last point index
last point index

18
Bucketing impact example
1 675 678
4 608 156
7 542 468
10 475 596
28 399 78 102 127 836 177 550
-
2 000 000
4 000 000
6 000 000
8 000 000
10 000 000
12 000 000
7 14 21 28
NumofDocuments
Days
Document Count
Document per sec Document per min

19
Bucketing impact example
96,84
266,32
435,90
605,42
29,83
83,90
135,64
190,64
20,46
56,28
92,12
127,94
8,10
23,18
37,48
52,71
-
100,00
200,00
300,00
400,00
500,00
600,00
700,00
7 14 21 28
Size(MB)
Days
Collection size (storage)
Data size (per sec) Disk storage (per sec) Data size (per min) Disk storage (per min)

20
Bucketing impact example
25,37
46,66
73,95
103,03
0,40 0,91 1,37 1,86
-
20,00
40,00
60,00
80,00
100,00
120,00
7 14 21 28
Size(MB)
Days
Index Size (memory)
Document per second Document per minute

I’ve heard that LSM is just better
than that B+ tree…

22
Is LSM better than B+ tree?
• It depends if you only write and save or actually query…
• LSM is better for fast writes but bad for reads, that means you will usually do the reads elsewhere…
• B+ tree is better for reads but trades off on write performance.
LSM tree B+ tree

But I want fast writes + reads,flexibility,
and rich queries!

24
Take the best for every step of the way
Fast ingest Aggregate /
bucket
Store and query

$27 Not that complicated SELECT cust_id, SUM(amount) AS total FROM orders GROUP BY cust_id db.orders.aggregate([ { $match: { status: “A"} } }, { $group: { _id: "$cust_id", total: { $sum: "$amount" } } ]) SQL: MongoDB:$

Preview every step change in sample documents

But what about heavy lifting
data science?

39
Analytics at scale with Spark connector

44
Key takeaways
• Think about your end goal (business need)
• Know your application requirements
• Choose the right pattern (bucket time / size)
• Plan for retention and cleanup
• Write intensive? Buffer with Kafka
• Use aggregation and all the tools (compass, charts, BI connector)
• Don’t forget to monitor and tune it as you go

Thank you for your
time!
open@matrix.co.il

What's hot

Elastic Search Meetup Special - Yann Cluchey, Cogenta Internet World

Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson

Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...HostedbyConfluent

Netflix Big Data Paris 2017Jason Flittner

Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1Blazeclan Technologies Private Limited

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services

MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to AtlasMongoDB

Big Data, HPC and StreamingAnjani Phuyal

Building a real-time, scalable and intelligent programmatic ad buying platformJampp

MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile AppsMongoDB

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom

MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB

Getting Actionable, Reactive and Historic insights on large volumes of dataAshish Tadose

AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...Amazon Web Services

Mindtalk Tech - Behind the scenesrobin_sy

Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScyllaDB

INTRODUCING: CREATE PIPELINESingleStore

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...confluent

Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...confluent

What's hot (20)

Elastic Search Meetup Special - Yann Cluchey, Cogenta

Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016

Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...

Netflix Big Data Paris 2017

Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...

MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas

Big Data, HPC and Streaming

Building a real-time, scalable and intelligent programmatic ad buying platform

MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite

MongoDB Breakfast Milan - Mainframe Offloading Strategies

Getting Actionable, Reactive and Historic insights on large volumes of data

AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...

Mindtalk Tech - Behind the scenes

Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka

INTRODUCING: CREATE PIPELINE

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...

Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...

Similar to Making Sense of Time Series Data in MongoDB

MongoDB Tick Data PresentationMongoDB

Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services

Database and Analytics on the AWS CloudAmazon Web Services

Big Data Architectural PatternsAmazon Web Services

AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services

Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg

L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB

L’architettura di classe enterprise di nuova generazioneMongoDB

When to Use MongoDBMongoDB

Realtime Analytics on AWSSungmin Kim

MongoDB at ScaleMongoDB

MongoDB Evening Austin, TX 2017MongoDB

Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix

TenMax Data Pipeline Experience SharingChen-en Lu

MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB

Time Series Databases for IoT (On-premises and Azure)Ivo Andreev

Myths & Reality - Choose a DBMS tailored to your use casesOVHcloud

MongoDB Europe 2016 - The Rise of the Data LakeMongoDB

Scalable web architectureKaushik Paranjape

MongoDB Aggregation PerformanceMongoDB

Similar to Making Sense of Time Series Data in MongoDB (20)

MongoDB Tick Data Presentation

Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv

Database and Analytics on the AWS Cloud

Big Data Architectural Patterns

AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT

Everything We Learned About In-Memory Data Layout While Building VoltDB

L’architettura di Classe Enterprise di Nuova Generazione

L’architettura di classe enterprise di nuova generazione

When to Use MongoDB

Realtime Analytics on AWS

MongoDB at Scale

MongoDB Evening Austin, TX 2017

Benchmark Showdown: Which Relational Database is the Fastest on AWS?

TenMax Data Pipeline Experience Sharing

MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...

Time Series Databases for IoT (On-premises and Azure)

Myths & Reality - Choose a DBMS tailored to your use cases

MongoDB Europe 2016 - The Rise of the Data Lake

Scalable web architecture

MongoDB Aggregation Performance

Recently uploaded

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Key Features Of Token Development (1).pptxLBM Solutions

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

APIForce Zurich 5 April Automation LPDGMarianaLemus7

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

How to convert PDF to text with Nanonetsnaman860154

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing

Key Features Of Token Development (1).pptx

Designing IA for AI - Information Architecture Conference 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

Injustice - Developers Among Us (SciFiDevCon 2024)

APIForce Zurich 5 April Automation LPDG

My Hashitalk Indonesia April 2024 Presentation

How to convert PDF to text with Nanonets

Scanning the Internet for External Cloud Exposures via SSL Certs

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Maximizing Board Effectiveness 2024 Webinar.pptx

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Are Multi-Cloud and Serverless Good or Bad?

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Unleash Your Potential - Namagunga Girls Coding Club

SQL Database Design For Developers at php[tek] 2024

Presentation on how to chat with PDF using ChatGPT code interpreter

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Making Sense of Time Series Data in MongoDB

1. Making Sense of Time Series Data in MongoDB Matan Zohar CTO, Matrix Open Source

2. 2 Whoami? • Helping customers with MongoDB & Data intensive architectures for the past 3 years. Anything from MongoDB, Kafka, Spark, Nifi, Hadoop to Docker, Kubernetes / Openshift and the mix of them together. • Experienced in designing and running production Data-driven, Messaging & Event- driven architectures for more than 10 years. • Looking constantly for new software projects to make our work life smarter and easier. • Loves chatting about martial arts, dogs, and philosophical thoughts.

3. 3 Agenda • What is time series data? • Why MongoDB for time series? • Time series challenges and patterns • Querying time series data in MongoDB • Tools of the trade

4. 4 What is time series data?

5. 5 Actually it’s quite simple… • Timestamp • value • data tags (optional)

6. 6 Why MongoDB? • Flexible document model • Transactional • Easily horizontally scalable • Optimized indexes • Native analytics + reporting tools • Mobile, On-prem, SaaS • Massive community 30M+ downloads

7. 7 Somebody did that? https://www.mongodb.com/customers/bosch

8. 8 Connected Cow Pedometer Sensor

9. 9 Connected Cow

10. 10 Rich query functionality

11. 11 Transactions are here

12. What is the right pattern for time series?

13. 13 Key to successful time series app • Writes per second? • Read patterns? • Read intensive? • Data retention? • Security requirements? Know the application

14. 14 General structure of time series apps 1. Write fast 2. Aggregate 1. Load to memory 2. Optimize aggregation for query pattern 3. Save to file / store 3. Query

15. 15 Pattern #1 : One document per data point Tick per second

16. 16 Pattern #2 : Time-based bucketing Bucket per minute

17. 17 Pattern #3 : Size-based bucketing Update last point index last point index

18. 18 Bucketing impact example 1 675 678 4 608 156 7 542 468 10 475 596 28 399 78 102 127 836 177 550 - 2 000 000 4 000 000 6 000 000 8 000 000 10 000 000 12 000 000 7 14 21 28 NumofDocuments Days Document Count Document per sec Document per min

19. 19 Bucketing impact example 96,84 266,32 435,90 605,42 29,83 83,90 135,64 190,64 20,46 56,28 92,12 127,94 8,10 23,18 37,48 52,71 - 100,00 200,00 300,00 400,00 500,00 600,00 700,00 7 14 21 28 Size(MB) Days Collection size (storage) Data size (per sec) Disk storage (per sec) Data size (per min) Disk storage (per min)

20. 20 Bucketing impact example 25,37 46,66 73,95 103,03 0,40 0,91 1,37 1,86 - 20,00 40,00 60,00 80,00 100,00 120,00 7 14 21 28 Size(MB) Days Index Size (memory) Document per second Document per minute

21. I’ve heard that LSM is just better than that B+ tree…

22. 22 Is LSM better than B+ tree? • It depends if you only write and save or actually query… • LSM is better for fast writes but bad for reads, that means you will usually do the reads elsewhere… • B+ tree is better for reads but trades off on write performance. LSM tree B+ tree

23. But I want fast writes + reads,flexibility, and rich queries!

24. 24 Take the best for every step of the way Fast ingest Aggregate / bucket Store and query

25. So how will I query time series?

26. 26 Aggregation pipeline

27. 27 Not that complicated SELECT cust_id, SUM(amount) AS total FROM orders GROUP BY cust_id db.orders.aggregate([ { $match: { status: “A"} } }, { $group: { _id: "$cust_id", total: { $sum: "$amount" } } ]) SQL: MongoDB:

28. See statistics on documents

29. Run explain on queries

30. Build & Debug aggregation pipeline

31. Add new steps to pipeline

32. Preview every step change in sample documents

33. Preview every step change in sample documents

34. Preview every step change in sample documents

35. Export to selected language

36. 36 MongoDB connector for BI

37. 37 MongoDB Charts (beta)

38. But what about heavy lifting data science?

39. 39 Analytics at scale with Spark connector

40. 40 MongoDB R driver

41. Monitor,monitor,monitor!

42. 42

43. Can you sum it up?

44. 44 Key takeaways • Think about your end goal (business need) • Know your application requirements • Choose the right pattern (bucket time / size) • Plan for retention and cleanup • Write intensive? Buffer with Kafka • Use aggregation and all the tools (compass, charts, BI connector) • Don’t forget to monitor and tune it as you go

45. Thank you for your time! open@matrix.co.il

Making Sense of Time Series Data in MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Making Sense of Time Series Data in MongoDB

Similar to Making Sense of Time Series Data in MongoDB (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Making Sense of Time Series Data in MongoDB