Imagine that self-driving cars now exist and are becoming widespread around the world. To facilitate the transition, it's necessary to set up central service to monitor traffic conditions nationwide, deploy sensors throughout the interstate system that monitor traffic conditions including car speeds, pavement and weather conditions, as well as accidents, construction, and other sources of traffic tie ups.
MongoDB has been selected as the database for this application. In this webinar, we will walk through designing the application’s schema that will both support the high update and read volumes as well as the data aggregation and analytics queries.
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
Video: http://youtu.be/B-bTPSwhsDY
Abstract
Patrick McFadin (@PatrickMcFadin), Chief Evangelist for Apache Cassandra at DataStax, will be presenting an introduction to Cassandra as a key player in database technologies. Both large and small companies alike chose Apache Cassandra as their database solution and Patrick will be presenting on why they made that choice.
Patrick will also be discussing Cassandra's architecture, including: data modeling, time-series storage and replication strategies, providing a holistic overview of how Cassandra works and the best way to get started.
About Patrick McFadin
Prior to working for DataStax, Patrick was the Chief Architect at Hobsons, an education services company. His responsibilities included ensuring product availability and scaling for all higher education products. Prior to this position, he was the Director of Engineering at Hobsons which he came to after they acquired his company, Link-11 Systems, a software services company. While at Link-11 Systems, he built the first widely popular CRM system for universities, Connect. He obtained a BS in Computer Engineering from Cal Poly, San Luis Obispo and holds the distinction of being the only recipient of a medal (asanyone can find out) for hacking while serving in the US Navy.
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
Cassandra is a great fit for high write use cases, which makes it a popular choice for storing time series and sensor-collection workloads. At Crowdstrike, we've been using Cassandra for just that purpose, collecting petabytes of expiring time series data. In this talk, I'll discuss compaction in time series workloads, and the TimeWindowCompactionStrategy we developed specifically for this purpose. I'll detail TWCS specific configuration properties, some lesser known compaction sub-properties that apply to all compaction strategies, and also cover other general tricks and tuning that are useful for very large time-series workloads.
Imagine that self-driving cars now exist and are becoming widespread around the world. To facilitate the transition, it's necessary to set up central service to monitor traffic conditions nationwide, deploy sensors throughout the interstate system that monitor traffic conditions including car speeds, pavement and weather conditions, as well as accidents, construction, and other sources of traffic tie ups.
MongoDB has been selected as the database for this application. In this webinar, we will walk through designing the application’s schema that will both support the high update and read volumes as well as the data aggregation and analytics queries.
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
Video: http://youtu.be/B-bTPSwhsDY
Abstract
Patrick McFadin (@PatrickMcFadin), Chief Evangelist for Apache Cassandra at DataStax, will be presenting an introduction to Cassandra as a key player in database technologies. Both large and small companies alike chose Apache Cassandra as their database solution and Patrick will be presenting on why they made that choice.
Patrick will also be discussing Cassandra's architecture, including: data modeling, time-series storage and replication strategies, providing a holistic overview of how Cassandra works and the best way to get started.
About Patrick McFadin
Prior to working for DataStax, Patrick was the Chief Architect at Hobsons, an education services company. His responsibilities included ensuring product availability and scaling for all higher education products. Prior to this position, he was the Director of Engineering at Hobsons which he came to after they acquired his company, Link-11 Systems, a software services company. While at Link-11 Systems, he built the first widely popular CRM system for universities, Connect. He obtained a BS in Computer Engineering from Cal Poly, San Luis Obispo and holds the distinction of being the only recipient of a medal (asanyone can find out) for hacking while serving in the US Navy.
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
Cassandra is a great fit for high write use cases, which makes it a popular choice for storing time series and sensor-collection workloads. At Crowdstrike, we've been using Cassandra for just that purpose, collecting petabytes of expiring time series data. In this talk, I'll discuss compaction in time series workloads, and the TimeWindowCompactionStrategy we developed specifically for this purpose. I'll detail TWCS specific configuration properties, some lesser known compaction sub-properties that apply to all compaction strategies, and also cover other general tricks and tuning that are useful for very large time-series workloads.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Presentation Material for NoSQL Indonesia "October MeetUp".
This slide talks about basic schema design and some examples in applications already on production.
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. In this session learn how FS companies are using MongoDB to solve their problems. The use cases are specific to FS but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
How Financial Services Organizations Use MongoDBMongoDB
MongoDB is the alternative that allows you to efficiently create and consume data, rapidly and securely, no matter how it is structured across channels and products, and makes it easy to aggregate data from multiple systems, while lowering TCO and delivering applications faster.
Learn how Financial Services Organizations are Using MongoDB with this presentation.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Presentation Material for NoSQL Indonesia "October MeetUp".
This slide talks about basic schema design and some examples in applications already on production.
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. In this session learn how FS companies are using MongoDB to solve their problems. The use cases are specific to FS but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
How Financial Services Organizations Use MongoDBMongoDB
MongoDB is the alternative that allows you to efficiently create and consume data, rapidly and securely, no matter how it is structured across channels and products, and makes it easy to aggregate data from multiple systems, while lowering TCO and delivering applications faster.
Learn how Financial Services Organizations are Using MongoDB with this presentation.
MongoDB Solution for Internet of Things and Big DataStefano Dindo
Internet of Things è uno degli scenari di mercato più importanti su cui investire entro il 2020.
L'Internet of Things permette di trasferire sul Web la vita reale delle persone grazie all'interazione con oggetti e spazi fisici scambiando un grande volume di dati.
Durante il Lab è stata fornita una descrizione di architettura necessaria a supportare progetti di Internet of Things con un focus sull'organizzazione dei dati all'interno di MongoDB, database NoSQL Leader di mercato, per raccogliere ed analizzare grandi volumi di dati in tempo reale ed in modo efficiente.
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...festival ICT 2016
Le aziende sono chiamate a rispondere velocemente ai cambiamenti di mercato a seguito dell’affermarsi di nuovi scenari (Internet of Things, Social Analysis, manifattura 4.0 ecc.) che richiedono sempre di più l’integrazione con nuove tecnologie. Trattandosi di progetti innovativi, che impattano su processi e contesti aziendali, è fondamentale disporre di soluzioni flessibili per la raccolta e l’analisi dei dati.
MongoDB è un database NoSQL in grado di offrire flessibilità, scalabilità e semplificazione delle attività di sviluppo. Il lab avrà lo scopo di illustrare come creare architetture MongoDB e svolgere attività di Schema Design per la gestione dei dati in ambito IoT e Big Data facendo inoltre riferimento a casi pratici reali che si basano su tecnologie Cloud, necessarie a far fronte ad un mercato sempre più globale.
Codemotion Milano 2014 - MongoDB and the Internet of ThingsMassimo Brignoli
Time series are a classical example about the flexibility of the document approach. In this presentation you will see how to manipulate the documents to create a schema optimized for the time-series.
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
In this session it will presented main workflows and technologies of getting value from Big Data stored in our Enterprise using Azure.
- When we have a Big Data problem
- Finding the best solution for our Big Data
- Working inside the Data Team
- Extract the true value of our data.
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesMongoDB
In this session we will dive into some of the use-cases companies are currently deploying MongoDB for in the energy space. It is becoming more important for companies to make data driven decisions, and MongoDB can often be the right tool for analyzing the massive amounts of data coming in. Whether tracking oil well site statistics, power meter data, or feeds from sensors, MongoDB can be a great fit for tracking and analyzing that data, using it to make smart, informed business decisions.
Upgrading an application’s database can be daunting.Doing this for tens ofthousands of apps at atime is downright scary.New bugs combined with unique edge cases can result in reduced performance,downtime, and plenty of frustration. Learn how Parse is working to avoid these issues as we upgrade to 2.6 with advanced benchmarking tools and aggressive troubleshooting
If there is one crucial thing in building ML models, this would be the data preparation. That is the process of transforming raw data to a state where machine learning algorithms could be run to disclose insights and make predictions. Data preparation involves analysis, depends on the nature of the problem and the particular algorithms. As far as there are knowledge and experience involved, there is no such thing as automation, which makes the role of the data scientist the key to success.
ML is trendy and Microsoft already have more than 10 services to support ML. So we will focus on tools like Azure ML Workbench and Python for data preparation, review some common tricks to approach data and experiment in Azure ML Studio.
Webinar: Best Practices for Getting Started with MongoDBMongoDB
MongoDB adoption continues to grow at a record pace due to the significant enhancements in developer productivity and scalability that the database provides. Occasionally, however, organizations new to the technology make mistakes that limit their ability to leverage the significant advantages MongoDB provides. This webinar will discuss some of the common mistakes made by users when they first start working with MongoDB, how to identify when you've made those mistakes, and how to resolve them.
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
In the analysis of big data there are problematic queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most frequent items, joins, matrix computations, and graph analysis. If approximate results are acceptable, there is a class of sub-linear, stochastic streaming algorithms, called "sketches", that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of extracting results for these problem queries in real-time, sketches are the only known solution. For any analysis system that requires these problematic queries from big data, sketches are a required toolkit that should be tightly integrated into the system's analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days to hours, or minutes to seconds on a number of its internal platforms. This talk covers the current state of our Open Source DataSketches.github.io library, which includes adaptations and example code for Pig, Hive, Spark and Druid and gives architectural examples of use and a case study.
Speakers:
Jon Malkin is a scientist at Yahoo working to extend the DataSketches library. His previous roles have involved large scale data processing for sponsored search, display advertising, user counting, ad targeting, and cross-device user identity modeling.
Alexander Saydakov is a senior software engineer at Yahoo working on the open source Data Sketches project. In his previous roles he has been involved in building large-scale back-end data processing systems and frameworks for data analytics and experimentation based on Torque, Hadoop, Pig, Hive and Druid. Alexander’s education background is in the field of applied mathematics.
Building Your First MongoDB ApplicationRick Copeland
This talk will introduce the features of MongoDB by walking through how one can building a simple location-based checkin application using MongoDB. The talk will cover the basics of MongoDB's document model, query language, map-reduce framework and deployment architecture.
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
This intermediate-level talk will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.
DevOps is the new rage among system administrators, applying agile software development techniques to infrastructure configuration management. In the center of the DevOps movement is the open-source Chef tool, implemented in Ruby atop CouchDB. Unsatisfied with the performance of the open-source and/or hosted Chef server and needing better integration with our Python web application, we set out to build a new implementation in Python atop MongoDB. This talk will give you an overview of Chef, reasons for doing a new implementation, and lots of code examples of how we made it all work together to get a chef server that screams.
This talk is updated with the latest version of MongoPyChef, ported to run on Pyramid and open sourced at https://github.com/rick446/MongoPyChef
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
DevOps is the new rage among system administrators, applying agile software development techniques to infrastructure configuration management. In the center of the DevOps movement is the open-source Chef tool, implemented in Ruby atop CouchDB. Unsatisfied with the performance of the open-source and/or hosted Chef server and needing better integration with our Python web application, we set out to build a new implementation in Python atop MongoDB. This talk will give you an overview of Chef, reasons for doing a new implementation, and lots of code examples of how we made it all work together to get a chef server that screams.
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
This talk, given at PyGotham 2011, will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
With over 180,000 projects and over 2 million users, SourceForge has tons of data about people developing and downloading open source projects. Until recently, however, that data didn't translate into usable information, so Zarkov was born. Zarkov is system that captures user events, logs them to a MongoDB collection, and aggregates them into useful data about user behavior and project statistics. This talk will discuss the components of Zarkov, including its use of Gevent asynchronous programming, ZeroMQ sockets, and the pymongo/bson driver.
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRick Copeland
In 2009, SourceForge embarked on a quest to modernize our websites, converting a site written for a hodge-podge of relational databases in PHP to a MongoDB and Python-powered site, with a small development team and a tight deadline. We have now completely rewritten both the consumer and producer parts of the site with better usability, more functionality and better performance. This talk focuses on how we're using MongoDB, the pymongo driver, and Ming, an ORM-like library implemented at SourceForge, to continually improve and expand our offerings, with a special focus on how3 anyone can quickly become productive with Ming and pymongo without having to apologize for poor performance.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
2. Who am I?
• Now a consultant/trainer, but formerly...
• Software engineer at SourceForge
• Author of Essential SQLAlchemy
• Author of MongoDB with Python and Ming
• Primarily code Python
3. The Inspiration
• MongoDB monitoring service
(MMS)
• Free to all MongoDB users
• Minute-by-minute stats on all
your servers
• Hardware cost is important,
use it efficiently (remember it’s
a free service!)
4. Our Experiment
• Similar to MMS but not identical
• Collection of 100 metrics, each with per-
minute values
• “Simulation time” is 300x real time
• Run on 2x AWS small instance
• one MongoDB server (2.0.2)
• one “load generator”
5. Load Generator
• Increment each metric as many times as
possible during the course of a simulated
minute
• Record number of updates per second
• Occasionally call getLastError to prevent
disconnects
6. Schema v1
{
_id: "20101010/metric-1",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
metric: "metric-1" }, • One document
daily: 5468426,
hourly: {
per metric (per
"00": 227850, server) per day
"01": 210231,
...
"23": 20457 },
minute: { • Per hour/minute
"0000": 3612, statistics stored as
"0001": 3241,
... documents
"1439": 2819 }
}
7. Update v1
• Use $inc to
update fields in-
place
increment = { daily: 1 }
increment['hourly.' + hour] = 1
increment['minute.' + minute] = 1
• Use upsert to
db.stats.update(
{ _id: id, metadata: metadata },
create document
{ $inc: update }, if it’s missing
true) // upsert
• Easy, correct,
seems like a good
idea....
11. Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
12. Document movement
problem
• MongoDB in-place updates are fast
• ... except when they’re not in place
• MongoDB adaptively pads documents
• ... but it’s better to know your doc size
ahead of time
13. Midnight problem
• Upserts are convenient, but what’s our key?
• date/metric
• At midnight, you get a huge spike in inserts
14. Fixing the document
movement problem
• Preallocate
db.stats.update(
documents with
{ _id: id, metadata: metadata }, zeros
{ $inc: {
daily: 0,
hourly.0: 0,
hourly.1: 0,
...
• Crontab (?)
minute.0: 0,
minute.1: 0,
... } • NO! (makes
true) // upsert the midnight
problem even
worse)
16. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
17. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
18. Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
• Let’s just preallocate tomorrow’s docs
randomly as new stats are inserted (with
low probability).
20. Performance with
Preallocation
• Well, it’s better
Experiment startup
21. Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
22. Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
23. Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
24. End-of-day problem
“0000” Value “0001” Value “1439” Value
• Bson stores documents as an association list
• MongoDB must check each key for a match
• Load increases significantly at the end of the day
(MongoDB must scan 1439 keys to find the right minute!)
29. Historical Query
Problem
• Intra-day queries are great
• What about “performance year to date”?
• Now you’re hitting a lot of “cold”
documents and causing page faults
30. Fixing the historical
query problem
• Store multiple levels
{ _id: "201010/metric-1", of granularity in
metadata: {
date: ISODate("2000-10-01T00:00:00Z"), different collections
metric: "metric-1" },
•
daily: {
"0": 5468426, 2 updates rather than
"1": ...,
... 1, but historical
}
"31": ... },
queries much faster
• Preallocate along with
daily docs (only
infrequently upserted)
31. Queries
db.stats.daily.find( { • Updates are by
"metadata.date": { $gte: dt1, $lte: dt2 },
"metadata.metric": "metric-1"},
_id, so no index
{ "metadata.date": 1, "hourly": 1 } }, needed there
sort=[("metadata.date", 1)])
• Chart queries are
by metadata
db.stats.daily.ensureIndex({
'metadata.metric': 1,
• Your range/sort
'metadata.date': 1 }) should be last in
the compound
index
32. Conclusion
• Monitor your performance. Watch out for
spikes.
• Preallocate to prevent document copying
• Pay attention to the number of keys in your
documents (hierarchy can help)
• Make sure your index is optimized for your
sorts
33. Questions?
MongoDB Monitoring Service
http://www.10gen.com/mongodb-monitoring-service
Rick Copeland
@rick446
http://arborian.com
MongoDB Consulting & Training