Sailthru has been using MongoDB for 4 years, pushing the system to scale. Maintaining a high degree of client-side customizability while growing aggressively has posed unique challenges to our infrastructure. We have maintained high uptime and performance by using monitoring that covers expected use patterns as well as monitoring that catches edge cases for new and unexpected access to the database. In this session, we will talk about Sailthru's use of MongoDB Management Service (MMS), as well as areas in which we have implemented custom monitoring and alerting tools. I will also discuss our transition from a hybrid backup solution using on-premise hardware and AWS snapshots, to using backups with MMS, and how this has benefited Sailthru.
Automate MongoDB with MongoDB Management ServiceMongoDB
MongoDB Management Service makes operations effortless, reducing complicated tasks to a single click. You can now provision machines, configure replica sets and sharded clusters, and upgrade your MongoDB deployment all through the MMS interface. We'll walk through demos of all the new MMS features, including provisioning, expanding and contracting a cluster, resizing the oplog, and managing users.
In this webinar, we'll discuss the different ways to back up and restore your MongoDB databases in case of a disaster scenario. We'll review manual approaches as well as premium solutions - using MongoDB Management Service (MMS) for managed backup to our cloud, or using Ops Manager at your own cloud/data centers.
For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com
Come learn about the different ways to back up your single servers, replica sets, and sharded clusters
In this webinar, we'll discuss the different ways to back up and restore your single servers, replica sets, and sharded clusters in case of a disaster scenario. We'll review various approaches, including taking filesystem snapshots, using mongodump and mongorestore, or leveraging MongoDB Management Service to backup and restore.
"MySQL on AWS RDS and its Myth" Kabilesh ( Co-founder Mydbops ). RDS has Proces and Con's , Kabilesh shares his experience on RDS and busted a few myths.
Automate MongoDB with MongoDB Management ServiceMongoDB
MongoDB Management Service makes operations effortless, reducing complicated tasks to a single click. You can now provision machines, configure replica sets and sharded clusters, and upgrade your MongoDB deployment all through the MMS interface. We'll walk through demos of all the new MMS features, including provisioning, expanding and contracting a cluster, resizing the oplog, and managing users.
In this webinar, we'll discuss the different ways to back up and restore your MongoDB databases in case of a disaster scenario. We'll review manual approaches as well as premium solutions - using MongoDB Management Service (MMS) for managed backup to our cloud, or using Ops Manager at your own cloud/data centers.
For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com
Come learn about the different ways to back up your single servers, replica sets, and sharded clusters
In this webinar, we'll discuss the different ways to back up and restore your single servers, replica sets, and sharded clusters in case of a disaster scenario. We'll review various approaches, including taking filesystem snapshots, using mongodump and mongorestore, or leveraging MongoDB Management Service to backup and restore.
"MySQL on AWS RDS and its Myth" Kabilesh ( Co-founder Mydbops ). RDS has Proces and Con's , Kabilesh shares his experience on RDS and busted a few myths.
A backup and recovery strategy is necessary to protect your mission critical data against the risk of catastrophic failure or human error. In this session, we'll discuss the different strategies to backing up and restoring your MongoDB clusters in case of a disaster scenario. We'll review the benefits and drawbacks of various approaches, including taking filesystem snapshots, using mongodump, or using MongoDB Management Service.
Introducing MongoDB in a multi-site HA environmentSebastian Geib
This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new site.
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
MongoDB 3.0 introduces a new pluggable storage engine API and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database. In this talk we will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Slide chia sẻ công nghệ về caching, thông qua slide này bạn sẽ trả lời được những câu hỏi như:
- Caching là gì
- Làm sao sử dụng cũng như xây dựng hệ thống caching
- Tại sao cache giúp tăng tốc ứng dụng lên vài chục, vài trăm lần
- Các hệ thống lớn của Facebook, Twitter, ... đang sử dụng cache thế nào
- ...
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
As we increasingly build applications to reach global audiences, the scalability and availability of your database across geographic regions becomes a critical consideration in systems selection and design.
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
These are the slides that the Redis Labs team had used to accompany the session that we gave during the first ever Redis Developers Day on October 2nd, 2014, London. It includes some of the ideas we've come up with to tackle operational challenges in the hyper-dense, multi-tenants Redis deployments that our service - Redis Cloud - consists of.
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
David Mytton is a MongoDB master and the founder of Server Density. In this presentation David delves deeper into what's discussed in our how to monitor MongoDB tutorial (https://blog.serverdensity.com/monitor-mongodb/), with the aim of taking you through:
Key MongoDB metrics to monitor.
Non-critical MongoDB metrics to monitor.
Alerts to set for MongoDB on production.
Tools for monitoring MongoDB.
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB
An Update on MongoDB's WiredTiger Storage Engine
Keith Bostic, Senior Staff Engineer, MongoDB
MongoDB Evenings Boston
Brightcove Offices
September 29, 2016
This slide was presented at Mydbops Database Meetup 4 by Bajranj ( Zenefits ). ZFS as a filesystem has good features that can enhance MySQL by compression, Quick Snapshots and others.
A backup and recovery strategy is necessary to protect your mission critical data against the risk of catastrophic failure or human error. In this session, we'll discuss the different strategies to backing up and restoring your MongoDB clusters in case of a disaster scenario. We'll review the benefits and drawbacks of various approaches, including taking filesystem snapshots, using mongodump, or using MongoDB Management Service.
Introducing MongoDB in a multi-site HA environmentSebastian Geib
This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new site.
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
MongoDB 3.0 introduces a new pluggable storage engine API and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database. In this talk we will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Slide chia sẻ công nghệ về caching, thông qua slide này bạn sẽ trả lời được những câu hỏi như:
- Caching là gì
- Làm sao sử dụng cũng như xây dựng hệ thống caching
- Tại sao cache giúp tăng tốc ứng dụng lên vài chục, vài trăm lần
- Các hệ thống lớn của Facebook, Twitter, ... đang sử dụng cache thế nào
- ...
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
As we increasingly build applications to reach global audiences, the scalability and availability of your database across geographic regions becomes a critical consideration in systems selection and design.
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
These are the slides that the Redis Labs team had used to accompany the session that we gave during the first ever Redis Developers Day on October 2nd, 2014, London. It includes some of the ideas we've come up with to tackle operational challenges in the hyper-dense, multi-tenants Redis deployments that our service - Redis Cloud - consists of.
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
David Mytton is a MongoDB master and the founder of Server Density. In this presentation David delves deeper into what's discussed in our how to monitor MongoDB tutorial (https://blog.serverdensity.com/monitor-mongodb/), with the aim of taking you through:
Key MongoDB metrics to monitor.
Non-critical MongoDB metrics to monitor.
Alerts to set for MongoDB on production.
Tools for monitoring MongoDB.
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB
An Update on MongoDB's WiredTiger Storage Engine
Keith Bostic, Senior Staff Engineer, MongoDB
MongoDB Evenings Boston
Brightcove Offices
September 29, 2016
This slide was presented at Mydbops Database Meetup 4 by Bajranj ( Zenefits ). ZFS as a filesystem has good features that can enhance MySQL by compression, Quick Snapshots and others.
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
The presentation will present an overview of the MongoDB NoSQL database, its history and current status as the leading NoSQL database. It will focus on how NoSQL, and in particular MongoDB, benefits developers building big data or web scale applications. Discuss the community around MongoDB and compare it to commercial alternatives. An introduction to installing, configuring and maintaining standalone instances and replica sets will be provided.
Presented live at FITC's Spotlight:MEAN Stack on March 28th, 2014.
More info at FITC.ca
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
Mondrian, MySQL, Mongo, Casandra, Lucene. You name it, we tried it. As a startup looking for cost-efficient and scalable solutions to power our event processing and statistics backend, we gave almost every Big Data technology out there a go. What we learned from these experiences is that doing it yourself is better than using plug-and-play black box solutions.
This presentation details the building of Incapsula’s Big Data system as a case study, examining the requirements and the different evolutionary phases it went through before becoming what it is today.
A walk-through of Joyent's Manta platform on SmartOS that explains how the illumos innovations of zones, zfs and Node.js led to the development of the Manta Object Store. Examples, primary manta commands and simple use-cases are provided to start using Manta to analyze Big Data in with any arbitrary Unix/Posix code without moving the data.
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
Internal presentation of Docker, Lightweight Virtualization, and linux Containers; at Spotify NYC offices, featuring engineers from Yandex, LinkedIn, Criteo, and NASA!
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
Comparing the burst buffers of today, such as the Cray DataWarp-based burst buffer implemented on NERSC Cori, to the proto-burst buffer deployed on SDSC's Gordon supercomputer in 2012.
From the current offensive and defensive technique arsenal, memory analysis applied to volatile memory is far from being the most explored channel. It is more likely to hear about input validation attacks or attacks against the protocol & cryptography while keys, passphrases, credit card numbers and other precious artifacts are kept unsafely in memory. This analysis arises as a mine waiting to be explored since it is sustained by one of the most vulnerable and unavoidable resource to systems, memory. From Java to Stuxnex, as well as Windows but without forgetting the Cloud, I will try to show some scenarios where these techniques can be applied, its impact as a threat and bring an important and fun subject not just to those who work in forensics but also to penetration testers as myself. Finally, I will also try to show how can this be used for defensive technologies as tools for monitoring and protection in networks with systems in production.
Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz
At taboola we are getting a constant feed of data (many billions of user events a day) and are using Apache Spark together with Cassandra for both real time data stream processing as well as offline data processing. We'd like to share our experience with these cutting edge technologies.
Apache Spark is an open source project - Hadoop-compatible computing engine that makes big data analysis drastically faster, through in-memory computing, and simpler to write, through easy APIs in Java, Scala and Python. This project was born as part of a PHD work in UC Berkley's AMPLab (part of the BDAS - pronounced "Bad Ass") and turned into an incubating Apache project with more active contributors than Hadoop. Surprisingly, Yahoo! are one of the biggest contributors to the project and already have large production clusters of Spark on YARN.
Spark can run either standalone cluster, or using either Apache mesos and ZooKeeper or YARN and can run side by side with Hadoop/Hive on the same data.
One of the biggest benefits of Spark is that the API is very simple and the same analytics code can be used for both streaming data and offline data processing.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
5. Scale The Universe!
Flat FRW Metric for isotropic cosmological geometry
Scale Factor
Related to the hubble constant for an expanding universe, this does a great job
of actually scaling our universe.
In fact the rate of expansion is continuing to grow and accelerate!
12. Sailthru
● Extremely early adopter of MongoDB ~2009
● 4 Clusters and 9 Stand-Alone RS
● Largest is 32 shards and 5.5TB with ~1.5 billion profiles
● All production systems are housed in a colo data center
on hardware owned and operated by Sailthru
13. Sailthru
● 4 DB Team Members
○ Me
○ Dr. Joshua Wickman
○ Chandrakant Gopalan
○ Tim Burrington
14. Sailthru
● Our systems are
composed of replica
sets of 2 live nodes
and 1 arbiter
● Many of our systems
are ‘microsharded’
PRIMARY
ARBITER
SECONDARY
PRIMARY
ARBITER
SECONDARY
15. Two tales of DBA struggle
No DBEs to DB team Mass Migration
What do you do if you have to
move data from one data-center to
another, while moving 17 replica
sets into a single sharded cluster
with no (minimal) downtime?
What do you do when you join an
organization which has been using
MongoDB without any DBA
oversight?
16. Welcome to the DB team
What are the most important things for a DB to set
up?
MONITORING BACKUPS
17. Monitoring
Microsharded systems are not easy to monitor!
● Multiple replica sets on a
single machine
● Primaries and Secondaries
often sharing hardware
● Monitoring systems for
Mongo are at a instance
level not server level
SHARD 1
PRIMARY
SHARD 2
SECONDARY
MEMORY
DISK IO
NETWORK IO
18. Monitoring - MMS
MMS is a great tool for all Mongo deployments
● Built in user level permissions
● Automatic topology discovery
● Graphs and time series data
● Breakdown by replica set for
clusters
● Pulls a wealth of data
19. Monitoring - MMS
● Built in alerting
● Many variable alerting
criteria
● Integration with email,
SMS, Pagerduty and
more
20. Monitoring - MMS
MMS is our backup monitoring system
● Alerting time sometimes lags
behind issue time
● Organizational decision not to
host MMS and that we need an
internal monitoring system as our
main monitor
21. Monitoring - MMS
What we are looking forward to:
● Proactive Support has some great features coming
through MMS
● Enhanced monitoring and alerting options
● Logging long queries? Non-indexed queries?
● Perhaps we can run custom scripts and checks against
the system eventually!
23. Monitoring - Zabbix
Monitoring mongo with Zabbix
https://github.com/sailthru/mongodb-zabbix
● Number of voting members
● Long query logging
● Chunk distribution in a sharded cluster
● Fsync lock status
● Failover notification
26. Monitoring - Zabbix
Zabbix does not have any automated topology discovery!
Sailthru has created its own MongoDB topological discovery
tool : DB Map
● Python Process
● Automatically discovers nodes or config changes
● Outputs all servers and information to a Mongo collection
27. Admin Tool - DB Map
Useful for many processes in our system
● Management scripts
● Execute aggregation queries to pull specific systems
● Keep Zabbix in sync using it as a source of truth
● Exportable for Ansible inventory files or other
management software
● Soon to be Open Sourced
Built By : Dr. Joshua Wickman
28. Backups
Many ways to skin a… cluster....?
● Volume snapshots (within our Datacenter)
● Snapshots of cloud secondaries (Hybrid Cloud)
● MMS Backups
29. Backups - Hybrid Cloud
SECONARY
(HIDDEN)
SECONDARY
PRIMARY
DATACENTER
CLOUD
Sailthru had a hybrid cloud-physical topology.
30. Backups - Hybrid Cloud
● Disaster recovery is immediate
● Backups can be taken care of by EC2
snapshotting
There are benefits to a hybrid setup
32. Backups - Hybrid Cloud
PRIMARY PRIMARY
SECONDARY SECONDARY
SECONDARY
(hidden)
SECONDARY
(hidden)
● Are these secondaries on
hardware provisioned
equally to the others?
● Is there enough bandwidth?
● Can the disks keep up with
bursts of write activity?
● Are the oplogs on these
secondaries long enough?
● Is the connection to the
cloud secure and stable?
33. Backups - Hybrid Cloud
DO YOU HAVE THE TIME AND RESOURCES
TO DO ALL OF THAT WORK??
We all just want backups that are fire-and-
forget it!
34. Backups - MMS
● Save on your team’s time
● Save on the provisioned hardware
● Much cheaper than hybrid cloud solution
Sailthru has saved almost 1 million
dollars year over year
35. Backups - MMS
● UI is easy to use and great
for small/individual sets
● Need automation in order to
bring up a cluster of any
reasonable size
○ Automation tools not yet
available out of the box
● Pulls your data across the
internet - make sure you
allocate this time!
36. The Power is Turning Off...
During 2014 Sailthru was forced to
move Data Centers
Additionally we made the infrastructure
decision to move from 17+ separate
replica sets to a sharded cluster.
38. Data Migrations - Dumps
DC1 DC2
Mongodump
Netcat Write to file then Mongorestore
● Lots of combinations, none ended up being fast enough.
● Hampered by disk writes and reads.
● If you touch disk you lose! The floor is lava!
39. Data Migrations - Mongopipe
Custom multiprocessing python process to insert
without hitting disk
● Using python, multiprocessing, ZMQ, and some custom
C objects
● Got around 2.4 bulk insert issue by sorting on shard key
● Never touches disk, all processing is done in memory
● Directly insert into many local mongos instances
● Open source coming soon!
42. Data Migrations - Mongo Connector
● Mongoconnector is a way to mirror mongodb operations,
creating almost a virtual secondary without adding it to a
replica set
● Great for data migrations without downtime
https://github.com/10gen-labs/mongo-connector
44. Access Patterns - Keystore
● What if I want to do a lot of findOnes on a cluster?
● On many unique fields?
● Am I doomed to many scatter gathers?
SHARD SHARD SHARD SHARD
MONGOSAssume sharded on _id: hashed
findOne({“ssn”: X}) findOne({“cell_phone”: X}) findOne({“_id”: X})
Created by : Ian White
45. Access Patterns - Keystore
Find by SSN
SHARDED COLL
Sharded on:
{_id: hashed}
Doc:
{
_id: SSN
sid: ObjectId()
}
Query on _id (shard key)
Return an ObjectId
Main Sharded Collection
Sharded on :
{_id: hashed}
Use sid that was found to query
the _id in the main collection
46. Access Patterns - Keystore
2 queries rather than n where n is your number of shards
** Not useful unless you are sharded out very far **
● Time averaged by keystore : ~30 seconds
● Time averaged by direct lookup: ~170 seconds
** tests done on a 32 shard cluster
47. Other Tools - Mongoexup
● Cron jobs are unreliable
● Any ‘prototype’ inevitably becomes production
● Constructed a python scheduler daemon to execute
these tasks
● Looking to open source in the future
Business need to regularly execute mongoexport and
uploads
Built By : Chandrakant Gopalan
48. Other Tools - Mongoexup
Mongo MongoExUp S3
Greenlets Greenlets
Job Status Information
49. What are we doing next?
● Open source even more of our tools
● Ansible Automation
● Building API layers around all our DBs
○ Tornado - ASYNC RULES
● MongoDB + Other Data Stores
○ Enhancing the Keystore concept
● Upgrading
○ WT
○ RocksDB