Workload isolation sounds like a good idea. But what does that mean, are you currently doing it, and what are the pitfalls of not doing it (or doing it incorrectly)?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?MongoDB
Workload isolation sounds like a good idea. But what does that mean, are you currently doing it, and what are the pitfalls of not doing it (or doing it incorrectly)?
In this practical talk [for all levels] we will look at ways to isolate different workloads from each other, as well as look at some disaster stories (both real-life and hypothetical) that were a result of "doing it wrong.
This document describes the architecture and design of OpenDNS's DNS query logging and analytics system. Key points:
- Billions of DNS queries are processed daily and stored in distributed databases and analytics systems.
- A map-reduce style processing system ingests logs, aggregates data by network, and stores results.
- Data is partitioned by network to keep tables small and optimize performance.
- A multi-stage system processes raw logs, calculates statistics, and prunes old data to optimize storage. The results are accessed via API and dashboard.
The email discusses an issue with the "cluster-fork" command in Rocks, which is used to run commands on multiple nodes in a cluster. The original poster gets an error indicating a problem importing the "gmon.encoder" module. Others respond that this may be due to a corrupt ".pyc" file for this module. Suggestions are made to remove the ".pyc" file to regenerate it, and to check that the file's MD5 checksum matches expected values. It is discovered that regenerating the file does not solve the problem, and discussion continues on how to resolve the underlying issue with the Python module.
Building Data Driven Products With Ruby - RubyConf 2012Ryan Weald
Description
Slides from RubyConf 2012 talk:
"Big data and data science have become hot topics in the developer community during the past year. This talk will show how ruby is used to build real data driven products at scale.
Data scientist Ryan Weald walks through the building of data driven products at Sharethrough, from exploratory analysis to production systems, with an emphasis on the role Ruby plays in each phase of the data driven product cycle.
He discusses how Ruby interacts with other data analysis tools -- such as Hadoop, Cascading, Python, and Javascript -- with a constructive look at Ruby's weaknesses, and presents suggestions on how Ruby can contribute more to data science in the areas of visualization and machine learning."
This document provides an introduction to Cassandra including:
1) An overview of Cassandra's key architecture including its linear scalability, continuous availability across data centers, and operational simplicity.
2) A discussion of Cassandra's data model including its use of Last Write Wins for conflict resolution and examples of modeling one-to-many relationships using clustered tables.
3) Details on Cassandra's consistency levels and how they impact availability and durability of writes and reads.
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
The document summarizes an IOT ETL performance case study where the author collected water and electric meter data and loaded it into a database. The initial load of over 90 million documents from a 10GB file into a MongoDB database took over 4 hours. The author then redesigned the data schema, splitting it into hourly documents to improve query performance. This reduced the processing time to just 3 minutes and the data size to 13MB. The key lessons were that changing the data schema and using batch writes with multiple workers can dramatically improve ETL and query performance.
The document summarizes new features and improvements in Apache Cassandra 2.1, including enhanced performance, lightweight transactions, collection indexing, improved counters, incremental repair, and a new row cache. It also discusses Cassandra's use at eBay to power mission-critical features for hundreds of millions of users daily.
The document contains notes from a networking class, including discussions of scheduling web servers, network overload issues, solutions like adding more servers, and network routing concepts like traceroute. It provides information on how traceroute works by sending packets with decreasing time-to-live values and having routers reject packets when they reach 0 and return the source address.
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?MongoDB
Workload isolation sounds like a good idea. But what does that mean, are you currently doing it, and what are the pitfalls of not doing it (or doing it incorrectly)?
In this practical talk [for all levels] we will look at ways to isolate different workloads from each other, as well as look at some disaster stories (both real-life and hypothetical) that were a result of "doing it wrong.
This document describes the architecture and design of OpenDNS's DNS query logging and analytics system. Key points:
- Billions of DNS queries are processed daily and stored in distributed databases and analytics systems.
- A map-reduce style processing system ingests logs, aggregates data by network, and stores results.
- Data is partitioned by network to keep tables small and optimize performance.
- A multi-stage system processes raw logs, calculates statistics, and prunes old data to optimize storage. The results are accessed via API and dashboard.
The email discusses an issue with the "cluster-fork" command in Rocks, which is used to run commands on multiple nodes in a cluster. The original poster gets an error indicating a problem importing the "gmon.encoder" module. Others respond that this may be due to a corrupt ".pyc" file for this module. Suggestions are made to remove the ".pyc" file to regenerate it, and to check that the file's MD5 checksum matches expected values. It is discovered that regenerating the file does not solve the problem, and discussion continues on how to resolve the underlying issue with the Python module.
Building Data Driven Products With Ruby - RubyConf 2012Ryan Weald
Description
Slides from RubyConf 2012 talk:
"Big data and data science have become hot topics in the developer community during the past year. This talk will show how ruby is used to build real data driven products at scale.
Data scientist Ryan Weald walks through the building of data driven products at Sharethrough, from exploratory analysis to production systems, with an emphasis on the role Ruby plays in each phase of the data driven product cycle.
He discusses how Ruby interacts with other data analysis tools -- such as Hadoop, Cascading, Python, and Javascript -- with a constructive look at Ruby's weaknesses, and presents suggestions on how Ruby can contribute more to data science in the areas of visualization and machine learning."
This document provides an introduction to Cassandra including:
1) An overview of Cassandra's key architecture including its linear scalability, continuous availability across data centers, and operational simplicity.
2) A discussion of Cassandra's data model including its use of Last Write Wins for conflict resolution and examples of modeling one-to-many relationships using clustered tables.
3) Details on Cassandra's consistency levels and how they impact availability and durability of writes and reads.
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
The document summarizes an IOT ETL performance case study where the author collected water and electric meter data and loaded it into a database. The initial load of over 90 million documents from a 10GB file into a MongoDB database took over 4 hours. The author then redesigned the data schema, splitting it into hourly documents to improve query performance. This reduced the processing time to just 3 minutes and the data size to 13MB. The key lessons were that changing the data schema and using batch writes with multiple workers can dramatically improve ETL and query performance.
The document summarizes new features and improvements in Apache Cassandra 2.1, including enhanced performance, lightweight transactions, collection indexing, improved counters, incremental repair, and a new row cache. It also discusses Cassandra's use at eBay to power mission-critical features for hundreds of millions of users daily.
The document contains notes from a networking class, including discussions of scheduling web servers, network overload issues, solutions like adding more servers, and network routing concepts like traceroute. It provides information on how traceroute works by sending packets with decreasing time-to-live values and having routers reject packets when they reach 0 and return the source address.
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?MongoDB
Workload isolation sounds like a good idea. But what does that mean, are you currently doing it, and what are the pitfalls of not doing it (or doing it incorrectly)?
In this practical talk [for all levels] we will look at ways to isolate different workloads from each other, as well as look at some disaster stories (both real-life and hypothetical) that were a result of "doing it wrong.
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
In this session it will presented main workflows and technologies of getting value from Big Data stored in our Enterprise using Azure.
- When we have a Big Data problem
- Finding the best solution for our Big Data
- Working inside the Data Team
- Extract the true value of our data.
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...NoSQLmatters
More than two years ago we faced the decision whether to run our MongoDB database on Amazon's EC2 ourselves or to rely on a Database as a Service provider. Common wisdom told us that a well known provider, focusing all its knowledge and energy on running MongoDB, would be a better choice than us trying it on the side. Well, this talk describes what can go wrong, since we have seen a lot of interesting minor and major hiccups — including stopped instances, broken backups, a major security incident, and more broken backups. Additionally, we discuss some reasons why a hosted solution is not always the better choice and which new challenges arise from it.
The document discusses the hype around NoSQL databases and provides guidance on selecting the right database solution. It summarizes different database types and evaluates databases based on characteristics like concurrency control, data storage, replication, and transaction support. The document advises profiling applications carefully before selecting a database and avoiding premature decoupling of data.
Leo Kim from Foursquare presented several hacks and tools they developed to help monitor and maintain their MongoDB deployment at scale. Some of the hacks discussed include ODash and Mongolyzer for improved MongoDB monitoring, Mackinac for automated repair of fragmented data, a shard key checker to detect queries missing shard keys, and Chunksanity to verify data integrity by checking documents are on the correct shards. Foursquare uses these internal tools to help optimize their extensive use of MongoDB sharding and replica sets supporting over 5 million check-ins per day.
Presented by Ger Hartnett, Manager, Technical Services, MongoDB
Experience level: Advanced
Ger will take you on a ride through some memorable customer stories. Get to hear about some more unusual MongoDB use cases, the idiosyncratic choices behind them, and their path to success. You'll laugh, you'll cry, and you'll learn never to shard collections on booleans again.
What You Need To Know About The Top Database TrendsDell World
The last 5 years have seen transformative changes in both personal and enterprise technologies. Many of these changes have been driven by or are driving paradigm shifts in database technologies and information systems. These include trends such as engineered systems including Exadata, "Big Data" technologies such as Hadoop ,"NoSQL" databases, SSDs, in-memory and columnar technologies. In this presentation we’ll review these big trends and describe how they are changing the database landscape and influencing the career prospects for database professionals.
This document contains frequently asked questions (FAQs) about big data technologies like Hadoop, MongoDB, and related topics. Key topics covered include using Hadoop for processing large datasets, MongoDB features and administration, optimizing web crawlers, performing clustering on large datasets, and comparing algorithms like logistic regression, decision trees, and neural networks. Configuration parameters for Hadoop like dfs.name.dir and dfs.data.dir are also discussed.
The document provides an overview of key concepts in system design including:
1) Breaking problems into modules using a top-down approach and discussing trade-offs.
2) Architectural components like load balancing, databases, caching, and data partitioning that are important to consider in system design.
3) Database types like SQL and NoSQL and when each is best suited based on factors like data structure, scalability needs, and development agility.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
This will cover what to consider for high write throughput performance from hardware configuration through to the use of replica sets, multi-data centre deployments, monitoring and sharding to ensure your database is fast and stays online.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
1) Databases have been around for a long time but newer non-relational databases are gaining popularity. However, databases still have issues that can "kill you" like fragile replication and poor failover support.
2) The CAP theorem states that a distributed system cannot simultaneously provide consistency, availability, and partition tolerance. This introduced needed realism that different data stores solve different problems depending on their focus on two of these areas.
3) New non-relational databases like MongoDB provide better solutions for issues like replication and failover that were weaknesses of relational databases. Document stores in particular are good alternatives when data is not truly relational.
The NameNode was experiencing high load and instability after being restarted. Graphs showed unknown high load between checkpoints on the NameNode. DataNode logs showed repeated 60000 millisecond timeouts in communication with the NameNode. Thread dumps revealed NameNode server handlers waiting on the same lock, indicating a bottleneck. Source code analysis pointed to repeated block reports from DataNodes to the NameNode as the likely cause of the high load.
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?MongoDB
Workload isolation sounds like a good idea. But what does that mean, are you currently doing it, and what are the pitfalls of not doing it (or doing it incorrectly)?
In this practical talk [for all levels] we will look at ways to isolate different workloads from each other, as well as look at some disaster stories (both real-life and hypothetical) that were a result of "doing it wrong.
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
In this session it will presented main workflows and technologies of getting value from Big Data stored in our Enterprise using Azure.
- When we have a Big Data problem
- Finding the best solution for our Big Data
- Working inside the Data Team
- Extract the true value of our data.
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...NoSQLmatters
More than two years ago we faced the decision whether to run our MongoDB database on Amazon's EC2 ourselves or to rely on a Database as a Service provider. Common wisdom told us that a well known provider, focusing all its knowledge and energy on running MongoDB, would be a better choice than us trying it on the side. Well, this talk describes what can go wrong, since we have seen a lot of interesting minor and major hiccups — including stopped instances, broken backups, a major security incident, and more broken backups. Additionally, we discuss some reasons why a hosted solution is not always the better choice and which new challenges arise from it.
The document discusses the hype around NoSQL databases and provides guidance on selecting the right database solution. It summarizes different database types and evaluates databases based on characteristics like concurrency control, data storage, replication, and transaction support. The document advises profiling applications carefully before selecting a database and avoiding premature decoupling of data.
Leo Kim from Foursquare presented several hacks and tools they developed to help monitor and maintain their MongoDB deployment at scale. Some of the hacks discussed include ODash and Mongolyzer for improved MongoDB monitoring, Mackinac for automated repair of fragmented data, a shard key checker to detect queries missing shard keys, and Chunksanity to verify data integrity by checking documents are on the correct shards. Foursquare uses these internal tools to help optimize their extensive use of MongoDB sharding and replica sets supporting over 5 million check-ins per day.
Presented by Ger Hartnett, Manager, Technical Services, MongoDB
Experience level: Advanced
Ger will take you on a ride through some memorable customer stories. Get to hear about some more unusual MongoDB use cases, the idiosyncratic choices behind them, and their path to success. You'll laugh, you'll cry, and you'll learn never to shard collections on booleans again.
What You Need To Know About The Top Database TrendsDell World
The last 5 years have seen transformative changes in both personal and enterprise technologies. Many of these changes have been driven by or are driving paradigm shifts in database technologies and information systems. These include trends such as engineered systems including Exadata, "Big Data" technologies such as Hadoop ,"NoSQL" databases, SSDs, in-memory and columnar technologies. In this presentation we’ll review these big trends and describe how they are changing the database landscape and influencing the career prospects for database professionals.
This document contains frequently asked questions (FAQs) about big data technologies like Hadoop, MongoDB, and related topics. Key topics covered include using Hadoop for processing large datasets, MongoDB features and administration, optimizing web crawlers, performing clustering on large datasets, and comparing algorithms like logistic regression, decision trees, and neural networks. Configuration parameters for Hadoop like dfs.name.dir and dfs.data.dir are also discussed.
The document provides an overview of key concepts in system design including:
1) Breaking problems into modules using a top-down approach and discussing trade-offs.
2) Architectural components like load balancing, databases, caching, and data partitioning that are important to consider in system design.
3) Database types like SQL and NoSQL and when each is best suited based on factors like data structure, scalability needs, and development agility.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
This will cover what to consider for high write throughput performance from hardware configuration through to the use of replica sets, multi-data centre deployments, monitoring and sharding to ensure your database is fast and stays online.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
1) Databases have been around for a long time but newer non-relational databases are gaining popularity. However, databases still have issues that can "kill you" like fragile replication and poor failover support.
2) The CAP theorem states that a distributed system cannot simultaneously provide consistency, availability, and partition tolerance. This introduced needed realism that different data stores solve different problems depending on their focus on two of these areas.
3) New non-relational databases like MongoDB provide better solutions for issues like replication and failover that were weaknesses of relational databases. Document stores in particular are good alternatives when data is not truly relational.
The NameNode was experiencing high load and instability after being restarted. Graphs showed unknown high load between checkpoints on the NameNode. DataNode logs showed repeated 60000 millisecond timeouts in communication with the NameNode. Thread dumps revealed NameNode server handlers waiting on the same lock, indicating a bottleneck. Source code analysis pointed to repeated block reports from DataNodes to the NameNode as the likely cause of the high load.
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
React.js, a JavaScript library developed by Facebook, has gained immense popularity for building user interfaces, especially for single-page applications. Over the years, React has evolved and expanded its capabilities, becoming a preferred choice for mobile app development. This article will explore why React.js is an excellent choice for the Best Mobile App development company in Noida.
Visit Us For Information: https://www.linkedin.com/pulse/what-makes-reactjs-stand-out-mobile-app-development-rajesh-rai-pihvf/
🏎️Tech Transformation: DevOps Insights from the Experts 👩💻campbellclarkson
Connect with fellow Trailblazers, learn from industry experts Glenda Thomson (Salesforce, Principal Technical Architect) and Will Dinn (Judo Bank, Salesforce Development Lead), and discover how to harness DevOps tools with Salesforce.
The Role of DevOps in Digital Transformation.pdfmohitd6
DevOps plays a crucial role in driving digital transformation by fostering a collaborative culture between development and operations teams. This approach enhances the speed and efficiency of software delivery, ensuring quicker deployment of new features and updates. DevOps practices like continuous integration and continuous delivery (CI/CD) streamline workflows, reduce manual errors, and increase the overall reliability of software systems. By leveraging automation and monitoring tools, organizations can improve system stability, enhance customer experiences, and maintain a competitive edge. Ultimately, DevOps is pivotal in enabling businesses to innovate rapidly, respond to market changes, and achieve their digital transformation goals.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Penify - Let AI do the Documentation, you write the Code.KrishnaveniMohan1
Penify automates the software documentation process for Git repositories. Every time a code modification is merged into "main", Penify uses a Large Language Model to generate documentation for the updated code. This automation covers multiple documentation layers, including InCode Documentation, API Documentation, Architectural Documentation, and PR documentation, each designed to improve different aspects of the development process. By taking over the entire documentation process, Penify tackles the common problem of documentation becoming outdated as the code evolves.
https://www.penify.dev/
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
How GenAI Can Improve Supplier Performance Management.pdfZycus
Data Collection and Analysis with GenAI enables organizations to gather, analyze, and visualize vast amounts of supplier data, identifying key performance indicators and trends. Predictive analytics forecast future supplier performance, mitigating risks and seizing opportunities. Supplier segmentation allows for tailored management strategies, optimizing resource allocation. Automated scorecards and reporting provide real-time insights, enhancing transparency and tracking progress. Collaboration is fostered through GenAI-powered platforms, driving continuous improvement. NLP analyzes unstructured feedback, uncovering deeper insights into supplier relationships. Simulation and scenario planning tools anticipate supply chain disruptions, supporting informed decision-making. Integration with existing systems enhances data accuracy and consistency. McKinsey estimates GenAI could deliver $2.6 trillion to $4.4 trillion in economic benefits annually across industries, revolutionizing procurement processes and delivering significant ROI.
Transforming Product Development using OnePlan To Boost Efficiency and Innova...OnePlan Solutions
Ready to overcome challenges and drive innovation in your organization? Join us in our upcoming webinar where we discuss how to combat resource limitations, scope creep, and the difficulties of aligning your projects with strategic goals. Discover how OnePlan can revolutionize your product development processes, helping your team to innovate faster, manage resources more effectively, and deliver exceptional results.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
13. These Stories Are True
or they are based on stories that are true
• Based* on real cases filed with MongoDB Support
• All names changed
• Some details* may have been omitted
* some cases may have been combined and/or embellished to make a point
* just boring ones, not the really embarrassing ones
15. simple typo
replica set
had 45 days in the oplog
but oldest backup was 90+ days
but ... saved the DB files (all of them)
We accidentally remove()ed an entire collection.
Is there a way to undo?
16. recovery not for the faint of heart ... all data was recovered
Conclusion:
Replication ≠ Backups
Do regular backups.
Don't do production operations "ad hoc"
noSQL ≠ noDBA
We accidentally remove()ed an entire collection.
Is there a way to undo?
52. Query Scaling Rate Comparison
Number of shards
vs
Number of queries
Each query
target one shard
Per shard/system
TOTAL
Each query
target all shard
Per shard/system
TOTAL
1 10,000/10K 10K/10K
2 5,000/10K 10K/20K
5 2,000/10K 10K/50K
10 1,000/10K 10K/100K
If your application sends 10,000 queries
53. Query Scaling Rate Comparison
Number of shards
vs
Total query capacity
Each query
target one shard
Per shard/system
TOTAL
Each query
target all shard
Per shard/system
TOTAL
1 10K/10K 10K/10K
2 10K/20K 10K/10K
5 10K/50K 10K/10K
10 10K/100K 10K/10K
If each shard can process 10,000 queries
62. Operational Testing Built-in
User facing latency
Linear scaling of resources
Most important criteria?
• Generates realistic real-life-scale workload
• compared to Twitter, etc.
• Confirms architecture scales linearly
• without loss of responsiveness
73. Accidentally deleted DB directory
Restored recent backup... but mongod won't start
Backups were not good: taken incorrectly
Unable to start mongod process
74. Unable to start mongod process
Conclusion:
Most Important Part of Backups:
RESTORES
Test your backups
noSQL ≠ noDBA
Unable to start mongod process
77. January 13:
"DBA" adds a new shard
"DBA" observes that data does not seem to be migrating to new shard
"DBA" sets out to "fix" the "problem"
By "re-sharding" the database/collection in question
Which doesn't work (because it's already sharded)
"Simple" solution: remove the config DB metadata for chunks!
Try resharding again!
Force it!
Sequence of Events
78. How did we help them fix it?
My colleague
The Operation
82. At 4:30 PM, Friday alpha page comes in
Two senior support engineers work the ticket till 10 PM
The details:
33 node,
17 terabyte sharded cluster (11 shards, 3 node replica each),
single data center,
no journaling
no backups
Add to that:
power failure in the data center
no UPS
Result:
unreadable data on every node
noSQL ≠ noDBA
Sequence of Events
86. Dec 29 2013 10:35:00 AM: db.stats() is showing dataSize > fileSize
Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
87. Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
Dec 30 2013 02:57:00 AM: Yes. We deleted actual db files on both the primaries
and secondaries on the 28th.
88. Dec 29 2013 04:44:00 PM: "there are data files viewed as missing by `mongod`"
Dec 30 2013 02:12:00 AM: Seeing incorrect fileSize on numerous servers
you can see a drop in fileSize on 12/28 in MMS with
no corresponding drop in the other size metrics.
Dec 30 2013 02:19:00 AM: "[do] these databases have anything in common,
especially with the xxxx DB from yesterday?
Dec 30 2013 02:22:00 AM: Nothing comes to mind ... that DBs have in common
Sequence of Events
Dec 28 2013: Someone notices that they are low on disk space and as a solution
writes a shell script that finds every file on every disk on every
server that's bigger than 1GB in size and which hasn't been
accessed in >3 days.
And it then deletes it.
This script ran on every server deleting every database file bigger than 1GB
which hasn't been accessed in the previous few days...
Dec 30 2013 02:28:00 AM: We notice this all happening at the same time.
We think something might be deleting data files.
Dec 30 2013 02:31:00 AM: "Something is deleting data files outside mongod?"
Dec 30 2013 02:57:00 AM: Yes. We deleted actual db files on both the primaries
89. Ultimately, NO data was lost.
running `mongod` process keeps the "deleted" file from being removed
running `mongod` can recreate all data files via db.repair()
BUT... there is no disk space for db.repair()
Luckily, an extra server or two are "found" and allow rotating re-sync of
new secondary in each replica set.
Again: no data was lost. All data was fully and successfully recovered.
Guess the Outcome!