MongoDB World 2018: What&#x27;s Next? The Path to Sharded Transactions

Speaker: Andy Schwerin, VP, Engineering, MongoDB

Technology

What’s Next? Path to
Sharded Transactions
MongoDB 4.2

def run_transaction(client):
with client.start_session() as s:
s.start_transaction()
c1.insert_one(doc1, session=s)
// application logic
c1.find_one(match1, session=s)
// application logic
c2.insert_one(doc2, session=s)
//application logic
s.commit_transaction()

Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3
c1.insert_one(doc1, session=s)1 Uncommitted write

c1.find_one(match1, session=s)
Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3
2 Uncommitted write

Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3
c2.insert_one(doc2, session=s)3 Uncommitted write

Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3
s.commit_transaction() Uncommitted write
1. Select Coordinator
Coordinator

Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3
s.commit_transaction() Uncommitted write
1. Select Coordinator
2. Prepare
Coordinator
Prepared write

Application
Driver
Query Router Query Router
P
S
S
S
P
P
S
S
Shard 1 Shard 2 Shard 3
Failure case 1: Before Prepare
Uncommitted write
Coordinator

Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
Shard 1 Shard 2 Shard 3
Uncommitted write
Coordinator
Prepared write
Failure case 2: After Prepare Shard Primary Failure

Application
Driver
Query Router Query Router
P
S
S
S
P
P
S
S
Shard 1 Shard 2 Shard 3
Uncommitted write
Coordinator
Prepared write
Failure case 3: After Prepare Coordinator Failure

Two Phase Commit Support
----
Coming in 4.2
P
S
S
S
P
S
Query Router

Retryable Writes
Server-side write collision detection
Application
Driver
Network Error
Application
Driver
MongoDB
MongoDB
write unsuccessful
write successful
Network Error

Retryable Writes
Server-side write collision detection
Completed in 3.6
Application
Driver
Network Error
Application
Driver
MongoDB
MongoDB
write unsuccessful
write successful
Network Error

Logical Sessions
1) Distributed Context Sharing P
S
S
S
P
S
Query Router
1) Distributed Context Sharing
2) Distributed Garbage Collection

Logical Sessions
1) Distributed Context Sharing
2) Distributed Garbage Collection
Completed in 3.6

Cluster Time
Relative time across the cluster

Cluster Time
Relative time across the cluster
Completed in 3.6

Replica Set Snapshot Reads
Use Cluster Time to establish a
point in time view of the data
Completed in 4.0

Replica Set Transactions
---
Completed in 4.0
SP
S
MongoDB Replica Set

Global Snapshot Reads
Use Cluster Time to establish a
point in time view of the data
across the entire cluster
Coming in 4.2

Sharded Transactions
-
Coming in 4.2
Application
Driver
Query Router Query Router
P
S
S
S
P
S
P
S
S
Shard 1 Shard 2 Shard 3

Beta of Sharded
Transactions
mongodb.com/transactions

MongoDB World 2018: What's Next? The Path to Sharded Transactions

This document discusses enterprise security in the cloud. It covers identity and access controls, auditing, and encryption. For identity and access, it describes secure access controls like multi-factor authentication, role-based access controls, and dedicated virtual private clouds (VPCs). For auditing, it outlines activity logs, monitoring and alerts, and a real-time activity panel. For encryption, it discusses key management, different encryption service levels, and key service differences between AWS, GCP and Azure.

MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...

MongoDB can be configured to meet the requirements of active-active applications across multiple data centers. There are three main deployment patterns: 1) active-passive with one data center as primary, 2) partitioned databases with each data center owning a partition, and 3) multi-master with each data center acting as a master. The document discusses how to tune MongoDB for performance, consistency, availability, and durability using features like sharding, read preference, write concern, and causal consistency.

Streaming in Practice - Putting Apache Kafka in Production

This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production. We will touch on the following topics: - Patterns for integrating with existing data systems and applications - Metadata management at enterprise scale - Tradeoffs in performance, cost, availability and fault tolerance - Choosing which cross-datacenter replication patterns fit with your application - Considerations for operating Kafka-based data pipelines in production

Mysql Latency

srubinstein

The document discusses strategies for managing replication latency in a distributed database system. It provides examples of average and maximum replication latencies between different database nodes. It also summarizes different approaches tried to reliably clear caches when data is updated, including using a multicast notification bus, database queues, and splitting data functionally across nodes.

Openstack meetup lyon_2017-09-28

Xavier Lucas

This document summarizes the key aspects of a public cloud archive storage solution. It offers affordable and unlimited storage using standard transfer protocols. Data is stored using erasure coding for redundancy and fault tolerance. Accessing archived data takes 10 minutes to 12 hours depending on previous access patterns, with faster access for inactive archives. The solution uses middleware to handle sealing and unsealing archives along with tracking access patterns to regulate retrieval times.

Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...

We recently learned about “Fault Tree Analysis” and decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient. This talk should provide a framework for answers the following common questions a Kafka operator or user might have: -What guarantees can I promise my users? -What should my replication factor? -What should the ISR setting be? -Should I use RAID or not? -Should I use external storage such as EBS or local disks?

Instrumenting and Scaling Databases with Envoy

Daniel Hochman

Every request to a database at Lyft is proxied by Envoy, providing complete visibility into the L3/L4 aspects of database interactions. This allows engineers to easily visualize changes to a database's load profile and pinpoint the root cause if necessary. Lyft has also open-sourced codecs for MongoDB, DynamoDB, and Redis. Protocol codecs in combination with custom filters yield benefits ranging from operation-level observability to horizontal scalability via sharding. Using Envoy for this purpose means that enhancements are implemented once and usable across a polyglot stack. The talk demonstrates Envoy's utility beyond traditional RPC service interactions in the network.

Google file system

Roopesh Jhurani

The document describes the Google File System (GFS), which was developed by Google to handle its large-scale distributed data and storage needs. GFS uses a master-slave architecture with the master managing metadata and chunk servers storing file data in 64MB chunks that are replicated across machines. It is designed for high reliability and scalability handling failures through replication and fast recovery. Measurements show it can deliver high throughput to many concurrent readers and writers.

ProxySQL provides native support for high availability solutions like PXC, InnoDB Cluster, and regular MySQL replication. It can monitor the health of nodes and redirect traffic away from unavailable or stale nodes, improving availability. It supports various topologies out of the box through host groups, health checks, and failure detection. ProxySQL helps implement robust HA architectures by integrating these functions and allowing automatic traffic redirection based on node status.

Dataservices: Processing (Big) Data the Microservice Way

QAware GmbH

Apache Big Data 2017, Miami (Florida/USA): Talk by Josef Adersberger (@adersberger, CTO at QAware) Abstract: We see a big data processing pattern emerging using the Microservice approach to build an integrated, flexible, and distributed system of data processing tasks. We call this the Dataservice pattern. In this presentation we'll introduce into Dataservices: their basic concepts, the technology typically in use (like Kubernetes, Kafka, Cassandra and Spring) and some architectures from real-life.

Exactly-once Data Processing with Kafka Streams - July 27, 2017

This document discusses exactly-once processing in stream processing systems. It begins by defining exactly-once processing and describing some of the challenges in achieving it. It then outlines three options for achieving exactly-once processing with Kafka: at-least-once processing with deduplication, using Kafka's idempotent producer and transactions, and using Kafka Streams. The document focuses on Kafka Streams, describing how it provides exactly-once guarantees through transactional processing of data in batches across the processing topology.

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

HostedbyConfluent

Deploying Kafka to support multiple teams or even an entire company has many benefits. It reduces operational costs, simplifies onboarding of new applications as your adoption grows, and consolidates all your data in one place. However, this makes applications sharing the cluster vulnerable to any one or few of them taking all cluster resources. The combined cluster load also becomes less predictable, increasing the risk of overloading the cluster and data unavailability. In this talk, we will describe how to use quota framework in Apache Kafka to ensure that a misconfigured client or unexpected increase in client load does not monopolize broker resources. You will get a deeper understanding of bandwidth and request quotas, how they get enforced, and gain intuition for setting the limits for your use-cases. While quotas limit individual applications, there must be enough cluster capacity to support the combined application load. Onboarding new applications or scaling the usage of existing applications may require manual quota adjustments and upfront capacity planning to ensure high availability. We will describe the steps we took toward solving this problem in Confluent Cloud, where we must immediately support unpredictable load with high availability. We implemented a custom broker quota plugin (KIP-257) to replace static per broker quota allocation with dynamic and self-tuning quotas based on the available capacity (which we also detect dynamically). By learning our journey, you will have more insights into the relevant problems and techniques to address them.

Lifting the Blinds: Monitoring Windows Server 2012

Datadog

Open-source Infrastructure at Lyft

Daniel Hochman

Lyft open sources several infrastructure projects including Confidant for securely storing secrets, Discovery for service registration and lookup, Ratelimit for rate limiting requests, and Envoy as an edge and internal proxy. Envoy handles all service to service communication at Lyft and provides observability, load balancing, and integration with other services. These projects help Lyft build and operate a microservices architecture at large scale.

Red hat storage server replication past, present, & future

Taline Felix

Red Hat Storage provides synchronous and asynchronous replication capabilities both locally and remotely. Local replication uses a leader-based approach called Near Sync Replication (NSR) to improve bandwidth usage and avoid split-brain scenarios. Remote asynchronous geo-replication continuously and incrementally replicates data across sites using distributed change detection via consumable journals and configurable data synchronization methods. Future plans include replicating snapshots, supporting multi-master replication, and integrating with object storage targets. Red Hat Storage also features erasure coding, snapshots, deduplication, compression, checksums, and tiering to different storage media.

MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!

Oh no! My backups aren't progressing! If something happens in production now, and I don't have current backups, I'll be out of a job for sure! If these words resonate with you, don’t worry; you’re not the only one! Backup issues are one of the most common topics we deal with in Technical Services. In this talk, we will go through the backup flow, talk about where things might go wrong, and the symptoms you will see in the logs and the UI. We will also talk about other commands you can run to confirm the diagnosis, and how support can assist if you’re still stuck. Finally, we will talk about the new backup architecture in 4.2 and how it simplifies some of these concerns. This session is suitable for those with all levels of Ops Manager experience, but attendees should have a basic understanding of MongoDB’s replication process before attending this session. After this talk, you will have leveled up your backup superpowers, and can swoop in to save your job (and the day)!

Scale and Throughput @ Clicktale with Akka

In the world of big data we need to build services that will be able to collect massive data, save it and pass it to processing and analysis. However, building manageable, reliable services that are scalable and cost effective is not an easy task. The choice of eco-system, frameworks and programming language, as well as using solid engineering principles is also crucial for achieving this goal. I will share our journey and insights from rebuilding a cloud service in Linux eco-system using Scala, Akka Actors and Aerospike DB, at the end of which we gained 10 folds improvement of server usage with a much lighter, stable and reliable system that handles tens of millions of requests per hour.

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Redis Labs

Think you have big data? What about high availability requirements? At DataDog we process billions of data points every day including metrics and events, as we help the world monitor the their applications and infrastructure. Being the world’s monitoring system is a big responsibility, and thanks to Redis we are up to the task. Join us as we discuss how the DataDog team monitors and scales Redis to power our SaaS based monitoring offering. We will discuss our usage and deployment patterns, as well as dive into monitoring best practices for production Redis workloads

Spark Streaming @ Scale (Clicktale)

Our new product (Clicktale Experience cloud) requires processing up to half a million messages per second, sessionizing each "users" journey throughout a web page. In this talk we'll discuss how we have achieved that using Spark's stateful streaming capabilities with only few servers in production, the challenges we've faced and how we've solved them. We'll also take a look at Spark 2.2 (the brand new version) and its new stateful aggregation and talk about how we've used it in order to improve performance significantly.

Apache Incubator Samza: Stream Processing at LinkedIn

Chris Riccomini

Python & Cassandra - Best Friends

Jon Haddad

google file system

diptipan

The document describes Google File System (GFS), which was designed by Google to store and manage large amounts of data across thousands of commodity servers. GFS consists of a master server that manages metadata and namespace, and chunkservers that store file data blocks. The master monitors chunkservers and maintains replication of data blocks for fault tolerance. GFS uses a simple design to allow it to scale incrementally with growth while providing high reliability and availability through replication and fast recovery from failures.

Best practice-high availability-solution-geo-distributed-final

Marco Tusa

Nowadays implementing different grades of business continuity for the data layer storage is a common requirement. When designing architectures that include MySQL as a data layer, we have different options to cover the required target. Nevertheless we still see a lot of confusion when in the need to properly cover concepts such as High Availability and Disaster Recovery. Confusion that often leads to improper architecture design and wrong solution implementation. This presentation aims to remove that confusion and provide clear guidelines when in the need to design a robust, flexible resilient architecture for your data layer.

Kafka and Storm - event processing in realtime

Guido Schmutz

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.

MongoDB .local Bengaluru 2019: Distributed Transactions: With Great Power Com...

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Patrick McFadin

Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.

Google File System

guest2cb4689

This document provides an overview of the Google File System (GFS). It describes the key components of GFS including the master server, chunkservers, and clients. The master manages metadata like file namespaces and chunk mappings. Chunkservers store file data in 64MB chunks that are replicated across servers. Clients read and write chunks through the master and chunkservers. GFS provides high throughput and fault tolerance for Google's massive data storage and analysis needs.

The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...

In this baller talk, we will be addressing the elephant in the room that no one ever wants to look at or talk about: security. We generally never want to talk about configuring security because if we do, we allocate risk of penetration by exposing ourselves to exploitation. However, this leads to a lot of confusion around proper Kafka security best practices and how to appropriately lock down a cluster when you are starting out. In this talk we will demystify the elephant in the room without deconstructing it limb by limb. We will give you a notion of how to configure the following for BOTH clients and servers: * TLS or Kerberos Authentication * Encrypt your network traffic via TLS * Perform authorization via access control lists (ACLs) We will also demonstrate the above with a GitHub repo you can try out for yourself. Lastly, we will present a reference implementation of oauth if that suits your fancy. All in all you should walk away with a pretty decent understanding of the necessary aspects required for a secure Kafka environment.

OpenSlava Infrastructure Automation Patterns

Antons Kranga

Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...

Amazon Web Services

“Infrastructure as Code” has changed not only how we think about configuring infrastructure, but about the infrastructure itself. AWS has been at the core of this movement, enabling your infrastructure teams to benefit from software engineering best practices such as CI/CD, automated testing, and repeatable deployments. Now that you have mastered the art of managing your infrastructure as code, it’s time to leverage these same lessons for monitoring and metrics. In this session, we dive into how you can leverage tooling such as AWS, Terraform, and Datadog to programmatically define your monitoring so that you that you can scale your organizational observability along with your infrastructure, and attain consistency from local development all the way through production. Session sponsored by Datadog, Inc.

What's hot

Robust ha solutions with proxysql

Marco Tusa

Dataservices: Processing (Big) Data the Microservice Way

QAware GmbH

Exactly-once Data Processing with Kafka Streams - July 27, 2017

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

HostedbyConfluent

Lifting the Blinds: Monitoring Windows Server 2012

Datadog

Open-source Infrastructure at Lyft

Daniel Hochman

Red hat storage server replication past, present, & future

Taline Felix

MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!

Scale and Throughput @ Clicktale with Akka

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Redis Labs

Spark Streaming @ Scale (Clicktale)

Apache Incubator Samza: Stream Processing at LinkedIn

Chris Riccomini

Python & Cassandra - Best Friends

Jon Haddad

google file system

diptipan

Best practice-high availability-solution-geo-distributed-final

Marco Tusa

Kafka and Storm - event processing in realtime

Guido Schmutz

MongoDB .local Bengaluru 2019: Distributed Transactions: With Great Power Com...

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Patrick McFadin

Google File System

guest2cb4689

The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...

What's hot (20)

Robust ha solutions with proxysql

Dataservices: Processing (Big) Data the Microservice Way

Exactly-once Data Processing with Kafka Streams - July 27, 2017

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

Lifting the Blinds: Monitoring Windows Server 2012

Open-source Infrastructure at Lyft

Red hat storage server replication past, present, & future

MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!

Scale and Throughput @ Clicktale with Akka

Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog

Spark Streaming @ Scale (Clicktale)

Apache Incubator Samza: Stream Processing at LinkedIn

Python & Cassandra - Best Friends

google file system

Best practice-high availability-solution-geo-distributed-final

Kafka and Storm - event processing in realtime

MongoDB .local Bengaluru 2019: Distributed Transactions: With Great Power Com...

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Google File System

The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...

Similar to MongoDB World 2018: What's Next? The Path to Sharded Transactions

OpenSlava Infrastructure Automation Patterns

Antons Kranga

Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...

Amazon Web Services

GCF

Gaurav Menghani

The document describes a proposed grid computing framework that aims to make grid computing easier to deploy, use, and maintain. The framework would accept computational problems from users, distribute tasks to client machines based on dependencies and load balancing, collect and compile results from clients, and present outputs to the user. The framework is intended to address concerns with existing grid middleware being complicated and not accessible to all, and will be open source, Linux-based, and work on a moderately sized local area network.

Talk Python To Me: Stream Processing in your favourite Language with Beam on ...

Aljoscha Krettek

Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)

Make your application expressive

Christian Varela

Zero to Streaming: Spark and Cassandra

Russell Spitzer

Hunting for APT in network logs workshop presentation

OlehLevytskyi1

Nonamecon 2021 presentation. Network logs are one of the most efficient sources to hunt adversaries, but building good analytics capabilities require a deep understanding of benign activity and attacker behavior. This training focuses on detecting real-case attacks, tools and scenarios by the past year. The training is highly interactive and retains a good balance between theory and a lot of hands-on exercises for the students to get used to the detection engineering methodology and prepare them to start implementing this at their organizations. Presentation topics: - Netflow Mitre Matrix view - Full packet captures vs Netflow - Zeek - Zeek packages - RDP initial comprometation - Empire Powershell and CobaltStrike or what to expect after initial loader execution. - Empire powershell initial connection - Beaconing. RITA - Scanning detection - Internal enumeration detection - Lateral movement techniques widely used - Kerberos attacks - PSExec and fileless ways of delivering payloads in the network - Zerologon detection - Data exfiltration - Data exfiltration over C2 channel - Data exfiltration using time size limits (data chunks) - DNS exfiltration - Detecting ransomware in your network - Real incident investigation Authors: Oleh Levytskyi (https://twitter.com/LeOleg97) Bogdan Vennyk (https://twitter.com/bogdanvennyk)

Docker serverless v1.0

Thomas Chacko

Serverless computing is a cloud-native paradigm where developers build and run applications without managing infrastructure. It involves short-running, stateless functions that are triggered by events. With serverless, applications automatically scale up or down based on usage, and customers only pay for the compute time used. The document discusses serverless offerings from various cloud providers, demos serverless architectures using Docker containers, and notes serverless is well-suited for event-driven workloads like mobile backends and IoT but not long-running stateful processes.

Learning spark ch10 - Spark Streaming

phanleson

This chapter discusses Spark Streaming and provides an overview of its key concepts. It describes the architecture and abstractions in Spark Streaming including transformations on data streams. It also covers input sources, output operations, fault tolerance mechanisms, and performance considerations for Spark Streaming applications. The chapter concludes by noting how knowledge from Spark can be applied to streaming and real-time applications.

Day in a life of a node.js developer

The document provides an overview of the Mastering Node.js course from Edureka. The course objectives include understanding Node.js development basics, using Node's package manager npm, developing server-side applications, creating RESTful APIs, and testing and debugging code. The document also discusses uses cases of Node.js in areas like server-side web applications, high scalability, and low memory consumption. It covers basics of Node.js like building a simple web server and using Socket.io for real-time communication. Node.js developers can create RESTful APIs, and must learn to debug and test their code.

Day In A Life Of A Node.js Developer

Node.js uses JavaScript - a language known to millions of developers worldwide - thus giving it a much lower learning curve even for complete beginners. Using Node.js you can build simple Command Line programs or complex enterprise level web applications with equal ease. Node.js is an event-driven, server-side, asynchronous development platform with lightning speed execution. Node.js helps you to code the most complex functionalities in just a few lines of code.

NodeJS : Communication and Round Robin Way

The document provides an overview of the Mastering Node.js course offered by Edureka. It outlines the course objectives which include introducing Node.js, NPM, use cases, network communication, two-way communication using Socket.io, and cluster round robin load balancing. It also lists topics that will be covered in the course modules and highlights features like live online classes, class recordings, 24/7 support, quizzes, projects, and a verifiable certificate.

MongoDB- Crud Operation

This document outlines the topics covered in an Edureka course on MongoDB. The course contains 8 modules that cover MongoDB concepts like NoSQL, CRUD operations, schema design, administration, scaling, and interfacing MongoDB with other languages. Each module is further broken down into specific topics. The document provides examples of questions and answers from the course related to MongoDB concepts like typical uses cases, caching, differences between mongo and mongos, write concerns and more. Slide examples are included to illustrate MongoDB concepts like CRUD operations, queries, indexes and distributed architectures.

MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes

Presented by: Jason Mimick Technical Director, MongoDB MongoDB Ops Manager is an enterprise-grade end-to-end database management, monitoring, and backup solution. Kubernetes has clearly won the orchestration-platform "wars". In this session we'll take a deep dive on how you can leverage both these technologies to host your MongoDB deployments within your Kubernetes infrastructure whether that's OpenShift, PKS, Azure AKS, or just upstream. This talk will review the core technologies, such as containers, Kubernetes, and MongoDB Ops Manager. You'll also have a chance to see real-live demos of MongoDB running on Kubernetes and managed with MongoDB Ops Manager with the MongoDB Enterprise Kubernetes Operator.

Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...

Flink Forward

Knowledge Sharing Session on JavaScript Source Maps & Angular Compilation

Md.Zahidur Rahman

This document summarizes a knowledge sharing session on Javascript sourcemaps and Angular compilation. It discusses how sourcemaps allow minified code to be mapped back to original source code for debugging purposes. It also explains the different stages of Angular compilation including initialization, analysis, resolution, type checking and emitting. The key differences between just-in-time (JIT) compilation and ahead-of-time (AOT) compilation are outlined, noting that AOT produces smaller bundles but requires compilation during the build. The advantages of sourcemaps and AOT for production use are highlighted.

Software Architecture - Quiz Questions

CodeOps Technologies LLP

Software Architecture - Quiz Questions

Ganesh Samarthyam

The document describes an application with a pipe-and-filter architecture pattern where sensor data flows through multiple components that each transform the data before passing it to the next component and finally to a modeling and visualization unit. It then asks questions about software architecture patterns and styles like pipe-and-filter, lambda architecture, decorator pattern, Conway's law, architecture drift, REST, event sourcing, and recommends architecture refactoring when dependency analysis finds numerous cycles and tangles.

So you think you can stream.pptx

Prakash Chockalingam

- Spark Streaming allows processing of live data streams using Spark's batch processing engine by dividing streams into micro-batches. - A Spark Streaming application consists of input streams, transformations on those streams such as maps and filters, and output operations. The application runs continuously processing each micro-batch. - Key aspects of operationalizing Spark Streaming jobs include checkpointing to ensure fault tolerance, optimizing throughput by increasing parallelism, and debugging using Spark UI.

From Ruby to Node.js

jubilem

This is a presentation I gave in Helsinki Node.js meetup (check http://helnode.io). I have been implementing a realtime communication service with Ruby during my previous assignment. I've used Rails and lower level Ruby frameworks such as Sinatra and Resque workers. I do like especially the Rack, since it enables building an efficient server stack. You can throw in middleware for throttling, authentication and for other tasks quite easily. Ruby was a strong candidate also for my current project. I consider the Ruby code is more readable than JavaScript. However, once I understood what ECMAScript 6 brings in, I was sold to Node.js. Generators will enable actually very similar implementations than the Ruby's Rack stack. In my opinion, JavaScript will finally become mature with JS1.7 as the "callback spaghetti" will be soon history."

Similar to MongoDB World 2018: What's Next? The Path to Sharded Transactions (20)

OpenSlava Infrastructure Automation Patterns

Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...

GCF

Talk Python To Me: Stream Processing in your favourite Language with Beam on ...

Make your application expressive

Zero to Streaming: Spark and Cassandra

Hunting for APT in network logs workshop presentation

Docker serverless v1.0

Learning spark ch10 - Spark Streaming

Day in a life of a node.js developer

Day In A Life Of A Node.js Developer

NodeJS : Communication and Round Robin Way

MongoDB- Crud Operation

MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes

Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...

Knowledge Sharing Session on JavaScript Source Maps & Angular Compilation

Software Architecture - Quiz Questions

So you think you can stream.pptx

From Ruby to Node.js

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...