MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

•Download as PPTX, PDF•

0 likes•230 views

Bazaarvoice switched from a legacy MySQL platform to MongoDB to gain flexibility, scalability, performance and simplicity. The document discusses the considerations and process of selecting MongoDB, prototyping advantages, and iteratively optimizing the new platform over time. Key issues addressed included high database connection counts, Lambda functions impacting performance, and inefficient rule executions. MongoDB and Atlas provided capabilities like point-in-time recovery that helped during data corruption issues. The replatforming effort reduced costs while improving agility through MongoDB's flexible schema and ability to iteratively optimize as needed.

Technology

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.1
Replatforming: Switching to MongoDB
For Flexibility, Scalability, Performance, and Simplicity
June 27, 2018
Ani Hammond
Sr Staff Software Engineer,
Bazaarvoice

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.2
• Senior Staff Software Engineer
and Tech Lead at Bazaarvoice
• Currently excited about
serverless applications and
distributed services
• Always excited about simple,
intuitive products with a clear
mission
Github: aniham
Email: ani.popova@gmail.com
whoami

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.3
• Based in Austin, TX; 700 employees
worldwide; Recently taken private
What is Bazaarvoice?
530M
BLACK FRIDAY
470M
CYBER MONDAY
6000
PAGEVIEWS / SEC
Q
A 4.5

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.4
What is Curations?
• Social collection
• Content enrichment
• Social outreach
• Targeted display

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.5
Legacy Platform
$60,000/mo
Each client adds a few
hundred/month
Monolithic stack
Python/Django
MySQL Database
Single-tenant
Cluster per client
~400 clusters
Multi-tenant
services
Social outreach
Display

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.6
Legacy Platform: Issues
• Maintainability
• Debugging
• Patching
• Releasing
• Managing data
• Cost
• Single-tenant clusters (RDS, EC2)
• Elasticsearch cluster
• ETL and eventual consistency
• Elasticsearch usability
• MySQL usability

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.7
• Support different access patterns
Picking a new DB: Considerations
• Able to scale as the client base and content volume grows
• Be our own database administrator
Service Read Volume Write Volume Query Complexity Fault Tolerance
Collect
Enrich
Display

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.8
• Prototype advantages
• Easy to use
• Flexible schema
• Easy to export and share
Picking a new DB: Early dev and experiments
• Some early numbers
• A note about indexes
• No indexes to start
• Added as needed

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.9
New Platform
$6,500/mo
All services multi-tenant
Display Service
Constant high reads
Enrichment Service
Constant complex reads
Simple updates
Management Service
Low complex reads
Simple updates
Collection Service
Bursty high writes

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.10
DevOps
Cloud Manager
SECOND ITERATION
Cheap
Fast
Totally reasonable option
Atlas
THIRD (CURRENT) ITERATION
Cheaper than dedicated DevOps
Fast
Insights into indexes, long running
queries, performance glitches, and
more
Push button upgrades and scaling
Provision by hand
FIRST ITERATION
Cheap
Laborious
Not viable long term

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.11
• Start from zero
• Best guess on what works
• Iterate
• Kill the unused
Indexing and Optimizations

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.12
Problem 1: ...
Solution
HOW DID WE SOLVE
THINGS
Connection pools
Lesson
WHAT DID WE LEARN
Failover is expensive
Detection
Board metrics
indicated high
response time
Further digging
indicated >30K DB
connections
HOW DID WE FIND OUT
Manifestation
Database kept
failing over
Not responsive for
long periods of time
WHAT HAPPENED

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.13
Problem 2: ...
Solution
HOW DID WE SOLVE
THINGS
Discrepancy due to
Lambdas’
connections to
MongoDB
Switched from
Lambdas to
Dockerized services
Lesson
WHAT DID WE LEARN
Don’t use Lambdas
for constant
workload
Detection
Board metrics
indicated DB
queries taking 5
seconds
Atlas was indicating
queries taking <
100ms
HOW DID WE FIND OUT
Manifestation
Display response
time > 6 seconds
WHAT HAPPENED

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.14
Problem 3: ...
Solution
HOW DID WE SOLVE THINGS
Rules perform actions on
matching content,
unmatched content still
scanned in subsequent
executions
Exclude scanning
previously unmatched
content
Lesson
WHAT DID WE LEARN
Don’t rescan if
you don’t have to
Don’t let your DB
do all of your
work for you
Detection
Board metrics
indicated poor rule
execution time
HOW DID WE FIND OUT
Manifestation
Rules taking 30 min
to execute despite
multiple indexes
DB ops taking
minutes to
complete
WHAT HAPPENED

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.15
Problem 4: ...
Lesson
WHAT DID WE LEARN
Keep audits
Have a solid
recovery plan
Detection
Client complaints hit
us like a wet mop
HOW DID WE FIND OUT
Manifestation
Bad code caused
data corruption
WHAT HAPPENED
Solution
HOW DID WE SOLVE THINGS
Atlas point in time recovery
Cherry pick client
enrichment actions since
recovery (~12 hours)
Aggregations proved helpful
to cross-reference what was
changed when

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.16
• Scale and size/cost
• How we’ll address
• Cleanup unused content
• Partial indexes
Anticipated Future Issues

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.17
• Ability to give read only view to our services team
• An accidental test case for the rest of the company
• Many teams are using MongoDB they provision and manage themselves
• No maintenance
Nice Side Effects

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.18
• The text index is not for everyone
• Hint is good
• Even when you think MongoDB will pick the right index to use, it sometimes doesn’t
• Doesn’t work with updates :(
Mentions that don’t need a separate slide

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.19
• Bottlenecks happen, services break, requirements change, products evolve
• What makes a good datastore is not infallibility, but the tools and ability to
• Detect issues fast
• Diagnose
• Develop fast and recover
• Agility! Iteration!
Final thoughts

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.20
• Sebastian Wong, Kenney Wong, Frank Licea, Paul Durivage
• Praveen Kalamegham
Thanks

Confidential and Proprietary. © 2018 Bazaarvoice, Inc.21
Q & A

This document discusses MongoDB's cloud database offerings including MongoDB Atlas, Ops Manager, and Cloud Manager. It provides an overview of key features such as automated backups, point-in-time restore, queryable snapshots, global availability, security, and elastic scaling. The document also demonstrates MongoDB's managed backup capabilities in Atlas including cloud provider snapshots on AWS and Azure, as well as a roadmap for future disaster recovery features.

MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...

MongoDB

Using Puppet, Ansible, and Ops Manager, Paychex automated the provisioning of MongoDB databases to address the challenges of a manual process that took 1-2 weeks. Puppet is used for OS configuration and standardization. Ansible is used for orchestration to make API calls to Ops Manager for database provisioning in 15-25 minutes. Ops Manager provides management, automation, monitoring, and backups. Future work includes further integrating backups and restores using Ansible and integrating with additional tools like ServiceNow.

Meetup#7: AWS LightSail - The Simplicity of VPS - The Power of AWS

AWS Vietnam Community

This document provides an overview of Amazon Lightsail, including what it is, when to use it, available plans, key features, and a demo. Lightsail offers simple virtual private servers with bundled compute, storage and networking starting at $5 per month. It provides an easy way to launch fully configured servers in seconds and manage them through an intuitive console. Lightsail can be used to host simple websites, apps, or testing environments and allows access to additional AWS services.

Cloudsolutionday 2016: Docker & FAAS at getvero.com

AWS Vietnam Community

This document discusses Docker and cloud functions at Vero. It provides details on: 1. How Vero uses containers and Docker to run over 20 kinds of worker functions, handling up to 16 million emails per day and tracking customer actions. 2. How Vero built its own cloud function service to have more control over scaling and security compared to other cloud providers' services. 3. Challenges faced including startup latency initially and horizontal scaling, and how Vero overcame these challenges through auto-scaling and other techniques.

Tabtale story: Building a publishing and monitoring mobile games architecture...

Tikal Knowledge

At Tabtale we are setting up an entire server side for the all the publishing services. These services include dynamic game configurations, error collection, analytics, social services and more.Tabtale is among the world’s top app publishers with millions of downloads so we are putting a great deal of effort in creating an extremely highly scalable and fault tolerant architecture. In this talk I will go over the architecture decisions taken to support the scalability and diversity that is required from the server side services while keeping the management of this infrastructure sane. ~30min By Assaf Gannon

Henrique Rodrigues (NotOnTheHighStreet.com) - Building a Future-Proof Infrast...

Outlyer

This document discusses the evolution of notonthehighstreet.com's infrastructure from a monolithic Ruby on Rails application hosted on over 150 physical and virtual servers, to a microservices architecture using Docker containers, Mesos for clustering, and Consul for service discovery and configuration management. The goals of the new architecture were to build a scalable, self-service infrastructure that allows for easy creation and management of new services. Key aspects of the implementation include using Docker to define service environments, Mesos for container orchestration, Consul as a configuration store, and ELK stack for logging. Ansible is used for configuration management and deploying services via integration with tools like Marathon and Jenkins.

Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4

Outlyer

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...

Fastly

Braze is a customer engagement platform that delivers more than a billion messaging experiences across push, email, apps and more each day. In this session, Jon Hyman will describe the company's challenges during an inflection point in 2015 when the company reached the limitation of their physical networking equipment, and how Braze has since grown more than 7x on Fastly. Jon will also discuss how Braze uses Fastly's Layer 7 load balancing to improve stability and uptime of its APIs.

Braze uses MongoDB to store customer data and power sophisticated customer journeys. Nearly 10 billion customer profiles are stored across many MongoDB clusters. Campaigns for messaging customers are represented as documents with embedded objects for messages, scheduling, targeting, and conversions. Canvases orchestrate multi-step journeys by linking campaign documents through embedded steps and path variations. This data model allows Braze to quickly query customer segments and send hundreds of millions of personalized messages per hour.

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...

Outlyer

Matt Chung (Independent) - Serverless application with AWS Lambda

Outlyer

The talk will focus on how we are utilizing AWS Lambda for certain applications and the advantages/disadvantages, and the challenges we discovered along the way. It would help those who are looking to reduce technical debt with the infrastructure and costs. Previously a Director of technical operations at fox networks (21st Century Fox/News Corporation) responsible for infrastructure and building deployment pipelines. Currently a Python programmer / DevOps engineer with roots in systems/networks administration. Focus is on infrastructure and application automation. Worked as an engineer for Cisco Systems with emphasis on video conferencing. Built microwave networks at Bel Air Internet. Find me on github and twitter @itsmemattchung Video: https://www.youtube.com/watch?v=BLcElBUhfrQ Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London Follow DOXLON on twitter http://www.twitter.com/doxlon

Public and private cloud metadata and why it is useful

DevSecCon

The document discusses how cloud metadata and Ansible dynamic inventory can help DevOps and security teams collaborate more effectively. It provides examples of how metadata tagging servers can enable targeted operations like provisioning, patching, software installation based on tags. This allows implementing security processes into continuous delivery pipelines through automated and flexible configuration of servers. The speaker advocates that DevOps and security teams work together to build pipelines that use metadata and dynamic inventory to make security part of standard operations.

DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...

Frank van der Linden

The role of a human resources employee can be hard work when it comes to filtering hundreds or thousands of job applications. Often, the cover letters are submitted in unstructured formats, making data organization and identification of interesting job applications difficult. In this session, we will share a look behind the scenes of the award-winning HR Assistant application, which uses a combination of IBM Bluemix, Watson and a Cloudant database to improve the recruitment process. See how we integrate these different technologies and display the content graphically using XPages, along with how the development progressed and the challenges we faced.

Cloudsolutionday 2016: DevOps workflow with Docker on AWS

AWS Vietnam Community

The document discusses DevOps workflow with Docker on AWS. It describes using Docker to isolate application environments, increasing team productivity and decreasing development team size. Key elements include using Gitlab for source control and CI/CD, building Docker images via Gitlab runners, and deploying to Kubernetes clusters. The workflow allows writing applications once and running them anywhere and forever through continuous integration and delivery of Docker images to private container registries on AWS.

NGINX Amplify: Monitoring NGINX with Advanced Filters and Custom Dashboards

NGINX, Inc.

On-demand recording: https://nginx.webex.com/nginx/lsr.php?RCID=4bcbaff57fd6a02e4b3ca249917d3a1f NGINX Amplify is a new diagnostic tool that gives engineers and DevOps professionals visibility and control of NGINX instances and NGINX-delivered applications. Our new product provides insights to help you quickly troubleshoot application health and performance issues within a highly customizable interface. In addition to NGINX metrics, NGINX Amplify provides configuration analysis and reports, configurable alerts, and system-level metrics. Join us in this webinar to learn: * How to quickly install the NGINX Amplify agent on your server or in a container * How to build custom dashboards of metrics gathered from your NGINX instances * How to use advanced filters to pinpoint performance issues

Serverless Real-time Tracking & Analysis

Hery Hope

We adopted a serverless architecture to build a real-time analytics solution for tracking website usage. This involved using AWS Lambda functions triggered by events in Amazon Kinesis streams to index data from API requests in Amazon Elasticsearch. The serverless approach allowed us to focus on solving business problems rather than managing infrastructure, and provided built-in monitoring, auto-scaling, and pay-per-use billing. While some services like API Gateway could become expensive at high volumes, we optimized costs by batching requests and retrieving data in batches from Kinesis. The resulting solution met our goals of speed, cost-effectiveness, and reduced maintenance.

Scaling Marketplace to 10,000 Add-Ons - Arun Bhalla

Atlassian

Consolidating services with middleware - NDC London 2017

Christian Horsdal

Have many services? Writing new ones often? If so middleware can help you cut down on the ceremony for writting new services and at same time consolidate the handling of cross cutting concerns. But what is middleware? OWIN and ASP.NET Core both have a concept of middleware. What are they? How do they help? In this talk we will dive into the code, write some middleware and show how middleware helps you handle cross-cutting concerns in an isolated and re-usable way across your services. I'll compare and contrast the OWIN and ASP.NET Core middleware concepts and talk about where each is appropriate.

Serverless Code Deployments in AWS

Marko Tomic

Serverless Architecture

Saul Caganoff

The Hitchhiker’s Guide to Hybrid Connectivity

BizTalk360

Organisations are increasingly becoming aware of the immense power afforded by hybrid application architectures. Enterprise businesses can now leverage the scale, elasticity, economy and global reach afforded by Microsoft Azure whilst still retaining the investment and security of their on-premises LOB systems, helping them to maintain a competitive edge in a world where businesses are no longer constrained by geographic boundaries. Yet with so many options available for connecting systems, which one should you choose? In this session we will discuss the various Microsoft offerings for hybrid connectivity including Hybrid Connections, the On-Premises Data Gateway, Virtual Private Network, Service Bus WCF Relay and the new Azure Relay – and when best to use which.

Serverless CQRS in Azure!

BizTalk360

The CQRS pattern enables you to build highly scalable, distributed and event-driven applications. Microsoft Azure contains all the serverless building blocks you need to take advantage of the CQRS pattern. In this session, we’re going to transform a monolithic web app into a modern cloud application, that easily handles peak loads and offers great flexibility. Expect architectural guidance, cost-effective designs and live demo’s.

App Services - Connecting the dots of Web Mobile and Integration_published

Wagner Silveira

This document discusses Azure App Services, which provides an end-to-end platform for building web, mobile, and API applications. It includes Web Apps for hosting web applications, API Apps for creating and consuming APIs, Mobile Apps for building mobile apps, and Logic Apps for automating workflows and integrating apps and services. These services simplify development, deployment, security, availability, and integration across platforms and devices. The document provides an overview and demonstrations of each service.

Greetings from AWS User Group Taiwan

Cliff Chao-kuan Lu

Webinar: Gaining Insights into MongoDB with MongoDB Cloud Manager and New Relic

MongoDB

Serving Files In Azure

Sam Cogan

A technical discussion on the various options for providing SMB based File Services within Azure. Many lift and shift operations into Azure require some sort of file share and the lack of shared storage in Azure can make providing resilient file services an issue. This presentation will cover what options are available and they benefits and problems. This will include Azure Files, Storage Spaces Direct, DFSR and more.

Migrating .NET and .NET Core to Pivotal Cloud Foundry (1/2)

VMware Tanzu

This document discusses Capgemini's DevOps platform and solutions for addressing common industry challenges. It outlines tools and technologies like Pivotal Cloud Foundry, Spring, Kubernetes, and AWS that provide benefits such as instant provisioning, continuous integration and delivery, automation, self-healing applications, and independent platform upgrades without downtime. Contact information is provided for Capgemini executives to discuss these DevOps solutions.

Transforming Product Development in the Cloud (ENT306) - AWS re:Invent 2018

Amazon Web Services

The document discusses how cloud computing is transforming product development by enabling design thinking, agile teaming, DevOps, and achieving organizational flow. It provides examples of how companies are developing products faster and scaling ideas quickly using AWS services like EC2, Lambda, and Fargate. Microservices, two-pizza teams, and continuous testing allow Amazon to rapidly adapt based on customer feedback.

Cbt storage at scale use case deck ppt pdf

jaswantinxero

CB Technologies is a technology solutions provider that offers services including cloud computing, storage, analytics, IT supply chain optimization, and hybrid IT. It has world-class engineers and partnerships with industry-leading technology companies. CB Technologies takes a consultative approach to develop digital strategies and customize solutions for clients' business problems. It helps clients pilot innovations, scale solutions, and achieve business value through its customer success methodology. The document highlights CB Technologies' STORAGE@scale solution and how it helped a Fortune 100 energy company meet its storage needs for a private cloud environment at a competitive price point.

Cbt storage@scale use case deck (cl) (6.8.18)

Anand Raj

CB Technologies is a technology solutions provider that offers services including cloud computing, storage, analytics, IT supply chain optimization, and hybrid IT. It has world-class engineers and partnerships with industry-leading technology companies. CB Technologies takes a consultative approach to develop digital strategies and customize solutions for clients' business problems. It helps clients pilot innovations, scale solutions, and achieve business value through its customer success methodology of discovery, design, iteration, and delivery. The document provides an example of how CB Technologies helped a Fortune 100 energy company meet its storage needs for a private cloud environment through its STORAGE@scale solution.

What's hot

MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...

MongoDB

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...

Outlyer

Matt Chung (Independent) - Serverless application with AWS Lambda

Outlyer

Public and private cloud metadata and why it is useful

DevSecCon

DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...

Frank van der Linden

Cloudsolutionday 2016: DevOps workflow with Docker on AWS

AWS Vietnam Community

NGINX Amplify: Monitoring NGINX with Advanced Filters and Custom Dashboards

NGINX, Inc.

Serverless Real-time Tracking & Analysis

Hery Hope

Scaling Marketplace to 10,000 Add-Ons - Arun Bhalla

Atlassian

Consolidating services with middleware - NDC London 2017

Christian Horsdal

Serverless Code Deployments in AWS

Marko Tomic

Serverless Architecture

Saul Caganoff

The Hitchhiker’s Guide to Hybrid Connectivity

BizTalk360

Serverless CQRS in Azure!

BizTalk360

App Services - Connecting the dots of Web Mobile and Integration_published

Wagner Silveira

Greetings from AWS User Group Taiwan

Cliff Chao-kuan Lu

Webinar: Gaining Insights into MongoDB with MongoDB Cloud Manager and New Relic

MongoDB

Serving Files In Azure

Sam Cogan

Migrating .NET and .NET Core to Pivotal Cloud Foundry (1/2)

VMware Tanzu

What's hot (19)

MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...

Matt Chung (Independent) - Serverless application with AWS Lambda

Public and private cloud metadata and why it is useful

DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...

Cloudsolutionday 2016: DevOps workflow with Docker on AWS

NGINX Amplify: Monitoring NGINX with Advanced Filters and Custom Dashboards

Serverless Real-time Tracking & Analysis

Scaling Marketplace to 10,000 Add-Ons - Arun Bhalla

Consolidating services with middleware - NDC London 2017

Serverless Code Deployments in AWS

Serverless Architecture

The Hitchhiker’s Guide to Hybrid Connectivity

Serverless CQRS in Azure!

App Services - Connecting the dots of Web Mobile and Integration_published

Greetings from AWS User Group Taiwan

Webinar: Gaining Insights into MongoDB with MongoDB Cloud Manager and New Relic

Serving Files In Azure

Migrating .NET and .NET Core to Pivotal Cloud Foundry (1/2)

Similar to MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

Transforming Product Development in the Cloud (ENT306) - AWS re:Invent 2018

Amazon Web Services

Cbt storage at scale use case deck ppt pdf

jaswantinxero

Cbt storage@scale use case deck (cl) (6.8.18)

Anand Raj

CB Technologies is a technology solutions provider that offers services including cloud computing, storage, analytics, IT supply chain optimization, and hybrid IT. It has world-class engineers and partnerships with industry-leading technology companies. CB Technologies takes a consultative approach to develop digital strategies and customize solutions for clients' business problems. It helps clients pilot innovations, scale solutions, and achieve business value through its customer success methodology of discovery, design, iteration, and delivery. The document provides an example of how CB Technologies helped a Fortune 100 energy company meet its storage needs for a private cloud environment through its STORAGE@scale solution.

ENT206 Product Development in the Cloud

Amazon Web Services

Many organizations that embark on a journey to the cloud view their effort as an opportunity to transform their outdated operations and development practices. DevOps, Agile software development, and Design Thinking are the popular methodologies used today to successfully speed the delivery of new products and features and create a more customer-centric mindset. In this session, we break down the essential components of each method and provide tips on how to navigate common challenges when adopting these methods during a cloud migration.

Product Development in the Cloud

Amazon Web Services

This document discusses modern product development practices in the cloud. It covers topics like DevOps, Agile teaming, and Design Thinking. It emphasizes the benefits of cloud development like reducing costs and failure, quickly scaling ideas, and rapid adoption of new capabilities. Specific practices highlighted include using two-pizza teams, microservices architectures, and serverless computing. Real-world examples are provided of companies innovating on AWS cloud.

Product Development in the Cloud - ENT206 - Chicago AWS Summit

Amazon Web Services

The journey to the cloud is an opportunity to transform outdated operations and development practices. DevOps, Agile software development, and Design Thinking are some of the popular methodologies used today to successfully speed delivery of new products and features and create a more customer-centric mindset. In this session, we break down the essential components of these methods and share tips on navigating common challenges when adopting these methods during a cloud migration.

Enterprise DevOps: Begin with Production-Ready Migration (ENT217-R1) - AWS re...

Amazon Web Services

DevOps is a powerful movement that can help enterprises speed up their rate of innovation. But many customers think DevOps can work only with their cloud-native applications. Enterprise DevOps is a set of best practices anchored by real-life customer experiences that enable large organizations to apply the speed and agility of DevOps to all of their applications without sacrificing security and compliance. And it all begins with production-ready migration. In this session, you learn 1) how to execute your migration with successful ongoing operations in mind, 2) how to integrate existing operational models (e.g., ITIL) with modern cloud best practices (e.g., DevOps), and 3) how enterprises like National Australia Bank are leveraging the Enterprise DevOps framework to run their business.

An Agile Approach to Cloud Adoption

Amazon Web Services

Does this scenario sound familiar? You have taken on a project that can solve a core challenge that can provide measurable value to your business. It's an exciting opportunity to learn and grow, but that excitement quickly turns to anxiety as you are confronted with a list of unknowns and questions. Answering these questions, if they ever get solved completely, can be done in a myriad of ways. As new projects and challenges come along, the same questions and anxieties surface. Lather, rinse, repeat. Analysis paralysis leads to projects getting forced on to legacy architectures, deliver results that are "good enough", or stop projects before they get started. This session will illustrate how adopting an agile mindset to cloud adoption can end this cycle and meet your business needs.

Webinar: How Partners Can Benefit from our New Program (EMEA)

MongoDB

The 10gen partner ecosystem is growing quickly and includes leading software, hardware, cloud, channel and services companies who develop, market, sell and support solutions based on the MongoDB document database. We've created a Partner Program designed for companies looking to efficiently build new business or revenue streams based on MongoDB and capitalize on big data, cloud, mobile and other computing trends and opportunities related to our document-oriented database. Join this webinar for an introduction to 10gen, MongoDB and our partnership program. We're going to explain the benefits of becoming a a partner and common use cases and verticals for MongoDB. Directions and contacts will be given to companies interested in partnering with us in EMEA.

MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...

MongoDB

Cbt storage at scale use case deck ppt

jaswantinxero

CB Technologies is a premier technology solutions provider and diverse woman-owned business that delivers innovative solutions across asset intelligence, analytics and high-performance computing, IT supply chain optimization, and hybrid IT. It offers custom solutions including systems integration, hardware, software, and training. CB Technologies takes a strategic approach to developing digital strategies and pilots solutions to create business value for its customers.

What’s New in OpenText Media Management 16.3?

OpenText

Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018

Amazon Web Services

Modernization involves implementing business processes and technology that provide your business applications with high availability, agility, and elasticity. Nowhere is this more important than in breaking apart the monolith. Modernizing an application as part of a migration can be extremely successful if you follow the AWS migration methodology of “discover, plan, migrate, and optimize” as you move that application to the cloud. In this session, we share what we learned from over 400 successful migrations. We also show you how to virtually break a monolith to a modernized architecture as part of the planning phase and accelerate your migration using container technologies and application discovery tools.

Praxistaugliche notes strategien 4 cloud

Roman Weber

Mail is received as a commodity from the cloud, also Collaboration. However, in many client meetings we often hear the question, where are we heading with the hundreds of Notes applications? Which strategy is most effective and cost efficient at the same time? Is cloud a practical answer? With sound and proven methodology Notes applications can be transformed into valuable web applications in the cloud. It turns out that today the time has come for cloud platforms. A side view of large customer projects, already transforming their Notes applications to the cloud - for example to IBM SoftLayer - is helpful. This Track helps you understand that strategies that are implemented and lets you understand the costs and risks involved.

Integrated Agile and DevOps: DevOps 2.0 and Beyond

DevOps.com

2017 ushered DevOps into the enterprise mainstream but minimal progress has been made by enterprise organizations in their ability to deliver software value faster with less cost and less risk. To succeed in today’s complex and demanding environment and become a truly digital enterprise, companies need to be just as effective in their digital factories as we have become in our physical manufacturing environments. Join CollabNet’s Logan Daigle, DevOps Strategist and Agile Coach, as he examines some of the current technical challenges within our software value streams and identifies proven approaches that can dramatically accelerate digital transformation. Learn how: The Software Value Stream Is Unique DevOps is Both the Problem and the Solution Value Stream Management (VSM) Benefits Business and Technology Stakeholders VSM Has Advantages for Specific Key Job Roles And, Much More!

Innovation and Startups Today

Amazon Web Services

Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018

Amazon Web Services

Gartner predicts APAC cloud services spend will be $15.8B by 2020. With the SaaS segment growing at an average of 24% year on year, your customers are increasingly expecting to consume software as a service. In this session you will hear how SaaS delivery accelerates customer adoption, helping you go global, use the AWS SaaS Reference Architecture to implement best practices and leverage AWS programs to help you achieve success on your journey.

Leveraging the AWS Cloud Adoption Framework to Build Your Cloud Action Plan (...

Amazon Web Services

The document describes a workshop on using the AWS Cloud Adoption Framework (AWS CAF) to develop a cloud action plan. The workshop guides participants through activities to identify challenges to cloud adoption, organize them by stakeholder perspective, identify themes among the challenges, classify them as knowledge or process gaps, and develop actions to address the challenges. The goal is for participants to start populating an action plan grid to guide initial steps in addressing key adoption challenges over the next 2-4 weeks.

Principal: How Principal takes monitoring into the future to face new technol...

Dynatrace

Ryan Heard presented on Principal's enterprise monitoring strategy and transition from Dynatrace AppMon to Dynatrace. Principal has used AppMon since 2016 but is moving to Dynatrace to handle new technologies like containers and microservices, ease of maintenance with OneAgent, and advanced features like AI, Davis and automation. The plan is to start with select applications, address security needs, understand licensing, and provide training during the transition to leverage Dynatrace's full capabilities for Principal's evolving environment.

(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects

Amazon Web Services

Choice Hotels is undertaking a multiyear, $20 million project to recreate our core business engines on AWS. In trying to approach this complex undertaking, we determined that the project itself is a system too. You can apply principles of good architecture and design work in how you approach the project structure and management. Come to this talk by Choice Hotels’ CTO to learn five key lessons and 20 concrete takeaways that you can implement today to help your AWS projects succeed.

Similar to MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options (20)

Transforming Product Development in the Cloud (ENT306) - AWS re:Invent 2018

Cbt storage at scale use case deck ppt pdf

Cbt storage@scale use case deck (cl) (6.8.18)

ENT206 Product Development in the Cloud

Product Development in the Cloud

Product Development in the Cloud - ENT206 - Chicago AWS Summit

Enterprise DevOps: Begin with Production-Ready Migration (ENT217-R1) - AWS re...

An Agile Approach to Cloud Adoption

Webinar: How Partners Can Benefit from our New Program (EMEA)

MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...

Cbt storage at scale use case deck ppt

What’s New in OpenText Media Management 16.3?

Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018

Praxistaugliche notes strategien 4 cloud

Integrated Agile and DevOps: DevOps 2.0 and Beyond

Innovation and Startups Today

Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018

Leveraging the AWS Cloud Adoption Framework to Build Your Cloud Action Plan (...

Principal: How Principal takes monitoring into the future to face new technol...

(SPOT205) 5 Lessons for Managing Massive IT Transformation Projects

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded

AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf

Techgropse Pvt.Ltd.

In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

UI5 Controls simplified - UI5con2024 presentation

Wouter Lemaire

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

akankshawande

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/ Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit. In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing. van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair

Building Production Ready Search Pipelines with Spark and Milvus

Zilliz

Infrastructure Challenges in Scaling RAG with Custom AI models

Zilliz

Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.

Mariano G Tinti - Decoding SpaceX

Mariano Tinti

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Recently uploaded (20)

AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf

National Security Agency - NSA mobile device best practices

Full-RAG: A modern architecture for hyper-personalization

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Essentials of Automations: The Art of Triggers and Actions in FME

20240605 QFM017 Machine Intelligence Reading List May 2024

UI5 Controls simplified - UI5con2024 presentation

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Removing Uninteresting Bytes in Software Fuzzing

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

20240609 QFM020 Irresponsible AI Reading List May 2024

Building Production Ready Search Pipelines with Spark and Milvus

Infrastructure Challenges in Scaling RAG with Custom AI models

Mariano G Tinti - Decoding SpaceX

Presentation of the OECD Artificial Intelligence Review of Germany

“I’m still / I’m still / Chaining from the Block”

MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

1. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.1 Replatforming: Switching to MongoDB For Flexibility, Scalability, Performance, and Simplicity June 27, 2018 Ani Hammond Sr Staff Software Engineer, Bazaarvoice

2. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.2 • Senior Staff Software Engineer and Tech Lead at Bazaarvoice • Currently excited about serverless applications and distributed services • Always excited about simple, intuitive products with a clear mission Github: aniham Email: ani.popova@gmail.com whoami

3. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.3 • Based in Austin, TX; 700 employees worldwide; Recently taken private What is Bazaarvoice? 530M BLACK FRIDAY 470M CYBER MONDAY 6000 PAGEVIEWS / SEC Q A 4.5

5. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.5 Legacy Platform $60,000/mo Each client adds a few hundred/month Monolithic stack Python/Django MySQL Database Single-tenant Cluster per client ~400 clusters Multi-tenant services Social outreach Display

6. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.6 Legacy Platform: Issues • Maintainability • Debugging • Patching • Releasing • Managing data • Cost • Single-tenant clusters (RDS, EC2) • Elasticsearch cluster • ETL and eventual consistency • Elasticsearch usability • MySQL usability

7. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.7 • Support different access patterns Picking a new DB: Considerations • Able to scale as the client base and content volume grows • Be our own database administrator Service Read Volume Write Volume Query Complexity Fault Tolerance Collect Enrich Display

8. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.8 • Prototype advantages • Easy to use • Flexible schema • Easy to export and share Picking a new DB: Early dev and experiments • Some early numbers • A note about indexes • No indexes to start • Added as needed

9. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.9 New Platform $6,500/mo All services multi-tenant Display Service Constant high reads Enrichment Service Constant complex reads Simple updates Management Service Low complex reads Simple updates Collection Service Bursty high writes

10. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.10 DevOps Cloud Manager SECOND ITERATION Cheap Fast Totally reasonable option Atlas THIRD (CURRENT) ITERATION Cheaper than dedicated DevOps Fast Insights into indexes, long running queries, performance glitches, and more Push button upgrades and scaling Provision by hand FIRST ITERATION Cheap Laborious Not viable long term

12. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.12 Problem 1: ... Solution HOW DID WE SOLVE THINGS Connection pools Lesson WHAT DID WE LEARN Failover is expensive Detection Board metrics indicated high response time Further digging indicated >30K DB connections HOW DID WE FIND OUT Manifestation Database kept failing over Not responsive for long periods of time WHAT HAPPENED

13. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.13 Problem 2: ... Solution HOW DID WE SOLVE THINGS Discrepancy due to Lambdas’ connections to MongoDB Switched from Lambdas to Dockerized services Lesson WHAT DID WE LEARN Don’t use Lambdas for constant workload Detection Board metrics indicated DB queries taking 5 seconds Atlas was indicating queries taking < 100ms HOW DID WE FIND OUT Manifestation Display response time > 6 seconds WHAT HAPPENED

14. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.14 Problem 3: ... Solution HOW DID WE SOLVE THINGS Rules perform actions on matching content, unmatched content still scanned in subsequent executions Exclude scanning previously unmatched content Lesson WHAT DID WE LEARN Don’t rescan if you don’t have to Don’t let your DB do all of your work for you Detection Board metrics indicated poor rule execution time HOW DID WE FIND OUT Manifestation Rules taking 30 min to execute despite multiple indexes DB ops taking minutes to complete WHAT HAPPENED

15. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.15 Problem 4: ... Lesson WHAT DID WE LEARN Keep audits Have a solid recovery plan Detection Client complaints hit us like a wet mop HOW DID WE FIND OUT Manifestation Bad code caused data corruption WHAT HAPPENED Solution HOW DID WE SOLVE THINGS Atlas point in time recovery Cherry pick client enrichment actions since recovery (~12 hours) Aggregations proved helpful to cross-reference what was changed when

17. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.17 • Ability to give read only view to our services team • An accidental test case for the rest of the company • Many teams are using MongoDB they provision and manage themselves • No maintenance Nice Side Effects

18. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.18 • The text index is not for everyone • Hint is good • Even when you think MongoDB will pick the right index to use, it sometimes doesn’t • Doesn’t work with updates :( Mentions that don’t need a separate slide

19. Confidential and Proprietary. © 2018 Bazaarvoice, Inc.19 • Bottlenecks happen, services break, requirements change, products evolve • What makes a good datastore is not infallibility, but the tools and ability to • Detect issues fast • Diagnose • Develop fast and recover • Agility! Iteration! Final thoughts

Editor's Notes

Hello everyone, I’m so glad to be here My name is Ani Hammond Today I’m going to talk to you about my team's journey replatforming And the important role that MongoDB played in it I’ll show you guys what our old stack looked like, what our new (and much better) stack looks like now, obviously how mongo fits in it And then I’ll go over some interesting issues we encountered with our new platform and the solutions we came up with
Who am I? I’m a Software Engineer and Tech Lead at Bazaarvoice Spoiled Westerner, I like it when things are easy for me and I only get to do things I like to do, so my passions change over time and I get excited about different things. But currently I’m interested in serverless applications and distributed services and I’m always excited about simple intuitive products with a clear mission. I know it sounds like a stretch to call a database technology a simple product, But I think MongoDB fits my description perfectly because of how easy it is to develop with. And I’m going to make that more clear later on in the presentation Software engineer at Bazaarvoice… [next slide]
What is Bazaarvoice? Our mission at Bazaarvoice is to connect brands and retailers to consumers. What that means in non-marketing speak is that most of the user-generated content on brand and retailer sites is flowing through our network. By user-generated content, I mean ratings & reviews or Q&A or social content. Here’s a random collection of logos that Marketing said I could show, but to give you a better idea our prevalence, if you’re shopping online anywhere other than Amazon and reading a review, it’s probably powered by us. To give you an idea of the scale we deal with, here are some stats from last year's Black Friday and Cyber Monday On Black Friday we had 530 million total page views on our network which is over 6000/second On Cyber Monday we had 470 million total page views which is just under 5500/second For a total of a billion page views from just those 2 days. What does a pageview imply? Each one implies multiple API calls fanning out to dozens of services. My team, Curations, built and supports some of those services.
What is Curations? In short, the Curations platform allows a brand or retailer to display relevant social content in the path of purchase on their e-commerce site. Let me walk you through the flow [CLICK] Someone posts a cute picture of their child wearing Gymboree rain boots on their Instagram Using Curations, Gymboree is watching for content that mentions certain hashtags about their brand The Curations Social collection service picks up that post [CLICK] And shows it to Gymboree in the Curations application. Once in Curations, the post can be enriched in various automatic and manual ways. Enrichment means things like moderation approval and product identification. It can be done manually by the client, or by a set of automatic rules that define their needs. An example of an automatic rule would be to reject all content that includes profanity. [CLICK] Once the content is moderation approved, the Curations platform reaches out on behalf of Gymboree to request permission from the author of the post to use their cute picture on Gymboree’s ecommerce site. You probably can’t make out, but here you’d see a comment from Gymboree followed by approval from the user. [CLICK] Finally, now that we have the author’s permission, the post is shown in a Curations powered display on Gymboree’s site. Are there any questions? Ok good. So basically we collect the data, we enrich it, and we display it.
How does this work? All of our infrastructure is in AWS. [CLICK] In our legacy platform, every client (in previous example, Gymboree), had their own cluster which consisted of a MySQL RDS instance, one or more EC2 instances running a Python/Django stack, and a load balancer. So for roughly 400 Curations clients, we needed 400 clusters [CLICK]. [CLICK] Outside of these clusters, we had a couple multi-tenant services. A social outreach service responsible for requesting author permissions. And a display service responsible for returning enriched content to all of our client’s sites. To meet display level scales, which we talked about before, we would ETL our enriched content to a Bazaarvoice-wide Cassandra ring before indexing it into an Elasticsearch cluster for efficient querying. Clearly this is a challenging stack to manage. Just look at how many different types of datastores we’re dealing with, we have MySQL, we have Cassandra, we have ElasticSearch. And it’s not cheap either. [CLICK] This came out to $60k/mo. And each additional client would add a few hundred dollars per month. And this doesn’t even include the cost of all the beer we had to drink to be able put up with this nightmare.
So! Why a nightmare? [CLICK] Well, for one, there’s not much satisfaction in maintaining a platform that’s so obviously ineffective in terms of cost. When your every solution to scale is “throw more hardware slash money at it” it’s hard to feel innovative; especially when you know better solutions exist. I already mentioned the cost of adding a single client EC2/RDS cluster and that only becomes more expensive as this data gets ETL’d and re-indexed in elasticsearch and so on. [CLICK] Then there’s the issue of maintainability. Imagine a scenario where a team member gets paged in the middle of the night that some client’s RDS volume is running out of space. Now, for some of my teammates that meant waking up in the middle of the night and handling it. For me, as a lazier and less conscientious person, it meant turning off my phone, sleeping through the night, and handling it after a leisurely breakfast. Regardless though, it had to be handled and it involved someone logging into AWS and manually resizing a single database instance. Not to bring up this point again too much, but any resize also meant more money spent on a particular database. Debugging was hard, patching and releasing anything to 400 systems was just a nightmare, managing data (GDPR!) was a huge pain. A lot of effort spent on maintenance when none of us really wanted to do that. You guys saw are product, I think it’s so cool and it’s great at what it does, we wanted to work on making it better. But instead we were all dealing with devops AND we had a designated devops engineer that would babysit clusters and run “ansible scripts” (whatever the hell those are). [CLICK] Any system that relies on an ETL is also bound to have lag, so yet another can of worms [CLICK] And then usability. I know a lot of people love elasticsearch and it’s really awesome at what it does. But, personally, I find the query language super verbose and non-intuitive. Plenty of this could be lack of experience and expertise in elasticsearch; but I knew ten times as much in half the time when I started using Mongo, so I think that speaks volumes of its ease of use. [CLICK] And I’m not going to go into SQL, there are probably half a dozen talks about it going on right now (but mine is better!) Knowing what we knew about SQL and elasticsearch, as we started talking about replatforming, we also started considering different options for our next database.
So what considerations did we have when picking the new database? [CLICK] If you remember from my earlier slide, the curations platform does three main things - collect, enrich and display They each have different access patterns COLLECT is high volume writes, but it’s more fault tolerant. If you don’t collect for a few minutes or even a couple hours, it’s not the end of the world and usually nobody’s the wiser ENRICH is complex querying and moderate volume read/write. And the queries can be as complex as the user chooses to make them. We often see things like “get me all Twitter content from this geolocation mentioning #babyclothes and send it for human moderation” the third component, DISPLAY is high volume reads (about 300 requests per second) with no tolerance for latency or outage. Content is displayed on retailer ecommerce sites in the path of purchase. If it doesn’t show up, it can’t influence and less stuff gets sold :) So we needed to be able to support all those different access patterns [CLICK] Next, we need to be able to support a growing number of clients and volume of content. Not only do each of our clients see organic growth of 10-20% but since this is a newer product the number of clients is also growing every quarter. [CLICK] Finally, our team must be able to self-manage (i.e., our own DBA). But honestly, we didn’t want to have to think of this at all (or we wanted to think of it as little as possible) We had already set on Node JS as our language and we knew we’ll be using AWS lambda, elastic beanstalk, and a few other AWS services that the team had previously had positive experience with We had several options when deciding on a database Mongo wasn’t very widely used within the company which, like our previous stack, favored a combination of SQL and cassandra indexed by elasticsearch There was also some pushback from the designated DevOps team at the time indicating they’d have a hard time supporting mongo. The question of scale also often came up usually backed by anecdotal evidence. But, the dedicated DevOps team essentially said “you’re on your own” if you choose mongo
at the time our development started, we were still not decided on a database there were strong pushes for both cassandra and mongo. In retrospect i see that as a positive as it allowed us to design a fully database agnostic platform [CLICK] however prototyping with mongo is just very very easy - it’s easy to boot up a mongo instance locally, connect to it through a simple Node JS driver and do anything you need to do for your testing without fully having our schema worked out, mongo’s schema flexibility made it very easy to change things quickly as needed it’s also easy for someone working on the collection piece to run a mongo export and airdrop a bunch of data for someone else who is working on the enrichment piece to test with So even in our proof of concept phases we very naturally gravitated toward using mongo [CLICK] Some numbers we tested with initially Our collection services ran every 15 minutes and would write about 80,000 documents as fast as possible. It usually took a few seconds and the time was limited more by the social APIs than anything else. In production now we write close to a thousand documents every time we collect Enrichment services or rule execution. We tested with about 4,000 rules over 7 million documents. Execution took a few minutes with no indexes. In production now we have about 4,000 rules over 20 million documents Aaaand we did no display testing until later [CLICK] a quick side note - I’m going to speak more about indexes later on, but I just want to touch on it for a second here - we consciously used no indexes up front added them as needed. We did this in part because we didn’t know beforehand what indexes will be helpful and in part because we wanted to prove to ourselves that everything will scale
What did we end up with? [CLICK] Here we have the collection service which is a bunch of lambda functions triggered off of a kinesis stream (kinesis being the AWS real time streaming platform). They hit up the social channel APIs every 10-15 minutes. That’s our bursty high write traffic. This service and all the other ones you’re about to see are written in Node JS. [CLICK] Here is our enrichment service which is a part of an autoscaling group. These are our constant complex reads and simple updates. [CLICK] Same access pattern as our enrichment service, our management service allows users to directly log in and approve content or identify products. [CLICK] And last but not least, our display autoscaling group which is obviously constant high volume simple reads. What datastore is in the middle of all these pieces? Well, you guessed it, it’s a convoluted combination of SQL, Cassandra, and Elasticsearch. No, I’m just kidding, all those other conferences turned me down, so we decided to use Mongo instead! And thank god. So here is our new platform and it now comes with a price tag of [CLICK] $6,500/month. If you’ll remember from the earlier slide, our original cost was $60,000/month so this new platform is running at 10% of our earlier cost. Massive cost savings, huge performance gains, transactional consistency instead for waiting for stuff to propagate to display, handful of services instead of hundreds of clusters to maintain. Great stuff; not without its challenges - but we’ll get into those next It’s worth pointing out that this architecture is completely serverless or containerized. We have lambdas and a few dockerized autoscaling services. I’ll speak more about Atlas in the next few slides, but in terms of its place here, it fits great into this architecture where we just want to code and not worry about infrastructure.
Let’s talk about our DevOps decisions. [CLICK] In its first iteration our cluster was some EC2 instances we provisioned by hand. We put together a few memory optimized instances as a replica set, figured out what ports to have access to what, installed Mongo. It’s cheap, but not super easy to set up, and it wasn’t going to be a viable long-term solution. We actually had an old cluster that was set up by hand and running a side job in production. We never saw any issues with it, but we weren’t going to risk it this time. [CLICK] We did much better on the second iteration. We decided to use cloud manager which is still an option I would recommend to people on a budget It was again cheap, the installation was easy and fast, it allowed us to upgrade mongo versions quickly and scale with the push of a button A few kinks that we saw (and those are somewhat unique to our setup) had to do with dealing with our own VPCs within amazon (cloud manager didn’t have a seamless integration at the time) However, it allowed us to code fast and forget about our database for the most part. Again, a totally reasonable option Now, our third iteration was Atlas [CLICK]. What was great about it? Well it was much cheaper than having a dedicated DevOps engineer It is super fast to set up, can be set up to scale automatically It gave us insights into our indexes, long running queries, performance glitches, and more And updates took no time and no stress on our part whatsoever We recognize that some of the cooler things about Atlas like performance analytics and such can be done by hand. But it’s just tedious, less graphable, and of course, for things like showing database load and so on, Atlas just kills it. So, like I said earlier, between serverless and Atlas, our infrastructure basically manages itself and leaves our hands free to make great products which is what most of us are passionate about
How did we decide on our indexes? I already mentioned that we started from zero. We really just wanted to see what works and what doesn’t. In our experience, if we could get an index to narrow down the scan size to thousands of documents, then it struck the balance between index size and performance gain. I can obviously create an index that gets us down to a single document, but the cost of doing that is not worth it Once we started creating indexes we kind of went with our best guesses on what works. For example, for display we knew tags, client name, and timestamp were going to be in every query - easy! For the more complex enrichment rules, we really just ballparked our guesses. Sometimes we were right, sometimes we weren’t A big part of our philosophy is to do what makes sense at the time and build stuff that we can easily iterate on. That applied to our indexes as well Once we started using Atlas, we realized we weren’t using some of our indexes nearly as often as we thought we would be We were able to make smart decisions on which ones to kill An index killed is as valuable as an index added. Why? We want all our indexes to fit in memory. Unused indexes obviously work against that goal
Shifting gears a little bit, I wanted to talk about some of the problems we’ve encountered in our new platform over the last year, and how we tackled them. We all know, nothing in life is easy, we have this shiny new product we built, we got this amazing tool (our database), so we can just cruise from here on out, right? Uhhh actually yes, pretty much, but not quite. The first problem had to do with [CLICK] a random day when our database started just failing over again and again. During failover it would be unresponsive for minutes at a time and the pattern would repeat every hour or so [CLICK] How did we detect it? Our board metrics indicated high response times. Further digging indicated that we had over 30,000 open database connections at the time of failover (for those of you taking notes, how much does it take to bring down the Primary node? About 30,000 open connections) [CLICK] Tools we used to root cause. Datadog and the Mongo console. [CLICK] And the solution? Once we realized each request to our database was opening a new connection and those connections weren’t being closed fast enough, we switched to using connection pools. The lesson? Failover is not seamless and it’s not cheap. It’s great that it’s there, but it’s better when it doesn’t happen.
Another problem happened in the very early days of our new platform launch. [CLICK] Shortly after onboarding our first live client, we realized that the displays on their site were taking around 6 seconds to load [CLICK] How did we detect it? Well for this one the datadog board was obvious. What’s interesting is that it also indicated our database queries were taking more than 5 seconds; at the same time Atlas was telling us that database queries (for the same request) were taking less than 100 milliseconds. So what gives? [CLICK][CLICK] As it turns out, the discrepancy in the request times had to do with our Lambdas connecting to mongo. On cold start, a lambda would take about 5 seconds to connect to mongo, then the mongo query would take 100 milliseconds, and all would get recorded in Datadog as a single transaction. Our solution was a quick switch from running display off of lambda (which is what we were doing at the time) to the dockerized autoscaling service you guys saw in the earlier diagram
This next problem has to do with the execution of our complex rules. If you’ll remember from earlier, rules are a set of filters coupled with a set of actions. So for example, your filter is “everything that says rain boots and is moderation approved” and the action is “ask author for permission” [CLICK] As our rules started growing in complexity, we noticed that for all of them to execute it was sometimes taking 30 minutes or more. Individual database operations were taking minutes to complete despite multiple complex indexes. [CLICK] How did we detect it? Our Atlas board metrics indicated poor rule execution time and [CLICK] obviously Atlas was our tool to root cause the issue [CLICK] And how did we solve it? Well, we realized that our rules were performing actions on matching content, but unmatched content was still being scanned in subsequent executions. Our solution was to exclude scanning of previously unmatched content. And to do that we included a timestamp in our queries that only scanned content updated since the last time a rule ran. The lesson I took from this is don’t rescan content you don’t have to. This is a great example of an issue where someone might say, our database isn’t scaling and it’s not able to perform complex queries in reasonable time. Well guess what. Fix your code. Hardware is great, tools are great, but they can only carry you so far. I think we sometimes tend to be sloppier than we should be because hardware is so cheap and easy, but we have to write code responsibly too. Example if needed: Say we have 10000 documents in a collection, each has a color I run a query every 15 min to find all the red ones and take some action First time I run, I find 1000 and take some action. We tag these so they aren’t scanned next time. But the next time I run, the other 9000 docs that aren’t red still needed to be scanned.
And speaking of bad code, the last issue I’ll talk about today is when [CLICK] some bad code caused major data corruption in our database across all clients and most of our content (It wasn’t me!! Actually it was :() How did we detect it? Well we didn’t need the boards this time because client complaints started pouring in fast [CLICK] [CLICK] Our solution? An atlas point in time recovery. Because we depend on social data, we can actually tolerate data loss pretty reasonably We rolled back our database to a backup less than 12 hours ago, and cherry-picked client enrichment actions since recovery Aggregations proved very helpful to cross-reference what was changed when This was a very bad day. It was on a Friday of course, because those things always happen on a Friday. Yet, somehow, Atlas made recovery super easy and as pain-free as we could have hoped for, considering. The lesson - keep an audit and have a solid backup path to recovery. Before this happened, we kept talking about how we need to do a dry run on recovery and we kept saying we’ll do it, but we didn’t until we had to I’m sure many people in the audience are thinking the same thing now. I really encourage you to do it. You don’t want it to be the first time when it’s a production escalation
Some issues we anticipate we’ll encounter in the future. Scale, size, and cost, obviously. How do we plan to address these? One is clean up unused content. I really feel like most people use a lot less data than they think they use. This is a good opportunity to evaluate and clean up As our dataset grows, I see us utilizing more sparse indexes. For us, recent content is valuable, older content, not so much. For better or worse, no one cares what someone posted on Instagram two years ago. If you can reduce the size of your indexes by making them sparser, by all means do it
Some surprising side effect arose from being our own devops engineers. Our services team can log in and get read-only view to all kinds of data that’s not available in standard analytics screens Unlike a relational database like SQL, there is no need for a deep understanding of a complex schema It just allows for very intuitive querying that doesn’t take very deep domain knowledge to get your work done According to our product manager, there’s been an 80% reduction in tickets since the switch, definitely in part due to people being able to get the information they need without developers being involved Another positive (this one specifically has to do with Atlas) is that other teams are now considering going the hosted route. It’s easier for others to walk a beaten path, there’s less uncertainty, and more successful examples to speak of next time someone mentions “scale” And I can’t stress this enough. We expected to have to do a little bit of maintenance; we haven’t had to do any. It’s harder to put a number on things like this. You can say our hosting costs went from 60,000 to 6,500, but it’s harder to gauge how much money we’ve saved by not having to worry about our database. Old platform: 60,000, new platform: 6,500, getting to focus all my time on just development: priceless.
A couple things that I wanted to mention that didn’t really fit in their own slide. Of course it depends on your use case, but we haven’t been impressed by the text index. It’s huge, it can’t be compounded, and the search doesn’t always behave predictably. I would recommend narrowing your queries down and doing a regex search, if you can Hint is great. Even when you think Mongo will pick up the right index to use, it sometimes doesn’t. So, if you can add hint in your code, do it. Unfortunately it doesn’t work with updates, but since I’m speaking here, Mongo, this is an official request, please fix.
Some final thoughts. Bottlenecks happen, services break, requirements change, products evolve. What makes a good datastore is not infallibility, but the tools and ability to detect issues fast, diagnose, develop fast, and recover. I think that the value of a great datastore or any good tool really is that it allows you to be agile and iterate. And really to do what you’re passionate about, which in our case is code.
Why Cassandra? It's a bit of a bazaarvoice domain requirement as that is how our single source of truth datastore works at scale. They picked cassandra years ago to handle the globally-fault tolerant high write volume access pattern that we see for ratings and reviews across all our clients

MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

Similar to MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scalability, Performance, and Options

Editor's Notes