MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

•Download as PPTX, PDF•

1 like•491 views

MongoDB

Speaker: Lee Chuen Ooi, Senior Principle Engineer, Dell | Scott Hilleque, Architect, Dell

Technology

Breaking the mold -
Redesigning Dell’s E-
Commerce Platform
Prepared by:
Hilleque, Scott
Ooi, Lee Chuen

2
Agenda
• Dell Digital IT Introduction
• Advancing from Service-Oriented to Resource Oriented Architecture
• Taking Advantage of Pre-Computation
• Command Query Responsibility Separation (CQRS)
• Why Document Stores?
• Success story - DCQO application
– Legacy System using RDBMS and gap
– “Proof of concept” testing result summary
– MongoDB Architecture using cluster(sharding)
– Key notes during design & development
– Data migration strategy from RDBMS to MongoDB
– Ops Manager - operation, administration, backup/restore tool

3
Dell Digital IT “We moved Dell’s U.S.
business off of Dell Order
Management (DOMS), a
mainframe more than 30 years
old, to processes and tools that
better serve our customers.
Using the new processes and
tools:
• 90 percent of Dell’s
business now on a single
global order management
system
• 30 percent fewer consumer
orders result in cancellation
• Commercial orders need 40
percent less intervention
• Once placed, orders reach
factories 35 percent faster”
Product
Price
Payment
Cart
Quote
Order

4
Typical Service-Oriented Architecture
Dependency Service
Database
Commerce Service
Dependency Service
Database
Database
Client
Commerce
Service
Data
Dependency
Service
Dependency
Service
Run Time

5
Enhanced SOA
Cache
Denormalized
Database
Commerce Service
Dependency Service
Read Only Replica
Database
Dependency
Database
Data
Replication
CacheCache
Dependency ServiceService Bus
Change
Notification
Source
Database
Replication
Job
Run Time

6
Resource Oriented Architecture
Resources
GET http://quotes/quote1
GET http://quotes/quote2
POST http://quotes
PATCH http://quotes/quote3
GET http://quotes/quote3

7
Resource Oriented Architecture
Authoring Time
Dependency Service
Database
Commerce Service
Dependency Service
Database
Database
Resources
Notification
Service
Notification
Service
Run Time
Commerce Service
GET

8
Resource Oriented Architecture
Item
Product
Price
Inventory
Fulfillment
Discount
Accessories
Tax

19
CQRS- Command Query Responsibility
Separation
ServerClient
Command
Query
Write API
Read API
Domain
Model
Read
Queries
Commands
Queries
Transactional
Database
Non-Transactional
Database
Replication

20
Strategies for Managing Schema
Versions

21
What is DCQO application?
• Dell online Cart Quote Order
• Part of Dell’s digital transformation strategy
• Read write intensive
• 24*7 mission critical application
• Key revenue system for Dell Sales
• Current MongoDB size is 5.4TB (growing 40% YoY). There are
243 millions of records in one of the large collections.

22
Legacy System using RDBMS and gap
• Use RDBMS to store XML as LOB (large object)
• Create a quote: Serialize to XML -> Compress -> save as LOB
(large object)
• Retrieve a quote: Decompress LOB -> De-serialize the XML
• Concern – how far can it scale vertically with the current
design?

23
Proof of Concept testing result
summary
• MongoDB version : 3.2.10 (back in Dec 2016)
Test cases RDBMS Lob vs
MongoDB
standalone on
same hardware
spec
RDBMS Lob vs
MongoDB cluster
(Stress test until
breakpoint)
MongoDB replica
set vs MongoDB
cluster
Performance
testing result
+- 5% (no
significant
difference)
RDBMS errored out,
while MongoDB was
still consistently
performing.
+10% overhead
using MongoDB
Cluster

MongoDB Journey
continues …..
Design and Development

25
DCQO Application Architecture
Quote Item Price Contact
Mongo DB
Elasticsearch
Asynchronous
Synchronous
Data Warehouse
Quote Microservices
Eventual Consistency Replication

26
• Cluster/sharding is used because we need to support high IOPS and it is
easy to scale horizontally
• Total size of data bearing nodes (primary) = 5.4 Terabytes (still growing)
Plan to change Arbiter to Secondary
when DC 4 is ready

27
Shard Key
• Choose the right shard key to ensure equal distribution
• Shard key cannot be changed (be cautious)
• Use GUID to generate shard key in the application layer if the
shard key is needed in the application logic before saving in
MongoDB to save round trip.
• Sample GUID: wqw5cQRGD0yxyEnIs4i4nQ

28
Sizing
• 1) Snappy compression is enabled by default in WT.
• 2) Include WT compression when calculating storage required.
• 3) Data is uncompressed in memory
DB Storage size (GB)Actual Size (GB) Count WT compression
Db1 499 1,912 26,684,160 74%
Db2 17 38 6,216,016 55%

29
Schema design
• $lookup function doesn’t support sharded collection
• Not 1:1 table to collection mapping
• Limit to 16MB size per doc.
– Performance result: The get/save elapsed time increased 200%
when Json doc size increased from 67kb to 148kb.
• Schema less, not “schemaless”
• De-normalized schema - good for high read to write ratio, but
MongoDB version 3.6 can’t achieve an atomic update on different
documents
30 tables
quote_info
Export_info
Contact
Enduser
Fulfillment
Promotion
Phone
Salesrep
……..
1 collection
quote

30
Query/Update/Delete
• Check the MongoDB log to identify problematic query/transaction
• Use explain() to review the execution path
• Create index
• Use $indexStats to clean up unused index. > 5% performance
improvement
• Partial document update. ~20% performance improvement
• Simplify the query, avoid complex and repeated conditions
db.collection.find ({ $and: [ { “field1": { $ne: null } }, -- this line can be removed as it is covered by the next line “field1” =19.
{ "field1": "19" } ] })
db.collection.find({ $or: [ { "field1":{$in: [ "vf1", "vf2", "vf3",…,"vf100" ] }, "field2": { $in: [ "v1", "v2", "v3", "v100"] } }
,{ "field1":{$in: [ "vf1", "vf2", "vf3“ ,…,"vf100"] }} ]}) -- the 1st OR clause can be removed
• Pre-load data in performance database before performance testing

31
Security/User Management
• Remember to enable authentication
• Use LDAP to ease the user management
• Use Kerberos to avoid specifying encrypted password in the
application connection string
• Each application uses unique account to login to MongoDB

MongoDB Journey
continues …..
Data Migration from legacy
system

33
Migration strategy to make MongoDB live
Legacy RDBMS
MongoDb
1) populate last_modify_date
column, if not exists
3) Complete initial loading.
Application sends data
to both RDBMS and MongoDB
based on last_modify_date.
4) 1 month stabilization period
5) Stop application
Make sure the last data sync up
completes
6) Switch application to connect
to MongoDb
8) Application continues
to send data to both
MongoDB and legacy
RDBMS
7) Start application.
MongoDB is LIVE!
2) Setup MongoDB and
collections
Application

34
Ops Manager
• Monitor and send alerts - integrate with IT operation tool, ServiceNow
via SNMP (Simple Network Management Protocol) to create Incident
ticket automatically to operation team.

35
Ops Manager (continue…)
• Administration tasks (version upgrade, startup, shutdown,
configure cluster/replicateSet…..).
• Backup
• Take snapshot every 12 hours.
• Restore
• Testing result:
• The network transfer was 25MBps.
• < 4 hours to complete full auto restore of 3 shards clusters
with 2 TB size.

Question & Answer
scott_hilleque@dell.com
lee_chuen_ooi@dell.com

What's hot

Mongodb

Apurva Vyas

An Enterprise Architect's View of MongoDB

MongoDB

MongoDB vs Mysql. A devops point of view

Pierre Baillet

MongoATL: How Sourceforge is Using MongoDB

Rick Copeland

Building tiered data stores using aesop to bridge sql and no sql systems

Regunath B

In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value. MongoDB is more than just a great application database for developers; it gives Enterprise Architects new capabilities to solve previously difficult architectural requirements much more easily. Take for example the challenge of many siloed systems at MetLife – with MongoDB, the Metlife team was able to successfully provide a single view into those 70 systems, in only 3 months. In this webinar, we will: Explore real life challenges enterprises face with case studies of their solutions Consider how best to introduce MongoDB in the enterprise Give an overview of how to optimize the use of MongoDB

Webinar: An Enterprise Architect’s View of MongoDB

MongoDB

TCO Comparison MongoDB & Oracle

El Taller Web

Prepare for Peak Holiday Season with MongoDB

MongoDB

MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)

MongoDB

This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.

MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...

MongoDB

Working with Large Firebird databases

Mind The Firebird

MongoDB Operations for Developers

MongoDB

Migration to ClickHouse. Practical guide, by Alexander Zaitsev

Altinity Ltd

Rapid Development and Performance By Transitioning from RDBMSs to MongoDB Modern day application requirements demand rich & dynamic data structures, fast response times, easy scaling, and low TCO to match the rapidly changing customer & business requirements plus the powerful programming languages used in today's software landscape. Traditional approaches to solutions development with RDBMSs increasingly expose the gap between the modern development languages and the relational data model, and between scaling up vs. scaling horizontally on commodity hardware. Development time is wasted as the bulk of the work has shifted from adding business features to struggling with the RDBMSs. MongoDB, the premier NoSQL database, offers a flexible and scalable solution to focus on quickly adding business value again. In this session, we will provide: - Overview of MongoDB's capabilities - Code-level exploration of the MongoDB programming model and APIs and how they transform the way developers interact with a database - Update of the exciting features in MongoDB 3.0

Benefits of Using MongoDB Over RDBMSs

MongoDB

MariaDB Platform X5 will include MariaDB MaxScale 2.5 (with its brand-new web UI for configuration and monitoring) and MariaDB ColumnStore 1.5 (with cluster management reimplemented in MariaDB MaxScale for improved ease of use and deployment). In addition to the new web UI, MariaDB MaxScale 2.5 will be introducing support for distributed caches such as Redis, streaming to Apache Kafka, and a completely rewritten binlog router. In this session, we’ll provide a short overview of MariaDB MaxScale and ColumnStore followed by a walkthrough of new features and a short discussion of plans for the next releases.

What to expect from MariaDB Platform X5, part 2

MariaDB plc

MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)

MongoDB

Presented by Sigfrido Narvaez, Senior Solutions Architect, MongoDB Experience level: Introductory When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice? In this session you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.

MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...

MongoDB

How leading financial services organisations are winning with tech

MongoDB

Albin John

ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.

ClustrixDB: how distributed databases scale out

MariaDB plc

What's hot (20)

Mongodb

An Enterprise Architect's View of MongoDB

MongoDB vs Mysql. A devops point of view

MongoATL: How Sourceforge is Using MongoDB

Building tiered data stores using aesop to bridge sql and no sql systems

Webinar: An Enterprise Architect’s View of MongoDB

TCO Comparison MongoDB & Oracle

Prepare for Peak Holiday Season with MongoDB

MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)

MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...

Working with Large Firebird databases

MongoDB Operations for Developers

Migration to ClickHouse. Practical guide, by Alexander Zaitsev

Benefits of Using MongoDB Over RDBMSs

What to expect from MariaDB Platform X5, part 2

MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)

MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...

How leading financial services organisations are winning with tech

MongoDB

ClustrixDB: how distributed databases scale out

Similar to MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

Webinar: High Performance MongoDB Applications with IBM POWER8

MongoDB

MongoDB Europe 2016 - The Rise of the Data Lake

MongoDB

Novedades de MongoDB 3.6

MongoDB

L’architettura di classe enterprise di nuova generazione

MongoDB

Building FoundationDB

FoundationDB

L’architettura di Classe Enterprise di Nuova Generazione

MongoDB

Webinar: Migrating from RDBMS to MongoDB

MongoDB

What's new in MongoDB 3.6?

MongoDB

Cloud Data Strategy event London

MongoDB

MongoDB Tick Data Presentation

MongoDB

MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.

Agility and Scalability with MongoDB

MongoDB

mongoDB: Driving a data revolution

MongoDB

Final_CloudEventFrankfurt2017 (1).pdf

MongoDB

Since inception of MongoDB as a NoSQL database system, roughly half of deployments have been on commercial cloud, providing Infrastructure as a Service. Business users have realized benefit of instant, elastic procurement of servers and offloading costs from traditional data center architecture. The next phase of cloud service architecture is Database as a Service, which has been accelerating dramatically the last year among large enterprise customers of MongoDB. We will explore integration with varying enterprise cloud architectural requirements, MongoDB best practices as applied to fundamental architectural choices, and collaboration with the business owners to ensure a good match of needs and value. We will also address accounting, chargeback integration, and quanification of benefits to the enterprise, such as standardizing elastic architecture and offloading database system maintenance costs.

Enterprise Trends for MongoDB as a Service

MongoDB

Paradigm shift in IBM's OLAP solutions and look deeply at IBM Cognos 10.2 Dynamic Cubes. View the webinar video recording and download this deck: http://www.senturus.com/resources/dynamic-cubesin-cognos-10-2-jan/. This webinar included discussions and demonstrations of IBM Cognos 10.2 Cube Designer infrastructure requirements and deployment proven practices, Dynamic Cube Designer, and the world of OLAP in 2013. Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.

IBM Cognos 10.2 Dynamic Cubes Deeper Dive

Senturus

AquaQ Analytics Kx Event - Data Direct Networks Presentation

AquaQ Analytics

Enterprise architectsview 2015-apr

MongoDB

La creación de una capa operacional con MongoDB

MongoDB

Mongo db 3.4 Overview

Norberto Leite

Challenges of Implementing an Advanced SQL Engine on Hadoop

DataWorks Summit

Similar to MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform (20)

Webinar: High Performance MongoDB Applications with IBM POWER8

MongoDB Europe 2016 - The Rise of the Data Lake

Novedades de MongoDB 3.6

L’architettura di classe enterprise di nuova generazione

Building FoundationDB

L’architettura di Classe Enterprise di Nuova Generazione

Webinar: Migrating from RDBMS to MongoDB

What's new in MongoDB 3.6?

Cloud Data Strategy event London

MongoDB Tick Data Presentation

Agility and Scalability with MongoDB

mongoDB: Driving a data revolution

Final_CloudEventFrankfurt2017 (1).pdf

Enterprise Trends for MongoDB as a Service

IBM Cognos 10.2 Dynamic Cubes Deeper Dive

AquaQ Analytics Kx Event - Data Direct Networks Presentation

Enterprise architectsview 2015-apr

La creación de una capa operacional con MongoDB

Mongo db 3.4 Overview

Challenges of Implementing an Advanced SQL Engine on Hadoop

More from MongoDB

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded

Exploring Multimodal Embeddings with Milvus

Zilliz

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

ICT role in 21st century education and its challenges

rafiqahmad00786416

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Introduction to use of FHIR Documents in ABDM

Kumar Satyam

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!

WSO2's API Vision: Unifying Control, Empowering Developers

WSO2

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

MINDCTI Revenue Release Quarter One 2024

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Corporate and higher education May webinar.pptx

[BuildWithAI] Introduction to Gemini.pdf

Why Teams call analytics are critical to your entire business

ICT role in 21st century education and its challenges

Understanding the FAA Part 107 License ..

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Introduction to use of FHIR Documents in ABDM

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

AWS Community Day CPH - Three problems of Terraform

FWD Group - Insurer Innovation Award 2024

Platformless Horizons for Digital Adaptability

How to Troubleshoot Apps for the Modern Connected Worker

WSO2's API Vision: Unifying Control, Empowering Developers

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

1. Breaking the mold - Redesigning Dell’s E- Commerce Platform Prepared by: Hilleque, Scott Ooi, Lee Chuen

2. 2 Agenda • Dell Digital IT Introduction • Advancing from Service-Oriented to Resource Oriented Architecture • Taking Advantage of Pre-Computation • Command Query Responsibility Separation (CQRS) • Why Document Stores? • Success story - DCQO application – Legacy System using RDBMS and gap – “Proof of concept” testing result summary – MongoDB Architecture using cluster(sharding) – Key notes during design & development – Data migration strategy from RDBMS to MongoDB – Ops Manager - operation, administration, backup/restore tool

3. 3 Dell Digital IT “We moved Dell’s U.S. business off of Dell Order Management (DOMS), a mainframe more than 30 years old, to processes and tools that better serve our customers. Using the new processes and tools: • 90 percent of Dell’s business now on a single global order management system • 30 percent fewer consumer orders result in cancellation • Commercial orders need 40 percent less intervention • Once placed, orders reach factories 35 percent faster” Product Price Payment Cart Quote Order

4. 4 Typical Service-Oriented Architecture Dependency Service Database Commerce Service Dependency Service Database Database Client Commerce Service Data Dependency Service Dependency Service Run Time

5. 5 Enhanced SOA Cache Denormalized Database Commerce Service Dependency Service Read Only Replica Database Dependency Database Data Replication CacheCache Dependency ServiceService Bus Change Notification Source Database Replication Job Run Time

6. 6 Resource Oriented Architecture Resources GET http://quotes/quote1 GET http://quotes/quote2 POST http://quotes PATCH http://quotes/quote3 GET http://quotes/quote3

7. 7 Resource Oriented Architecture Authoring Time Dependency Service Database Commerce Service Dependency Service Database Database Resources Notification Service Notification Service Run Time Commerce Service GET

8. 8 Resource Oriented Architecture Item Product Price Inventory Fulfillment Discount Accessories Tax

9. 9 Taking Advantage of Permutations

10. 10 Pre-Computation

11. 11 Don’t Boil the Ocean

12. 12 Find the Balance

13. 13 Why Document Store?

14. 14 Why Document Store?

15. 15 CAP Theorem

16. 16

17. 17 Validation

18. 18 Business Justification

19. 19 CQRS- Command Query Responsibility Separation ServerClient Command Query Write API Read API Domain Model Read Queries Commands Queries Transactional Database Non-Transactional Database Replication

20. 20 Strategies for Managing Schema Versions

21. 21 What is DCQO application? • Dell online Cart Quote Order • Part of Dell’s digital transformation strategy • Read write intensive • 24*7 mission critical application • Key revenue system for Dell Sales • Current MongoDB size is 5.4TB (growing 40% YoY). There are 243 millions of records in one of the large collections.

22. 22 Legacy System using RDBMS and gap • Use RDBMS to store XML as LOB (large object) • Create a quote: Serialize to XML -> Compress -> save as LOB (large object) • Retrieve a quote: Decompress LOB -> De-serialize the XML • Concern – how far can it scale vertically with the current design?

23. 23 Proof of Concept testing result summary • MongoDB version : 3.2.10 (back in Dec 2016) Test cases RDBMS Lob vs MongoDB standalone on same hardware spec RDBMS Lob vs MongoDB cluster (Stress test until breakpoint) MongoDB replica set vs MongoDB cluster Performance testing result +- 5% (no significant difference) RDBMS errored out, while MongoDB was still consistently performing. +10% overhead using MongoDB Cluster

24. MongoDB Journey continues ….. Design and Development

25. 25 DCQO Application Architecture Quote Item Price Contact Mongo DB Elasticsearch Asynchronous Synchronous Data Warehouse Quote Microservices Eventual Consistency Replication

26. 26 • Cluster/sharding is used because we need to support high IOPS and it is easy to scale horizontally • Total size of data bearing nodes (primary) = 5.4 Terabytes (still growing) Plan to change Arbiter to Secondary when DC 4 is ready

27. 27 Shard Key • Choose the right shard key to ensure equal distribution • Shard key cannot be changed (be cautious) • Use GUID to generate shard key in the application layer if the shard key is needed in the application logic before saving in MongoDB to save round trip. • Sample GUID: wqw5cQRGD0yxyEnIs4i4nQ

28. 28 Sizing • 1) Snappy compression is enabled by default in WT. • 2) Include WT compression when calculating storage required. • 3) Data is uncompressed in memory DB Storage size (GB)Actual Size (GB) Count WT compression Db1 499 1,912 26,684,160 74% Db2 17 38 6,216,016 55%

29. 29 Schema design • $lookup function doesn’t support sharded collection • Not 1:1 table to collection mapping • Limit to 16MB size per doc. – Performance result: The get/save elapsed time increased 200% when Json doc size increased from 67kb to 148kb. • Schema less, not “schemaless” • De-normalized schema - good for high read to write ratio, but MongoDB version 3.6 can’t achieve an atomic update on different documents 30 tables quote_info Export_info Contact Enduser Fulfillment Promotion Phone Salesrep …….. 1 collection quote

30. 30 Query/Update/Delete • Check the MongoDB log to identify problematic query/transaction • Use explain() to review the execution path • Create index • Use $indexStats to clean up unused index. > 5% performance improvement • Partial document update. ~20% performance improvement • Simplify the query, avoid complex and repeated conditions db.collection.find ({ $and: [ { “field1": { $ne: null } }, -- this line can be removed as it is covered by the next line “field1” =19. { "field1": "19" } ] }) db.collection.find({ $or: [ { "field1":{$in: [ "vf1", "vf2", "vf3",…,"vf100" ] }, "field2": { $in: [ "v1", "v2", "v3", "v100"] } } ,{ "field1":{$in: [ "vf1", "vf2", "vf3“ ,…,"vf100"] }} ]}) -- the 1st OR clause can be removed • Pre-load data in performance database before performance testing

31. 31 Security/User Management • Remember to enable authentication • Use LDAP to ease the user management • Use Kerberos to avoid specifying encrypted password in the application connection string • Each application uses unique account to login to MongoDB

32. MongoDB Journey continues ….. Data Migration from legacy system

33. 33 Migration strategy to make MongoDB live Legacy RDBMS MongoDb 1) populate last_modify_date column, if not exists 3) Complete initial loading. Application sends data to both RDBMS and MongoDB based on last_modify_date. 4) 1 month stabilization period 5) Stop application Make sure the last data sync up completes 6) Switch application to connect to MongoDb 8) Application continues to send data to both MongoDB and legacy RDBMS 7) Start application. MongoDB is LIVE! 2) Setup MongoDB and collections Application

34. 34 Ops Manager • Monitor and send alerts - integrate with IT operation tool, ServiceNow via SNMP (Simple Network Management Protocol) to create Incident ticket automatically to operation team.

35. 35 Ops Manager (continue…) • Administration tasks (version upgrade, startup, shutdown, configure cluster/replicateSet…..). • Backup • Take snapshot every 12 hours. • Restore • Testing result: • The network transfer was 25MBps. • < 4 hours to complete full auto restore of 3 shards clusters with 2 TB size.

36. Question & Answer scott_hilleque@dell.com lee_chuen_ooi@dell.com

Editor's Notes

[Scott] Good morning, my name is Scott Hilleque and I'm a senior architect in the Digital Commerce Services Platform at Dell Technologies and today happens to be my 7-year anniversary with Dell. For pretty much as long as I’ve been in this industry I’ve been a product engineer. That’s taking things that are purpose built for one client or scenario, and making them scale and adapt to work for many. [LC] My name is Lee Chuen. I am a senior principle database engineer in Dell. I work on RDBMS and NOSQL databases. [Scott] So, we have recently gone though a journey of making MongoDB our system-of-record document store; and we are here today to share with you some of what we learned, some of the choices we made, and maybe a couple of stumbles along the way that we hope we can help you avoid.
[Scott] We’ll kick off with an introduction to our domain, then I’ll discuss some common architectural problems Mongo is helping us solve, and then LC is going to go into specifics of how we implemented our repositories and completed our migration.
[Scott] I’d like to start by giving you a bit of insight into our "problem space”. So, Dell Digital IT Services is the suite of tools that we use to manage products, prices, carts, quotes, payments, and orders across our customer experiences. This is pretty much everything that we use to transact business-to-consumer and business-to-business e-commerce. You may also have heard Dell is in the process of merging with EMC, so we were dually tasked with keeping all this working AND converging two large enterprises on “one unified global commerce platform”. We knew we’d have to be MUCH more aggressive in regards to performance and availability; and we especially wanted to take advantage of cloud-type horizontal scalability. Ultimately, the overall approach we landed on was mixed; a bit of tried and tested combined with bleeding edge. We were certain from the outset though that our basic application architecture needed a completely new vision though. So, to start that, we made our list of architectural imperatives; things like micro-services, strong versioning, and continuous integration. But there were two key patterns that made the whole thing click into place; and those are resource-orientation and pre-computation.
[Scott] To explain resource orientation, we need to take a step back and understand how the existing application designs were hurting us. Here you see a typical (legacy) service-oriented-architecture flow diagram. Incoming requests are validated, data is retrieved from databases and other services, business rules get run, and a response is assembled and sent back to the client. So what’s wrong with this? First, each dependency service call costs some number of milliseconds and, in our case, those fractions of a second added up to the majority of overall transaction time. Modern network infrastructures provides super low latency and high bandwidth, but serialization and raw time on the wire were making outward performance much slower than the individual parts. The second problem with this is reliability. I expect most of you have heard of the concept of 5-nines availability; sometimes referred to as six-sigma. That’s the goal of having only about 5 1/2 minutes of unscheduled down time per year. When you serially connect services like this, your outward availability can never be better than your weakest link. That’s the primary reason the average SOA system doesn’t get close to 5-nines; one intermittent crash or a race condition downstream can ruin it for everyone up the line.
[Scott] The shortcomings of real-time service dependencies are pretty well understood now. Typically, the next step that architects and developers take is to employ caches, read-only replicas, and sometimes service-buses to improve responsiveness, Unfortunately, our experience has been that the benefits of those additions never really materialized in performance metrics. Also with more moving parts, we frequently found an actual reduction in availability. One more thing; caches introduce the problem of cache expiration, which makes it possible to get two different results between load-balanced servers at runtime. A favorite quote of Martin Fowler, who is the godfather of the Agile Methodology, is that the 2 toughest problems remaining in computer science are naming things and cache expiration. I can certainly vouch for the latter.
[Scott] So the next-generation answer to these problems is resource-orientation, which you may have heard referred to as REST or maybe WebAPIs. Personally, I like the term Resource-Oriented-Architecture better because we’re talking about more than just a new protocol or an access method. It’s a new way of solving business problems which de-emphasizes behavior and focuses instead on state. So why is that important? Well, if you look at Internet-scale applications, like the web (www) or email, you’ll notice it’s very rare to see behavior changes out at the edge. For example, HTTP has been stable on version 1.1 since 1996. It’s not likely that a line-of-business application is ever going to be so future-proof that new features aren’t needed for 22 years; or at least I hope not. But that frequency of change, or surface area to changing business requirements, is what makes them hard to make reliable.
[Scott] Then another feature of ROA is that it naturally guides you to separate “authoring-time” concerns from “user-time” concerns. This has a huge impact on scalability and performance. We’re going to go into that more in a bit.
[Scott] Resources also help with your domain separation of concerns. Instead of services carrying copies of all the data from their dependency systems, they can just link to those resources with URLs. This is nice because it allows clients to selectively hydrate the details they are interested in and only pay that portion of transport cost when they need it. This is basically how modern responsive web pages work; HTML contains links to style sheets, JavaScript, and image resources, which are then opportunistically downloaded only when needed.
[Scott] Our second big architecture change is pre-computation. That means pushing your data forward, all the way to the client-ready state so you don’t need to make real-time service calls at runtime. In resource-oriented-architectureds you still need to have validation of source data and running of business rules, but your focus is shifted to preparing the end state for fast retrieval. In web experiences, customers expect page loads and navigation to respond instantly, so that is not the time to be running your complex business logic. In effect, the idea here is to have answers to customers’ questions before the question is asked. Changes that originate with source data are expected to take a bit longer, but their retrieval always has to be as fast as possible.
[Scott] An example I like to use to demonstrate how to think of this is an item placed in a shopping cart; we’ll use this Precision 5520 laptop. The item resource for it consists of product data, some configuration options, prices, and other details like how long it takes to go from ordered to shipped and shipped to delivered. In the classic SOA model we would have all that information spread across several systems-of-record and go retrieve them at runtime. Now, we’ve figured out all the valid choices a user can make and save them in a precomputed store as separate resources.
[Scott] You do have to keep an eye on how far you go with your precomputations. Too much and it'll feel like you're boiling an ocean. For example, if we were to precompute every possible permutation of Dell enterprise server products that could easily result in tens of millions of product documents; most of which would never be of interest to an actual user.
[Scott] On the other hand, if you achieve too little precomputation; you're back where you started; having to do too much work while the user is waiting. Our advise is to aim for "no real-time calls to dependency services at runtime". If you still need to run some real-time business logic that's OK. Just make sure you have all the state you need to run that logic in your resource.
[Scott] Ok, so why use a document store to solve these problems? We have lots of other options; relational, graph, key-value pair. Relational is great where structure, type, and referential integrity are top priority, which typically occurs with source data. Graphs are good for asking dynamic and nested questions. Key-value pair, well, I haven’t actually figured out a really good use for them yet.
[Scott] The answer is that documents are simply the best form for holding complex and hierarchical precomputed state. That can include a mix of strongly-typed data fields, one-to-many relationships, internal references, links to external resources, and anything else that we’ve needed to model inside an atomic resource.
[Scott] When making a decision on which document store, we first need to understand a bit about the "CAP Theorem". CAP is a formally proven conjecture that says any distributed database has to make a sacrifice between consistency, availability, and partition tolerance. Since partition tolerance isn't really a thing, all distributed databases favor either consistency or availability. High-availability is great for read-only data, but things get complicated when you look close at what happens during a failure event. Regardless of marketing claims, all AP document stores (AP means they compromise on C-consistency) will lose data under at least some failure modes. Sometimes you can manually recover that data and sometimes you can't, but they all share the very-real possibility of data loss. CP, or high-consistency stores, on the other hand have the possibility of being off-line when a quorum of nodes can't be reached, but they can be proven to never lose data under any circumstances.
[Scott] Note: The question of data loss due to a node failure, lost drive, or hung VM is really not what were talking about here; that's actually become a non-interesting event. The kind of failure we have to consider is a data center power outage, core switch failure, or misconfigured firewall/router. We have to plan for these type of events as well, and when they happen we have a choice; favor availability or consistency. There are no magic solutions that provide both at the same time.
[Scott] Some of the more complicated failure modes, like unidirectional routing, can be difficult to simulate and test for. There is a tool called Jepsen though, that can thoroughly evaluate CP consistency claims. This wasn’t available yet when we made our decision to adopt Mongo, but the fact that Mongo received a clean bill of health from Jepsen gave us confidence we made the right choice.
[Scott] When describing all this to your business partners, and you advise them of a risk of potential data loss, of course they're going to say it's never acceptable to lose data. Sometimes that's true and sometimes not so much. The better question to ask is what are the consequences of accepting a transaction, confirming it back to the user, and then subsequently losing the transaction forever. If were talking about a wish list or shopping cart and the occurrence is rare, nobody's likely to get bent out of shape. On the other hand if you accept a $1M order for hardware and service and then, poof, it disappears. That could be a career limiting situation. If you think you're dealing with the latter, you're probably best sticking to high consistency stores.
[Scott] There’s another related architectural pattern I’d like to mention quickly and that’s Command Query Responsibility Separation (CQRS). Document stores are ideal for the command side of CQRS, the part were your updating and retrieving a resource at a time. The reason CQRS works well here is because it separates record level access (i.e. row locks) from much broader page and table level locks.
[Scott] Before I hand off, some advise on handling schema versioning. First, put schema version numbers in your documents from the beginning. This gives you options later on how to handle simultaneous versions and data migration. Next, plan out how you are going to handle backwards incompatible changes to your document’s structure. Option #1 is mass migration. This is how most shrink-wrapped commercial products work. When deploying a new version all historical data is translated to the new format. This keeps things simple but it get impractical at large volumes and doesn’t allow running both versions for a while. Option #2 is to migrate on demand. Just save the documents in the last-used schema. I like this option a lot because its simple and scales well, but it does have an affect performance. You also have to write both forward AND backward converters. Option #3 is explicit migration, which is like moving between two similar systems made by different vendors. The downside with #3 is you will have some of the same data stored in both version until you completely shut down the old one.
[Scott] OK, now LC is going to take over and provide some more specific examples. [LC] Thanks Scott. Let me go through with you how we started MongoDB. We will also share our key notes during the design and development phases, how we migrated RDBMS data to MongoDB by using DCQO application as an example. DCQO stands for Dell Online Cart Quote and Order. It is part of Dell’s digital transformation. It is a read write intensive mission critical application, one of the key applications which generates revenue to Dell. Our current Production MongoDB size is 5.4 TB with the yearly growth rate of 40%. One of our large collections is having 243 millions of records.
[LC] Here was our problem statement in the legacy system. When a quote information is saved, application serialize into XML format -> compress it and store it in RDBMS database as LOB (Large Object). When a quote is retrieved, application will decompress the LOB (large object) from RDBMS -> de-serialize it to get the quote information. With the increase of transaction volume every year, we started to look for alternative to support higher concurrency, without compromising performance.
[LC] We started “Proof of Concept” by using MongoDB Enterprise Version 3.2.10 back in Dec 2016. We spent around 1 month to complete our “Proof of Concept” after running > 50 rounds. Each test case was repeated a few rounds to ensure consistency. Here is our key testing result summary back in Dec 2016: The 1st test case is to compare RDBMS (LOB) vs MongoDB standalone hosted on the same hardware spec. The performance test result in term of TPT(transaction per second) is +-5%. There was no obvious performance difference between RDBMS vs MongoDB. The 2nd test case is to compare RDBMS (LOB) vs MongoDB cluster. We stress test the database until it hit to the breakpoint. With the same workload, RDBMS errored out, while MongoDB was still consistently performing. It scales horizontally. The 3rd test case is to compare MongoDB replicateSet vs MongoDB cluster. There is +10% overhead in term of response time when using MongoDB cluster. We presented the above performance result to the stakeholders and concluded that since it was only 10% overhead, which was mill-second difference, the decision was to go with MongoDB cluster.
[LC] Our MongoDB journey continues with Design and Development.
[Scott] This is the microservice architecture of our quotation system. MongoDB is the transactional data store, what we call a Class 1 system-of-record, which means it holds real-time business critical data with verified fault tolerance. Asynchronously, data changes are then replicated and indexed to Elasticsearch (following CQRS). Elastic is where users run queries to find individual quotes and lists of quotes. Finally, there is an eventually consistent feed to our data warehouse where complex analysis and reporting jobs are run. Separating these workloads helps us prevent any adverse impacts on transactional users from complex and batch processes.
[LC] Here is our MongoDB architecture. We use cluster/sharding because we need to support high IOPS and it is easy to scale horizontally. There are 3 data centers. Each box is representing 1 server. We use MongoDB cluster with 3 replica Set. Each Replicate Set has 3 members. Data bearing nodes , which are brown boxes are hosted in DC1 and DC2. Arbiter is hosted in DC3. We use arbiter because DC3 is physically far from DC1 and DC2. There is network latency and this is read write intensive db. We do have a plan to upgrade Arbiter to Secondary when DC4 is ready.
[LC] If you are using Mongodb cluster, it is very important to choose the right shard key to ensure equal distribution. Once collection is sharded, we cannot select a different shard key and we cannot update the value of the shard key field. For our application, as the shard key is needed at the biz level before saving the doc into MongoDB, we use GUID as the shard key. This is to save the round trip between MongoDB and application server. GUID is the global unique identifier, 128 bit. We use the 22 alphanumeric, which is save to be used as part of URL.
[LC] While estimating the MongoDB sizing, Snappy WiredTiger compression is enabled by default. It means the actual storage size is smaller than the actual data size. As shown in the table, our MongoDB has a good compression rate, which 55%-74%. This saves the storage cost. Do take note that data is uncompressed when it is loaded into the memory.
[LC] Most of us comes from RDBMS background. We get use to JOIN tables in relational model. MongoDB $lookup function doesn’t support sharded collection. We store all these quote information into 1 collection. When de-normalize and consolidating all information into 1 Json doc, make sure that the size of the doc is <16MB. MongoDB is “schema less”, not “schemaless”. Json document is flexible for e.g. data length and type, but we still need to spend time to come out with a good schema design. De-normalized schema good for high read to write ratio, but Mongodb 3.6 can’t achieve an atomic update on different documents. The good news is Mongodb 4.0 will be supporting updating multiple doc in 1 txn. If you are the beta version Mongodb 4.0, pls let me know what you think on this new feature. I will be glad to hear from you.
[LC] Most of the time, DBA doesn’t have visibility on what query is used in the application. One of the direct way to identify problematic query is checking the MongoDB logfile during performance testing. By default, mongodb logs any txn/query which takes more than 100 ms. Use explain() function to see the winningPlan. Avoid COLLSCAN in large collection, which is the full table scan in RDBMS. If you think that this is too much of work, alternatively, you can use Ops Manager 3.6, refer to the Performance Advisor section, which will recommend the index required. This saves a lot of time to look at the mongodb logfile. It is also a good practice to drop any unused index to optimize the insert/update txn. Use indexStats -> accesses.op to find out how many operations have used the index. There was 5% performance improvement gain after we clean up the indexes. We found out that using partial update rather than updating the whole doc improve the performance. We have seen 20% performance improvement once using partial update. We should also simplify the query to avoid complex and repeated conditions. I have given 2 examples on the slides: The 1st example is the AND condition. The query wants to retrieve documents which has field1 = 19. It is not needed to check if field1 is NOT NULL. The 2nd example is the OR condition. The 1st OR and 2nd OR condition have the same value. The 1st OR condition can be removed because it is a subset of the 2nd OR condition. Last few weeks, my colleges came to me and said, “Hi LC, we faced performance issue in PROD. We tested in PERF and everything was fine. We even have higher hardware spec in PROD than in PERF”. After I looked at the statistic, the PROD volume is 150 times higher than the volume in PERF env. This silence the potential issue during performance testing. Hence, pls load the expected volume in PERF env to simulate PROD load in PERF env.
[LC] Every database needs to be protected. Remember to enable “Authentication” in MongoDB. We find that LDAP Authentication simplify the user management. For e.g. we can grant all DBAs with a set of role/privileges without granting each induvial privilege. We use Kerberos to avoid specifying encrypted password in the application connection string. Our application run on service account, which is managed by security team. Each application should have unique account to login to MongoDB. This is not only a good guideline to segregate out the access, it also helps in troubleshooting because the login information is logged in the logfile.
[LC] – Next, we will share with you how we migrated data from legacy RDBMS to MongoDB.
[LC] Here is our data migration strategy making MongoDB live: Populate last_modify_date in the legacy RDBMS if this col doesn’t exist. Setup database and collection in MongoDB. Make sure that there is a “last_modify_date” field in all documents during the schema design. Then, we will start with the Initiate data loading. For our case, we have to backload data since year 2014 onwards. It took us a week to complete initial loading. Once data is caught up between the legacy RDBMS and MongoDB, application sends data to both RDBMS and MongoDB, Let MongoDB stabilize for 1 month. Testing team can also run any testing hitting MongoDB during this stabilization period. After pass the stabilization period, stop application. No data should go into the legacy RDBMS. Complete the last data sync up. Switch the application connection to the new MongoDB. For our case, this cut over happened within a few minutes. Start application. MongoDB takes live traffic. Continue to send data to both MongoDB and legacy RDBMS. This strategy buys us insurance. If there is any issue, we still have a choice to switch back to Legacy RDBMS since data is in sync. This minimize the risk. We originally target to bring down the data migration tool after 3 months. In fact, it ran more than ½ year. Anyone wants to do a guess why? Reason is “no news is good”. It was stabile until nobody could remember that we still had the migration tools running. We only realized and stopped the data migration tool after server team sent us notification that we need to decommission the legacy RDBMS.
[LC] We use Ops Manager to monitor and send alerts. We have integrated the alert with our IT operation tool, ServiceNow, via SNMP (Simple Network Management Protocol) to automatically create incident ticket for the operation team.
[LC] We also use ops manager to perform administration tasks. For e.g., version upgrade, startup, shutdown, configure cluster/replicateSet…. It increases the productivity of DBA. It is used to backup/restore our MongoDB. We are using Blockstore db as storage to perform backup. We schedule to take snapshot every 12 hours. The full DB recovery is pretty fast. The network transfer was 25MBps. It took less than 4 hours to complete full auto restore of 3 shards clusters with 2 TB size via Ops Manager.
That’s all for our presentation today. We have shared how MongoDB helps us to solve the common architectural problems, the list of key notes during design and development phase and how we completed the data migration from RDBMS to MongoDB. I hope you enjoy our sharing and get benefits from it. We are now opened for question and answer.

MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

Similar to MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform

Editor's Notes