SlideShare a Scribd company logo
Letters from the Trenches:

Lessons Learned Taking MongoDB to Production
October 17, 2013

Rick Warren

rick.warren@eharmony.com
Traditional Internet Dating Service
Unidirectional User-Defined Criteria
eHarmony Matching
Bidirectional User-Defined Criteria
eHarmony Matching: 3 Parts

1. Bidirectional
User-Defined
Criteria

2. Research-Based
Compatibility
Models

3. Machine-Learned
Affinity Models

Photo Credits

Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/
Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
Application: Find Potential Matches
As fast as possible:
1. Find people who
meet each other’s
preferences

1. Bidirectional
User-Defined
Criteria

2. Discard combos
that violate
Compatibility
Models
Application: Find Potential Matches
• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
1. Bidirectional
User-Defined
Criteria

– Read-heavy
– Complex queries

• Java application
Application: Find Potential Matches
• In full production
> 6 mos
– Following several mos
limited production
– Following several mos
intensive dev+testing

• No production
outages
• MongoDB no longer
the thing we worry
about most

• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
– Read-heavy
– Complex queries

• Java application
Lesson: Provision for Success
 Fit all data & indexes in memory
– MongoDB storage implemented using
mem-mapped files
– Beware under-provisioned VMs

 Minimize field names to keep data
as small as possible
– “Schema-less records” ==
“schema repeated millions of times”
– Morphia Java library can help with mapping
Lesson: Provision for Success
Scale write ops & data volume by adding shards

Scale read ops

by adding secondaries

Shard / RS

Shard / RS

Primary

Primary

Secondary

Secondary

Secondary

Secondary

…

…

…
Lesson: Be Ready to Tinker
• Many processes:

 Use Puppet, Chef, or similar

– mongod on each
node, primary or secondary

– Helps with config
files, command-line arguments

– 2 MMS agents

– Insufficient for adding
secondaries, configuring
indexes, etc.

– Plus, if sharding:
• mongos for each app instance
• 3 config servers

• …Each configured
separately & differently
– Configuration file
– Manual commands to set up

• Less likely to have
DBA support
– …and relational Best
Practices may not transfer

 If scripting, use real client
driver, not mongo shell
– Doesn’t handle output or errors
consistently
– Can’t wait in JavaScript

 Train your DB/Ops team(s)
– And expect to do more yourself
Lesson: Shadow Mode Is Your Friend
 Test with real production data, conditions, and queries
 Measure everything (MMS is a good start, but insufficient)
Real Application

Real Events
& Requests

“Shadow” Application

X

 Kill mongod instances to verify resiliency
Primary school enrollment, Armenia:

http://data.worldbank.org/country/armenia
Lesson: Be Ready to Restore Your Data
• Schemas will
change

 Maintain 2nd copy in
another format
– Backing source of truth?

• Shard key(s) will
change
– More on this later…

• You’ll experience

MongoDB bugs

– Backup in standard format?
– Second cluster with different
version of MongoDB?

 Increment DB name
with each reload
 Automate reload
process, and use it

Image credit:

http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
Lesson: Pick a Good Shard Key

1. Distribute Data Volume Evenly
– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!
Lesson: Pick a Good Shard Key
Shard 1

Shard 2

mongos
1. Distribute Data Volume Evenly

– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!

Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit
Steve Urkel:
http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
Lesson: Pick a Good Shard Key
DO These Things

BEWARE These Things

 Use fields appearing in
every query

• Include serial numbers
(or similar)

 Choose combo that
finely partitions data

• Hash fields when reads
might be a problem

 Measure relative load
across shards

• Mutable fields in shard
key—remove and add

– Consider adding
secondaries to loaded
shard(s) ONLY
Summary

1. Provision for Success
2. Be Ready to Tinker

3. Shadow Mode Is Your Friend
4. Be Ready to Restore Your Data

5. Pick a Good Shard Key
We’re Hiring

http://www.eharmony.com/about/careers

rick.warren@eharmony.com

More Related Content

Similar to Letters from the Trenches: Lessons Learned Taking MongoDB to Production

ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...
Sayed Mohsin Reza
 
Beautiful Models in PHP
Beautiful Models in PHPBeautiful Models in PHP
Beautiful Models in PHP
brandonsavage
 
DBMS Bascis
DBMS BascisDBMS Bascis
Building an interactive timeline from facebook photos
Building an interactive timeline from facebook photosBuilding an interactive timeline from facebook photos
Building an interactive timeline from facebook photos
Rakesh Rajan
 
Your first web application. From Design to Launch
Your first web application. From Design to LaunchYour first web application. From Design to Launch
Your first web application. From Design to Launch
David Brooks
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
Mindaugas Zickus
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
Web Macros
Web MacrosWeb Macros
Web Macros
cscaffid
 
Sec presentation
Sec presentationSec presentation
Sec presentation
nistorandreialexandru
 
Finding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design PatternsFinding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design Patterns
Steven Smith
 
Data Abstraction for Large Web Applications
Data Abstraction for Large Web ApplicationsData Abstraction for Large Web Applications
Data Abstraction for Large Web Applications
brandonsavage
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDB
bwullems
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
iammutex
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
ferreroroche11
 
Adaptive Educational Hypermedia
Adaptive Educational HypermediaAdaptive Educational Hypermedia
Adaptive Educational Hypermedia
AlaaZ
 
Social job search
Social job searchSocial job search
Social job search
GiulianoVesci
 
Session1
Session1Session1
Session1
Van Pham
 
Session1
Session1Session1
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Rahul Pola
 

Similar to Letters from the Trenches: Lessons Learned Taking MongoDB to Production (20)

ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...
 
Beautiful Models in PHP
Beautiful Models in PHPBeautiful Models in PHP
Beautiful Models in PHP
 
DBMS Bascis
DBMS BascisDBMS Bascis
DBMS Bascis
 
Building an interactive timeline from facebook photos
Building an interactive timeline from facebook photosBuilding an interactive timeline from facebook photos
Building an interactive timeline from facebook photos
 
Your first web application. From Design to Launch
Your first web application. From Design to LaunchYour first web application. From Design to Launch
Your first web application. From Design to Launch
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
 
Web Macros
Web MacrosWeb Macros
Web Macros
 
Sec presentation
Sec presentationSec presentation
Sec presentation
 
Finding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design PatternsFinding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design Patterns
 
Data Abstraction for Large Web Applications
Data Abstraction for Large Web ApplicationsData Abstraction for Large Web Applications
Data Abstraction for Large Web Applications
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDB
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
Adaptive Educational Hypermedia
Adaptive Educational HypermediaAdaptive Educational Hypermedia
Adaptive Educational Hypermedia
 
Social job search
Social job searchSocial job search
Social job search
 
Session1
Session1Session1
Session1
 
Session1
Session1Session1
Session1
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 

More from Rick Warren

Real-World Git
Real-World GitReal-World Git
Real-World Git
Rick Warren
 
Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data Distribution
Rick Warren
 
Data-centric Invocable Services
Data-centric Invocable ServicesData-centric Invocable Services
Data-centric Invocable Services
Rick Warren
 
Engineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable SystemsEngineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable Systems
Rick Warren
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and Devices
Rick Warren
 
DDS in a Nutshell
DDS in a NutshellDDS in a Nutshell
DDS in a Nutshell
Rick Warren
 
Java 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final SubmissionJava 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final Submission
Rick Warren
 
Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)
Rick Warren
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised Submission
Rick Warren
 
Web-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised SubmissionWeb-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised Submission
Rick Warren
 
Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)
Rick Warren
 
Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1
Rick Warren
 
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric ModelMapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Rick Warren
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Rick Warren
 
Data-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System ArchitectureData-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System Architecture
Rick Warren
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
Rick Warren
 
Easing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSEasing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDS
Rick Warren
 
Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)
Rick Warren
 
Introduction to DDS
Introduction to DDSIntroduction to DDS
Introduction to DDS
Rick Warren
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
Rick Warren
 

More from Rick Warren (20)

Real-World Git
Real-World GitReal-World Git
Real-World Git
 
Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data Distribution
 
Data-centric Invocable Services
Data-centric Invocable ServicesData-centric Invocable Services
Data-centric Invocable Services
 
Engineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable SystemsEngineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable Systems
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and Devices
 
DDS in a Nutshell
DDS in a NutshellDDS in a Nutshell
DDS in a Nutshell
 
Java 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final SubmissionJava 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final Submission
 
Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised Submission
 
Web-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised SubmissionWeb-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised Submission
 
Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)
 
Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1
 
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric ModelMapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric Model
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
Data-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System ArchitectureData-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System Architecture
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
 
Easing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSEasing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDS
 
Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)
 
Introduction to DDS
Introduction to DDSIntroduction to DDS
Introduction to DDS
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 

Letters from the Trenches: Lessons Learned Taking MongoDB to Production

  • 1. Letters from the Trenches: Lessons Learned Taking MongoDB to Production October 17, 2013 Rick Warren rick.warren@eharmony.com
  • 2. Traditional Internet Dating Service Unidirectional User-Defined Criteria
  • 4. eHarmony Matching: 3 Parts 1. Bidirectional User-Defined Criteria 2. Research-Based Compatibility Models 3. Machine-Learned Affinity Models Photo Credits Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/ Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
  • 5. Application: Find Potential Matches As fast as possible: 1. Find people who meet each other’s preferences 1. Bidirectional User-Defined Criteria 2. Discard combos that violate Compatibility Models
  • 6. Application: Find Potential Matches • User attributes in MongoDB – Replicated – Sharded • Data access pattern: 1. Bidirectional User-Defined Criteria – Read-heavy – Complex queries • Java application
  • 7. Application: Find Potential Matches • In full production > 6 mos – Following several mos limited production – Following several mos intensive dev+testing • No production outages • MongoDB no longer the thing we worry about most • User attributes in MongoDB – Replicated – Sharded • Data access pattern: – Read-heavy – Complex queries • Java application
  • 8. Lesson: Provision for Success  Fit all data & indexes in memory – MongoDB storage implemented using mem-mapped files – Beware under-provisioned VMs  Minimize field names to keep data as small as possible – “Schema-less records” == “schema repeated millions of times” – Morphia Java library can help with mapping
  • 9. Lesson: Provision for Success Scale write ops & data volume by adding shards Scale read ops by adding secondaries Shard / RS Shard / RS Primary Primary Secondary Secondary Secondary Secondary … … …
  • 10. Lesson: Be Ready to Tinker • Many processes:  Use Puppet, Chef, or similar – mongod on each node, primary or secondary – Helps with config files, command-line arguments – 2 MMS agents – Insufficient for adding secondaries, configuring indexes, etc. – Plus, if sharding: • mongos for each app instance • 3 config servers • …Each configured separately & differently – Configuration file – Manual commands to set up • Less likely to have DBA support – …and relational Best Practices may not transfer  If scripting, use real client driver, not mongo shell – Doesn’t handle output or errors consistently – Can’t wait in JavaScript  Train your DB/Ops team(s) – And expect to do more yourself
  • 11. Lesson: Shadow Mode Is Your Friend  Test with real production data, conditions, and queries  Measure everything (MMS is a good start, but insufficient) Real Application Real Events & Requests “Shadow” Application X  Kill mongod instances to verify resiliency Primary school enrollment, Armenia: http://data.worldbank.org/country/armenia
  • 12. Lesson: Be Ready to Restore Your Data • Schemas will change  Maintain 2nd copy in another format – Backing source of truth? • Shard key(s) will change – More on this later… • You’ll experience MongoDB bugs – Backup in standard format? – Second cluster with different version of MongoDB?  Increment DB name with each reload  Automate reload process, and use it Image credit: http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
  • 13. Lesson: Pick a Good Shard Key 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above!
  • 14. Lesson: Pick a Good Shard Key Shard 1 Shard 2 mongos 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above! Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit Steve Urkel: http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
  • 15. Lesson: Pick a Good Shard Key DO These Things BEWARE These Things  Use fields appearing in every query • Include serial numbers (or similar)  Choose combo that finely partitions data • Hash fields when reads might be a problem  Measure relative load across shards • Mutable fields in shard key—remove and add – Consider adding secondaries to loaded shard(s) ONLY
  • 16. Summary 1. Provision for Success 2. Be Ready to Tinker 3. Shadow Mode Is Your Friend 4. Be Ready to Restore Your Data 5. Pick a Good Shard Key

Editor's Notes

  1. Specifically, we’ll be talking about 5 lessons.It should take about 30 minutes.
  2. At some point, you’ll realize the data in your cluster isn’t what and/or how you need. You’ll need to reconstruct it.In first two cases, you could dump and reload a single cluster.What about production changes in the mean time?
  3. Idea is for the breakdown of data across shards to reflect the same natural divisions of data you’re likely to query against.