SlideShare a Scribd company logo
MongoDB Replication
Fundamentals
Desert Code Camp – Oct 2014
By
Avinash Ramineni
avinash@clairvoyantsoft.com
Agenda
• Introduction to MongoDB
• MongoDB Replication
• Understanding Oplog
• Stream data from Oplog
• Demo
• Gotchas
• Questions
Why use a NoSQL Database?
• NoSQL describes a horizontally scalable, non-relational database
with built-in replication support
• One Size does not Fit All
– RDBMS
• Horizontal or Vertical Scalability ?
– Key-Value stores
– Column
– Document and Graph
• High Availability and Scalability
• CAP Theorem
– Choose any two from (Consistency, Availability , Partition Tolerance)
• Availability and Partition Tolerance
Why use a NoSQL Database? -2
• NoSQL’s primary goal is to achieve horizontal scalability. It attains
this by reducing transactional semantics and referential integrity.
MongoDB -1
• Document Oriented Database
– Bridges the gap between RDBMS and Key-Value Stores
– Atomicity
– Indexing
– Sharding - horizontal Scalability
• BSON format
– Binary encoded JSON representation
• No Joins
• Complex Queries /Indices
• Row Level Locking
MongoDB -2
• MongoDB Cluster
– Master - Slave
• Slave can become Master incase of fail-over
• Only Master is allowed to commit changes to Store
– Master – Master in limited capacity
• Inserts/Queries/Deletions are done by Id
• Does not work if the usecase expects same object can
be updated concurrently
– ReplicaSets
MongoDB -2
Replication
• Why Replication ?
– Failover Scenarios
• Hot Backups
– Disaster Recovery
• Provides Redundancy and Increases Data
Availability
• Increases Read Capacity
• Different uses of data
• Normal processing
• DR / Backup
• Reporting
MongoDB Terminology
• Database
– Collection (RDBMS – table)
– Document (RDBMS – row)
• Cluster Node Types
– Primary
– Secondary
– Arbiter
– Hidden
MongoDB Replication
Replicasets
• Primary
– Primary accepts all write operations
– Only one Primary
– Strict Consistency for reads
– Logs all the changes in data to “oplog “
• Secondary
– Replicate by reading Primary’s “oplog”
– Reads might return stale data
– Can become primary
Cluster
Primary Election
Read Preference
• Routes Read operations to Replica set Members
• Increase Read throughputs
• Reduce Latency
• Secondary reads might be stale
• Modes
– Primary
– Primary Preferred (secondary if primary unavailable)
– Secondary
– Secondary Preferred
– Nearest (read from member with least network
latency)
Write Preference
• Write only on Primary (Default)
• Write to N number of replica set members
db.products.insert(
{ item: "envelopes", qty : 100, type: "Clasp" },
{ writeConcern: { w: 2, wtimeout: 5000 } }
)
WriteConcern: Unacknowledged
WriteConcern: Acknowledged
WriteConcern: Journaled
WriteConcern w:2
Stream data from MongoDB
Oplog (Operation Log)
• Similar to Oracle Redo log
– Rolling record of all operations that modify the
data
– All writes (insert/update/delete) get an entry in
the Oplog
• Replicaset members have oplog collection
– local.oplog.rs
– Oplog is yet another collection in the database
Oplog in Action - Demo
Dissecting Oplog
Dissecting Oplog ..
• Oplog Contents
– ts: the time this operation occurred.
– h: a unique ID for this operation. Each operation will
have a different value in this field.
– op: the write operation that should be applied to the
slave
– ns: the database and collection affected by this
operation.
– o: the actual document representing the operation
– v: Version of the oplog.
Oplog - op
• Op – Operation
– i inserts
– u updates
– d deletes
– n no-op
• Updates has an extra field
– o2
• o1 has update information
• o2 has the id that was updated
Triggers?
• Does mongoDB have triggers?
– Tailable cursors
• tail –f oplog
• Notice any issues with oplog
– Aren't we doubling the size of the database ?
Oplog ..
• Capped Collection (fixed Size collection)
– Circular Queue
– Default Oplog size depends on the OS
– Oldest entries get overwritten
• What if the slave node is way off that the oplog
got overwritten
– Full Resync
• copyDatabase starts streaming from oplog
– What if oplog rolls over while the slaves are
completing the copy
Non-Replicated Collections
• local database
– Collections in local don’t get replicated
– Changes to the collections in local database don’t
show up in the oplog
Questions

More Related Content

What's hot

Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
NoSql
NoSqlNoSql
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
Rahul Jain
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
WSO2
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
Tuan Luong
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Martin Traverso
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
Ericsson Labs
 
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache CassandraCassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
DataStax Academy
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
kbajda
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
George Joseph
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
EffectiveUI
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
 

What's hot (20)

Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
NoSql
NoSqlNoSql
NoSql
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache CassandraCassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
Cassandra Day Atlanta 2015: Feeding Solr at Large Scale with Apache Cassandra
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 

Viewers also liked

Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash Ramineni
Avinash Ramineni
 
Making Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in MeteorMaking Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in Meteor
yaliceme
 
MongoDB Replication Cluster
MongoDB Replication ClusterMongoDB Replication Cluster
MongoDB Replication Cluster
Anuchit Chalothorn
 
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
Ontico
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
MongoDB
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
Avinash Ramineni
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Avinash Ramineni
 
Back to Basics, webinar 1: Introduzione a NoSQL
Back to Basics, webinar 1: Introduzione a NoSQLBack to Basics, webinar 1: Introduzione a NoSQL
Back to Basics, webinar 1: Introduzione a NoSQL
MongoDB
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
MongoSF
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
MongoDB
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
Aleksandar Bradic
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Edureka!
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
Server Density
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
RTTS
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
Yoav Francis
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
Dan Sullivan, Ph.D.
 

Viewers also liked (20)

Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash Ramineni
 
Making Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in MeteorMaking Mongo realtime - oplog tailing in Meteor
Making Mongo realtime - oplog tailing in Meteor
 
MongoDB Replication Cluster
MongoDB Replication ClusterMongoDB Replication Cluster
MongoDB Replication Cluster
 
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Back to Basics, webinar 1: Introduzione a NoSQL
Back to Basics, webinar 1: Introduzione a NoSQLBack to Basics, webinar 1: Introduzione a NoSQL
Back to Basics, webinar 1: Introduzione a NoSQL
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
 

Similar to MongoDB Replication fundamentals - Desert Code Camp - October 2014

Drop acid
Drop acidDrop acid
Drop acid
Mike Feltman
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Migrating from MySQL to MongoDB
Migrating from MySQL to MongoDBMigrating from MySQL to MongoDB
Migrating from MySQL to MongoDB
James Carr
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
Hadoop
HadoopHadoop
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
jasonfrantz
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
Jimmy Ray
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
 
Dynamo vs Mongo
Dynamo vs MongoDynamo vs Mongo
Dynamo vs Mongo
Amar Das
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
Amar Das
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
Siraj Memon
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marin Dimitrov
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
Michael Yarichuk
 
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
Insight Technology, Inc.
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
Murat Çakal
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
MongoDB
 

Similar to MongoDB Replication fundamentals - Desert Code Camp - October 2014 (20)

Drop acid
Drop acidDrop acid
Drop acid
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Migrating from MySQL to MongoDB
Migrating from MySQL to MongoDBMigrating from MySQL to MongoDB
Migrating from MySQL to MongoDB
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Hadoop
HadoopHadoop
Hadoop
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Dynamo vs Mongo
Dynamo vs MongoDynamo vs Mongo
Dynamo vs Mongo
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
 

More from Avinash Ramineni

Simplifying the data privacy governance quagmire building automated privacy ...
Simplifying the data privacy governance quagmire  building automated privacy ...Simplifying the data privacy governance quagmire  building automated privacy ...
Simplifying the data privacy governance quagmire building automated privacy ...
Avinash Ramineni
 
Winning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscapeWinning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscape
Avinash Ramineni
 
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Avinash Ramineni
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
Avinash Ramineni
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
Avinash Ramineni
 

More from Avinash Ramineni (7)

Simplifying the data privacy governance quagmire building automated privacy ...
Simplifying the data privacy governance quagmire  building automated privacy ...Simplifying the data privacy governance quagmire  building automated privacy ...
Simplifying the data privacy governance quagmire building automated privacy ...
 
Winning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscapeWinning the war on data breaches in a changing data landscape
Winning the war on data breaches in a changing data landscape
 
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Event Driven Architectures
Event Driven ArchitecturesEvent Driven Architectures
Event Driven Architectures
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

MongoDB Replication fundamentals - Desert Code Camp - October 2014

  • 1. MongoDB Replication Fundamentals Desert Code Camp – Oct 2014 By Avinash Ramineni avinash@clairvoyantsoft.com
  • 2. Agenda • Introduction to MongoDB • MongoDB Replication • Understanding Oplog • Stream data from Oplog • Demo • Gotchas • Questions
  • 3. Why use a NoSQL Database? • NoSQL describes a horizontally scalable, non-relational database with built-in replication support • One Size does not Fit All – RDBMS • Horizontal or Vertical Scalability ? – Key-Value stores – Column – Document and Graph • High Availability and Scalability • CAP Theorem – Choose any two from (Consistency, Availability , Partition Tolerance) • Availability and Partition Tolerance
  • 4. Why use a NoSQL Database? -2 • NoSQL’s primary goal is to achieve horizontal scalability. It attains this by reducing transactional semantics and referential integrity.
  • 5. MongoDB -1 • Document Oriented Database – Bridges the gap between RDBMS and Key-Value Stores – Atomicity – Indexing – Sharding - horizontal Scalability • BSON format – Binary encoded JSON representation • No Joins • Complex Queries /Indices • Row Level Locking
  • 6. MongoDB -2 • MongoDB Cluster – Master - Slave • Slave can become Master incase of fail-over • Only Master is allowed to commit changes to Store – Master – Master in limited capacity • Inserts/Queries/Deletions are done by Id • Does not work if the usecase expects same object can be updated concurrently – ReplicaSets
  • 8. Replication • Why Replication ? – Failover Scenarios • Hot Backups – Disaster Recovery • Provides Redundancy and Increases Data Availability • Increases Read Capacity • Different uses of data • Normal processing • DR / Backup • Reporting
  • 9. MongoDB Terminology • Database – Collection (RDBMS – table) – Document (RDBMS – row) • Cluster Node Types – Primary – Secondary – Arbiter – Hidden
  • 11. Replicasets • Primary – Primary accepts all write operations – Only one Primary – Strict Consistency for reads – Logs all the changes in data to “oplog “ • Secondary – Replicate by reading Primary’s “oplog” – Reads might return stale data – Can become primary
  • 14. Read Preference • Routes Read operations to Replica set Members • Increase Read throughputs • Reduce Latency • Secondary reads might be stale • Modes – Primary – Primary Preferred (secondary if primary unavailable) – Secondary – Secondary Preferred – Nearest (read from member with least network latency)
  • 15. Write Preference • Write only on Primary (Default) • Write to N number of replica set members db.products.insert( { item: "envelopes", qty : 100, type: "Clasp" }, { writeConcern: { w: 2, wtimeout: 5000 } } )
  • 20. Stream data from MongoDB
  • 21. Oplog (Operation Log) • Similar to Oracle Redo log – Rolling record of all operations that modify the data – All writes (insert/update/delete) get an entry in the Oplog • Replicaset members have oplog collection – local.oplog.rs – Oplog is yet another collection in the database
  • 22. Oplog in Action - Demo
  • 24. Dissecting Oplog .. • Oplog Contents – ts: the time this operation occurred. – h: a unique ID for this operation. Each operation will have a different value in this field. – op: the write operation that should be applied to the slave – ns: the database and collection affected by this operation. – o: the actual document representing the operation – v: Version of the oplog.
  • 25. Oplog - op • Op – Operation – i inserts – u updates – d deletes – n no-op • Updates has an extra field – o2 • o1 has update information • o2 has the id that was updated
  • 26. Triggers? • Does mongoDB have triggers? – Tailable cursors • tail –f oplog • Notice any issues with oplog – Aren't we doubling the size of the database ?
  • 27. Oplog .. • Capped Collection (fixed Size collection) – Circular Queue – Default Oplog size depends on the OS – Oldest entries get overwritten • What if the slave node is way off that the oplog got overwritten – Full Resync • copyDatabase starts streaming from oplog – What if oplog rolls over while the slaves are completing the copy
  • 28. Non-Replicated Collections • local database – Collections in local don’t get replicated – Changes to the collections in local database don’t show up in the oplog

Editor's Notes

  1. One size doesnot fit all -- abiltiy of the system to store ,analyze , manipulate the with out loosing Availability , Performance and Throughput as the data increases -- enforce data integrity and enforce schema rules.. Enable high-performance queries on complex, connected data ●  Easily represent the complex, connected data stored in today’s applications The type of NOSQL database you choose depends on what type of data you need to store and how you want to access it. A graph database, for instance, models real world connections better than other NOSQL databases Column family It’s a powerful way to capture semi-structured data, but often sacrifices consistency for availability A document database contains a collection of key-value pairs stored in documents. While it is good at storing documents, it was not designed with enterprise-strength transactions and durability in mind. Document databases are the most flexible of the key-value style stores, perfect for storing a large collection of unrelated, discrete documents Relationals DBs scale with adding more processor / memory / diskspace ----- >loading data from disk ?? Try adding a new column to a very large relational database ORACLE RAC – multiple computers with access to the same database - shared storage facilities…that do not scale out Availability , Consistency ----- single database with all your data - might not work all the data needs to be in single instance of the database Partition tolerance and Consistency  2 phase commits across database Availability and Partition tolerance ---- NoSQL way NoSQL storage is highly replicated (a commit doesn’t occur until the data is successfully written to at least two separate storage devices) and the file systems are optimized for write-only commits Consistency (all nodes see the same data at the same tim e) • A vailability (node failures don’t prevent survivors from continuing to operate) • Partition tolerance (no failur es less than total network failures cause the system to fail) Don’ t be stubborn; neither NoSQL nor traditional databases apply to all cases • Apply the CAP Theor em to your use cases to determine feasibility
  2. Relationals DBs scale with adding more processor / memory / diskspace ----- >loading data from disk ?? Try adding a new column to a very large relational database ORACLE RAC – multiple computers with access to the same database - shared storage facilities…that do not scale out Availability , Consistency ----- single database with all your data - might not work all the data needs to be in single instance of the database Partition tolerance and Consistency  2 phase commits across database Availability and Partition tolerance ---- NoSQL way To scale horizontally, you need strong network partition tolerance which requires giving up either consistency or availability NoSQL storage is highly replicated (a commit doesn’t occur until the data is successfully written to at least two separate storage devices) and the file systems are optimized for write-only commits Consistency means that each client always has the same view of the data. Availability means that all clients can always read and write. Partition tolerance means that the system works well across physical network partitions. Don’ t be stubborn; neither NoSQL nor traditional databases apply to all cases • Apply the CAP Theor em to your use cases to determine feasibility
  3. mongoDB is a document-based NoSQL database that bridges the gap between scalable key-value stores like Datastore and Memcache DB, and RDBMS’s querying and robustness capabilities. Document-oriented storage - data is manipulated as JSON-like documents • Querying - uses JavaScript and has APIs for submitting queries in every major programming language • In-place updates - atomicity • Indexing - any attribute in a document may be used for indexing and query optimization • Auto-sharding - enables horizontal scalability • Map/reduce - the mongoDB cluster may run smaller MapReduce jobs than a Hadoop cluster with significant cost and efficiency improvements
  4. mongoDB is a document-based NoSQL database that bridges the gap between scalable key-value stores like Datastore and Memcache DB, and RDBMS’s querying and robustness capabilities. Document-oriented storage - data is manipulated as JSON-like documents • Querying - uses JavaScript and has APIs for submitting queries in every major programming language • In-place updates - atomicity • Indexing - any attribute in a document may be used for indexing and query optimization • Auto-sharding - enables horizontal scalability • Map/reduce - the mongoDB cluster may run smaller MapReduce jobs than a Hadoop cluster with significant cost and efficiency improvements
  5. mongoDB is a document-based NoSQL database that bridges the gap between scalable key-value stores like Datastore and Memcache DB, and RDBMS’s querying and robustness capabilities. Document-oriented storage - data is manipulated as JSON-like documents • Querying - uses JavaScript and has APIs for submitting queries in every major programming language • In-place updates - atomicity • Indexing - any attribute in a document may be used for indexing and query optimization • Auto-sharding - enables horizontal scalability • Map/reduce - the mongoDB cluster may run smaller MapReduce jobs than a Hadoop cluster with significant cost and efficiency improvements
  6. A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.
  7. When a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary. The first secondary that receives a majority of the votes becomes primary.
  8. The application performs a read with a different read preference, The thread terminates, or The client receives a socket exception, as is the case when there’s a network error or when the mongod closes connections during a failover. This triggers a retry, which may be transparent to the application. When using request association, if the client detects that the set has elected a new primary, the driver will discard all associations between threads and members.
  9. override this default write concern, such as to confirm write operations on a specified number of the replica set members. MongoDB does not provide any multi-document transactions or isolation.
  10. the following method includes a write concern that specifies that the method return only after the write propagates to the primary and at least one secondary or the method times out after 5 seconds.
  11. Acknowledged With a receipt acknowledged write concern, the mongod confirms that it received the write operation and applied the change to the in-memory view of data. Acknowledged write concern allows clients to catch network, duplicate key, and other errors. MongoDB uses the acknowledged write concern by default starting in the driver releases outlined in Acknowledged write concern does not confirm that the write operation has persisted to the disk system.
  12. With a journaled write concern, the MongoDB acknowledges the write operation only after committing the data to the journal. This write concern ensures that MongoDB can recover the data following a shutdown or power interruption. You must have journaling enabled to use this write concern. With a journaled write concern, write operations must wait for the next journal commit. To reduce latency for these operations, MongoDB also increases the frequency that it commits operations to the journal
  13. the following method includes a write concern that specifies that the method return only after the write propagates to the primary and at least one secondary or the method times out after 5 seconds.
  14. A fixed-sized collection that automatically overwrites its oldest entries when it reaches its maximum size. The MongoDB oplog that is used in replication is a capped collection
  15. GO OVER other possible updates
  16. Oplog is the reason why capped collections were invented