SlideShare a Scribd company logo
1 of 44
Download to read offline
MongoDB 在盛大大数据
                    量下的应用

               郭理靖 @sn da ; gu ol ijing@ gm a il .c om ;




11年10月19日星期三
Sections


               Basic introduction to MongoDB

               Monitor MongoDB

               Backup and Rollback of MongoDB

               Case Study




11年10月19日星期三
What’s MongoDB


               MongoDB (from "humongous") is a
               scalable, high-performance, open
               source, document-oriented
               database.
               Current version:2.0.0


11年10月19日星期三
What’s MongoDB


               MongoDB=
               JSON + Indexes

11年10月19日星期三
Philosophy Of MongoDB




11年10月19日星期三
Features

               Document-oriented storage (JSON)

               Full Index Support (Indexes)

               Replication & High availability

               Rich Document-based Queries and Updates

               Map/Reduce

               Auto-Sharding and GridFS


11年10月19日星期三
Terminology
                 RDBMS              MongoDB
                  Table             Collection
               View/Row(s)       JSON Document
               Column name         Field name
                  Index               Index
                   Join        Embedding & Linking
                 Partition            Shard
               Partition Key        Shard Key


11年10月19日星期三
SELECT
                          mySQL                                        db.runCommand({
                                                                                         MongoDB
               Dim1, Dim2,                                 1           mapreduce: "DenormAggCollection",
               SUM(Measure1) AS MSum,                                  query: {
                                                           2
               COUNT(*) AS RecordCount,                                     filter1: { '$in': [ 'A', 'B' ] },
               AVG(Measure2) AS MAvg,                      3                filter2: 'C',
               MIN(Measure1) AS MMin                                        filter3: { '$gt': 123 }
               MAX(CASE                                                  },
                  WHEN Measure2 < 100                      4           map: function() { emit(
                  THEN Measure2                                             { d1: this.Dim1, d2: this.Dim2 },
               END) AS MMax                                                 { msum: this.measure1, recs: 1, mmin: this.measure1,
           FROM DenormAggTable                                                mmax: this.measure2 < 100 ? this.measure2 : 0 }
           WHERE (Filter1 IN (’A’,’B’))                                  );},
               AND (Filter2 = ‘C’)                         5           reduce: function(key, vals) {
               AND (Filter3 > 123)                                          var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 };
           GROUP BY Dim1, Dim2                             1                for(var i = 0; i < vals.length; i++) {
           HAVING (MMin > 0)                                                  ret.msum += vals[i].msum;
           ORDER BY RecordCount DESC                                          ret.recs += vals[i].recs;
           LIMIT 4, 8                                                         if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin;
                                                                              if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax))
                                                                                ret.mmax = vals[i].mmax;
                                                                            }
               1   Grouped dimension columns are pulled                     return ret;
                   out as keys in the map function,                      },
                   reducing the size of the working set.               finalize: function(key, val) {
                                                                   6
               2   Measures must be manually aggregated.       7            val.mavg = val.msum / val.recs;
                                                                            return val;
               3 Aggregates depending on record counts
                                                                         },




                                                                                                                               Revision 4, Created 2010-03-06
                                                                                                                               Rick Osborne, rickosborne.org
                 must wait until finalization.
               4 Measures can use procedural logic.
                                                                       out: 'result1',
                                                                       verbose: true
               5 Filters have an ORM/ActiveRecord-                     });
                 looking style.
                                                                       db.result1.
               6 Aggregate filtering must be applied to
                 the result set, not in the map/reduce.
                                                                         find({ mmin: { '$gt': 0 } }).
               7 Ascending: 1; Descending: -1
                                                                         sort({ recs: -1 }).
                                                                         skip(4).
                                                                         limit(8);




11年10月19日星期三
Indexes

               Unique Indexes or Duplicate Values Indexes

               Index of Single Key(Embedded key, Document
               key) .Default index, ‘_id’: MongoID(GUID)

               Index of Compound
               Keys(db.user.ensureIndex({credit : -1, name: 1}))

               Sparse Index. (A sparse index can only have one
               field) //after 1.75


11年10月19日星期三
Geo Indexes

               Geospatial index supported (good news for LBS)

               db.c.find( {a:[50,50]} ) using index {a:’2d’}

               db.c.find( {a:{$near:[50,50]}} ) using index
               {a:’2d’}

               Results are sorted closest - farthest

               db.c.find( {a:{$within:{$box:[[40,40],
               [60,60]]}}} ) using index {a:’2d’}


11年10月19日星期三
Atomic Operations
               $set $unset $inc

               $push - append a value to an array

               $pushAll - append several values to an array

               $pull - remove a value(s) from an existing array

               $pullAll - remove several value(s) from an
               existing array

               $bit - bitwise operations

11年10月19日星期三
Tips

               Update requires a write lock.

               Write lock is “greedy” right now

               Update in big collection is slow and block other
               writes during the update

               Query it first then update it will reduce the lock
               time



11年10月19日星期三
Replication




11年10月19日星期三
Replica set

               Replica sets are basically master/slave
               replication, adding automatic failover and
               automatic recovery of member nodes.

               A Replica Set consists of two or more nodes that
               are copies of each other. (i.e.: replicas)

               The Replica Set automatically elects a Primary
               (master) if there is no primary currently
               available.


11年10月19日星期三
Replica set

               Automated fail-over

               Distribute read load

               Simplified maintenance

               A Replica Set can have passive members that
               cannot become primary (for reading and backup)

               A Replica Set can have delayed nodes (in case of
               user error)


11年10月19日星期三
step down primary

               If you step down primary manually when
               primary is heavy, the replica set may can not
               elect new primary. (it is fixed in 1.8.0)

               May lose data when step down primary
               manually. (sometimes, not test in 1.8.0 yet)

               Clients may not know what happened.



11年10月19日星期三
What’s happened when
                  primary is down?
               Replica set will elect new primary, but how?

               what’s arbiter?

               operator time

               votes

               priority

               may lose data when primary is down


11年10月19日星期三
Auto-Sharding




11年10月19日星期三
Monitoring and Diagnostics

               Query Profiler

               Http Console

               mongostat

               db.stat()

               db.serverStatus()



11年10月19日星期三
Database Profiler


               >db.setProfilingLevel(1,20)

                { "was" : 0, "slowms" : 100, "ok" : 1 }

               > db.getProfilingStatus() //after1.7x

                { "was" : 1, "slowms" : 20 }



11年10月19日星期三
Database Profiler


               db.system.profile.find()

               {"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500
               (EST)" , "info" : "query test.$cmd ntoreturn:1
               reslen:66 nscanned:0 <br>query: { profile: 2 }
               nreturned:1 bytes:50" , "millis" : 0}

               db.system.profile.find().sort({$natural:-1})//To
               see newest information first


11年10月19日星期三
mongostat




11年10月19日星期三
mongostat

               inserts/s - # of inserts per second

                query/s     - # of queries per second

                update/s - # of updates per second

                delete/s    - # of deletes per second

                getmore/s     - # of get mores (cursor batch)
               per second


11年10月19日星期三
mongostat

               command/s      - # of commands per second

                flushes/s - # of fsync flushes per second

                 mapped - amount of data mmaped (total data
               size) megabytes

                visze   - virtual size of process in megabytes

                res     - resident size of process in megabytes


11年10月19日星期三
mongostat

                faults/s    - # of pages faults/sec (linux only)

                locked      - percent of time in global write lock

                 idx miss   - percent of btree page misses
               (sampled)

                q t|r|w     - lock queue lengths (total|read|
               write)

                conn        - number of open connections


11年10月19日星期三
Admin UIs




11年10月19日星期三
MongoDB with Numa

               WARNING: You are running on a NUMA
               machine.

          We suggest launching mongod like this
          to avoid performance problems:
          numactl --interleave=all mongod
          [other options]




11年10月19日星期三
Security


               Auth is supported in Replica set/Master-Slaves,
               Auto-Sharding

               bind_ip

               start auth in RS may have bug.(use http to check
               the status and try restart mongods)




11年10月19日星期三
Memory

               Memory-mapping

               Keep indexes in memory

               db.test.stats()

               db.test.totalIndexSize()

               RAM > indexes + hot data = better performance



11年10月19日星期三
some tips

               No transactions

               Fast-N-Loose by default (no safe/w/GLE)

               Indexing order matters; query optimizer helps

               One write thread,many query threads

               Memory Mapped Data Files

               BSON everywhere


11年10月19日星期三
Oplog of MongoDB

               Oplog in Replica set:

                local.system.replset: store config of replica set,
                use rs.conf() to detail

                local.oplog.rs: capped collection, use --
                oplogSize to set size

                local.replset.minvalid: for sync status

                NO INDEX


11年10月19日星期三
Oplog of MongoDB

               > db.test.insert({'name': 'guolijing', ‘fat’:‘false’})

               > db.oplog.rs.find().sort({$natural:-1})

               { "ts" : { "t" : 1318772440000, "i" : 1 }, "h" :
               NumberLong( "1503388658822904667" ), "op" :
               "i", "ns" : "test.test", "o" : { "_id" :
               ObjectId("4e9aded8bbf25c4665f212fc"),
               "name" : "guolijing" , ‘fat’:‘false’} }



11年10月19日星期三
Oplog of MongoDB

               {ts:{},h{}, op:{},ns:{},o:{},o2:{} }
               Ts: 8 bytes of time stamp by 4 bytes Unix timestamp + 4 bytes since the count said.

          This value is very important, in the elections (such as master is down unit), new primary would
          choose the biggest change as the ts new primary.
          Op: 1 bytes of operation types, such as I said insert, d said the delete.
          In the namespace ns: the operation.
          O: the operation is corresponding to the document, that is, the content of the current operation
          (such as the update operation to update when the fields and value)
          O2: perform update operation in the condition is limited to the update, where to have this property
          Among them, can be the op several situations:
          h:// hash mongodb use to make sure we are reading the right flow of ops and aren't on an out-of-
          date "fork"




11年10月19日星期三
Oplog of MongoDB

               > db.test.update({'name':'guolijing'},{$set:
               {'fat':'ture'}})

               > db.oplog.rs.find().sort({$natural:-1})

               { "ts" : { "t" : 1318775290000, "i" : 1 }, "h" :
               NumberLong( "-5204953982587889486" ),
               "op" : "u", "ns" : "test.test", "o2" : { "_id" :
               ObjectId("4e9aded8bbf25c4665f212fc") }, "o" :
               { "$set" : { "fat" : "ture" } } }


11年10月19日星期三
Feature of Oplog


               Replay same oplog is harmless

               Replay old oplogs is harmless if the ts in last one
               is newer than current database




11年10月19日星期三
Back up

               mongoexport mongoimport

               mongodump mongorestore

               data consistency? use --oplog

               use incremental back up

               read oplog and replay (wordnik tools)



11年10月19日星期三
Rollback MongoDB


               Use snapshot + oplog
               Use delayed secondary +
               oplog


11年10月19日星期三
Vertical Scala


               Old server 8G RAM +200G dis

               New server 64G RAM + 2T disk

               What can I do if my service should always keep
               online ?




11年10月19日星期三
Vertical Scala

               You’re luck if using Replica set

               move everything(include ip config) from
               secondary to new service

               step down primary, do the same thing above

               well, problem solved



11年10月19日星期三
Build New Index


               Build a index is quite easy.....

               Build a index in a huge collections, such as
               500GB ,and database is very buy, is it easy?




11年10月19日星期三
Build New Index


               Take a snapshot of current Mongodb service

               Build new index in snapshot

               replay oplog of working Mongod

               Replace the working Mongod




11年10月19日星期三
Best practices



               Replica set: one for primary, one for secondary,
               one for incremental back up with a arbiter.




11年10月19日星期三
Q&A



               www.mongoic.com is online !
               We’re hiring...




11年10月19日星期三
MongoDB 在盛大大数据量下的应用

More Related Content

What's hot

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Takahiro Inoue
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
MongoDB
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
Scott Hernandez
 

What's hot (20)

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Introduction to Tokyo Products
Introduction to Tokyo ProductsIntroduction to Tokyo Products
Introduction to Tokyo Products
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love Story
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDB
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
CMUデータベース輪読会第8回
CMUデータベース輪読会第8回CMUデータベース輪読会第8回
CMUデータベース輪読会第8回
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Zend Server Data Caching
Zend Server Data CachingZend Server Data Caching
Zend Server Data Caching
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Java Development with MongoDB
Java Development with MongoDBJava Development with MongoDB
Java Development with MongoDB
 

Similar to MongoDB 在盛大大数据量下的应用

Sql to-mongo db
Sql to-mongo dbSql to-mongo db
Sql to-mongo db
Louis liu
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
ajhannan
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Mongoseattle indexing-2010-07-27
Mongoseattle indexing-2010-07-27Mongoseattle indexing-2010-07-27
Mongoseattle indexing-2010-07-27
MongoDB
 

Similar to MongoDB 在盛大大数据量下的应用 (20)

Sql to Mongodb
Sql to MongodbSql to Mongodb
Sql to Mongodb
 
Sql to-mongo db
Sql to-mongo dbSql to-mongo db
Sql to-mongo db
 
Sql to-mongo db
Sql to-mongo dbSql to-mongo db
Sql to-mongo db
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - Guilin
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Mongoseattle indexing-2010-07-27
Mongoseattle indexing-2010-07-27Mongoseattle indexing-2010-07-27
Mongoseattle indexing-2010-07-27
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management system
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 

More from iammutex

Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
iammutex
 

More from iammutex (20)

Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
深入了解Redis
深入了解Redis深入了解Redis
深入了解Redis
 
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
 
skip list
skip listskip list
skip list
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告
 
redis 适用场景与实现
redis 适用场景与实现redis 适用场景与实现
redis 适用场景与实现
 
Introduction to couchdb
Introduction to couchdbIntroduction to couchdb
Introduction to couchdb
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disks
 
Ooredis
OoredisOoredis
Ooredis
 
Ooredis
OoredisOoredis
Ooredis
 
redis运维之道
redis运维之道redis运维之道
redis运维之道
 
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
 
[译]No sql生态系统
[译]No sql生态系统[译]No sql生态系统
[译]No sql生态系统
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
Redis cluster
Redis clusterRedis cluster
Redis cluster
 
Redis cluster
Redis clusterRedis cluster
Redis cluster
 
Hadoop introduction berlin buzzwords 2011
Hadoop introduction   berlin buzzwords 2011Hadoop introduction   berlin buzzwords 2011
Hadoop introduction berlin buzzwords 2011
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

MongoDB 在盛大大数据量下的应用

  • 1. MongoDB 在盛大大数据 量下的应用 郭理靖 @sn da ; gu ol ijing@ gm a il .c om ; 11年10月19日星期三
  • 2. Sections Basic introduction to MongoDB Monitor MongoDB Backup and Rollback of MongoDB Case Study 11年10月19日星期三
  • 3. What’s MongoDB MongoDB (from "humongous") is a scalable, high-performance, open source, document-oriented database. Current version:2.0.0 11年10月19日星期三
  • 4. What’s MongoDB MongoDB= JSON + Indexes 11年10月19日星期三
  • 6. Features Document-oriented storage (JSON) Full Index Support (Indexes) Replication & High availability Rich Document-based Queries and Updates Map/Reduce Auto-Sharding and GridFS 11年10月19日星期三
  • 7. Terminology RDBMS MongoDB Table Collection View/Row(s) JSON Document Column name Field name Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key 11年10月19日星期三
  • 8. SELECT mySQL db.runCommand({ MongoDB Dim1, Dim2, 1 mapreduce: "DenormAggCollection", SUM(Measure1) AS MSum, query: { 2 COUNT(*) AS RecordCount, filter1: { '$in': [ 'A', 'B' ] }, AVG(Measure2) AS MAvg, 3 filter2: 'C', MIN(Measure1) AS MMin filter3: { '$gt': 123 } MAX(CASE }, WHEN Measure2 < 100 4 map: function() { emit( THEN Measure2 { d1: this.Dim1, d2: this.Dim2 }, END) AS MMax { msum: this.measure1, recs: 1, mmin: this.measure1, FROM DenormAggTable mmax: this.measure2 < 100 ? this.measure2 : 0 } WHERE (Filter1 IN (’A’,’B’)) );}, AND (Filter2 = ‘C’) 5 reduce: function(key, vals) { AND (Filter3 > 123) var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; GROUP BY Dim1, Dim2 1 for(var i = 0; i < vals.length; i++) { HAVING (MMin > 0) ret.msum += vals[i].msum; ORDER BY RecordCount DESC ret.recs += vals[i].recs; LIMIT 4, 8 if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } 1 Grouped dimension columns are pulled return ret; out as keys in the map function, }, reducing the size of the working set. finalize: function(key, val) { 6 2 Measures must be manually aggregated. 7 val.mavg = val.msum / val.recs; return val; 3 Aggregates depending on record counts }, Revision 4, Created 2010-03-06 Rick Osborne, rickosborne.org must wait until finalization. 4 Measures can use procedural logic. out: 'result1', verbose: true 5 Filters have an ORM/ActiveRecord- }); looking style. db.result1. 6 Aggregate filtering must be applied to the result set, not in the map/reduce. find({ mmin: { '$gt': 0 } }). 7 Ascending: 1; Descending: -1 sort({ recs: -1 }). skip(4). limit(8); 11年10月19日星期三
  • 9. Indexes Unique Indexes or Duplicate Values Indexes Index of Single Key(Embedded key, Document key) .Default index, ‘_id’: MongoID(GUID) Index of Compound Keys(db.user.ensureIndex({credit : -1, name: 1})) Sparse Index. (A sparse index can only have one field) //after 1.75 11年10月19日星期三
  • 10. Geo Indexes Geospatial index supported (good news for LBS) db.c.find( {a:[50,50]} ) using index {a:’2d’} db.c.find( {a:{$near:[50,50]}} ) using index {a:’2d’} Results are sorted closest - farthest db.c.find( {a:{$within:{$box:[[40,40], [60,60]]}}} ) using index {a:’2d’} 11年10月19日星期三
  • 11. Atomic Operations $set $unset $inc $push - append a value to an array $pushAll - append several values to an array $pull - remove a value(s) from an existing array $pullAll - remove several value(s) from an existing array $bit - bitwise operations 11年10月19日星期三
  • 12. Tips Update requires a write lock. Write lock is “greedy” right now Update in big collection is slow and block other writes during the update Query it first then update it will reduce the lock time 11年10月19日星期三
  • 14. Replica set Replica sets are basically master/slave replication, adding automatic failover and automatic recovery of member nodes. A Replica Set consists of two or more nodes that are copies of each other. (i.e.: replicas) The Replica Set automatically elects a Primary (master) if there is no primary currently available. 11年10月19日星期三
  • 15. Replica set Automated fail-over Distribute read load Simplified maintenance A Replica Set can have passive members that cannot become primary (for reading and backup) A Replica Set can have delayed nodes (in case of user error) 11年10月19日星期三
  • 16. step down primary If you step down primary manually when primary is heavy, the replica set may can not elect new primary. (it is fixed in 1.8.0) May lose data when step down primary manually. (sometimes, not test in 1.8.0 yet) Clients may not know what happened. 11年10月19日星期三
  • 17. What’s happened when primary is down? Replica set will elect new primary, but how? what’s arbiter? operator time votes priority may lose data when primary is down 11年10月19日星期三
  • 19. Monitoring and Diagnostics Query Profiler Http Console mongostat db.stat() db.serverStatus() 11年10月19日星期三
  • 20. Database Profiler >db.setProfilingLevel(1,20) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.getProfilingStatus() //after1.7x { "was" : 1, "slowms" : 20 } 11年10月19日星期三
  • 21. Database Profiler db.system.profile.find() {"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500 (EST)" , "info" : "query test.$cmd ntoreturn:1 reslen:66 nscanned:0 <br>query: { profile: 2 } nreturned:1 bytes:50" , "millis" : 0} db.system.profile.find().sort({$natural:-1})//To see newest information first 11年10月19日星期三
  • 23. mongostat inserts/s - # of inserts per second query/s - # of queries per second update/s - # of updates per second delete/s - # of deletes per second getmore/s - # of get mores (cursor batch) per second 11年10月19日星期三
  • 24. mongostat command/s - # of commands per second flushes/s - # of fsync flushes per second mapped - amount of data mmaped (total data size) megabytes visze - virtual size of process in megabytes res - resident size of process in megabytes 11年10月19日星期三
  • 25. mongostat faults/s - # of pages faults/sec (linux only) locked - percent of time in global write lock idx miss - percent of btree page misses (sampled) q t|r|w - lock queue lengths (total|read| write) conn - number of open connections 11年10月19日星期三
  • 27. MongoDB with Numa WARNING: You are running on a NUMA machine. We suggest launching mongod like this to avoid performance problems: numactl --interleave=all mongod [other options] 11年10月19日星期三
  • 28. Security Auth is supported in Replica set/Master-Slaves, Auto-Sharding bind_ip start auth in RS may have bug.(use http to check the status and try restart mongods) 11年10月19日星期三
  • 29. Memory Memory-mapping Keep indexes in memory db.test.stats() db.test.totalIndexSize() RAM > indexes + hot data = better performance 11年10月19日星期三
  • 30. some tips No transactions Fast-N-Loose by default (no safe/w/GLE) Indexing order matters; query optimizer helps One write thread,many query threads Memory Mapped Data Files BSON everywhere 11年10月19日星期三
  • 31. Oplog of MongoDB Oplog in Replica set: local.system.replset: store config of replica set, use rs.conf() to detail local.oplog.rs: capped collection, use -- oplogSize to set size local.replset.minvalid: for sync status NO INDEX 11年10月19日星期三
  • 32. Oplog of MongoDB > db.test.insert({'name': 'guolijing', ‘fat’:‘false’}) > db.oplog.rs.find().sort({$natural:-1}) { "ts" : { "t" : 1318772440000, "i" : 1 }, "h" : NumberLong( "1503388658822904667" ), "op" : "i", "ns" : "test.test", "o" : { "_id" : ObjectId("4e9aded8bbf25c4665f212fc"), "name" : "guolijing" , ‘fat’:‘false’} } 11年10月19日星期三
  • 33. Oplog of MongoDB {ts:{},h{}, op:{},ns:{},o:{},o2:{} } Ts: 8 bytes of time stamp by 4 bytes Unix timestamp + 4 bytes since the count said. This value is very important, in the elections (such as master is down unit), new primary would choose the biggest change as the ts new primary. Op: 1 bytes of operation types, such as I said insert, d said the delete. In the namespace ns: the operation. O: the operation is corresponding to the document, that is, the content of the current operation (such as the update operation to update when the fields and value) O2: perform update operation in the condition is limited to the update, where to have this property Among them, can be the op several situations: h:// hash mongodb use to make sure we are reading the right flow of ops and aren't on an out-of- date "fork" 11年10月19日星期三
  • 34. Oplog of MongoDB > db.test.update({'name':'guolijing'},{$set: {'fat':'ture'}}) > db.oplog.rs.find().sort({$natural:-1}) { "ts" : { "t" : 1318775290000, "i" : 1 }, "h" : NumberLong( "-5204953982587889486" ), "op" : "u", "ns" : "test.test", "o2" : { "_id" : ObjectId("4e9aded8bbf25c4665f212fc") }, "o" : { "$set" : { "fat" : "ture" } } } 11年10月19日星期三
  • 35. Feature of Oplog Replay same oplog is harmless Replay old oplogs is harmless if the ts in last one is newer than current database 11年10月19日星期三
  • 36. Back up mongoexport mongoimport mongodump mongorestore data consistency? use --oplog use incremental back up read oplog and replay (wordnik tools) 11年10月19日星期三
  • 37. Rollback MongoDB Use snapshot + oplog Use delayed secondary + oplog 11年10月19日星期三
  • 38. Vertical Scala Old server 8G RAM +200G dis New server 64G RAM + 2T disk What can I do if my service should always keep online ? 11年10月19日星期三
  • 39. Vertical Scala You’re luck if using Replica set move everything(include ip config) from secondary to new service step down primary, do the same thing above well, problem solved 11年10月19日星期三
  • 40. Build New Index Build a index is quite easy..... Build a index in a huge collections, such as 500GB ,and database is very buy, is it easy? 11年10月19日星期三
  • 41. Build New Index Take a snapshot of current Mongodb service Build new index in snapshot replay oplog of working Mongod Replace the working Mongod 11年10月19日星期三
  • 42. Best practices Replica set: one for primary, one for secondary, one for incremental back up with a arbiter. 11年10月19日星期三
  • 43. Q&A www.mongoic.com is online ! We’re hiring... 11年10月19日星期三