• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MongoTokyo
 

MongoTokyo

on

  • 5,128 views

Mongo Tokyo 2012 session

Mongo Tokyo 2012 session

Statistics

Views

Total Views
5,128
Views on SlideShare
3,146
Embed Views
1,982

Actions

Likes
10
Downloads
95
Comments
0

18 Embeds 1,982

http://d.hatena.ne.jp 1294
http://www.10gen.com 439
http://f-kei.blogspot.com 75
http://blog.fkei.me 71
http://www.mongodb.com 33
http://garagekidztweetz.hatenablog.com 29
http://webcache.googleusercontent.com 10
http://drupal1.10gen.cc 9
http://a0.twimg.com 9
http://download.mongodb.org 2
http://www.twylah.com 2
http://w.mongodb.org 2
http://samples.10gen.com 2
http://www.alerts.mongodb.org 1
http://ww.mongodb.org 1
http://webinar.10gen.com 1
http://twitter.com 1
http://sxr.mongodb.org 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MongoTokyo MongoTokyo Presentation Transcript

    • KVSの性能 RDBMSのインデックス 更にMapReduceを併せ持つ All-in-one NoSQLRakuten,inc DU Architect Group Hiroaki Kubota |2011/1/18 1
    • Introduction 2
    • Who am I ? 3
    • IntroductionProfileName: 窪田 博昭 Hiroaki KubotaCompany: Rakuten Inc.Unit: ACT = Development Unit Architect GroupMail: hiroaki.kubota@mail.rakuten.comHobby: Futsal , GolfRecent: My physical power has gradual declined...twitter : crumbjpgithub: crumbjp 4
    • IntroductionAgenda• Introduction• Mongo’s characteristic• How to take advantage of the mongo for our service – Our new system “cockatoo” – MapReduce• Structure & Performance• Performance example ( on EC2 large )• Major problems... – Indexing – STALE – Diskspace – PHP client• Closing 5
    • Mongo’s characteristics 6
    • Mongo’s characteristicMongo’s ... / Mongo has ... / Mongo is ...READ performance is extremely good !WRITE performance is so-so,but cannot be scalable.To READ data immediately after it is WRITTEN is bad.Very high availability !Under development. Maintenance tools are poor. Some useless operations. 7
    • How to take advantages of the Mongo for the infoseek news 8
    • Our new system “Cockatoo”(used to be call “Albatross”) 9
    • For instance of our page 10
    • Page structure 11
    • Layout / ComponentsLayout Components 12
    • Generic WEB structureInternet Internet Request WEB WEB Call APIs API API Retrieve data DB 13
    • Cockatoo structure Internet Internet Request SessionDBLayoutDB Gat page layout MongoDB WEB WEB ReplSetMongoDBReplSet Get components Call APIs Memcache API API Retrieve data ContentsDB MongoDB ReplSet 14
    • Cockatoo structure Internet Internet Request SessionDBLayoutDB Gat page layout Mongo’s READ performance MongoDB is WEB WEB enough to cope with ReplSetMongoDBReplSet WEB PV. Get components Call APIs Memcache But WRITE performance is not enough. API API Retrieve data ContentsDB MongoDB ReplSet 15
    • Cockatoo structure Internet Internet Request SessionDBLayoutDB Gat page layout MongoDB WEB WEB ReplSetMongoDBReplSet Get components Call APIs Memcache API API Retrieve data ContentsDB MongoDB ReplSet 16
    • Cockatoo structure Internet Internet Request SessionDBLayoutDB Gat page layout MongoDB WEB WEB ReplSetMongoDBReplSet Get components Call APIs Memcache API APIZookeeper Retrieve data ContentsDB MongoDB ReplSet 17
    • Cockatoo structure Internet Internet Request SessionDBLayoutDB Gat page layout MongoDB WEB WEB ReplSetMongoDBReplSet Get components Call APIs Memcache API APIZookeeper Solr Retrieve data ContentsDB MongoDB ReplSet 18
    • Cockatoo structure Developer HTML markupLayoutDB Set page layout & Deploy API API settings CMS Batch serversMongoDBReplSet Set components Insert Data API servers API servers Set static contents ContentsDB MongoDB ReplSet 19
    • CMSLayout editor 20
    • CMS 21
    • CMS 22
    • MapReduce 23
    • MapReduceOur usageWe have never used MapReduce as regular operation.However, We have used it for some irreglar case.• To search the invalid articles that should be removed because of someone’s mistakes...• To analyze the number of new articles posted a day.• To analyze the updated number an article.• We get start considering to use it regularly for the social data analyzing before long ... 24
    • Structure & Performance 25
    • StructureWe are using very poor machine (Virtual machine) !! • Intel(R) Xeon(R) CPU X5650 2.67GHz 1core!! • 4GB memory • 50 GB disk space ( iScsi ) • CentOS5.5 64bit • mongodb 1.8.0 – ReplicaSet 5 nodes ( + 1 Arbiter) – Oplog size 1.2GB – Average object size 1KB 26
    • StructureResearched environmentWe’ve also researched following environments...• Virtual machine 1 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents• Virtual machine 3 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents• EC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) 27
    • PerformanceI found the formula for making a rough estimation of QPS1~8 kb documents + 1 unique indexC = Number of CPU cores (Xeon 2.67 GHz)DD = Score of ‘dd’ command (byte/sec)S = Document size (byte)• GET qps = 4500 × C• SET(fsync) bytes/s = 0.05×DD ÷ S• SET(nsync) qps = 4500 BUT... have chance of STALE 28
    • Performance example (on EC2 large) 29
    • Performance example (on EC2 large)Environment and amount of dataEC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) – 1 unique indexData-type { shop: someone, item: something, description: item explanation sentences...‘ } 30
    • Performance example (on EC2 large)Batch insert (1000 documents) fsync=true17906 sec (=289 min) (=3358 docs/sec)Ensure index (background=false)4049 sec (=67min) 1.primary 2101 sec (=35min) 2.secondary 1948 sec (=32min) 31
    • Performance example (on EC2 large)Add one node5833sec (=97min) 1.Get files 2GB×48 2120 sec (=35min) 2._id indexing 1406 sec (=23min) 3.uniq indexing 2251 sec (=38min) 4.other processes 56 sec (=1 min) 32
    • Performance example (on EC2 large)Group by• Reduce by unique index & map & reduce –368 msec db.data.group({ key: { shop: 1}, cond: { shop: someone }, reduce: function ( o , p ) { p.sum++; }, initial: { sum: 0 } }); 33
    • Performance example (on EC2 large)MapReduce• Scan all data 3116sec (=52min) –number of key = 39092 db.data.mapReduce( function(){ emit(this.shop,1); }, function(k,v){ var ret=0; v.forEach( function (value){ ret+=value; }); return ret; }, { query: {}, inline: 1, out: Tmp } ); 34
    • Major problems... 35
    • Indexing 36
    • Index probremOnline indexisng is completely useless even if last version (2.0.2)Indexing is lock operation in default.Indexing operation can run as background on the primary. But...It CANNOT run as background on the secondaryMoreover the all secondary’s indexing run at the same time !!Result in above... All slave freezes ! orz... 37
    • Present indexing ( default ) 38
    • Index probremPresent indexing ( default ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 39
    • Index probremPresent indexing ( default ) Primary ensureIndex Lock Cannot Batch write Indexing Secondary Secondary Secondary Client Client Client Client Client 40
    • Index probremPresent indexing ( default ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 41
    • Index probremPresent indexing ( default ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 42
    • Present indexing ( background ) 43
    • Index probremPresent indexing ( background ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 44
    • Index probrem Present indexing ( background )ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 45
    • Index probremPresent indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 46
    • Index probremPresent indexing ( background ) Primary finished Batch Background Complete don’t work indexing SYNC SYNC SYNC Secondary on the Lock secondaries Secondary Lock Secondary Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 47
    • Index probremPresent indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 48
    • Index probremPresent indexing ( background ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 49
    • Probable 2.1.X indexing 50
    • Index probremAccoding to mongodb.org this probrem will fix in 2.1.0But not released formally.So I checked out the source code up to date. Certainlly it’ll be fixed !Moreover it sounds like it’ll run as foreground when slave status isn’t SECONDARY (Does it means RECOVERING ?) 51
    • Index probremProbable 2.1.X indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 52
    • Index probrem Probable 2.1.X indexingensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 53
    • Index probremProbable 2.1.X indexing Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Slowdown Slowdown Slowdown Indexing Indexing Indexing Slow down... Client Client Client Client Client 54
    • Index probremProbable 2.1.X indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 55
    • Index probremBackground indexing 2.1.XBut I think it’s not enough.I think it can bring failure to our system whenthe all secondaries slowdown at the same time !! So... 56
    • Ideal indexing 57
    • Index probremIdeal indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 58
    • Index probrem Ideal indexingensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 59
    • Index probremIdeal indexing Primary finished Batch Complete ensureIndex Recovering Secondary Secondary Indexing Client Client Client Client Client 60
    • Index probremIdeal indexing Primary Batch Complete ensureIndex Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 61
    • Index probremIdeal indexing Primary Batch Complete ensureIndex Secondary Secondary Recovering Complete Complete Indexing Client Client Client Client Client 62
    • Index probremIdeal indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 63
    • Index probremBut ... I easilly guess it’s difficult to apply for current OplogIt would be great if I can operate indexing manually at each secondaries 64
    • I suggest Manual indexing 65
    • Index probremManual indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 66
    • Index probremManual indexingensureIndex(manual,background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 67
    • Index probremManual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 68
    • Index probremManual indexing Primary finished Batch Complete Secondary Secondary Secondary The secondaries don’t sync automatically Client Client Client Client Client 69
    • Index probremManual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 70
    • Index probremManual indexing Primary Batch Complete ensureIndex(manual) Recovering Secondary Secondary Indexing Client Client Client Client Client 71
    • Index probremManual indexing Primary Batch Complete ensureIndex(manual) Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 72
    • Index probremManual indexing Primary Batch CompleteensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 73
    • Index probremManual indexing Primary Batch It needs to support CompleteensureIndex(manual,background) background operation Secondary Secondary Secondary Slowdown Complete Complete IndexingJust in case,if the ReplSet has only one Secondary Client Client Client Client Client 74
    • Index probremManual indexing Primary Batch CompleteensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 75
    • Index probremManual indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 76
    • That’s all about Indexing problem 77
    • Struggle to control the sync 78
    • STALE 79
    • Unknown log & Out of control the ReplSetWe often suffered from going out of control the Secondaries...• Secondaries change status repeatedly in a moment between Secondary and Recovering (1.8.0)• Then we found the strange line in the log... [rsSync] replSet error RS102 too stale to catch up 80
    • What’s Stale ?stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp• 〈食品・飲料などが〉新鮮でない(⇔fresh);• 気の抜けた, 〈コーヒーが〉香りの抜けた,• 〈パンが〉ひからびた, 堅くなった,• 〈空気・臭(にお)いなどが〉むっとする,• いやな臭いのする 81
    • What’s Stale ?stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp• 〈食品・飲料などが〉新鮮でない(⇔fresh);• 気の抜けた, 〈コーヒーが〉香りの抜けた,• 〈パンが〉ひからびた, 堅くなった,• 〈空気・臭(にお)いなどが〉むっとする,• いやな臭いのするどうも非常によろしくないらしい・・・ 82
    • Mechanizm of being stale 83
    • ReplicaSet Client Clientmongod mongodDatabase Oplog Database Oplog Primary Secondary 84
    • Replication (simple case) 85
    • ReplicaSet Client Clientmongod mongodDatabase Oplog Database Oplog Primary Secondary 86
    • Insert & Replication 1 A Client Client Insertmongod mongod Insert A ADatabase Oplog Database Oplog Primary Secondary 87
    • Insert & Replication 1 Client Client Sync Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 88
    • Replication (busy case) 89
    • Stale Client Clientmongod mongod Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 90
    • Insert & Replication 2 B Client Client Insert Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 91
    • Insert & Replication 2 C Client Client Insert Insert C C Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 92
    • Insert & Replication 2 A Client Client Update Update A Insert C C Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 93
    • Insert & Replication 2 Client Client Check Oplog Update A Insert C C Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 94
    • Insert & Replication 2 Client Client Sync Update A Update A Insert C Insert C C Insert B C Insert B B Insert A B Insert A A ADatabase Oplog Database Oplog Primary Secondary 95
    • Replication (more busy) 96
    • Stale Client Clientmongod mongod Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 97
    • Stale B Client Client Insert Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 98
    • Stale C Client Client Insert Insert C C Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 99
    • Stale A Client Client Update Update A Insert C C Insert B B Insert A Insert A A ADatabase Oplog Database Oplog Primary Secondary 100
    • Stale C Client Client Update Update C Update A C Insert C B Insert B Insert A A Insert A ADatabase Oplog Database Oplog Primary Secondary 101
    • Stale D Client Client Insert Insert D D Update C C Update A B Insert C Insert A A Insert B ADatabase Insert A Database Oplog Primary Secondary 102
    • Stale Client Client [Inset A] not found !! Check Oplog Insert D D Update C C Update A B Insert C Insert A A Insert B ADatabase Insert A Database Oplog Primary Secondary 103
    • Stale Client Client [Inset A] not found !! Check Oplog It cannot get infomation about [Insert B]. Insert D D Update C C Update A So cannot sync !! B Insert C Insert A A Insert B A It’s called STALEDatabase Insert A Database Oplog Primary Recovering 104
    • StaleWe have to understand the importance of adjusting oplog sizeWe can specify the oplog size as one of the command line optionOnly at the first time per the dbpath that is also specified as a command line.Also we cannot change the oplog size without clearing the dbpath. Be careful ! 105
    • Replication (Join as a new node) 106
    • InitialSync Client Clientmongod Insert D D Update C C Update A B Insert C ADatabase Oplog Primary 107
    • InitialSync Client Clientmongod mongod Insert D D Update C C Update A B Insert C ADatabase Oplog Database Oplog Primary Startup 108
    • InitialSync Client Client Get last Oplog Insert D D Update C C Update A B Insert C Insert D ADatabase Oplog Database Oplog Primary Recovering 109
    • InitialSync D Client Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D ADatabase Oplog Database Oplog Primary Recovering 110
    • InitialSync D Client Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D A ADatabase Oplog Database Oplog Primary Recovering 111
    • InitialSync E D Client Client Insert C B A Cloning DB E Insert E D Insert D C Update C B B Update A Insert D A A Insert CDatabase Oplog Database Oplog Primary Recovering 112
    • InitialSync B Client Client Update Cloning DB complete E Update B D Insert E D C Insert D C B Update C B Insert D A Update A ADatabase Oplog Database Oplog Primary Recovering 113
    • InitialSync Client Client Check Oplog E Update B D Insert E D C Insert D C B Update C B Insert D A ADatabase Oplog Database Oplog Primary Recovering 114
    • InitialSync Client Client Sync E Update B E D Insert E D Update B C Insert D C Insert E B Update C B Insert D A ADatabase Oplog Database Oplog Primary Secondary 115
    • Additional infomationFrom source code. ( I’ve never examed these... )Secondary will try to sync from other Secondaries when it cannot reach the Primary or might be stale against the Primary. There is a bit of chance that sync problem not occured if the secondary has old Oplog or larger Oplog space than Primary 116
    • Sync from another secondary Client Client Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert BDatabase Insert A Database Oplog Database Insert A Primary Secondary Secondary 117
    • Sync from another secondary Client [Inset A] Client not found !! Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert BDatabase Insert A Database Oplog Database Insert A Primary Secondary Secondary 118
    • Sync from another secondary Client But found at the other secondary Client So it’s able to sync Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert BDatabase Insert A Database Oplog Database Insert A Primary Secondary Secondary 119
    • Sync from the other secondary Client But found at the other secondary Client So it’s able to sync Sync Insert D Insert D Insert D D Update C D Update C D Update C C Update A C Update A C Update A B Insert C B Insert C B Insert C A Insert B A Insert B A Insert B Insert ADatabase Insert A Database Database Insert A Primary Secondary Secondary 120
    • That’s all about sync 121
    • Others... 122
    • Disk space 123
    • Disk spaceData fragment into any DB files sparsely... We met the unfavorable circumstance in our DBs This circumstance appears at some of our collections around 3 months after we launched the services db.ourcol.storageSize() = 16200727264 (15GB) db.ourcol.totalSize() = 16200809184 db.ourcol.totalIndexSize() = 81920 db.outcol.dataSize() = 2032300 (2MB) What’s happen to them !! 124
    • Disk spaceData fragment into any DB files sparsely... It’s seems like to be caused by the specific operation that insert , update and delete over and over. Anyway we have to shrink the using disk space regularly just like PostgreSQL’s vacume. But how to do it ? 125
    • Disk spaceShrink the using disk spaces MongoDB offers some functions for this case. But couldn’t use in our case ! repairdatabase: Only runable on the Primary. It needs long time and BLOCK all operations !! compact: Only runable on the Secondary. Zero-fill the blank space instead of shrink disk spaces. So cannot shrink... 126
    • Disk spaceOur measurementsFor temporary collection: To issue drop-command regularly.For other collections: 1.Get rid of one secondary from the ReplSet. 2.Shut down this. 3.Remove all DB files. 4.Join to the ReplSet. 5.Do these operations one after another. 6.Step down the Primary. (Change Primary node) 7.At last, do 1 – 4 operations on prior Primary. 127
    • Disk spaceShrink operation Primary Secondary Secondary Bloated Bloated Bloated 128
    • Disk spaceShrink operation shutdown mongod (kill -15) Primary Dead Secondary Bloated Bloated Bloated 129
    • Disk spaceShrink operation delete DBPATH Primary Dead Secondary Bloated Nothing Bloated 130
    • Disk spaceShrink operation start mongod Primary Startup Secondary Bloated Nothing Bloated 131
    • Disk spaceShrink operation Primary Secondary Secondary Bloated Shrinked Bloated 132
    • Disk spaceShrink operation shutdown mongod delete DBPATH startup mongod Primary Secondary Secondary Bloated Shrinked Shrinked 133
    • Disk spaceShrink operation step down Secondary Primary Secondary Bloated Shrinked Shrinked 134
    • Disk spaceShrink operation shutdown mongod delete DBPATH startup mongod Secondary Primary Secondary Shrinked Shrinked Shrinked 135
    • PHP client 136
    • PHP clientWe tried 1.1.4 and 1.2.21.1.4: There is some critical bugs around connection pool. We struggled to invalidate the broken connection. I think, you should use 1.2.X instead of 1.1.X1.2.2: It seems like to be fixed around connection pool. But there are 2 critical bugs ! –Socket handle leak –Useless sleep However, This version is relatively stable 137 as long as to fix these bugs
    • PHP clientPatcheshttps://github.com/crumbjp/Personal - mongo1.2.2.non-wait.patch - mongo1.2.2.sock-leak.patch 138
    • PHP client 139
    • Closing 140
    • Closing What’s MongoDB ?It has very good READ performance. We can use mongo instead of memcached. if we can allow the limited write performance.Die hard ! MongoDB have high availability even if under a severe stress..Can use easilly without deep consideration We can manage to do anything after getting start to use. Let’s forget any awkward trivial things that have bothered us. How to treat the huge data ? How to put in the cache system ? How to keep the availablity ? And so on .... 141
    • ClosingKeep in mindSharding is challenging... It’s last resort ! It’s hard to operate. In particular, to maintain config-servers. [Mongos] is also difficult to keep alive. I want the way to failover Mongos.Mongo is able to run on the poor environment but... You should ONLY put aside the large diskspaceHuge write is sensitive Adjust the oplog size carefullyIndexing function has been unfinished Cannot apply index online 142
    • All right, Have fun !! 143
    • All right, Have fun !! ...with us at Rakuten 144
    • All right, Have fun !! ...with us at RakutenPlease join Rakuten for cool work? 145
    • Thank you for your listening 146