SlideShare a Scribd company logo
1 of 46
Download to read offline
MongoDB:
Scaling write performance
                            Junegunn Choi
MongoDB

• Document    data store

 • JSON-like   document

• Secondary   indexes

• Automatic   failover

• Automatic   sharding
First impression:
                     Easy

• Easy   installation

• Easy   data model

• No   prior schema design

• Native   support for secondary indexes
Second thought:
                Not so easy

• No   SQL

• Coping    with massive data growth

• Setting   up and operating sharded cluster

• Scaling   write performance
Today we’ll talk about
 insert performance
Insert throughput
  on a replica set
Steady 5k inserts/sec




              * 1kB record. ObjectId as PK
              * WriteConcern: Journal sync on Majority
Insert throughput
    on a replica set
with a secondary index
Culprit:
               B+Tree index

• Good    at sequential insert

 • e.g. ObjectId, Sequence       #, Timestamp

• Poor   at random insert

 • Indexes   on randomly-distributed data
Sequential vs. Random insert
  1                                           55
  2                                           75
  3                                           78
  4                                            1
  5                                           99
  6                                           36
  7                                           80
  8                                           91
  9                                           52
 10          B+Tree                           63      B+Tree
 11                                           56
 12                                           33
                            working set               working set




 Sequential insert ➔ Small working set    Random insert ➔ Large working set
   ➔ Fits in RAM ➔ Sequential I/O         ➔ Cannot fit in RAM ➔ Random I/O
         (bandwidth-bound)                          (IOPS-bound)
So, what do we do now?
1. Partitioning

                        Aug 2012   Sep 2012     Oct 2012




      B+Tree

                                              fits in memory

does not fit in memory
1. Partitioning

• MongoDB       doesn’t support partitioning

• Partitioning   at application-level

• e.g. Daily   log collection

  • logs_20121012
Switch collection every hour
2. Better H/W

• More   RAM

• More   IOPS

 • RAID   striping

 • SSD

 • AWS    Provisioned IOPS (1k ~ 10k)
3. More H/W: Sharding
• Automatic   partitioning across nodes

   SHARD1             SHARD2              SHARD3




                      mongos router
3 shards (3x3)
3 shards (3x3)
on RAID 1+0
There’s no free lunch
•   Manual partitioning

    •   Incidental complexity

•   Better H/W

    •   $

•   Sharding

    •   $$

    •   Operational complexity
“Do you really need that index?”
Scaling insert performance
       with sharding
=
Choosing the right shard key
Shard key example:
          year_of_birth
                                 64MB chunk

    ~ 1950    1971 ~ 1990     1951 ~ 1970

1991 ~ 2005   2006 ~ 2010

                2010 ~ ∞

  USERS         USERS           USERS
 SHARD1        SHARD2          SHARD3



              mongos router
5k inserts/sec w/o sharding
Sequential key


• ObjectId   as shard key

• Sequence   #

• Timestamp
Worse throughput with 3x H/W.
Sequential key

                                               1000 ~ 2000
• All   inserts into one chunk
                                               5000 ~ 7500
• Cannot    scale insert performance
                                                  9000 ~ ∞
• Chunk    migration overhead                     USERS
                                                SHARD-x

                                 9001, 9002, 9003, 9004, ...
Sequential key
Hash key


• e.g. SHA1(_id)   = 9f2feb0f1ef425b292f2f94 ...

• Distributes   inserts evenly across all chunks
Hash key


• Performance   drops as collection grows

 • Why?   Mandatory index on shard key

     • B+Tree   problem again!
Sequential key
  Hash key
Sequential + hash key
• Coarse-grained    sequential prefix

• e.g. Year-month   + hash value

  • 201210_24c3a5b9

                                          B+Tree



                               201208_*   201209_*   201210_*
But what if...


                         B+Tree




                             large working set

   201208_*   201209_*            201210_*
Sequential + hash key

• Can   you predict data growth rate?

• Balancer   not clever enough

  • Only   considers # of chunks

  • Migration   slow during heavy-writes
Sequential key
     Hash key
Sequential + hash key
Low-cardinality hash key
• Small   portion of hash value                   Shard key range: A ~ D


  • e.g. A~Z, 00~FF

• Alleviates   B+Tree problem
                                               Local
  • Sequential   access on fixed #             B+Tree
   of parts

    • Cardinality   / # of shards
                                    A A   A   B     B   B   C C C
Low-cardinality hash key

• Limits   the # of possible chunks

  • e.g. 00   ~ FF ➔ 256 chunks

  • Chunk     grows past 64MB

    • Balancing   becomes difficult
Sequential key
       Hash key
 Sequential + hash key
Low-cardinality hash key
Low-cardinality hash prefix
      + sequential part
                                         Shard key range: A000 ~ C999


• e.g. Short   hash prefix + timestamp

• Nice   index access pattern              Local
                                          B+Tree
• Unlimited    number of chunks


                                A000 A123 B000 B123 C000 C123
Finally, 2x throughput
Lessons learned
• Know     the performance impact of secondary index

• Choose     the right shard key

• Test   with large data sets

• Linear   scalability is hard

  • If   you really need it, consider HBase or Cassandra

  • SSD
Thank you. Questions?

         유응섭 rspeed@daumcorp.com
         최준건 gunn@daumcorp.com

More Related Content

What's hot

Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
Fadel Chafai
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 

What's hot (20)

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
 
오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Automated YCSB Benchmarking
Automated YCSB BenchmarkingAutomated YCSB Benchmarking
Automated YCSB Benchmarking
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 

Similar to Mongodb - Scaling write performance

MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012
Daum DNA
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Boris Yen
 

Similar to Mongodb - Scaling write performance (20)

MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 

More from Daum DNA

Daum devday 13 [bap]
Daum devday 13  [bap]Daum devday 13  [bap]
Daum devday 13 [bap]
Daum DNA
 
Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012
Daum DNA
 

More from Daum DNA (20)

Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
 
Daum OAuth 2.0
Daum OAuth 2.0Daum OAuth 2.0
Daum OAuth 2.0
 
Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)
 
Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)
 
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
 
Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)
 
Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)
 
FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표
 
웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민
 
반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석
 
Daum devday 13 [bap]
Daum devday 13  [bap]Daum devday 13  [bap]
Daum devday 13 [bap]
 
Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해
 
Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리
 
Daum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBraceDaum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBrace
 
Daum DevDay 13 - Ogangjang
Daum DevDay 13 - OgangjangDaum DevDay 13 - Ogangjang
Daum DevDay 13 - Ogangjang
 
Daum DevDay 13 - Mook
Daum DevDay 13 - MookDaum DevDay 13 - Mook
Daum DevDay 13 - Mook
 
Daum DevDay 13 - Moonlight
Daum DevDay 13 - MoonlightDaum DevDay 13 - Moonlight
Daum DevDay 13 - Moonlight
 
Daum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-OutDaum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-Out
 
Daum DevDay 13 - i-DF
Daum DevDay 13 - i-DFDaum DevDay 13 - i-DF
Daum DevDay 13 - i-DF
 
Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012
 

Mongodb - Scaling write performance

  • 2. MongoDB • Document data store • JSON-like document • Secondary indexes • Automatic failover • Automatic sharding
  • 3. First impression: Easy • Easy installation • Easy data model • No prior schema design • Native support for secondary indexes
  • 4. Second thought: Not so easy • No SQL • Coping with massive data growth • Setting up and operating sharded cluster • Scaling write performance
  • 5. Today we’ll talk about insert performance
  • 6. Insert throughput on a replica set
  • 7. Steady 5k inserts/sec * 1kB record. ObjectId as PK * WriteConcern: Journal sync on Majority
  • 8. Insert throughput on a replica set with a secondary index
  • 9.
  • 10. Culprit: B+Tree index • Good at sequential insert • e.g. ObjectId, Sequence #, Timestamp • Poor at random insert • Indexes on randomly-distributed data
  • 11. Sequential vs. Random insert 1 55 2 75 3 78 4 1 5 99 6 36 7 80 8 91 9 52 10 B+Tree 63 B+Tree 11 56 12 33 working set working set Sequential insert ➔ Small working set Random insert ➔ Large working set ➔ Fits in RAM ➔ Sequential I/O ➔ Cannot fit in RAM ➔ Random I/O (bandwidth-bound) (IOPS-bound)
  • 12. So, what do we do now?
  • 13. 1. Partitioning Aug 2012 Sep 2012 Oct 2012 B+Tree fits in memory does not fit in memory
  • 14. 1. Partitioning • MongoDB doesn’t support partitioning • Partitioning at application-level • e.g. Daily log collection • logs_20121012
  • 16. 2. Better H/W • More RAM • More IOPS • RAID striping • SSD • AWS Provisioned IOPS (1k ~ 10k)
  • 17.
  • 18. 3. More H/W: Sharding • Automatic partitioning across nodes SHARD1 SHARD2 SHARD3 mongos router
  • 20. 3 shards (3x3) on RAID 1+0
  • 21. There’s no free lunch • Manual partitioning • Incidental complexity • Better H/W • $ • Sharding • $$ • Operational complexity
  • 22. “Do you really need that index?”
  • 23. Scaling insert performance with sharding
  • 24. = Choosing the right shard key
  • 25. Shard key example: year_of_birth 64MB chunk ~ 1950 1971 ~ 1990 1951 ~ 1970 1991 ~ 2005 2006 ~ 2010 2010 ~ ∞ USERS USERS USERS SHARD1 SHARD2 SHARD3 mongos router
  • 26. 5k inserts/sec w/o sharding
  • 27. Sequential key • ObjectId as shard key • Sequence # • Timestamp
  • 29. Sequential key 1000 ~ 2000 • All inserts into one chunk 5000 ~ 7500 • Cannot scale insert performance 9000 ~ ∞ • Chunk migration overhead USERS SHARD-x 9001, 9002, 9003, 9004, ...
  • 31. Hash key • e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ... • Distributes inserts evenly across all chunks
  • 32.
  • 33. Hash key • Performance drops as collection grows • Why? Mandatory index on shard key • B+Tree problem again!
  • 34. Sequential key Hash key
  • 35. Sequential + hash key • Coarse-grained sequential prefix • e.g. Year-month + hash value • 201210_24c3a5b9 B+Tree 201208_* 201209_* 201210_*
  • 36. But what if... B+Tree large working set 201208_* 201209_* 201210_*
  • 37. Sequential + hash key • Can you predict data growth rate? • Balancer not clever enough • Only considers # of chunks • Migration slow during heavy-writes
  • 38. Sequential key Hash key Sequential + hash key
  • 39. Low-cardinality hash key • Small portion of hash value Shard key range: A ~ D • e.g. A~Z, 00~FF • Alleviates B+Tree problem Local • Sequential access on fixed # B+Tree of parts • Cardinality / # of shards A A A B B B C C C
  • 40.
  • 41. Low-cardinality hash key • Limits the # of possible chunks • e.g. 00 ~ FF ➔ 256 chunks • Chunk grows past 64MB • Balancing becomes difficult
  • 42. Sequential key Hash key Sequential + hash key Low-cardinality hash key
  • 43. Low-cardinality hash prefix + sequential part Shard key range: A000 ~ C999 • e.g. Short hash prefix + timestamp • Nice index access pattern Local B+Tree • Unlimited number of chunks A000 A123 B000 B123 C000 C123
  • 45. Lessons learned • Know the performance impact of secondary index • Choose the right shard key • Test with large data sets • Linear scalability is hard • If you really need it, consider HBase or Cassandra • SSD
  • 46. Thank you. Questions? 유응섭 rspeed@daumcorp.com 최준건 gunn@daumcorp.com