SlideShare a Scribd company logo
1 of 46
Download to read offline
MongoDB:
Scaling write performance
                            Junegunn Choi
MongoDB

• Document    data store

 • JSON-like   document

• Secondary   indexes

• Automatic   failover

• Automatic   sharding
First impression:
                     Easy

• Easy   installation

• Easy   data model

• No   prior schema design

• Native   support for secondary indexes
Second thought:
                Not so easy

• No   SQL

• Coping    with massive data growth

• Setting   up and operating sharded cluster

• Scaling   write performance
Today we’ll talk about
 insert performance
Insert throughput
  on a replica set
Steady 5k inserts/sec




              * 1kB record. ObjectId as PK
              * WriteConcern: Journal sync on Majority
Insert throughput
    on a replica set
with a secondary index
Culprit:
               B+Tree index

• Good    at sequential insert

 • e.g. ObjectId, Sequence       #, Timestamp

• Poor   at random insert

 • Indexes   on randomly-distributed data
Sequential vs. Random insert
  1                                           55
  2                                           75
  3                                           78
  4                                            1
  5                                           99
  6                                           36
  7                                           80
  8                                           91
  9                                           52
 10          B+Tree                           63      B+Tree
 11                                           56
 12                                           33
                            working set               working set




 Sequential insert ➔ Small working set    Random insert ➔ Large working set
   ➔ Fits in RAM ➔ Sequential I/O         ➔ Cannot fit in RAM ➔ Random I/O
         (bandwidth-bound)                          (IOPS-bound)
So, what do we do now?
1. Partitioning

                        Aug 2012   Sep 2012     Oct 2012




      B+Tree

                                              fits in memory

does not fit in memory
1. Partitioning

• MongoDB       doesn’t support partitioning

• Partitioning   at application-level

• e.g. Daily   log collection

  • logs_20121012
Switch collection every hour
2. Better H/W

• More   RAM

• More   IOPS

 • RAID   striping

 • SSD

 • AWS    Provisioned IOPS (1k ~ 10k)
3. More H/W: Sharding
• Automatic   partitioning across nodes

   SHARD1             SHARD2              SHARD3




                      mongos router
3 shards (3x3)
3 shards (3x3)
on RAID 1+0
There’s no free lunch
•   Manual partitioning

    •   Incidental complexity

•   Better H/W

    •   $

•   Sharding

    •   $$

    •   Operational complexity
“Do you really need that index?”
Scaling insert performance
       with sharding
=
Choosing the right shard key
Shard key example:
          year_of_birth
                                 64MB chunk

    ~ 1950    1971 ~ 1990     1951 ~ 1970

1991 ~ 2005   2006 ~ 2010

                2010 ~ ∞

  USERS         USERS           USERS
 SHARD1        SHARD2          SHARD3



              mongos router
5k inserts/sec w/o sharding
Sequential key


• ObjectId   as shard key

• Sequence   #

• Timestamp
Worse throughput with 3x H/W.
Sequential key

                                               1000 ~ 2000
• All   inserts into one chunk
                                               5000 ~ 7500
• Cannot    scale insert performance
                                                  9000 ~ ∞
• Chunk    migration overhead                     USERS
                                                SHARD-x

                                 9001, 9002, 9003, 9004, ...
Sequential key
Hash key


• e.g. SHA1(_id)   = 9f2feb0f1ef425b292f2f94 ...

• Distributes   inserts evenly across all chunks
Hash key


• Performance   drops as collection grows

 • Why?   Mandatory index on shard key

     • B+Tree   problem again!
Sequential key
  Hash key
Sequential + hash key
• Coarse-grained    sequential prefix

• e.g. Year-month   + hash value

  • 201210_24c3a5b9

                                          B+Tree



                               201208_*   201209_*   201210_*
But what if...


                         B+Tree




                             large working set

   201208_*   201209_*            201210_*
Sequential + hash key

• Can   you predict data growth rate?

• Balancer   not clever enough

  • Only   considers # of chunks

  • Migration   slow during heavy-writes
Sequential key
     Hash key
Sequential + hash key
Low-cardinality hash key
• Small   portion of hash value                   Shard key range: A ~ D


  • e.g. A~Z, 00~FF

• Alleviates   B+Tree problem
                                               Local
  • Sequential   access on fixed #             B+Tree
   of parts

    • Cardinality   / # of shards
                                    A A   A   B     B   B   C C C
Low-cardinality hash key

• Limits   the # of possible chunks

  • e.g. 00   ~ FF ➔ 256 chunks

  • Chunk     grows past 64MB

    • Balancing   becomes difficult
Sequential key
       Hash key
 Sequential + hash key
Low-cardinality hash key
Low-cardinality hash prefix
      + sequential part
                                         Shard key range: A000 ~ C999


• e.g. Short   hash prefix + timestamp

• Nice   index access pattern              Local
                                          B+Tree
• Unlimited    number of chunks


                                A000 A123 B000 B123 C000 C123
Finally, 2x throughput
Lessons learned
• Know     the performance impact of secondary index

• Choose     the right shard key

• Test   with large data sets

• Linear   scalability is hard

  • If   you really need it, consider HBase or Cassandra

  • SSD
Thank you. Questions?

         유응섭 rspeed@daumcorp.com
         최준건 gunn@daumcorp.com

More Related Content

What's hot

Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용흥배 최
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.NAVER D2
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)Hadoop / Spark Conference Japan
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
MySQL_MariaDB로의_전환_기술요소-202212.pptx
MySQL_MariaDB로의_전환_기술요소-202212.pptxMySQL_MariaDB로의_전환_기술요소-202212.pptx
MySQL_MariaDB로의_전환_기술요소-202212.pptxNeoClova
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교Woo Yeong Choi
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxNeoClova
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxNeoClova
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 

What's hot (20)

Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
MySQL_MariaDB로의_전환_기술요소-202212.pptx
MySQL_MariaDB로의_전환_기술요소-202212.pptxMySQL_MariaDB로의_전환_기술요소-202212.pptx
MySQL_MariaDB로의_전환_기술요소-202212.pptx
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptx
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 

Similar to Mongodb - Scaling write performance

MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012Daum DNA
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQLsunnygleason
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBMongoDB
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleKostas Mavridis
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Ontico
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyDataStax Academy
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBSHanson Dong
 

Similar to Mongodb - Scaling write performance (20)

MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 

More from Daum DNA

Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)Daum DNA
 
Daum OAuth 2.0
Daum OAuth 2.0Daum OAuth 2.0
Daum OAuth 2.0Daum DNA
 
Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum DNA
 
Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)Daum DNA
 
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)Daum DNA
 
Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)Daum DNA
 
Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)Daum DNA
 
FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표Daum DNA
 
웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민Daum DNA
 
반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석Daum DNA
 
Daum devday 13 [bap]
Daum devday 13  [bap]Daum devday 13  [bap]
Daum devday 13 [bap]Daum DNA
 
Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해Daum DNA
 
Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리Daum DNA
 
Daum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBraceDaum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBraceDaum DNA
 
Daum DevDay 13 - Ogangjang
Daum DevDay 13 - OgangjangDaum DevDay 13 - Ogangjang
Daum DevDay 13 - OgangjangDaum DNA
 
Daum DevDay 13 - Mook
Daum DevDay 13 - MookDaum DevDay 13 - Mook
Daum DevDay 13 - MookDaum DNA
 
Daum DevDay 13 - Moonlight
Daum DevDay 13 - MoonlightDaum DevDay 13 - Moonlight
Daum DevDay 13 - MoonlightDaum DNA
 
Daum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-OutDaum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-OutDaum DNA
 
Daum DevDay 13 - i-DF
Daum DevDay 13 - i-DFDaum DevDay 13 - i-DF
Daum DevDay 13 - i-DFDaum DNA
 
Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012Daum DNA
 

More from Daum DNA (20)

Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
Daum의 개방형 기술 전략 및 자바 기술 로드맵(2007)
 
Daum OAuth 2.0
Daum OAuth 2.0Daum OAuth 2.0
Daum OAuth 2.0
 
Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)
 
Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)Daum 검색/지도 API (이정주)
Daum 검색/지도 API (이정주)
 
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
 
Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)Daum 티스토리 API (천정환)
Daum 티스토리 API (천정환)
 
Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)Daum 로그인 API (함태윤)
Daum 로그인 API (함태윤)
 
FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표FT직군의 현재와 미래 - 홍윤표
FT직군의 현재와 미래 - 홍윤표
 
웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민웹접근성과 장애인 차별 금지법 - 장성민
웹접근성과 장애인 차별 금지법 - 장성민
 
반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석반응형 웹 디자인은 만능인가? - 신현석
반응형 웹 디자인은 만능인가? - 신현석
 
Daum devday 13 [bap]
Daum devday 13  [bap]Daum devday 13  [bap]
Daum devday 13 [bap]
 
Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해Daum DevDay 13-힐링이 필요해
Daum DevDay 13-힐링이 필요해
 
Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리Daum DevDay 13 - 마음의 소리
Daum DevDay 13 - 마음의 소리
 
Daum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBraceDaum DevDay 13 - OpenBrace
Daum DevDay 13 - OpenBrace
 
Daum DevDay 13 - Ogangjang
Daum DevDay 13 - OgangjangDaum DevDay 13 - Ogangjang
Daum DevDay 13 - Ogangjang
 
Daum DevDay 13 - Mook
Daum DevDay 13 - MookDaum DevDay 13 - Mook
Daum DevDay 13 - Mook
 
Daum DevDay 13 - Moonlight
Daum DevDay 13 - MoonlightDaum DevDay 13 - Moonlight
Daum DevDay 13 - Moonlight
 
Daum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-OutDaum DevDay 13 - In-N-Out
Daum DevDay 13 - In-N-Out
 
Daum DevDay 13 - i-DF
Daum DevDay 13 - i-DFDaum DevDay 13 - i-DF
Daum DevDay 13 - i-DF
 
Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012Daum 키노트 | Devon 2012
Daum 키노트 | Devon 2012
 

Mongodb - Scaling write performance

  • 2. MongoDB • Document data store • JSON-like document • Secondary indexes • Automatic failover • Automatic sharding
  • 3. First impression: Easy • Easy installation • Easy data model • No prior schema design • Native support for secondary indexes
  • 4. Second thought: Not so easy • No SQL • Coping with massive data growth • Setting up and operating sharded cluster • Scaling write performance
  • 5. Today we’ll talk about insert performance
  • 6. Insert throughput on a replica set
  • 7. Steady 5k inserts/sec * 1kB record. ObjectId as PK * WriteConcern: Journal sync on Majority
  • 8. Insert throughput on a replica set with a secondary index
  • 9.
  • 10. Culprit: B+Tree index • Good at sequential insert • e.g. ObjectId, Sequence #, Timestamp • Poor at random insert • Indexes on randomly-distributed data
  • 11. Sequential vs. Random insert 1 55 2 75 3 78 4 1 5 99 6 36 7 80 8 91 9 52 10 B+Tree 63 B+Tree 11 56 12 33 working set working set Sequential insert ➔ Small working set Random insert ➔ Large working set ➔ Fits in RAM ➔ Sequential I/O ➔ Cannot fit in RAM ➔ Random I/O (bandwidth-bound) (IOPS-bound)
  • 12. So, what do we do now?
  • 13. 1. Partitioning Aug 2012 Sep 2012 Oct 2012 B+Tree fits in memory does not fit in memory
  • 14. 1. Partitioning • MongoDB doesn’t support partitioning • Partitioning at application-level • e.g. Daily log collection • logs_20121012
  • 16. 2. Better H/W • More RAM • More IOPS • RAID striping • SSD • AWS Provisioned IOPS (1k ~ 10k)
  • 17.
  • 18. 3. More H/W: Sharding • Automatic partitioning across nodes SHARD1 SHARD2 SHARD3 mongos router
  • 20. 3 shards (3x3) on RAID 1+0
  • 21. There’s no free lunch • Manual partitioning • Incidental complexity • Better H/W • $ • Sharding • $$ • Operational complexity
  • 22. “Do you really need that index?”
  • 23. Scaling insert performance with sharding
  • 24. = Choosing the right shard key
  • 25. Shard key example: year_of_birth 64MB chunk ~ 1950 1971 ~ 1990 1951 ~ 1970 1991 ~ 2005 2006 ~ 2010 2010 ~ ∞ USERS USERS USERS SHARD1 SHARD2 SHARD3 mongos router
  • 26. 5k inserts/sec w/o sharding
  • 27. Sequential key • ObjectId as shard key • Sequence # • Timestamp
  • 29. Sequential key 1000 ~ 2000 • All inserts into one chunk 5000 ~ 7500 • Cannot scale insert performance 9000 ~ ∞ • Chunk migration overhead USERS SHARD-x 9001, 9002, 9003, 9004, ...
  • 31. Hash key • e.g. SHA1(_id) = 9f2feb0f1ef425b292f2f94 ... • Distributes inserts evenly across all chunks
  • 32.
  • 33. Hash key • Performance drops as collection grows • Why? Mandatory index on shard key • B+Tree problem again!
  • 34. Sequential key Hash key
  • 35. Sequential + hash key • Coarse-grained sequential prefix • e.g. Year-month + hash value • 201210_24c3a5b9 B+Tree 201208_* 201209_* 201210_*
  • 36. But what if... B+Tree large working set 201208_* 201209_* 201210_*
  • 37. Sequential + hash key • Can you predict data growth rate? • Balancer not clever enough • Only considers # of chunks • Migration slow during heavy-writes
  • 38. Sequential key Hash key Sequential + hash key
  • 39. Low-cardinality hash key • Small portion of hash value Shard key range: A ~ D • e.g. A~Z, 00~FF • Alleviates B+Tree problem Local • Sequential access on fixed # B+Tree of parts • Cardinality / # of shards A A A B B B C C C
  • 40.
  • 41. Low-cardinality hash key • Limits the # of possible chunks • e.g. 00 ~ FF ➔ 256 chunks • Chunk grows past 64MB • Balancing becomes difficult
  • 42. Sequential key Hash key Sequential + hash key Low-cardinality hash key
  • 43. Low-cardinality hash prefix + sequential part Shard key range: A000 ~ C999 • e.g. Short hash prefix + timestamp • Nice index access pattern Local B+Tree • Unlimited number of chunks A000 A123 B000 B123 C000 C123
  • 45. Lessons learned • Know the performance impact of secondary index • Choose the right shard key • Test with large data sets • Linear scalability is hard • If you really need it, consider HBase or Cassandra • SSD
  • 46. Thank you. Questions? 유응섭 rspeed@daumcorp.com 최준건 gunn@daumcorp.com