SlideShare a Scribd company logo
感谢您参加本次Ar h u
         c S mmi全球架构师峰会!
               t
大会官方网站与资料下载地址:
www. c um m i . om
   ar hs    tc
分享产品与技术 分享腾讯与互联网



更多精彩内容

   QQ基础数据库架构演变之路

   QQ空间技术架构之峥嵘岁月

   架构之美:开放环境下的网络架构

   查看更多
Sharding




               Marty Weiner              Evrhet Milam
               Metropolis                Batcave




12年8月10⽇日星期五
·   Amazon EC2 + S3 + CloudFront
                  ·   2 NGinX, 16 Web Engines + 2 API Engines
                                      Page Views / Day

                  ·   5 Functionally Sharded MySQL DB + 9 read slaves
                  ·   4 Cassandra Nodes
                  ·   15 Membase Nodes (3 separate clusters)
                  ·   8 Memcache Nodes
                  ·   10 Redis Nodes
          Mar 2010·
           Mar 2010
                      3 Task Routers + 4 Task Processors
                                    Jan 2011
                                               Jan 2011
                                                           Jan 2012
                                                                      Jan 2012
                                                                         May 2012

                  ·   4 Elastic Search Nodes
                  ·   3 Mongo Clusters
                  ·   3 Engineers
  Scaling Pinterest

12年8月10⽇日星期五
Clustering



                           ·   Data distributed automatically
                           ·   Data can move
                           ·   Rebalances to distribute capacity
                           ·   Nodes communicate with each othe



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五
Clustering



                           ·   Data distributed manually
                           ·   Data does not move
                           ·   Split data to distribute load
                           ·   Nodes are not aware of each other



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五
Our Transition
                          1 DB + Foreign Keys + Joins
                          1 DB + Denormalized +
                          Cache + Read slaves +
                           1 DB
                      Cache
         Several functionally sharded DBs + Read slaves +
         Cache ID sharded DBs + Backup slaves +

                      Cache


  Scaling Pinterest

12年8月10⽇日星期五
How we sharded




  Scaling Pinterest

12年8月10⽇日星期五
Sharded Server Topology




                      db00001     db00513      db03072       db03584
                      db00002     db00514      db03073       db03585
                        .......     .......      .......       .......
                      db00512     db01024      db03583       db04096


                      Initially, 8 physical servers, each with 512
                      DBs
  Scaling Pinterest

12年8月10⽇日星期五
High Availability




                      db00001        db00513     db03072     db03584
                      db00002        db00514     db03073     db03585
                        .......        .......     .......     .......
                      db00512        db01024     db03583     db04096

                         Multi Master replication for hot swapping
                         All writes/reads go to only 1 master
  Scaling Pinterest

12年8月10⽇日星期五
Configuration
         · Initially 8 masters, each with 512 DBs
         · Now at 80 masters, some 32 DBs and some 64
              DBs
              · Over 10 TB
         · All tables are InnoDB



  Scaling Pinterest

12年8月10⽇日星期五
Configuration
         · Ubuntu 11.10 on AWS managed by Puppet
         · m1.xlarge using 4 stripe-raided ephemeral
              drives
         · cc2.8xlarge using 4 stripe-raided ephemeral
              drives
         · Soon: hi1.4xlarge (SSD)



  Scaling Pinterest

12年8月10⽇日星期五
ID Structure
                                    64 bits


                      Shard ID   Type         Local ID
         · A lookup data structure has physical server to
              shard ID range (cached by each app server
              process)
         · Shard ID denotes which shard
         · Type denotes object type (e.g., pins)
         · Local ID denotes position in table
  Scaling Pinterest

12年8月10⽇日星期五
Lookup Structure

                                    {“sharddb001a”:   (   1, 512),
                                     “sharddb002b”:   ( 513, 1024),
                                     “sharddb003a”:   (1025, 1536),
                                      ...
                                     “sharddb008b”:   (3585, 4096)}




                      sharddb003a                      DB01025            users


                                                        users         1    json
                                                  user_has_boards     2    json
                                                       boards         3    json



  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         · New users are randomly distributed across
              shards
         · Boards, pins, etc. try to be collocated with user
         · Local ID’s are assigned by auto-increment
              mysql_query(“USE pbdata%d” % shard_id)
              local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json)
              full_id = shard_id << 46 | USER_TYPE << 10 | local_id


         · Enough ID space for 65536 shards, but only
              first 4096 opened initially. Can expand
              horizontally.
  Scaling Pinterest

12年8月10⽇日星期五
· Object tables (e.g., pin, board, user, comment)
                  Objects and Mappings
               · Local ID    MySQL blob (JSON / Serialized
                      thrift)
             · Mapping tables (e.g., user has boards, pin has
                  likes)
                  · Full ID     Full ID (+ sequence ID /
                      timestamp)
                  · Naming schema is noun_verb_noun
             ·    Queries are PK or index lookups (no joins)
             ·    Data DOES NOT MOVE
             ·    All tables exist on all shards
  Scaling Pinterest

12年8月10⽇日星期五
             ·    No schema changes required (index = new
Loading a Page
    · Rendering user profile
         SELECT       body FROM users WHERE id=<local_user_id>
         SELECT       board_id FROM user_has_boards WHERE user_id=<user_id>
         SELECT       body FROM boards WHERE id IN (<board_ids>)
         SELECT       pin_id FROM board_has_pins WHERE board_id=<board_id>
         SELECT       body FROM pins WHERE id IN (pin_ids)
    · Most of these calls will be a cache hit
    · Omitting offset/limits and mapping sequence
         id sort

  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 4),                  db00004
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),
           ...
          “sharddb008b”:   ( 29, 32)}




                                         sharddb001b




                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 4),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32)}                                          db00003
                                                                               db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 2),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 2),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002

         {“sharddb001a”:   (   1, 2),                            sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),
           ...
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Scripting
         · Must get old data into your shiny new shard
         · 500M pins, 1.6B follower rows, etc
         · Build a scripting farm with Pyres + Redis
           · Spawn more workers and complete the task
                  faster
              · Can push millions of jobs onto the queue


  Scaling Pinterest

12年8月10⽇日星期五
Sharding Alternatives




  Scaling Pinterest

12年8月10⽇日星期五
· ID generated by Generation
                      ID shard + type + auto-
              increment
              ·   Pinterest, Facebook
              ·   Pro: ID generation scales with size of shard
              ·   Pro: No routing services which can fail
              ·   Pro: No holes in ID space
              ·   Pro: Type verification built-in
              ·   Con: No global ordering (requires sequence
                  ID)
              · Con: Can’t move parts of databases around
  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         · ID contains timestamp + shard
           · Instagram
           · Pro: Global ordering
           · Pro: No routing services which can fail
           · Con: No type verification
           · Con: Can’t move parts of databases around
           · Con: Not much space for shard ID’s


  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         ·    ID generated by service
              ·   Twitter, Etsy
              ·   Requires: ID routing service
              ·   Pro: Possible global ordering
              ·   Pro: Can move individual data elements
              ·   Pro: Can perform type-checking
              ·   Con: ID routing service can fail
              ·   Con: ID routing service can be unwieldy and
                  complicated
  Scaling Pinterest

12年8月10⽇日星期五
Problems




  Scaling Pinterest

12年8月10⽇日星期五
Hotspots
        · Our board_followedby_users tables were
             unbalanced
            · Rearchitected to have
                 board_followedby_user_shard map boards to
                 shards containing subsets of the
                 board_followedby_users data.
            · Pro: Data evenly distributed
            · Con: May have to hit multiple shards

  Scaling Pinterest

12年8月10⽇日星期五
No Joins
          · Home is user_follows_board JOIN board_has_pins
            · Each user has a home feed redis list
            · New pin ID placed in each following user’s list

                      for user_id in board_followedby_users:
                         redisconn.lpush(“home:%s” % user_id, new_pin_id)




  Scaling Pinterest

12年8月10⽇日星期五
No Automatic Failover
             · Some fraction of users see timeouts if a DB fails
             · Manually change settings and re-deploy
             · Needs to be automated, especially with 128+
                  DBs
             · Moving to automation with Zookeeper




  Scaling Pinterest

12年8月10⽇日星期五
Caching




  Scaling Pinterest

12年8月10⽇日星期五
Caching
               · Redis lists for persistently caching homefeeds
                 · lpush homefeed:82363 7233494
                 · lrange homefeed:82363 0 49
               · Memcache for caching lists
                 · MessagePack to serialize data
                 · Append to add new data
                 · Never delete from middle
               · Memcache for caching objects
  Scaling Pinterest

12年8月10⽇日星期五
Caching
                 · Caches separated by pools
                   · Example: pin objects, board_has_pins, etc
                   · Better stats
                   · Easier to scale and manage




  Scaling Pinterest

12年8月10⽇日星期五
Caching with decorators
                @mc_objects(USER_MC_CONN,
                      format = “user:%d”,
                     version=1,
                     serialization=simplejson
                     expire_time=0)
                def user_get_many(self, user_ids):
                       cursor = get_mysql_conn().cursor
                       cursor.execute(“SELECT * FROM users WHERE id in (%s)”,
                       user_ids)
                       return cursor.fetchall()




  Scaling Pinterest

12年8月10⽇日星期五
Caching with decorators

                @paged_list(BOARD_MC_CONN,
                     format = “b:p:%d”,
                    version=1,
                    expire_time=24*60*60)
                def get_board_pins(self, board_id, offset, limit):
                       cursor = get_mysql_conn().cursor
                       cursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET
                                           %d LIMIT %d”, board_id, offset, limit)
                       return cursor.fetchall()




  Scaling Pinterest

12年8月10⽇日星期五
We are Hiring!
                       jobs@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五
Questions?

                      marty@pinterest.com   evrhet@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五
杭州站·2012年 10月 25日 ~27日
大会官网:www.c n a g h uc m
        q o h n z o .o

More Related Content

What's hot

ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
Horacio Gonzalez
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2
won min jang
 
Querydsl
QuerydslQuerydsl
Querydsl
Younghan Kim
 
hibernate
hibernatehibernate
hibernate
Arjun Shanka
 
Jndi (1)
Jndi (1)Jndi (1)
Jndi (1)
laraibnaeem
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
MongoDB
 
Morphia, Spring Data & Co.
Morphia, Spring Data & Co.Morphia, Spring Data & Co.
Morphia, Spring Data & Co.
Tobias Trelle
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
Jeff Yemin
 
OrientDB the graph database
OrientDB the graph databaseOrientDB the graph database
OrientDB the graph database
Artem Orobets
 
Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425
guest4f2eea
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
Norberto Leite
 
Storekit diagram
Storekit diagramStorekit diagram
Storekit diagram
直樹 筒井
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
David Coallier
 
09 Data
09 Data09 Data
09 Data
Mahmoud
 
Scala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogueScala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogue
Konrad Malawski
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
Ynon Perek
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
Scott Hernandez
 
03 Custom Classes
03 Custom Classes03 Custom Classes
03 Custom Classes
Mahmoud
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
robosonia Mar
 

What's hot (20)

ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2
 
Querydsl
QuerydslQuerydsl
Querydsl
 
hibernate
hibernatehibernate
hibernate
 
Jndi (1)
Jndi (1)Jndi (1)
Jndi (1)
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
Morphia, Spring Data & Co.
Morphia, Spring Data & Co.Morphia, Spring Data & Co.
Morphia, Spring Data & Co.
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
 
OrientDB the graph database
OrientDB the graph databaseOrientDB the graph database
OrientDB the graph database
 
Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Storekit diagram
Storekit diagramStorekit diagram
Storekit diagram
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
09 Data
09 Data09 Data
09 Data
 
Scala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogueScala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogue
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
03 Custom Classes
03 Custom Classes03 Custom Classes
03 Custom Classes
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
 

Viewers also liked

B2B Video 101
B2B Video 101B2B Video 101
B2B Video 101
Leslie Drate
 
Electronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RMElectronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RM
Plus Technologies
 
Dockerの導入
Dockerの導入Dockerの導入
Dockerの導入
regret raym
 
Printer Load Balancing Systems and Benefits
Printer Load Balancing Systems and BenefitsPrinter Load Balancing Systems and Benefits
Printer Load Balancing Systems and Benefits
Plus Technologies
 
Drawing and printmaking
Drawing and printmaking Drawing and printmaking
Drawing and printmaking
Carmel Torres
 
Sarah and trevor hall
Sarah and trevor hallSarah and trevor hall
Sarah and trevor hall
587554
 
Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1
Magaly Caba
 
Florida-John
Florida-JohnFlorida-John
Florida-John
klei8103
 
Gypsy Girl
Gypsy Girl Gypsy Girl
Gypsy Girl
Makala (D)
 
Tennessee-Alize and Tesnim
Tennessee-Alize and TesnimTennessee-Alize and Tesnim
Tennessee-Alize and Tesnim
klei8103
 
Bbfc ratings
Bbfc ratingsBbfc ratings
Bbfc ratings
CallumBrown6032
 
Researchss
ResearchssResearchss
Researchss
r-jones123
 
Kentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and KassidyKentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and Kassidy
klei8103
 
"Сокровищницы Прибыльных Идей"
 "Сокровищницы Прибыльных Идей" "Сокровищницы Прибыльных Идей"
"Сокровищницы Прибыльных Идей"Marina Kiselevich
 
The Industrial Revolution 20
The  Industrial  Revolution 20The  Industrial  Revolution 20
The Industrial Revolution 20
Grace Pangan
 
Mississippi-Evan and Teddy
Mississippi-Evan and Teddy Mississippi-Evan and Teddy
Mississippi-Evan and Teddy
klei8103
 
Rahul network resume copy
Rahul  network resume   copyRahul  network resume   copy
Rahul network resume copy
Rahul patil
 
7. susret 17.11.2011. konkretno lice boga oca
7. susret 17.11.2011.   konkretno lice boga oca7. susret 17.11.2011.   konkretno lice boga oca
7. susret 17.11.2011. konkretno lice boga ocaMeri-Lucijeta
 
Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1
Sachin Singh
 

Viewers also liked (20)

B2B Video 101
B2B Video 101B2B Video 101
B2B Video 101
 
Electronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RMElectronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RM
 
Dockerの導入
Dockerの導入Dockerの導入
Dockerの導入
 
Printer Load Balancing Systems and Benefits
Printer Load Balancing Systems and BenefitsPrinter Load Balancing Systems and Benefits
Printer Load Balancing Systems and Benefits
 
Drawing and printmaking
Drawing and printmaking Drawing and printmaking
Drawing and printmaking
 
Sarah and trevor hall
Sarah and trevor hallSarah and trevor hall
Sarah and trevor hall
 
Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1
 
Florida-John
Florida-JohnFlorida-John
Florida-John
 
Pinball1
Pinball1Pinball1
Pinball1
 
Gypsy Girl
Gypsy Girl Gypsy Girl
Gypsy Girl
 
Tennessee-Alize and Tesnim
Tennessee-Alize and TesnimTennessee-Alize and Tesnim
Tennessee-Alize and Tesnim
 
Bbfc ratings
Bbfc ratingsBbfc ratings
Bbfc ratings
 
Researchss
ResearchssResearchss
Researchss
 
Kentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and KassidyKentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and Kassidy
 
"Сокровищницы Прибыльных Идей"
 "Сокровищницы Прибыльных Идей" "Сокровищницы Прибыльных Идей"
"Сокровищницы Прибыльных Идей"
 
The Industrial Revolution 20
The  Industrial  Revolution 20The  Industrial  Revolution 20
The Industrial Revolution 20
 
Mississippi-Evan and Teddy
Mississippi-Evan and Teddy Mississippi-Evan and Teddy
Mississippi-Evan and Teddy
 
Rahul network resume copy
Rahul  network resume   copyRahul  network resume   copy
Rahul network resume copy
 
7. susret 17.11.2011. konkretno lice boga oca
7. susret 17.11.2011.   konkretno lice boga oca7. susret 17.11.2011.   konkretno lice boga oca
7. susret 17.11.2011. konkretno lice boga oca
 
Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1
 

Similar to Pinterest的数据库分片架构

Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
Mohit Jain
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterest
drewz lin
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
Eric Evans
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
sunnygleason
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
MongoDB
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
srisatish ambati
 
NoSQL
NoSQLNoSQL
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Everyday - mongodb
Everyday - mongodbEveryday - mongodb
Everyday - mongodb
elliando dias
 
No Sql
No SqlNo Sql
Databases for Storage Engineers
Databases for Storage EngineersDatabases for Storage Engineers
Databases for Storage Engineers
Thomas Kejser
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
Tony Rogerson
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
srisatish ambati
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
Daum DNA
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
DataWorks Summit
 

Similar to Pinterest的数据库分片架构 (20)

Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterest
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
NoSQL
NoSQLNoSQL
NoSQL
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Everyday - mongodb
Everyday - mongodbEveryday - mongodb
Everyday - mongodb
 
No Sql
No SqlNo Sql
No Sql
 
Databases for Storage Engineers
Databases for Storage EngineersDatabases for Storage Engineers
Databases for Storage Engineers
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

Pinterest的数据库分片架构

  • 1. 感谢您参加本次Ar h u c S mmi全球架构师峰会! t 大会官方网站与资料下载地址: www. c um m i . om ar hs tc
  • 2. 分享产品与技术 分享腾讯与互联网 更多精彩内容 QQ基础数据库架构演变之路 QQ空间技术架构之峥嵘岁月 架构之美:开放环境下的网络架构 查看更多
  • 3. Sharding Marty Weiner Evrhet Milam Metropolis Batcave 12年8月10⽇日星期五
  • 4. · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines Page Views / Day · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes Mar 2010· Mar 2010 3 Task Routers + 4 Task Processors Jan 2011 Jan 2011 Jan 2012 Jan 2012 May 2012 · 4 Elastic Search Nodes · 3 Mongo Clusters · 3 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 5. Clustering · Data distributed automatically · Data can move · Rebalances to distribute capacity · Nodes communicate with each othe Sharding Scaling Pinterest 12年8月10⽇日星期五
  • 6. Clustering · Data distributed manually · Data does not move · Split data to distribute load · Nodes are not aware of each other Sharding Scaling Pinterest 12年8月10⽇日星期五
  • 7. Our Transition 1 DB + Foreign Keys + Joins 1 DB + Denormalized + Cache + Read slaves + 1 DB Cache Several functionally sharded DBs + Read slaves + Cache ID sharded DBs + Backup slaves + Cache Scaling Pinterest 12年8月10⽇日星期五
  • 8. How we sharded Scaling Pinterest 12年8月10⽇日星期五
  • 9. Sharded Server Topology db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Initially, 8 physical servers, each with 512 DBs Scaling Pinterest 12年8月10⽇日星期五
  • 10. High Availability db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Multi Master replication for hot swapping All writes/reads go to only 1 master Scaling Pinterest 12年8月10⽇日星期五
  • 11. Configuration · Initially 8 masters, each with 512 DBs · Now at 80 masters, some 32 DBs and some 64 DBs · Over 10 TB · All tables are InnoDB Scaling Pinterest 12年8月10⽇日星期五
  • 12. Configuration · Ubuntu 11.10 on AWS managed by Puppet · m1.xlarge using 4 stripe-raided ephemeral drives · cc2.8xlarge using 4 stripe-raided ephemeral drives · Soon: hi1.4xlarge (SSD) Scaling Pinterest 12年8月10⽇日星期五
  • 13. ID Structure 64 bits Shard ID Type Local ID · A lookup data structure has physical server to shard ID range (cached by each app server process) · Shard ID denotes which shard · Type denotes object type (e.g., pins) · Local ID denotes position in table Scaling Pinterest 12年8月10⽇日星期五
  • 14. Lookup Structure {“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)} sharddb003a DB01025 users users 1 json user_has_boards 2 json boards 3 json Scaling Pinterest 12年8月10⽇日星期五
  • 15. ID Generation · New users are randomly distributed across shards · Boards, pins, etc. try to be collocated with user · Local ID’s are assigned by auto-increment mysql_query(“USE pbdata%d” % shard_id) local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json) full_id = shard_id << 46 | USER_TYPE << 10 | local_id · Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally. Scaling Pinterest 12年8月10⽇日星期五
  • 16. · Object tables (e.g., pin, board, user, comment) Objects and Mappings · Local ID MySQL blob (JSON / Serialized thrift) · Mapping tables (e.g., user has boards, pin has likes) · Full ID Full ID (+ sequence ID / timestamp) · Naming schema is noun_verb_noun · Queries are PK or index lookups (no joins) · Data DOES NOT MOVE · All tables exist on all shards Scaling Pinterest 12年8月10⽇日星期五 · No schema changes required (index = new
  • 17. Loading a Page · Rendering user profile SELECT body FROM users WHERE id=<local_user_id> SELECT board_id FROM user_has_boards WHERE user_id=<user_id> SELECT body FROM boards WHERE id IN (<board_ids>) SELECT pin_id FROM board_has_pins WHERE board_id=<board_id> SELECT body FROM pins WHERE id IN (pin_ids) · Most of these calls will be a cache hit · Omitting offset/limits and mapping sequence id sort Scaling Pinterest 12年8月10⽇日星期五
  • 18. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 4), db00004 “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32)} sharddb001b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 19. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 4), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32)} db00003 db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 20. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 2), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 21. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 2), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 22. Increased load on DB? sharddb001a db00001 db00002 {“sharddb001a”: ( 1, 2), sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 23. Scripting · Must get old data into your shiny new shard · 500M pins, 1.6B follower rows, etc · Build a scripting farm with Pyres + Redis · Spawn more workers and complete the task faster · Can push millions of jobs onto the queue Scaling Pinterest 12年8月10⽇日星期五
  • 24. Sharding Alternatives Scaling Pinterest 12年8月10⽇日星期五
  • 25. · ID generated by Generation ID shard + type + auto- increment · Pinterest, Facebook · Pro: ID generation scales with size of shard · Pro: No routing services which can fail · Pro: No holes in ID space · Pro: Type verification built-in · Con: No global ordering (requires sequence ID) · Con: Can’t move parts of databases around Scaling Pinterest 12年8月10⽇日星期五
  • 26. ID Generation · ID contains timestamp + shard · Instagram · Pro: Global ordering · Pro: No routing services which can fail · Con: No type verification · Con: Can’t move parts of databases around · Con: Not much space for shard ID’s Scaling Pinterest 12年8月10⽇日星期五
  • 27. ID Generation · ID generated by service · Twitter, Etsy · Requires: ID routing service · Pro: Possible global ordering · Pro: Can move individual data elements · Pro: Can perform type-checking · Con: ID routing service can fail · Con: ID routing service can be unwieldy and complicated Scaling Pinterest 12年8月10⽇日星期五
  • 28. Problems Scaling Pinterest 12年8月10⽇日星期五
  • 29. Hotspots · Our board_followedby_users tables were unbalanced · Rearchitected to have board_followedby_user_shard map boards to shards containing subsets of the board_followedby_users data. · Pro: Data evenly distributed · Con: May have to hit multiple shards Scaling Pinterest 12年8月10⽇日星期五
  • 30. No Joins · Home is user_follows_board JOIN board_has_pins · Each user has a home feed redis list · New pin ID placed in each following user’s list for user_id in board_followedby_users: redisconn.lpush(“home:%s” % user_id, new_pin_id) Scaling Pinterest 12年8月10⽇日星期五
  • 31. No Automatic Failover · Some fraction of users see timeouts if a DB fails · Manually change settings and re-deploy · Needs to be automated, especially with 128+ DBs · Moving to automation with Zookeeper Scaling Pinterest 12年8月10⽇日星期五
  • 32. Caching Scaling Pinterest 12年8月10⽇日星期五
  • 33. Caching · Redis lists for persistently caching homefeeds · lpush homefeed:82363 7233494 · lrange homefeed:82363 0 49 · Memcache for caching lists · MessagePack to serialize data · Append to add new data · Never delete from middle · Memcache for caching objects Scaling Pinterest 12年8月10⽇日星期五
  • 34. Caching · Caches separated by pools · Example: pin objects, board_has_pins, etc · Better stats · Easier to scale and manage Scaling Pinterest 12年8月10⽇日星期五
  • 35. Caching with decorators @mc_objects(USER_MC_CONN, format = “user:%d”, version=1, serialization=simplejson expire_time=0) def user_get_many(self, user_ids): cursor = get_mysql_conn().cursor cursor.execute(“SELECT * FROM users WHERE id in (%s)”, user_ids) return cursor.fetchall() Scaling Pinterest 12年8月10⽇日星期五
  • 36. Caching with decorators @paged_list(BOARD_MC_CONN, format = “b:p:%d”, version=1, expire_time=24*60*60) def get_board_pins(self, board_id, offset, limit): cursor = get_mysql_conn().cursor cursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET %d LIMIT %d”, board_id, offset, limit) return cursor.fetchall() Scaling Pinterest 12年8月10⽇日星期五
  • 37. We are Hiring! jobs@pinterest.com Scaling Pinterest 12年8月10⽇日星期五
  • 38. Questions? marty@pinterest.com evrhet@pinterest.com Scaling Pinterest 12年8月10⽇日星期五
  • 39. 杭州站·2012年 10月 25日 ~27日 大会官网:www.c n a g h uc m q o h n z o .o