SlideShare a Scribd company logo
1 of 123
Download to read offline
The Etsy Shard Architecture
    Starts With S and Ends With Hard


        jgoulah@etsy.com / @johngoulah
1.5B page views / mo.
525MM sales in 2011
40MM unique visitors/mo.
800K shops / 150 countries
25K+ queries/sec avg
3TB InnoDB buffer pool
15TB+ data stored
99.99% queries under 1ms
50+ MySQL servers

      Server Spec
      HP DL 380 G7
       96GB RAM
16 spindles / 1TB RAID 10
        24 Core
Ross Snyder
Scaling Etsy - What Went Wrong, What Went Right
           http://bit.ly/rpcxtP


             Matt Graham
 Migrating From PG to MySQL Without Downtime
          http://bit.ly/rQpqZG
Architecture
Redundancy
Master - Master
Master - Master

  R/W      R/W
Master - Master

  R/W      R/W

 Side A   Side B
Scalability
shard 1   shard 2         shard N

                    ...
shard 1    shard 2            shard N

                        ...



          shard N + 1
shard 1        shard 2                shard N

                               ...
Migrate     Migrate           Migrate


                shard N + 1
Bird’s-Eye View
tickets             index




shard 1             shard 2           shard N
tickets             index
 Unique IDs

shard 1             shard 2           shard N
tickets                 index
                              Shard Lookup

shard 1             shard 2               shard N
tickets             index




shard 1             shard 2           shard N
          Store/Retrieve Data
Basics
users_groups


user_id   group_id
  1          A
  1          B
  2          A
  2          C

  3          A

  3          B

  3          C
users_groups


user_id   group_id
  1          A
  1          B
  2          A
  2          C

  3          A

  3          B

  3          C
users_groups


user_id   group_id
  1          A
  1          B
  2          A                      user_id   group_id
  2          C                        3          A
  3          A                        3          B
  3          B                        3          C

  3          C
users_groups
          shard 1
user_id         group_id
  1                 A
  1                 B
                                                    shard 2
  2                 A                     user_id         group_id
  2                 C                       3                 A

                                            3                 B

                                            3                 C
Index Servers
Shards NOT Determined by
          key hashing
        range partitions
    partitioning by function
Look-Up Data
index




shard 1   shard 2   shard N
index    select shard_id from user_index
                  where user_id = X




shard 1   shard 2               shard N
index    select shard_id from user_index
                  where user_id = X

                    returns 1

shard 1   shard 2               shard N
index       select join_date from users
                  where user_id = X




shard 1   shard 2                shard N
index       select join_date from users
                  where user_id = X


                returns 2012-02-05
shard 1   shard 2                shard N
Ticket Servers
Globally Unique ID
CREATE TABLE `tickets` (
 `id` bigint(20) unsigned NOT NULL auto_increment,
 `stub` char(1) NOT NULL default '',
 PRIMARY KEY (`id`),
 UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM
Ticket Generation
REPLACE INTO tickets (stub) VALUES ('a');
SELECT LAST_INSERT_ID();
Ticket Generation
REPLACE INTO tickets (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

SELECT * FROM tickets;
      id            stub

    4589294          a
tickets A
            auto-increment-increment = 2
              auto-increment-offset = 1

tickets B
            auto-increment-increment = 2
              auto-increment-offset = 2
tickets A
            auto-increment-increment = 2
              auto-increment-offset = 1

tickets B
            auto-increment-increment = 2
              auto-increment-offset = 2

  NOT master-master
Shards
Object Hashing
A      B




user_id : 500
A               B




user_id : 500 % (# active replicants)
A                                     B
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',




   user_id : 500 % (# active replicants)
A                                     B
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',




   user_id : 500 % (# active replicants)
A            B




user_id : 500 % (2)
A                 B




user_id : 500 % (2) == 0
A                 B




                           select ...
user_id : 500 % (2) == 0   insert ...
                           update ...
A              B




user_id : 500 % (2) == 0
       user_id : 501 % (2) == 1
500          A          B     501
select ...                    select ...
insert ...                    insert ...
update ...                    update ...



user_id : 500 % (2) == 0
       user_id : 501 % (2) == 1
Failure
A              B




user_id : 500 % (2) == 0
       user_id : 501 % (2) == 1
A              B




user_id : 500 % (2) == 0
       user_id : 501 % (2) == 1
A              B




user_id : 500 % (2) == 0
       user_id : 501 % (2) == 1
A                                     B
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',




   user_id : 500 % (2) == 0
          user_id : 501 % (2) == 1
A                                     B
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw',
'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',




   user_id : 500 % (2) == 0
          user_id : 501 % (2) == 1
A              B




user_id : 500 % (1) == 0
       user_id : 501 % (1) == 0
ORM
connection handling
    shard lookup
 replicant selection
CRUD
cache handling
 data validation
data abstraction
Shard Selection
Non-Writable Shards
$config["non_writable_shards"] = array(1, 2, 3, 4);


  public static function getKnownWritableShards(){
    return array_values(
      array_diff(
        self::getKnownShards(),
        self::getNonwritableShards()
    ));
  }
Initial Selection
$shards = EtsyORM::getKnownWritableShards();

$user_shard = $shards[rand(0, count($shards) - 1)];




              user_id      shard_id

                500
Initial Selection
$shards = EtsyORM::getKnownWritableShards();

$user_shard = $shards[rand(0, count($shards) - 1)];




              user_id      shard_id

                500           2
Later....
            select shard_id from user_index
  index             where user_id = X




  shard 1   shard 2               shard N
Variants
shard 1                  shard 2



      user_id    group_id      user_id    group_id

        1             A          3             A

        1             B          3             B

        2             A          4             A

        2             C          5             C




SELECT user_id FROM users_groups WHERE group_id = ‘A’
shard 1                     shard 2



      user_id    group_id       user_id      group_id

        1             A             3             A

        1             B             3             B

        2             A             4             A

        2             C             5             C




SELECT user_id FROM users_groups WHERE group_id = ‘A’
                          Broken!
shard 1                       shard 2



      user_id    group_id           user_id    group_id

        1
        1
                      A
                      B
                            JOIN?     3
                                      3
                                                    A
                                                    B

        2             A               4             A

        2             C               5             C




SELECT user_id FROM users_groups WHERE group_id = ‘A’
                          Broken!
shard 1                       shard 2



      user_id    group_id           user_id    group_id

        1
        1
                      A
                      B
                            JOIN?     3
                                      3
                                                    A
                                                    B

        2             A               4             A

        2             C               5             C




SELECT user_id FROM users_groups WHERE group_id = ‘A’
                          Broken!
users_groups         groups_users
user_id   group_id   group_id   user_id

  1          A          A         1

  1          B          A         3

  2          A          A         2

  2          C          B         3

  3          A          B         1

  3          B          C         2

  3          C          C         3
users_groups_index    groups_users_index
             user_id   shard_id   group_id   shard_id
index          1          1          A          1
               2          1          B          2
               3          2          C          2
               4          3          D          3




         separate indexes for
        different slices of data
users_groups_index        groups_users_index
           user_id   shard_id         group_id   shard_id
index         1         1                 A         1
              2         1                 B         2
              3         2                 C         2
              4         3                 D         3




                         user_id   group_id
        shard 3             4         A
                            4         B
                            4         C
                            4         D
Schema Changes
shard 1   shard 2   shard N
shard 1   shard 2   shard N
Schemanator
shard 1   shard 2   shard N
shard 1             shard 2             shard N




SET SQL_LOG_BIN = 0; ALTER TABLE user ....
shard migration
Why?
Prevent disk from filling
Prevent disk from filling
High traffic objects (shops, users)
Prevent disk from filling
High traffic objects (shops, users)
Shard rebalancing
When?
Balance
Added Shards
per object migration
         <object type> <object id> <shard>

# migrate_object User 5307827 2
percentage migration
<object type> <percent> <old shard> <new shard>


 # migrate_pct User 25 3 6
index
           user_id         shard_id   migration_lock   old_shard_id

             1                1             0               0




 shard 1         shard 2                          shard N
index
           user_id           shard_id   migration_lock   old_shard_id

             1                  1             1               0

           •Lock



 shard 1           shard 2                          shard N
index
           user_id          shard_id   migration_lock   old_shard_id

              1                1             1               0

           •Lock
           •Migrate



 shard 1          shard 2                          shard N
index
           user_id         shard_id   migration_lock   old_shard_id

             1                1             1               0

           •Lock
           •Migrate
           •Checksum


 shard 1         shard 2                          shard N
index
           user_id         shard_id   migration_lock   old_shard_id

             1                1             1               0

           •Lock
           •Migrate
           •Checksum


 shard 1         shard 2                          shard N
index
           user_id         shard_id   migration_lock   old_shard_id

             1                2             0               1

           •Lock
           •Migrate
           •Checksum
           •Unlock

 shard 1         shard 2                          shard N
index
           user_id          shard_id   migration_lock   old_shard_id

              1                2             0               1

           •Lock
           •Migrate
           •Checksum
           •Unlock
           •Delete (from old shard)
 shard 1          shard 2                          shard N
Usage Patterns
Arbitrary Key Hash
tag1     tag2     co_occurrence _count




“red”   “cloth”           666
tag1        tag2      shard_id
 “red”       “cloth”       1
“vintage”    “doll”        3
“antique”   “radio”        5
  “gift”     “vinyl”       2            hash_bucket   shard_id
 “toy”       “car”         1                1            2
 “wool”      “felt”        2
 “floral”
“wood”
            “wreath”
             “table”
                           5
                           8
                                   OR       2
                                            3
                                                         3
                                                         1

 “box”      “wood”         4                4            2
 “doll”     “happy”        5                5            3
 “smile”    “clown”        3
 “radio”    “vintage”     10
 “blue”     “luggage”      8
“shoes”     “green”       12
    ...        ...         ...
1. provide some key
1. provide some key
2. compute corresponding hash bucket
1. provide some key
2. compute corresponding hash bucket
3. lookup hash bucket on index to find shard
1,000,000 'buckets' each with a row in
   arbitrary_key_index which points to a shard
             hash_bucket     shard_id
                 1              2
                 2              3
                 3              1
                 4              2
                 5              3




hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
1,000,000 'buckets' each with a row in
   arbitrary_key_index which points to a shard
             hash_bucket     shard_id
                 1              2
                 2              3
                 3              1
                 4              2
                 5              3




hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
1,000,000 'buckets' each with a row in
   arbitrary_key_index which points to a shard
             hash_bucket     shard_id
                 1              2
                 2              3
                 3              1
                 4              2
                 5              3




hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
1,000,000 'buckets' each with a row in
   arbitrary_key_index which points to a shard
             hash_bucket     shard_id
                 1              2
                 2              3
                 3              1
                 4              2
                 5              3




hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
Partitions
PARTITION BY RANGE (reference_timestamp)(
 PARTITION P5 VALUES LESS THAN (1317441600),
 PARTITION P6 VALUES LESS THAN (1320120000),
 PARTITION P7 VALUES LESS THAN (1322715600),
 PARTITION P8 VALUES LESS THAN (1325394000));
Deleting a large partition:
few hours, tons of disk IO
Deleting a large partition:
      few hours, tons of disk IO
Dropping a 2G partition with 2M rows :
Deleting a large partition:
      few hours, tons of disk IO
Dropping a 2G partition with 2M rows :
                < 1s
# file= "shop_stats_syndication_hourly#P#P1345867200.ibd"
# ln $file $file.remove"
# file= "shop_stats_syndication_hourly#P#P1345867200.ibd"
# ln $file $file.remove"


# stat "shop_stats_syndication_hourly#P#P1345867200.ibd"
 File: `shop_stats_syndication_hourly#P#P1345867200.ibd'
 Size: 65536 Blocks: 136 IO Block: 4096 regular file
Device: 6804h/26628d Inode: 41321163 Links: 2
Access: (0660/-rw-rw----) Uid: ( 104/ mysql) Gid: ( 106/ mysql)
tickets             index




shard 1             shard 2           shard N
Thank you
etsy.com/jobs

More Related Content

What's hot

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_planMaria Colgan
 
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912Yooseok Choi
 
TiDB as an HTAP Database
TiDB as an HTAP DatabaseTiDB as an HTAP Database
TiDB as an HTAP DatabasePingCAP
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
Distributed Transactions are dead, long live distributed transaction!
Distributed Transactions are dead, long live distributed transaction!Distributed Transactions are dead, long live distributed transaction!
Distributed Transactions are dead, long live distributed transaction!J On The Beach
 
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps Eason Kuo
 

What's hot (20)

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
PySaprk
PySaprkPySaprk
PySaprk
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark Applications
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_plan
 
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
 
TiDB as an HTAP Database
TiDB as an HTAP DatabaseTiDB as an HTAP Database
TiDB as an HTAP Database
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Distributed Transactions are dead, long live distributed transaction!
Distributed Transactions are dead, long live distributed transaction!Distributed Transactions are dead, long live distributed transaction!
Distributed Transactions are dead, long live distributed transaction!
 
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps
Implementing Domain-Driven Design (Study Group) Chapter 3 - Context Maps
 

Viewers also liked

Java Concurrency Idioms
Java Concurrency IdiomsJava Concurrency Idioms
Java Concurrency IdiomsAlex Miller
 
Polymer & the web components revolution 6:25:14
Polymer & the web components revolution 6:25:14Polymer & the web components revolution 6:25:14
Polymer & the web components revolution 6:25:14mattsmcnulty
 
Downtown & Infill Tax Increment Districts: Strategies for Success
Downtown & Infill Tax Increment Districts: Strategies for SuccessDowntown & Infill Tax Increment Districts: Strategies for Success
Downtown & Infill Tax Increment Districts: Strategies for SuccessVierbicher
 
Appraisal and Performance Management in Schools - A practical approach
Appraisal and Performance Management in Schools - A practical approachAppraisal and Performance Management in Schools - A practical approach
Appraisal and Performance Management in Schools - A practical approachMark S. Steed
 
The Economics of Green Building
The Economics of Green BuildingThe Economics of Green Building
The Economics of Green Buildingnilskok
 
Increment letter format
Increment letter formatIncrement letter format
Increment letter formatDeepti Joshi
 
Downtown & Infill Tax Increment Districts
Downtown & Infill Tax Increment DistrictsDowntown & Infill Tax Increment Districts
Downtown & Infill Tax Increment DistrictsVierbicher
 
Increment Strategy ppt 2012-13 : Play this in slide show mode
Increment Strategy ppt 2012-13 : Play this in slide show modeIncrement Strategy ppt 2012-13 : Play this in slide show mode
Increment Strategy ppt 2012-13 : Play this in slide show modeVipul Saxena
 
Lecture 8 increment_and_decrement_operators
Lecture 8 increment_and_decrement_operatorsLecture 8 increment_and_decrement_operators
Lecture 8 increment_and_decrement_operatorseShikshak
 
Scrum - Agile Methodology
Scrum - Agile MethodologyScrum - Agile Methodology
Scrum - Agile MethodologyNiel Deckx
 
Iocl compensation
Iocl compensationIocl compensation
Iocl compensationmukti91
 
Normal forest – growing stock and increment
Normal forest – growing stock and incrementNormal forest – growing stock and increment
Normal forest – growing stock and incrementiqbalforestry
 
An overview of techniques for detecting software variability concepts in sour...
An overview of techniques for detecting software variability concepts in sour...An overview of techniques for detecting software variability concepts in sour...
An overview of techniques for detecting software variability concepts in sour...Angela Lozano
 
C Prog. - Operators and Expressions
C Prog. - Operators and ExpressionsC Prog. - Operators and Expressions
C Prog. - Operators and Expressionsvinay arora
 

Viewers also liked (20)

Java Concurrency Idioms
Java Concurrency IdiomsJava Concurrency Idioms
Java Concurrency Idioms
 
Polymer & the web components revolution 6:25:14
Polymer & the web components revolution 6:25:14Polymer & the web components revolution 6:25:14
Polymer & the web components revolution 6:25:14
 
Conflict Resolution In Kai
Conflict Resolution In KaiConflict Resolution In Kai
Conflict Resolution In Kai
 
Agile Development
Agile DevelopmentAgile Development
Agile Development
 
Downtown & Infill Tax Increment Districts: Strategies for Success
Downtown & Infill Tax Increment Districts: Strategies for SuccessDowntown & Infill Tax Increment Districts: Strategies for Success
Downtown & Infill Tax Increment Districts: Strategies for Success
 
Appraisal and Performance Management in Schools - A practical approach
Appraisal and Performance Management in Schools - A practical approachAppraisal and Performance Management in Schools - A practical approach
Appraisal and Performance Management in Schools - A practical approach
 
The Economics of Green Building
The Economics of Green BuildingThe Economics of Green Building
The Economics of Green Building
 
Increment letter format
Increment letter formatIncrement letter format
Increment letter format
 
Downtown & Infill Tax Increment Districts
Downtown & Infill Tax Increment DistrictsDowntown & Infill Tax Increment Districts
Downtown & Infill Tax Increment Districts
 
Increment Strategy ppt 2012-13 : Play this in slide show mode
Increment Strategy ppt 2012-13 : Play this in slide show modeIncrement Strategy ppt 2012-13 : Play this in slide show mode
Increment Strategy ppt 2012-13 : Play this in slide show mode
 
Lecture 8 increment_and_decrement_operators
Lecture 8 increment_and_decrement_operatorsLecture 8 increment_and_decrement_operators
Lecture 8 increment_and_decrement_operators
 
String
StringString
String
 
Scrum - Agile Methodology
Scrum - Agile MethodologyScrum - Agile Methodology
Scrum - Agile Methodology
 
Iocl compensation
Iocl compensationIocl compensation
Iocl compensation
 
Incremental
IncrementalIncremental
Incremental
 
Intro To Scrum.V3
Intro To Scrum.V3Intro To Scrum.V3
Intro To Scrum.V3
 
Normal forest – growing stock and increment
Normal forest – growing stock and incrementNormal forest – growing stock and increment
Normal forest – growing stock and increment
 
Introduction to Redux
Introduction to ReduxIntroduction to Redux
Introduction to Redux
 
An overview of techniques for detecting software variability concepts in sour...
An overview of techniques for detecting software variability concepts in sour...An overview of techniques for detecting software variability concepts in sour...
An overview of techniques for detecting software variability concepts in sour...
 
C Prog. - Operators and Expressions
C Prog. - Operators and ExpressionsC Prog. - Operators and Expressions
C Prog. - Operators and Expressions
 

Similar to The Etsy Shard Architecture: Starts With S and Ends With Hard

From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)Night Sailer
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
 
Outrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkOutrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB
 
Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Ulf Wendel
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践ghostsun
 
Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007Rabble .
 
Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redisDvir Volk
 
ROS2勉強会@別府 第7章Pythonクライアントライブラリrclpy
ROS2勉強会@別府 第7章PythonクライアントライブラリrclpyROS2勉強会@別府 第7章Pythonクライアントライブラリrclpy
ROS2勉強会@別府 第7章PythonクライアントライブラリrclpyAtsuki Yokota
 
Extending Moose
Extending MooseExtending Moose
Extending Moosesartak
 
Tame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperTame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperGiordano Scalzo
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j Max De Marzi
 
Mongodb index 讀書心得
Mongodb index 讀書心得Mongodb index 讀書心得
Mongodb index 讀書心得cc liu
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
gumiStudy#2 実践 memcached
gumiStudy#2 実践 memcachedgumiStudy#2 実践 memcached
gumiStudy#2 実践 memcachedgumilab
 

Similar to The Etsy Shard Architecture: Starts With S and Ends With Hard (20)

MySQL under the siege
MySQL under the siegeMySQL under the siege
MySQL under the siege
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
 
Mac authentication amigopod radius
Mac authentication amigopod radiusMac authentication amigopod radius
Mac authentication amigopod radius
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 
Outrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar FrameworkOutrageous Performance: RageDB's Experience with the Seastar Framework
Outrageous Performance: RageDB's Experience with the Seastar Framework
 
Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践
 
Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007
 
Undrop for InnoDB
Undrop for InnoDBUndrop for InnoDB
Undrop for InnoDB
 
Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redis
 
ROS2勉強会@別府 第7章Pythonクライアントライブラリrclpy
ROS2勉強会@別府 第7章PythonクライアントライブラリrclpyROS2勉強会@別府 第7章Pythonクライアントライブラリrclpy
ROS2勉強会@別府 第7章Pythonクライアントライブラリrclpy
 
Extending Moose
Extending MooseExtending Moose
Extending Moose
 
Tame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperTame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapper
 
Web security
Web securityWeb security
Web security
 
Fraud Detection and Neo4j
Fraud Detection and Neo4j Fraud Detection and Neo4j
Fraud Detection and Neo4j
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Mongodb index 讀書心得
Mongodb index 讀書心得Mongodb index 讀書心得
Mongodb index 讀書心得
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
gumiStudy#2 実践 memcached
gumiStudy#2 実践 memcachedgumiStudy#2 実践 memcached
gumiStudy#2 実践 memcached
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

The Etsy Shard Architecture: Starts With S and Ends With Hard

  • 1. The Etsy Shard Architecture Starts With S and Ends With Hard jgoulah@etsy.com / @johngoulah
  • 2.
  • 3. 1.5B page views / mo. 525MM sales in 2011 40MM unique visitors/mo. 800K shops / 150 countries
  • 4.
  • 5.
  • 6. 25K+ queries/sec avg 3TB InnoDB buffer pool 15TB+ data stored 99.99% queries under 1ms
  • 7. 50+ MySQL servers Server Spec HP DL 380 G7 96GB RAM 16 spindles / 1TB RAID 10 24 Core
  • 8.
  • 9. Ross Snyder Scaling Etsy - What Went Wrong, What Went Right http://bit.ly/rpcxtP Matt Graham Migrating From PG to MySQL Without Downtime http://bit.ly/rQpqZG
  • 13. Master - Master R/W R/W
  • 14. Master - Master R/W R/W Side A Side B
  • 16. shard 1 shard 2 shard N ...
  • 17. shard 1 shard 2 shard N ... shard N + 1
  • 18. shard 1 shard 2 shard N ... Migrate Migrate Migrate shard N + 1
  • 20. tickets index shard 1 shard 2 shard N
  • 21. tickets index Unique IDs shard 1 shard 2 shard N
  • 22. tickets index Shard Lookup shard 1 shard 2 shard N
  • 23. tickets index shard 1 shard 2 shard N Store/Retrieve Data
  • 25. users_groups user_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 26. users_groups user_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 27. users_groups user_id group_id 1 A 1 B 2 A user_id group_id 2 C 3 A 3 A 3 B 3 B 3 C 3 C
  • 28. users_groups shard 1 user_id group_id 1 A 1 B shard 2 2 A user_id group_id 2 C 3 A 3 B 3 C
  • 30. Shards NOT Determined by key hashing range partitions partitioning by function
  • 32. index shard 1 shard 2 shard N
  • 33. index select shard_id from user_index where user_id = X shard 1 shard 2 shard N
  • 34. index select shard_id from user_index where user_id = X returns 1 shard 1 shard 2 shard N
  • 35. index select join_date from users where user_id = X shard 1 shard 2 shard N
  • 36. index select join_date from users where user_id = X returns 2012-02-05 shard 1 shard 2 shard N
  • 39. CREATE TABLE `tickets` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`) ) ENGINE=MyISAM
  • 40. Ticket Generation REPLACE INTO tickets (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
  • 41. Ticket Generation REPLACE INTO tickets (stub) VALUES ('a'); SELECT LAST_INSERT_ID(); SELECT * FROM tickets; id stub 4589294 a
  • 42. tickets A auto-increment-increment = 2 auto-increment-offset = 1 tickets B auto-increment-increment = 2 auto-increment-offset = 2
  • 43. tickets A auto-increment-increment = 2 auto-increment-offset = 1 tickets B auto-increment-increment = 2 auto-increment-offset = 2 NOT master-master
  • 46. A B user_id : 500
  • 47. A B user_id : 500 % (# active replicants)
  • 48. A B 'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', user_id : 500 % (# active replicants)
  • 49. A B 'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', user_id : 500 % (# active replicants)
  • 50. A B user_id : 500 % (2)
  • 51. A B user_id : 500 % (2) == 0
  • 52. A B select ... user_id : 500 % (2) == 0 insert ... update ...
  • 53. A B user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 54. 500 A B 501 select ... select ... insert ... insert ... update ... update ... user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 56. A B user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 57. A B user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 58. A B user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 59. A B 'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 60. A B 'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 61. A B user_id : 500 % (1) == 0 user_id : 501 % (1) == 0
  • 62. ORM
  • 63. connection handling shard lookup replicant selection
  • 64. CRUD cache handling data validation data abstraction
  • 66. Non-Writable Shards $config["non_writable_shards"] = array(1, 2, 3, 4); public static function getKnownWritableShards(){ return array_values( array_diff( self::getKnownShards(), self::getNonwritableShards() )); }
  • 67. Initial Selection $shards = EtsyORM::getKnownWritableShards(); $user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500
  • 68. Initial Selection $shards = EtsyORM::getKnownWritableShards(); $user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500 2
  • 69. Later.... select shard_id from user_index index where user_id = X shard 1 shard 2 shard N
  • 71. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 C SELECT user_id FROM users_groups WHERE group_id = ‘A’
  • 72. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 C SELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 73. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 C SELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 74. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 C SELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 75. users_groups groups_users user_id group_id group_id user_id 1 A A 1 1 B A 3 2 A A 2 2 C B 3 3 A B 1 3 B C 2 3 C C 3
  • 76. users_groups_index groups_users_index user_id shard_id group_id shard_id index 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 separate indexes for different slices of data
  • 77. users_groups_index groups_users_index user_id shard_id group_id shard_id index 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 user_id group_id shard 3 4 A 4 B 4 C 4 D
  • 79. shard 1 shard 2 shard N
  • 80. shard 1 shard 2 shard N
  • 82.
  • 83.
  • 84. shard 1 shard 2 shard N
  • 85. shard 1 shard 2 shard N SET SQL_LOG_BIN = 0; ALTER TABLE user ....
  • 87. Why?
  • 88. Prevent disk from filling
  • 89. Prevent disk from filling High traffic objects (shops, users)
  • 90. Prevent disk from filling High traffic objects (shops, users) Shard rebalancing
  • 91. When?
  • 92.
  • 95. per object migration <object type> <object id> <shard> # migrate_object User 5307827 2
  • 96. percentage migration <object type> <percent> <old shard> <new shard> # migrate_pct User 25 3 6
  • 97. index user_id shard_id migration_lock old_shard_id 1 1 0 0 shard 1 shard 2 shard N
  • 98. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock shard 1 shard 2 shard N
  • 99. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate shard 1 shard 2 shard N
  • 100. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 101. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 102. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock shard 1 shard 2 shard N
  • 103. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock •Delete (from old shard) shard 1 shard 2 shard N
  • 106. tag1 tag2 co_occurrence _count “red” “cloth” 666
  • 107. tag1 tag2 shard_id “red” “cloth” 1 “vintage” “doll” 3 “antique” “radio” 5 “gift” “vinyl” 2 hash_bucket shard_id “toy” “car” 1 1 2 “wool” “felt” 2 “floral” “wood” “wreath” “table” 5 8 OR 2 3 3 1 “box” “wood” 4 4 2 “doll” “happy” 5 5 3 “smile” “clown” 3 “radio” “vintage” 10 “blue” “luggage” 8 “shoes” “green” 12 ... ... ...
  • 109. 1. provide some key 2. compute corresponding hash bucket
  • 110. 1. provide some key 2. compute corresponding hash bucket 3. lookup hash bucket on index to find shard
  • 111. 1,000,000 'buckets' each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3 hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 112. 1,000,000 'buckets' each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3 hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 113. 1,000,000 'buckets' each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3 hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 114. 1,000,000 'buckets' each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3 hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 116. PARTITION BY RANGE (reference_timestamp)( PARTITION P5 VALUES LESS THAN (1317441600), PARTITION P6 VALUES LESS THAN (1320120000), PARTITION P7 VALUES LESS THAN (1322715600), PARTITION P8 VALUES LESS THAN (1325394000));
  • 117. Deleting a large partition: few hours, tons of disk IO
  • 118. Deleting a large partition: few hours, tons of disk IO Dropping a 2G partition with 2M rows :
  • 119. Deleting a large partition: few hours, tons of disk IO Dropping a 2G partition with 2M rows : < 1s
  • 121. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd" # ln $file $file.remove" # stat "shop_stats_syndication_hourly#P#P1345867200.ibd" File: `shop_stats_syndication_hourly#P#P1345867200.ibd' Size: 65536 Blocks: 136 IO Block: 4096 regular file Device: 6804h/26628d Inode: 41321163 Links: 2 Access: (0660/-rw-rw----) Uid: ( 104/ mysql) Gid: ( 106/ mysql)
  • 122. tickets index shard 1 shard 2 shard N