0
+    MyCassandra
+     NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB        : memcached, Google Bigtable, Amazon Dynamo, Ama...
+                  key/value vs. multi-dimensional map vs. document vs. graph                             vs.        ...
+                  key/value vs. multi-dimensional map vs. document vs. graph                             vs.        ...
+                       vs.              write/read                       Bigtable, Cassandra,         MySQL, Sherpa      ...
+    ~                     vs.                ~                  Write-Heavy                                              ...
+    ~              vs.                      ~                  Read-Heavy                         write-optimized        ...
+                                                 /              1.          2.                1.MyCassandra            ...
+    Apache Cassandra                                   dc1          dc2                           rack/dc             ...
+    Apache Cassandra    Consistent Hashing (                                        )         (A~Z                  )   ...
+    Google Bigtable                              : O(1)                                   sequential write              ...
+    Google Bigtable      Key           Memtable            value           SSTable                 value              ...
+ Cassandra                                                        (      / 99.9%)                                        ...
1.           +                        1.MyCassandra               read-optimized                                write-opti...
+ MyCassandra:      Cassandra         Cassandra                   /              InnoDB MyISAM Memory …                 ...
+ MyCassandra:      Cassandra         Cassandra                           /                                 Consistent H...
+ MyCassandra –MyCassandra
:          Cassandra                  :          . JDBC API / stored procedure              :           key-value store   ...
2.          +              2.MyCassandra Cluster               read and write-optimized11.4.14                            ...
•  W:                                    •  R:                 20                                    •  RW:             ...
21                                    MyCassandra                         (W) /                  (R) /               (RW)...
•  :                                                              •  R:          22                                       ...
•  :                                                                   •  R:           23                                 ...
+                                                                                       24                      /        ...
+                                                                        25    YCSB             4                  Workloa...
/         1.5                                    avg. write-latency              Cassandra                           0.36m...
27      20000                                                 Cassandra                          0.90   max. qps for 40 cl...
+                                                                       28                         1:               Cassa...
+               2:    Q.    A.                  LRU like cache      Swap                            read repair    Q.    A...
+                                30   Read-Heavy                        84.9%                 6.49                      ...
31                                                   index algorithm             FD-Tree: Tree Indexing on Flash Disks, ...
+                                                                               32                 :        1.         2....
+                                        34                  :       (       )                      5       6             ...
35                  : MyCassandra/MyCassandra Cluster                     Cassandra   1. MyCassandra            2. MyCassa...
host(1) 1             /1                                                                        node   ☓   ☓              ...
: HDD vs. SSD    25000               Cassandra             HDD                                              SSD           ...
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
Upcoming SlideShare
Loading in...5
×

読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)

2,564

Published on

SACSIS2011(http://sacsis.hpcc.jp/2011/)の発表資料です。
以前より結果が改善されました。

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,564
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
56
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)"

  1. 1. + MyCassandra
  2. 2. +   NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, Velocity, … 100   : ↔     join, transaction   /MyCassandra
  3. 3. +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs. – fsync     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralizedMyCassandra
  4. 4. +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs.     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralizedMyCassandra
  5. 5. + vs. write/read Bigtable, Cassandra, MySQL, Sherpa HBase Log-Structured B-Trees [R.Bayer ’70] Merge Tree [P. O’Neil ‘96]disk append (buffering) randomdisk n random I/O + merge 1 random I/O Bigtable MySQLMyCassandra
  6. 6. + ~ vs. ~ Write-Heavy Better read-optimized write-optimized6MyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  7. 7. + ~ vs. ~ Read-Heavy write-optimized Better read-optimizedMyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  8. 8. +   /   1.  2.  1.MyCassandra 2.MyCassandra Cluster read-optimized read and write-optimized write-optimizedMyCassandra
  9. 9. + Apache Cassandra       dc1 dc2 rack/dc region dc3
  10. 10. + Apache Cassandra Consistent Hashing ( )  (A~Z ) N := 3 ID A F Z •  request proxy secondary 1 •  primary node Q •  secondary node V N primary secondary 2 hash(key) = Q key values
  11. 11. + Google Bigtable : O(1)   sequential write I/O   Always writable write-lock memory sync <k1, obj (v1+v2)> async flush write path Memtable LSM-Tree [P. O’Neil ‘96] disk <k1, v1>, <k1, v2> sequential write Commit Log disk mem <k1,obj1> SSTable 1 <k1,obj2> SSTable 2 <k1,obj3> SSTable 3SSTable MyCassandra
  12. 12. + Google Bigtable   Key   Memtable value   SSTable value I/O disk memory <k1,obj> Memtable disk mem disk <k1,obj+obj1~3> Commit Log client merge <k1,obj1> SSTable 1 I/O <k1,obj2> SSTable 2 <k1,obj3> SSTable 3MyCassandra
  13. 13. + Cassandra ( / 99.9%) 1/9 Better read writeNumber of queries avg. 6.16 ms read Latency (ms) write write: 2.0 ms avg. 0.69 ms read: 86.9 ms 99.9 percentile Latency (ms)
  14. 14. 1. + 1.MyCassandra read-optimized write-optimized11.4.14 14
  15. 15. + MyCassandra: Cassandra   Cassandra /   InnoDB MyISAM Memory … Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis …MyCassandra
  16. 16. + MyCassandra: Cassandra   Cassandra /   Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis … InnoDB MyISAM Memory …MyCassandra
  17. 17. + MyCassandra –MyCassandra
  18. 18. : Cassandra : . JDBC API / stored procedure : key-value store •  ….MyCassandra
  19. 19. 2. + 2.MyCassandra Cluster read and write-optimized11.4.14 19
  20. 20. •  W: •  R: 20 •  RW:   write query sync async  W RQuorum Protocol: ( )+ ( )> ( )   write read W RW R - mycassandra -
  21. 21. 21 MyCassandra  (W) / (R) / (RW)  (join/dead) gossip protocol  1.  (key ) 2.  × N-1 1 3 Proxy N=3 gossip RW W RW R W W RW RW R secondary secondary primary
  22. 22. •  : •  R: 22 •  RW: =3, =2 Client 1) W:RW:R = 1:1:1 Proxy 2)  W, RW ACK ACK 3a) W RW R 3b) R ACK : max (W, RW)- mycassandra -
  23. 23. •  : •  R: 23 •  RW: =3, =2W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W RW R W 4)  : max (R, RW) . (Cassandra read repair )- mycassandra -
  24. 24. + 24 /     MyCassandra Cluster: 6×3 = 18 /6 (W:R:RW = 6 : 6 : 6)   Cassandra: 6 /6     : = 3, : = =2   : Bigtable (W), MySQL / InnoDB (R), Redis (RW) : YCSB (Yahoo! Cloud Serving Benchmark) [SOCC ’10]   1.  MyCassandra/Cassandra×6 YCSB Client×1 2.  1KB values(100[Bytes]×10[columns])+key 1,000 3.  4.  YCSB 5.  YCSB Stat- mycassandra -
  25. 25. + 25 YCSB 4 Workload Application Operation Record Example Ratio Selection Write-Only Log Read: 0% Zipfian( )Write Write: 100%Heavy Write-Heavy Session Store Read: 50% Write: 50% Read-Heavy Photo Read: 95%Read Write: 5%Heavy tagging Read-Only Cache Read: 100% Write: 0% ( ) Zipfian : , /- mycassandra -
  26. 26. / 1.5 avg. write-latency Cassandra 0.36ms MyCassandra Cluster 1 9.3% 26.2% 46.2%Better 0.5 MySQL + Redis write:100% write:50% write:5% write:0% 0 (ms) 12 84.9% avg. read-latency 10 8.59ms 8Better 6 82.6% 84.9% 4 35.7% 2 read:0% read:50% read:95% read:100% 0 (ms) - mycassandra - Write-Only Write-Heavy Read-Heavy Read-Only 26
  27. 27. 27 20000 Cassandra 0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 6.49 14000 12000 1.54 0.93 10000Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy •  6.49 •  - mycassandra -
  28. 28. + 28 1:   Cassandra   N   MyCassandra Cluster   :   : MyCassandra Cassandra Cluster write read write readN R,W W RW R - mycassandra -
  29. 29. + 2: Q. A. LRU like cache Swap read repair Q. A. 1) 2) Redis fsync ( )myCassandra
  30. 30. + 30 Read-Heavy   84.9%   6.49 +- mycassandra -
  31. 31. 31   index algorithm   FD-Tree: Tree Indexing on Flash Disks, VLDB ’10     B+tree + LSM-tree   SSD   Fractal-Tree / TokuDB (MySQL )     MySQL: RDBMS   Anvil, SOSP ’09: 1   Cloudy, VLDB ’10:   Dynamo, SOSP ‘07: vs.   MyCassandra ( ): vs. +- mycassandra -
  32. 32. + 32   : 1.  2.  (MySQL + memcached)   : MyCassandra Cluster     Web Table movie-id name thumb-name tag count 704122313 movieA EY37lHk5bgU sport, succer, FIFA, 169,374 704122314 movieB Zk3BSYMWjzQ music, jazz, … 472,803- mycassandra -
  33. 33. + 34 : ( ) 5 6 twitter: @MyCassandraJP- mycassandra -
  34. 34. 35 : MyCassandra/MyCassandra Cluster Cassandra 1. MyCassandra 2. MyCassandra Clusterdata model multi-dimensional map (Column Family)throughput write write or read write and readlatency low lower in case lowerpersistence yes yes or no yesconsistency weak (eventual, quorum)replication sync / asyncdata partition rownode decentralizedorganization throughput, latency- mycassandra -
  35. 35. host(1) 1 /1 node ☓ ☓ storage(2) 1 /k ID [Amazon Dynamo, SOSP ’07] ☓(3) 1Fault FT space FT spaceTorelance (FT) space1storage / 1node / 1 host (2) (3) (1) virtual node 1 node / host k storages / node k nodes / host 1 storage / node 36
  36. 36. : HDD vs. SSD 25000 Cassandra HDD SSD 20000 MyCassandra HDD 20000 Cluster SSD 15000 15000 10000 10000Better 5000 5000 0 0 (qps) (qps) IOZone HDD: Western SSD: Crucial benchmark digital seq. write 86,277 qps 96,401 qps seq. read 108,914 qps 216,099 qps random write 2,485 qps 29,045 qps random read 926 qps 21,751 qps 11.4.14 - mycassandra -
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×