読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
Upcoming SlideShare
Loading in...5
×
 

読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)

on

  • 2,877 views

SACSIS2011(http://sacsis.hpcc.jp/2011/)の発表資料です。

SACSIS2011(http://sacsis.hpcc.jp/2011/)の発表資料です。
以前より結果が改善されました。

Statistics

Views

Total Views
2,877
Views on SlideShare
2,870
Embed Views
7

Actions

Likes
6
Downloads
55
Comments
0

2 Embeds 7

http://www.linkedin.com 6
http://s.deeeki.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1) 読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1) Presentation Transcript

  • + MyCassandra
  • +   NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, Velocity, … 100   : ↔     join, transaction   /MyCassandra
  • +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs. – fsync     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralizedMyCassandra
  • +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs.     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralizedMyCassandra
  • + vs. write/read Bigtable, Cassandra, MySQL, Sherpa HBase Log-Structured B-Trees [R.Bayer ’70] Merge Tree [P. O’Neil ‘96]disk append (buffering) randomdisk n random I/O + merge 1 random I/O Bigtable MySQLMyCassandra
  • + ~ vs. ~ Write-Heavy Better read-optimized write-optimized6MyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  • + ~ vs. ~ Read-Heavy write-optimized Better read-optimizedMyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  • +   /   1.  2.  1.MyCassandra 2.MyCassandra Cluster read-optimized read and write-optimized write-optimizedMyCassandra
  • + Apache Cassandra       dc1 dc2 rack/dc region dc3
  • + Apache Cassandra Consistent Hashing ( )  (A~Z ) N := 3 ID A F Z •  request proxy secondary 1 •  primary node Q •  secondary node V N primary secondary 2 hash(key) = Q key values
  • + Google Bigtable : O(1)   sequential write I/O   Always writable write-lock memory sync <k1, obj (v1+v2)> async flush write path Memtable LSM-Tree [P. O’Neil ‘96] disk <k1, v1>, <k1, v2> sequential write Commit Log disk mem <k1,obj1> SSTable 1 <k1,obj2> SSTable 2 <k1,obj3> SSTable 3SSTable MyCassandra
  • + Google Bigtable   Key   Memtable value   SSTable value I/O disk memory <k1,obj> Memtable disk mem disk <k1,obj+obj1~3> Commit Log client merge <k1,obj1> SSTable 1 I/O <k1,obj2> SSTable 2 <k1,obj3> SSTable 3MyCassandra
  • + Cassandra ( / 99.9%) 1/9 Better read writeNumber of queries avg. 6.16 ms read Latency (ms) write write: 2.0 ms avg. 0.69 ms read: 86.9 ms 99.9 percentile Latency (ms)
  • 1. + 1.MyCassandra read-optimized write-optimized11.4.14 14
  • + MyCassandra: Cassandra   Cassandra /   InnoDB MyISAM Memory … Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis …MyCassandra
  • + MyCassandra: Cassandra   Cassandra /   Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis … InnoDB MyISAM Memory …MyCassandra
  • + MyCassandra –MyCassandra
  • : Cassandra : . JDBC API / stored procedure : key-value store •  ….MyCassandra
  • 2. + 2.MyCassandra Cluster read and write-optimized11.4.14 19
  • •  W: •  R: 20 •  RW:   write query sync async  W RQuorum Protocol: ( )+ ( )> ( )   write read W RW R - mycassandra -
  • 21 MyCassandra  (W) / (R) / (RW)  (join/dead) gossip protocol  1.  (key ) 2.  × N-1 1 3 Proxy N=3 gossip RW W RW R W W RW RW R secondary secondary primary
  • •  : •  R: 22 •  RW: =3, =2 Client 1) W:RW:R = 1:1:1 Proxy 2)  W, RW ACK ACK 3a) W RW R 3b) R ACK : max (W, RW)- mycassandra -
  • •  : •  R: 23 •  RW: =3, =2W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W RW R W 4)  : max (R, RW) . (Cassandra read repair )- mycassandra -
  • + 24 /     MyCassandra Cluster: 6×3 = 18 /6 (W:R:RW = 6 : 6 : 6)   Cassandra: 6 /6     : = 3, : = =2   : Bigtable (W), MySQL / InnoDB (R), Redis (RW) : YCSB (Yahoo! Cloud Serving Benchmark) [SOCC ’10]   1.  MyCassandra/Cassandra×6 YCSB Client×1 2.  1KB values(100[Bytes]×10[columns])+key 1,000 3.  4.  YCSB 5.  YCSB Stat- mycassandra -
  • + 25 YCSB 4 Workload Application Operation Record Example Ratio Selection Write-Only Log Read: 0% Zipfian( )Write Write: 100%Heavy Write-Heavy Session Store Read: 50% Write: 50% Read-Heavy Photo Read: 95%Read Write: 5%Heavy tagging Read-Only Cache Read: 100% Write: 0% ( ) Zipfian : , /- mycassandra -
  • / 1.5 avg. write-latency Cassandra 0.36ms MyCassandra Cluster 1 9.3% 26.2% 46.2%Better 0.5 MySQL + Redis write:100% write:50% write:5% write:0% 0 (ms) 12 84.9% avg. read-latency 10 8.59ms 8Better 6 82.6% 84.9% 4 35.7% 2 read:0% read:50% read:95% read:100% 0 (ms) - mycassandra - Write-Only Write-Heavy Read-Heavy Read-Only 26
  • 27 20000 Cassandra 0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 6.49 14000 12000 1.54 0.93 10000Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy •  6.49 •  - mycassandra -
  • + 28 1:   Cassandra   N   MyCassandra Cluster   :   : MyCassandra Cassandra Cluster write read write readN R,W W RW R - mycassandra -
  • + 2: Q. A. LRU like cache Swap read repair Q. A. 1) 2) Redis fsync ( )myCassandra
  • + 30 Read-Heavy   84.9%   6.49 +- mycassandra -
  • 31   index algorithm   FD-Tree: Tree Indexing on Flash Disks, VLDB ’10     B+tree + LSM-tree   SSD   Fractal-Tree / TokuDB (MySQL )     MySQL: RDBMS   Anvil, SOSP ’09: 1   Cloudy, VLDB ’10:   Dynamo, SOSP ‘07: vs.   MyCassandra ( ): vs. +- mycassandra -
  • + 32   : 1.  2.  (MySQL + memcached)   : MyCassandra Cluster     Web Table movie-id name thumb-name tag count 704122313 movieA EY37lHk5bgU sport, succer, FIFA, 169,374 704122314 movieB Zk3BSYMWjzQ music, jazz, … 472,803- mycassandra -
  • + 34 : ( ) 5 6 twitter: @MyCassandraJP- mycassandra -
  • 35 : MyCassandra/MyCassandra Cluster Cassandra 1. MyCassandra 2. MyCassandra Clusterdata model multi-dimensional map (Column Family)throughput write write or read write and readlatency low lower in case lowerpersistence yes yes or no yesconsistency weak (eventual, quorum)replication sync / asyncdata partition rownode decentralizedorganization throughput, latency- mycassandra -
  • host(1) 1 /1 node ☓ ☓ storage(2) 1 /k ID [Amazon Dynamo, SOSP ’07] ☓(3) 1Fault FT space FT spaceTorelance (FT) space1storage / 1node / 1 host (2) (3) (1) virtual node 1 node / host k storages / node k nodes / host 1 storage / node 36
  • : HDD vs. SSD 25000 Cassandra HDD SSD 20000 MyCassandra HDD 20000 Cluster SSD 15000 15000 10000 10000Better 5000 5000 0 0 (qps) (qps) IOZone HDD: Western SSD: Crucial benchmark digital seq. write 86,277 qps 96,401 qps seq. read 108,914 qps 216,099 qps random write 2,485 qps 29,045 qps random read 926 qps 21,751 qps 11.4.14 - mycassandra -