読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
Upcoming SlideShare
Loading in...5
×
 

読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)

on

  • 3,200 views

2011/04/14 4月OS/ARC研究会の発表スライドです。

2011/04/14 4月OS/ARC研究会の発表スライドです。

Statistics

Views

Total Views
3,200
Views on SlideShare
3,197
Embed Views
3

Actions

Likes
4
Downloads
30
Comments
0

1 Embed 3

http://twitter.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24) 読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24) Presentation Transcript

  • 11.4.14 - mycassandra - 1
  • NoSQL, Key-Value Store (KVS), Document-oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Cabinet/Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, Hadoop Hbase, Hypertable, Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, Velocity, … :“ ↔ ”• •  (join, transaction)•  / - mycassandra - 2
  • •  •  key/value vs. multi-dimensional map vs. document vs. graph •  •  vs. •  vs. •  •  strong vs. weak •  •  vs. •  •  row vs. column •  •  master/slave vs. decentralized11.4.14 - mycassandra - 3
  • •  •  key/value vs. multi-dimensional map vs. document vs. graph •  •  vs. •  vs. •  •  strong vs. weak •  •  vs. •  •  row vs. column •  •  master/slave vs. decentralized11.4.14 - mycassandra - 4
  • vs. write/read Bigtable, Cassandra, MySQL, Sherpa HBase Log-Structured B+-Tree [R.Bayer ‘72] Merge Tree [P. O’Neil ‘96] Bigtable MySQL 11.4.14 - mycassandra - 5
  • Write-Heavy Read-Heavy write-optimized Better Better read-optimized write-optimized read-optimized Yahoo! Cloud Serving Benchmark, SOCC ’1011.4.14 - mycassandra - 6
  • / 1.  2.  1.MyCassandra 2.MyCassandra Cluster read-optimized read/write-optimized write-optimized11.4.14 - mycassandra - 7
  • Apache Cassandra •  •  •  N = 3 ID Consistent Hashing( ) A F Z secondary 1 Q V N •  request proxy primary secondary 2 •  primary node •  secondary node hash(key) = Q key values11.4.14 - mycassandra - 8
  • Google Bigtable - : - •  Bigtable: sequential write I/O •  always writable write-lock <k1, cf1+cf2> Cassandra map: <key,ColumnFamily> async Memtable Memory Disk <k1, cf1> <k1, cf2> write Commit Log SSTable 11.4.14 - mycassandra - 9
  • Google Bigtable - : - key •  Memtable value •  SSTable value I/O Map Cassandra <key,ColumnFamily> read Memtable Memory <k1, CF4> Disk <key, CF1> Commit Log I/O <key, CF2> SSTable <key, CF3> 11.4.14 - mycassandra - 10
  • 1. MyCassandra read-optimized write-optimized 11.4.14 - mycassandra - 11
  • Cassandra •  Cassandra / •  Consistent Hashing InnoDB MyISAM Memory … Gossip Protocol Bigtable MySQL Redis …11.4.14 MyCassandra: 12
  • MyCassandra : Cassandra : . JDBC API / stored procedure : key-value store MyCassandra node × 611.4.14 13
  • 2. MyCassandra Cluster read/write-optimized11.4.14 - mycassandra - 14
  • •  •  sync async => •  Quorum Protocol: ( )+ ( )> ( ) => mem11.4.14 - mycassandra - 15
  • •  W: •  R: •  RW: MyCassandra •  (W) / (R) / (RW) •  gossip protocol •  1.  (key ) 2.  × N-1 N=3 Consistent Hashing ID R RW RW W W R gossip R RW W RW R W11.4.14 16
  • host node(1) 1 /1 → ☓ storage ☓(2) 1 /k → ID [Amazon Dynamo, SOSP ’07] ☓(3) 1 → FT spaceFaultTorelance (FT) space FT space (3)1storage / 1node / 1 host (2) (1) virtual node 1 node / host k nodes / host11.4.14 17 k storages / node 1 storage / node
  • •  : •  R: •  RW: =3, =2 W:RW:R = 1:1:1 Client 1)  Proxy 2)  W, RW ACK ACK 3a) W 3b) R RW R ACK : max (W, RW)11.4.14 - mycassandra - 18
  • •  : •  R: =3, =2 •  RW: W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W W RW R 4)  Proxy (Cassandra read repair ) : max (R, RW)11.4.14 - mycassandra - 19
  • /   •  MyCassandra Cluster: 6×3 = 18 /6 (W:R:RW = 6 : 6 : 6) •  Cassandra: 6 /6   •  : = 3, : = =2   : Bigtable (W), MySQL / InnoDB (R), Redis (RW) : YCSB (Yahoo! Cloud Serving Benchmark) [SOCC ’10]   1.  MyCassandra/Cassandra×6 YCSB Client×1 2.  1KB values(100[Bytes]×10[columns])+key 1,000 3.  4.  YCSB 5.  YCSB Stat11.4.14 - mycassandra - 20
  • YCSB •  4 Workload Application Operation Ratio Record Example Selection Log Read: 0% Zipfian( ) Write Write-Only Write: 100% Heavy Read: 50% Write-Heavy Session Store Write: 50% Read: 95% Read Read-Heavy Photo tagging Write: 5% Heavy Read: 100% Read-Only Cache Write: 0% ( ) Zipfian : , / 11.4.14 - mycassandra - 21
  • / 1 11.5~23.5% avg. write-latency Cassandra 0.8 MyCassandra 0.6 Cluster 0.4 MySQL + RedisBetter 0.2 write:100% write:50% write:5% write:0% 0 (ms) 88.5% 10 avg. read-latency 8Better 6 85.2% 88.5% 4 49.7% 2 read:0% read:50% read:95% read:100% 0 (ms) Write-Only Write-Heavy Read-Heavy Read-Only11.4.14 - mycassandra - 22
  • 30000 0.99 Cassandra max. qps for 40 clients MyCassandra 25000 Cluster 20000 6.53 15000Better 10000 0.62 1.49 5000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy •  6.53 •  11.4.14 - mycassandra - 23
  • (1) : HDD vs. SSD 30000 Cassandra HDD 30000 MyCassandra SSD HDD 25000 SSD 25000 20000 20000 Cluster 15000 15000 (3) ( ) ( ) 10000Better 10000 5000 5000 (3) 0 0 (qps) (qps) (1) HDD/SSD IOZone HDD: Western digital SSD: Crucial (2) benchmark sequential write 86,277 qps 96,401 qps (3) sequential read 108,914 qps 216,099 qps random write 2,485 qps 29,045 qps11.4.14 - mycassandra - random read 926 qps 21,751 qps 24
  •  Read-Heavy •  88.5% •  6.53 => /   Write-Heavy •  Cassandra11.4.14 - mycassandra - 25
  • (1/2)  Write-Heavy •  MySQL •  : •  : •  •  ) write-optimized write-heavy 4 15000 Cassandra MyCassandra cluster 3 10000 2 1 5000 0 011.4.14 26 write latency read latency throughput
  • (2/2)  Amazon EC2 •  1 /N   / •  / •  • 11.4.14 - mycassandra - 27
  •   FD-Tree: Tree Indexing on Flash Disks, VLDB ’10 •  •  B+tree + LSM-tree •  SSD   •  MySQL: RDBMS •  Anvil, SOSP ’09: 1 •  Cloudy, VLDB ’10: •  Dynamo, SOSP ‘07: vs. •  MyCassandra ( ): vs.11.4.14 - mycassandra - 28
  • : MyCassandra/MyCassandra Cluster Cassandra 1. MyCassandra 2. MyCassandra Cluster data model multi-dimensional map (Column Family) throughput write write or read write and read latency low lower in case lower persistence yes yes or no (memory) yes consistency weak (eventual, quorum) replication sync / async data partition row node decentralized organization throughput, latency 11.4.14 - mycassandra - 29
  • : 1) 2) MySQL + memcached : MyCassandra Cluster - - Table movie-id name thumb-name tag count 704122313 movieA EY37lHk5bgU sport, succer, FIFA, … 169,374 704122314 movieB Zk3BSYMWjzQ music, jazz, … 472,80311.4.14 Read-Heavy - mycassandra - Write-Heavy 30