SlideShare a Scribd company logo
1 of 37
Download to read offline
+




    MyCassandra
+

     NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB
        : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra,
      Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis,
      LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM
      ObjectGrid, Oracle Coherence, Velocity, …     100

                                         :               ↔
       

          join, transaction
                              /




MyCassandra
+

     
             key/value vs. multi-dimensional map vs. document vs. graph
     
                        vs.
                  vs.           –                        fsync

     
                 vs.          (snapshot)
     
                 vs.
     
             strong vs. weak
     
             row vs. column
     
             master/slave vs. decentralized

MyCassandra
+

     
             key/value vs. multi-dimensional map vs. document vs. graph
     
                        vs.
                  vs.
     
                 vs.          (snapshot)
     
                 vs.
     
             strong vs. weak
     
             row vs. column
     
             master/slave vs. decentralized

MyCassandra
+
                       vs.


              write/read
                       Bigtable, Cassandra,         MySQL, Sherpa
                       HBase
                       Log-Structured               B-Trees [R.Bayer ’70]
                       Merge Tree [P. O’Neil ‘96]
disk                   append                       (buffering) random
disk                   n     random I/O + merge 1         random I/O


                       Bigtable                     MySQL



MyCassandra
+
    ~                     vs.                ~
                  Write-Heavy




                                                        Better
                 read-optimized



                                     write-optimized6




MyCassandra   Yahoo! Cloud Serving Benchmark, SOCC ’10
                               - mycassandra -
+
    ~              vs.                      ~
                  Read-Heavy



                         write-optimized



                                                  Better




                                      read-optimized




MyCassandra   Yahoo! Cloud Serving Benchmark, SOCC ’10
                               - mycassandra -
+

     
                                            /

     
         1. 
         2. 
               1.MyCassandra            2.MyCassandra Cluster


    read-optimized


                                         read and write-optimized
                      write-optimized




MyCassandra
+
    Apache Cassandra


     
     

                         dc1          dc2



                           rack/dc
                       region
                                 dc3
+
    Apache Cassandra

    Consistent Hashing (                                        )
         

(A~Z                  )
             N := 3                      ID

             A                F
       Z                                          •  request proxy
                          secondary 1
                                                  •          primary node
                           Q                      •             secondary node
          V                             N
       primary                      secondary 2
                             hash(key) = Q
                          key   values
+
    Google Bigtable
                              : O(1)
                                   sequential write
                              I/O
        Always writable
                              write-lock               memory
                                    sync               <k1, obj (v1+v2)> async flush
           write path                              Memtable
  LSM-Tree [P. O’Neil ‘96]
                                                       disk
                                                   <k1, v1>, <k1, v2>

                             sequential write
                                                 Commit Log
      disk          mem                                                  <k1,obj1>
                                                   SSTable 1
                                                                         <k1,obj2>
                                                   SSTable 2
                                                                         <k1,obj3>
                                                   SSTable 3
SSTable
 MyCassandra
+
    Google Bigtable

      Key
           Memtable            value
           SSTable                 value
                                        I/O
       disk                                                      memory
                                                   <k1,obj>
                                                               Memtable

                  disk                mem                         disk
                                 <k1,obj+obj1~3>
                                                               Commit Log
                       client           merge
                                                   <k1,obj1>
                                                                SSTable 1
                                    I/O            <k1,obj2>
                                                                SSTable 2
                                                   <k1,obj3>
                                                                SSTable 3
MyCassandra
+ Cassandra
                                                        (      / 99.9%)

                                                  1/9
                               Better
                                      read                                write
Number of queries




                               avg.   6.16 ms
                                                                          read



                                                              Latency (ms)


                       write                                 write: 2.0 ms
                    avg. 0.69 ms                             read: 86.9 ms
                                                            99.9 percentile
                                   Latency (ms)
1.
           +
                        1.MyCassandra


               read-optimized



                                write-optimized



11.4.14                                           14
+ MyCassandra:
      Cassandra

         Cassandra                   /
        




      InnoDB MyISAM Memory …                                Consistent Hashing
                                                 Bigtable   Gossip Protocol




                               Bigtable MySQL   Redis   …

MyCassandra
+ MyCassandra:
      Cassandra

         Cassandra                           /
        




                         Consistent Hashing
              Bigtable   Gossip Protocol




                                                  Bigtable MySQL   Redis   …


      InnoDB MyISAM Memory …



MyCassandra
+ MyCassandra –




MyCassandra
:          Cassandra
                  :          . JDBC API / stored procedure
              :           key-value store
    •  ….




MyCassandra
2.
          +
              2.MyCassandra Cluster




               read and write-optimized



11.4.14                                   19
•  W:
                                    •  R:                 20
                                    •  RW:

 

 
                                        write query

                                    sync              async


                                   W                  R

Quorum Protocol:       (   )+   (       )>        (          )
      
                                    write             read



                                    W        RW       R

     - mycassandra -
21

                                    MyCassandra
                         (W) /                  (R) /               (RW)



                         (join/dead)          gossip protocol



 
     1.                                 (key                )
     2.                                                          × N-1

           1              3
                                                                             Proxy
               N=3

                      gossip
                                                RW


           W         RW        R          W                  W       RW     RW        R
                                                         secondary
                                                                          secondary primary
•  :
                                                              •  R:          22
                                                              •  RW:

 =3, =2
                                     Client   1) 
W:RW:R = 1:1:1               Proxy


                                              2)  W, RW

                       ACK
                                                                ACK

                                              3a)
          W       RW           R
                                              3b)         R

                                                                       ACK
                  : max (W, RW)



- mycassandra -
•  :
                                                                   •  R:           23
                                                                   •  RW:
 =3, =2
W:RW:R = 1:1:1                         Client
                           Proxy                1) 



                                                2)  R, RW



                                                3a)

                                                3b)         or
       W          RW        R                     W


                                                4) 
                       : max (R, RW)                    .
                                                      (Cassandra read repair   )


- mycassandra -
+                                                                                       24




                      /
     
              MyCassandra Cluster: 6×3 = 18           /6        (W:R:RW = 6 : 6 : 6)
              Cassandra: 6     /6
     
                         :     = 3,                        :    =   =2
                                      : Bigtable (W), MySQL / InnoDB (R), Redis (RW)

                    : YCSB (Yahoo! Cloud Serving Benchmark) [SOCC ’10]
     
              1.    MyCassandra/Cassandra×6      YCSB Client×1
              2.    1KB values(100[Bytes]×10[columns])+key                   1,000
              3. 
              4.    YCSB
              5.    YCSB Stat




- mycassandra -
+                                                                        25

    YCSB
             4
                  Workload       Application Operation           Record
                                 Example     Ratio               Selection
                  Write-Only     Log         Read: 0%            Zipfian( )
Write                                              Write: 100%
Heavy             Write-Heavy Session Store        Read: 50%
                                                   Write: 50%
                  Read-Heavy     Photo             Read: 95%
Read                                               Write: 5%
Heavy
                                 tagging
                  Read-Only      Cache             Read: 100%
                                                   Write: 0%

                   ( ) Zipfian   :             ,
                                           /


- mycassandra -
/
         1.5
                                    avg. write-latency              Cassandra
                           0.36ms                                   MyCassandra Cluster
           1

                    9.3%            26.2%           46.2%
Better 0.5
                                                               MySQL + Redis
                     write:100%        write:50%      write:5%           write:0%
           0
         (ms)
          12
                    84.9%            avg. read-latency
          10
                           8.59ms
           8
Better
           6                                                82.6%              84.9%
           4                           35.7%
           2
                       read:0%         read:50%       read:95%           read:100%
           0
          (ms)
  - mycassandra -    Write-Only       Write-Heavy    Read-Heavy          Read-Only     26
27




      20000                                                 Cassandra
                          0.90   max. qps for 40 clients    MyCassandra Cluster
      18000
      16000                                                      6.49
      14000
      12000                                        1.54
                                       0.93
      10000
Better 8000
       6000
       4000
       2000
          0
                     [100:0]          [50:50]        [5:95]        [0:100] [write:read]
    (query/sec)     Write-Only      Write-Heavy   Read-Heavy      Read-Only

                          Write Heavy                 Read Heavy
                     •                                     6.49
                     • 
  - mycassandra -
+                                                                       28

                         1:
               Cassandra
                    N
               MyCassandra Cluster
                                      :
                                            :


                                                    MyCassandra
                                 Cassandra            Cluster

                              write        read   write        read
N                                                                     R,W

                                                  W       RW   R

    - mycassandra -
+
               2:
    Q.

    A.                  LRU like cache
      Swap                            read repair


    Q.

    A.    1)


          2) Redis                   fsync
                    (                           )




myCassandra
+                                30




   Read-Heavy
                        84.9%
                 6.49


                          +




- mycassandra -
31



                                                   index algorithm
             FD-Tree: Tree Indexing on Flash Disks, VLDB ’10
               

                  B+tree        + LSM-tree
                       SSD
             Fractal-Tree / TokuDB (MySQL                     )

     
             MySQL: RDBMS
           Anvil, SOSP ’09: 1
           Cloudy, VLDB ’10:
           Dynamo, SOSP ‘07:          vs.
             MyCassandra (       ):          vs.         +


- mycassandra -
+                                                                               32



                 :
        1. 


        2.            (MySQL + memcached)


                 : MyCassandra Cluster
         
         

                                 Web             Table
          movie-id      name     thumb-name    tag                    count
          704122313     movieA   EY37lHk5bgU   sport, succer, FIFA,   169,374

          704122314     movieB   Zk3BSYMWjzQ music, jazz, …           472,803


- mycassandra -
+                                        34

                  :       (       )




                      5       6

                      twitter: @MyCassandraJP




- mycassandra -
35

                  : MyCassandra/MyCassandra Cluster
                     Cassandra   1. MyCassandra            2. MyCassandra
                                                           Cluster
data model           multi-dimensional map (Column Family)
throughput           write       write or read             write and read
latency              low         lower in case             lower
persistence          yes        yes or no                  yes
consistency          weak (eventual, quorum)
replication    sync / async
data partition row
node                 decentralized
organization
                                     throughput, latency

- mycassandra -
host
(1) 1             /1
                                                                        node
   ☓
   ☓                                                                    storage
(2) 1             /k
   ID                          [Amazon Dynamo, SOSP ’07]
   ☓
(3) 1

Fault
                                    FT         space               FT           space
Torelance (FT)         space

1storage / 1node / 1 host
                                         (2)                             (3)
            (1)
                                               virtual node

                                               1 node / host
                                               k storages / node
                                                                   k nodes / host
                                                                   1 storage / node 36
: HDD vs. SSD
    25000               Cassandra             HDD
                                              SSD
                                                    20000           MyCassandra            HDD
    20000                                                             Cluster              SSD
                                                    15000
    15000
                                                    10000
     10000
Better
     5000                                           5000

        0                                              0
      (qps)                                           (qps)



                                                        IOZone        HDD: Western   SSD: Crucial
                                                        benchmark     digital

                                                       seq. write     86,277 qps     96,401 qps
                                                       seq. read      108,914 qps    216,099 qps
                                                       random write   2,485 qps      29,045 qps
                                                       random read    926 qps        21,751 qps
              11.4.14    - mycassandra -

More Related Content

What's hot

Cassandra
CassandraCassandra
Cassandra
pcmanus
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
DataWorks Summit
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Ontico
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 

What's hot (18)

Cassandra
CassandraCassandra
Cassandra
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
No sql & dq2 tracer service
No sql & dq2 tracer serviceNo sql & dq2 tracer service
No sql & dq2 tracer service
 
Advanced queuinginternals
Advanced queuinginternalsAdvanced queuinginternals
Advanced queuinginternals
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Future
 
The OSSCube MySQL High Availability Tutorial
The OSSCube MySQL High Availability TutorialThe OSSCube MySQL High Availability Tutorial
The OSSCube MySQL High Availability Tutorial
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Java 8 고급 (6/6)
Java 8 고급 (6/6)Java 8 고급 (6/6)
Java 8 고급 (6/6)
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)
 

Similar to 読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)

Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Boris Yen
 
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
grssieee
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
srisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
srisatish ambati
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
Benoit Perroud
 

Similar to 読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1) (20)

読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Cassandra勉強会
Cassandra勉強会Cassandra勉強会
Cassandra勉強会
 
Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 

More from Shun Nakamura (6)

HBase at LINE
HBase at LINEHBase at LINE
HBase at LINE
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!
 
MyCassandra
MyCassandraMyCassandra
MyCassandra
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
 
ComSys WIP
ComSys WIPComSys WIP
ComSys WIP
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)

  • 1. + MyCassandra
  • 2. +   NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, Velocity, … 100   : ↔     join, transaction   / MyCassandra
  • 3. +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs. – fsync     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralized MyCassandra
  • 4. +     key/value vs. multi-dimensional map vs. document vs. graph     vs.   vs.     vs. (snapshot)     vs.     strong vs. weak     row vs. column     master/slave vs. decentralized MyCassandra
  • 5. + vs. write/read Bigtable, Cassandra, MySQL, Sherpa HBase Log-Structured B-Trees [R.Bayer ’70] Merge Tree [P. O’Neil ‘96] disk append (buffering) random disk n random I/O + merge 1 random I/O Bigtable MySQL MyCassandra
  • 6. + ~ vs. ~ Write-Heavy Better read-optimized write-optimized6 MyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  • 7. + ~ vs. ~ Read-Heavy write-optimized Better read-optimized MyCassandra Yahoo! Cloud Serving Benchmark, SOCC ’10 - mycassandra -
  • 8. +   /   1.  2.  1.MyCassandra 2.MyCassandra Cluster read-optimized read and write-optimized write-optimized MyCassandra
  • 9. + Apache Cassandra       dc1 dc2 rack/dc region dc3
  • 10. + Apache Cassandra Consistent Hashing ( )   (A~Z ) N := 3 ID A F Z •  request proxy secondary 1 •  primary node Q •  secondary node V N primary secondary 2 hash(key) = Q key values
  • 11. + Google Bigtable : O(1)   sequential write I/O   Always writable write-lock memory sync <k1, obj (v1+v2)> async flush write path Memtable LSM-Tree [P. O’Neil ‘96] disk <k1, v1>, <k1, v2> sequential write Commit Log disk mem <k1,obj1> SSTable 1 <k1,obj2> SSTable 2 <k1,obj3> SSTable 3 SSTable MyCassandra
  • 12. + Google Bigtable   Key   Memtable value   SSTable value I/O disk memory <k1,obj> Memtable disk mem disk <k1,obj+obj1~3> Commit Log client merge <k1,obj1> SSTable 1 I/O <k1,obj2> SSTable 2 <k1,obj3> SSTable 3 MyCassandra
  • 13. + Cassandra ( / 99.9%) 1/9 Better read write Number of queries avg. 6.16 ms read Latency (ms) write write: 2.0 ms avg. 0.69 ms read: 86.9 ms 99.9 percentile Latency (ms)
  • 14. 1. + 1.MyCassandra read-optimized write-optimized 11.4.14 14
  • 15. + MyCassandra: Cassandra   Cassandra /   InnoDB MyISAM Memory … Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis … MyCassandra
  • 16. + MyCassandra: Cassandra   Cassandra /   Consistent Hashing Bigtable Gossip Protocol Bigtable MySQL Redis … InnoDB MyISAM Memory … MyCassandra
  • 18. : Cassandra : . JDBC API / stored procedure : key-value store •  …. MyCassandra
  • 19. 2. + 2.MyCassandra Cluster read and write-optimized 11.4.14 19
  • 20. •  W: •  R: 20 •  RW:     write query sync async   W R Quorum Protocol: ( )+ ( )> ( )   write read W RW R - mycassandra -
  • 21. 21 MyCassandra   (W) / (R) / (RW)   (join/dead) gossip protocol   1.  (key ) 2.  × N-1 1 3 Proxy N=3 gossip RW W RW R W W RW RW R secondary secondary primary
  • 22. •  : •  R: 22 •  RW: =3, =2 Client 1)  W:RW:R = 1:1:1 Proxy 2)  W, RW ACK ACK 3a) W RW R 3b) R ACK : max (W, RW) - mycassandra -
  • 23. •  : •  R: 23 •  RW: =3, =2 W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W RW R W 4)  : max (R, RW) . (Cassandra read repair ) - mycassandra -
  • 24. + 24 /     MyCassandra Cluster: 6×3 = 18 /6 (W:R:RW = 6 : 6 : 6)   Cassandra: 6 /6     : = 3, : = =2   : Bigtable (W), MySQL / InnoDB (R), Redis (RW) : YCSB (Yahoo! Cloud Serving Benchmark) [SOCC ’10]   1.  MyCassandra/Cassandra×6 YCSB Client×1 2.  1KB values(100[Bytes]×10[columns])+key 1,000 3.  4.  YCSB 5.  YCSB Stat - mycassandra -
  • 25. + 25 YCSB 4 Workload Application Operation Record Example Ratio Selection Write-Only Log Read: 0% Zipfian( ) Write Write: 100% Heavy Write-Heavy Session Store Read: 50% Write: 50% Read-Heavy Photo Read: 95% Read Write: 5% Heavy tagging Read-Only Cache Read: 100% Write: 0% ( ) Zipfian : , / - mycassandra -
  • 26. / 1.5 avg. write-latency Cassandra 0.36ms MyCassandra Cluster 1 9.3% 26.2% 46.2% Better 0.5 MySQL + Redis write:100% write:50% write:5% write:0% 0 (ms) 12 84.9% avg. read-latency 10 8.59ms 8 Better 6 82.6% 84.9% 4 35.7% 2 read:0% read:50% read:95% read:100% 0 (ms) - mycassandra - Write-Only Write-Heavy Read-Heavy Read-Only 26
  • 27. 27 20000 Cassandra 0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 6.49 14000 12000 1.54 0.93 10000 Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy •  6.49 •  - mycassandra -
  • 28. + 28 1:   Cassandra   N   MyCassandra Cluster   :   : MyCassandra Cassandra Cluster write read write read N R,W W RW R - mycassandra -
  • 29. + 2: Q. A. LRU like cache Swap read repair Q. A. 1) 2) Redis fsync ( ) myCassandra
  • 30. + 30 Read-Heavy   84.9%   6.49 + - mycassandra -
  • 31. 31   index algorithm   FD-Tree: Tree Indexing on Flash Disks, VLDB ’10     B+tree + LSM-tree   SSD   Fractal-Tree / TokuDB (MySQL )     MySQL: RDBMS   Anvil, SOSP ’09: 1   Cloudy, VLDB ’10:   Dynamo, SOSP ‘07: vs.   MyCassandra ( ): vs. + - mycassandra -
  • 32. + 32   : 1.  2.  (MySQL + memcached)   : MyCassandra Cluster     Web Table movie-id name thumb-name tag count 704122313 movieA EY37lHk5bgU sport, succer, FIFA, 169,374 704122314 movieB Zk3BSYMWjzQ music, jazz, … 472,803 - mycassandra -
  • 33.
  • 34. + 34 : ( ) 5 6 twitter: @MyCassandraJP - mycassandra -
  • 35. 35 : MyCassandra/MyCassandra Cluster Cassandra 1. MyCassandra 2. MyCassandra Cluster data model multi-dimensional map (Column Family) throughput write write or read write and read latency low lower in case lower persistence yes yes or no yes consistency weak (eventual, quorum) replication sync / async data partition row node decentralized organization throughput, latency - mycassandra -
  • 36. host (1) 1 /1 node ☓ ☓ storage (2) 1 /k ID [Amazon Dynamo, SOSP ’07] ☓ (3) 1 Fault FT space FT space Torelance (FT) space 1storage / 1node / 1 host (2) (3) (1) virtual node 1 node / host k storages / node k nodes / host 1 storage / node 36
  • 37. : HDD vs. SSD 25000 Cassandra HDD SSD 20000 MyCassandra HDD 20000 Cluster SSD 15000 15000 10000 10000 Better 5000 5000 0 0 (qps) (qps) IOZone HDD: Western SSD: Crucial benchmark digital seq. write 86,277 qps 96,401 qps seq. read 108,914 qps 216,099 qps random write 2,485 qps 29,045 qps random read 926 qps 21,751 qps 11.4.14 - mycassandra -