SlideShare a Scribd company logo
1 of 52
Download to read offline
A tunably consistent, highly-
available, Distributed Database

     Tom Wilkie @tom_wilkie
  Founder & VP Engineering, Acunu




                                    1
•   Overview
•   Distribution
•   Storage
•   Datamodel
•   Usecases




                   2
•   Overview
•   Distribution
•   Storage
•   Datamodel
•   Usecases




                   3
•   A distributed database for Big Data
    •   Scale out on commodity servers
    •   Best of bread performance
    •   Multi-master architecture, no SPOF
    •   Powerful multi data centre support



4

                                              4
5

    5
BigTable, 2006                        Dynamo, 2007




                 Open sourced, 2008




                                TLP, 2010
          Incubator, 2009       v1.0 2011
                                                     6
BigTable: ...
    •   Simple but powerful datamodel
    •   Write-optimised storage system
    •   Consistent, available but not partition tolerant
    •   Master-slave distribution system, SPOF




                                            http://goo.gl/7T1Ej

7

                                                                  7
Dynamo: ...
    •   Sophisticated distribution system with tradable
        consistency and availability
    •   Over-simple datamodel




                                          http://goo.gl/Q80b4

8

                                                                8
•   Overview
•   Distribution
•   Storage
•   Datamodel
•   Usecases




                   9
Distribution: Consistent Hashing
                                     →
                              r1, c1	

 v1
                                     →
                              r2, c2	

 v2
                                     →
                              r3, c3	

 v3




10

                                             10
Distribution: Scaling




11

                             11
Distribution: Scaling




12

                             12
Distribution: Scaling




     •   .




13

                                     13
Distribution: Scaling




14

                             14
Distribution: Scaling




15

                             15
Distribution: Replication
                                  →
                           r1, c1	

 v1




16

                                          16
Distribution: Replication




17

                                 17
Distribution: Consistency
     Tuneable, per-operation consistency
     Timestamped values, N > R + W




                  W            R

18

                                           18
Distribution: Read Repair




19

                                 19
Distribution: Read Repair




20

                                 20
Distribution: Read Repair




21

                                 21
Distribution: Read Repair




22

                                 22
•   Overview
•   Distribution
•   Storage
•   Datamodel
•   Usecases




                   23
Writing to Cassandra



     Row Key   Column   Column   Column   Column




24

                                                   24
Writing to Cassandra
     In the JVM

       Row   Colu   Colu   Colu   Colu
                                                  Memtable




      On disk                            Commit
                                           log

25

                                                             25
Writing to Cassandra
     In the JVM

                            Full Memtable




      On disk     Commit
                    log

26

                                            26
Writing to Cassandra
     In the JVM

                           New Memtable




      On disk     Commit
                              SSTable
                    log

27

                                          27
Writing to Cassandra

     On disk    Commit
                             SSTable
                  log
                             SSTable
                             SSTable
                             SSTable
                             SSTable
                             SSTable



28

                                       28
Writing to Cassandra

     On disk    Commit
                  log


                             SSTable




29

                                       29
Reading from Cassandra




                         30
2

 Off-heap                              Row cache
 (no GC)
                         1


 In the JVM                                Memtable
                         3                 4               5
                                                               SSTable
                             Bloom filter       Key cache
                                                                index

                         6



 On disk    Commit log                     SSTable

31
                                                                         31
•   Overview
•   Distribution
•   Storage
•   Datamodel
•   Usecases




                   32
SQL                                     Cassandra

     Database   row/key col_1    col_2
                                                 Keyspace
                   row/key col_1     col_1
                        row/  col_1    col_1


      Table                                    Column Family




33

                                                               33
col1   col2   col3   col4   col5   col6   col7
     row1           x                    x      x
     row2    x      x      x      x      x
     row3           x      x             x      x      x
     row4           x      x      x             x
     row5           x             x      x      x
     row6           x
     row7    x      x             x



34

                                                             34
alice: {
        m2: {
           Sender: bob,
           Subject: ‘paper!’, ...
        }
     }

     bob: {
        m1: {
            Sender: alice,
            Subject: ‘rock?’, ...
        }
     }

     charlie: {
        m1: {
           Sender: alice,
           Subject: ‘rock?’, ...
        },
        m2: {
           Sender: bob,
           Subject: ‘paper!’, ...
        }
     }




35

                                    35
•   Overview
•   Distribution
•   Storage
•   Datamodelling
•   Usecases




                    36
Perfect for high velocity data
               Web, SCM, Retail    Location Services   Cloud Monitoring




                   Social Gaming     Social Media      Ad Marketplaces




               Fraud Detection     Smart Metering      Oil/Gas Sensors


  37
 Confidential                               6
Wednesday, 25 April 12
                                                                          37
Not Covered...
     •   Distribution: Hinted Handoff, Anti-entropy repair,
         Counter distribution
     •   Storage: Counter storage, different compaction
         strategies, partitioning etc
     •   Datamodel: de-normalisation, TTLs, secondary
         indexes, CQL, super-columns, schema optional
     •   Operations: backup, nodetool, performance tuning
     •   Integration: Hadoop, Client Libraries etc
38

                                                              38
• Distributed, scalable database
• Opensource, widely used
• Tunably consistent
• Highly-available
• Partition tolerant
• Write-optimised
• Schema-optional
                                   39
Data Platform



                40
Data Platform
Data driven applications   Web UI



   Acunu Analytics

                           Control
  Apache Cassandra
                           Center

Acunu Storage Engine

       Configured and tuned OS


         Commodity Hardware


                                                     41
Control Center




“I've had the EC2 instance running for a little while and I
have to say, I'm impressed. You guys have done well with
                      this product.”
                                           - Lloyd, JustDevelopIt
                                                                    42
Control Center




“The new UI has been critical in helping us work out
           what is wrong in our code”
                                       - Matt, TellyBug
                                                            43
Castle: Built for Big Data
     •        Storage engine optimized for large slow disks,
              many cores, Big Data workloads
     •        Enterprise density on commodity hardware
     •        Lightning disk rebuilds:10x faster than RAID

                           Shared memory interface




                                                                                                                                                   Castle
                                                                      keys
                                                                                                                              Userspace
                                                                                                                            Acunu Kernel
         userspace
          interface




                                                               values
                                                                                                  In-kernel
                                         async, shared
                                          memory ring                                             workloads
                                                                 shared buffers
         kernelspace




                                                        Streaming interface
           interface




                             range           key               buffered              key           buffered
                            queries         insert            value insert           get           value get




                                                              Doubling Arrays                                                               •   Opensource (GPLv2, MIT
         doubling array
         mapping layer




                                                                                                                                                for user libraries)
                                         insert                                                                              Bloom filters
                                        queues                                                       key
                                                                                                     get
                            arrays                                                                                                  x
                             range                                        arrays
                            queries                                     management




                                                                                                                                                                             http://goo.gl/gzihe
                                         key




                                                                                                                                            •
                                        insert                          merges


                                                                                                                                                http://bitbucket.org/acunu
                                                                 Arrays
         mapping layer




                                                                                                                                            •
         modlist btree




                                          key                                                       Version tree


                                                                                                                                                Loadable Kernel Module,
                                         insert                          btree
                                                                                            key
                                                                                            get
                             btree


                                                                                                                                                targeting CentOS’s 2.6.18
                             range
                            queries                           value arrays



44
                                                                                                                                            •
                                                                                           Cache
         block mapping &




                                                                                                                                                http://www.acunu.com/
          cacheing layer




                                 "Extent" layer
                                                                                                               prefetcher




                                                                                 extent block
                                                   extent                           cache


                                                                                                                                                blogs/andy-twigg/why-
                            freespace
                                                  allocator
                             manager
                                                                                                                                                                                                   44
                                                                                                     flusher
45
Rebuild time
                            5


                            4
     Rebuild Time (Hours)




                            3


                            2


                            1


                            0   RAID10, 8 Disks   RAID5, 8 Disks   RDA, 8 Disks   RDA, 15 Disks




46

                                                                                                  46
Analytics

                                     counter
                                     updates
Click stream    events
                          Acunu
Sensor data
                         Analytics
     etc




     •   Simple, real-time, incremental analytics
     •   Push processing into ingest phase

                                                                47
Questions?
 tom@acunu.com
   @tom_wilkie
 www.acunu.com




                 48
Introduction



     Live & historical
       aggregates...




49

                                        49
Realtime trends...




50

                          50
Drill downs
     and roll ups


51

                    51
Solution              Con

                        Scalability
                          $$$


                        Not realtime
                 Inefficient Recomputation


                Spartan query semantics =>
                  complex, DIY solutions

52

                                             52

More Related Content

Viewers also liked

Cassandra datamodel
Cassandra datamodelCassandra datamodel
Cassandra datamodellurga
 
Datamodelling training
Datamodelling trainingDatamodelling training
Datamodelling trainingVasudha India
 
Introduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraIntroduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraJim Hatcher
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraAcunu
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modelingRomain Hardouin
 
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax Academy
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationIppon
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 

Viewers also liked (13)

Cassandra datamodel
Cassandra datamodelCassandra datamodel
Cassandra datamodel
 
Datamodelling training
Datamodelling trainingDatamodelling training
Datamodelling training
 
Introduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraIntroduction to Data Modeling in Cassandra
Introduction to Data Modeling in Cassandra
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
 
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra application
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 

Similar to Progressive NOSQL: Cassandra

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Acunu
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Advanced mysql replication for the masses
Advanced mysql replication for the massesAdvanced mysql replication for the masses
Advanced mysql replication for the massesGiuseppe Maxia
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 

Similar to Progressive NOSQL: Cassandra (20)

Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Cassandra
CassandraCassandra
Cassandra
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Advanced mysql replication for the masses
Advanced mysql replication for the massesAdvanced mysql replication for the masses
Advanced mysql replication for the masses
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
NoSQL
NoSQLNoSQL
NoSQL
 

More from Acunu

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinAcunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu
 
All Your Base
All Your BaseAll Your Base
All Your BaseAcunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonAcunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with CassandraAcunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraAcunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation CassandraAcunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
 
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Acunu
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowAcunu
 

More from Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard Low
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Progressive NOSQL: Cassandra

  • 1. A tunably consistent, highly- available, Distributed Database Tom Wilkie @tom_wilkie Founder & VP Engineering, Acunu 1
  • 2. Overview • Distribution • Storage • Datamodel • Usecases 2
  • 3. Overview • Distribution • Storage • Datamodel • Usecases 3
  • 4. A distributed database for Big Data • Scale out on commodity servers • Best of bread performance • Multi-master architecture, no SPOF • Powerful multi data centre support 4 4
  • 5. 5 5
  • 6. BigTable, 2006 Dynamo, 2007 Open sourced, 2008 TLP, 2010 Incubator, 2009 v1.0 2011 6
  • 7. BigTable: ... • Simple but powerful datamodel • Write-optimised storage system • Consistent, available but not partition tolerant • Master-slave distribution system, SPOF http://goo.gl/7T1Ej 7 7
  • 8. Dynamo: ... • Sophisticated distribution system with tradable consistency and availability • Over-simple datamodel http://goo.gl/Q80b4 8 8
  • 9. Overview • Distribution • Storage • Datamodel • Usecases 9
  • 10. Distribution: Consistent Hashing → r1, c1 v1 → r2, c2 v2 → r3, c3 v3 10 10
  • 13. Distribution: Scaling • . 13 13
  • 16. Distribution: Replication → r1, c1 v1 16 16
  • 18. Distribution: Consistency Tuneable, per-operation consistency Timestamped values, N > R + W W R 18 18
  • 23. Overview • Distribution • Storage • Datamodel • Usecases 23
  • 24. Writing to Cassandra Row Key Column Column Column Column 24 24
  • 25. Writing to Cassandra In the JVM Row Colu Colu Colu Colu Memtable On disk Commit log 25 25
  • 26. Writing to Cassandra In the JVM Full Memtable On disk Commit log 26 26
  • 27. Writing to Cassandra In the JVM New Memtable On disk Commit SSTable log 27 27
  • 28. Writing to Cassandra On disk Commit SSTable log SSTable SSTable SSTable SSTable SSTable 28 28
  • 29. Writing to Cassandra On disk Commit log SSTable 29 29
  • 31. 2 Off-heap Row cache (no GC) 1 In the JVM Memtable 3 4 5 SSTable Bloom filter Key cache index 6 On disk Commit log SSTable 31 31
  • 32. Overview • Distribution • Storage • Datamodel • Usecases 32
  • 33. SQL Cassandra Database row/key col_1 col_2 Keyspace row/key col_1 col_1 row/ col_1 col_1 Table Column Family 33 33
  • 34. col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x 34 34
  • 35. alice: { m2: { Sender: bob, Subject: ‘paper!’, ... } } bob: { m1: { Sender: alice, Subject: ‘rock?’, ... } } charlie: { m1: { Sender: alice, Subject: ‘rock?’, ... }, m2: { Sender: bob, Subject: ‘paper!’, ... } } 35 35
  • 36. Overview • Distribution • Storage • Datamodelling • Usecases 36
  • 37. Perfect for high velocity data Web, SCM, Retail Location Services Cloud Monitoring Social Gaming Social Media Ad Marketplaces Fraud Detection Smart Metering Oil/Gas Sensors 37 Confidential 6 Wednesday, 25 April 12 37
  • 38. Not Covered... • Distribution: Hinted Handoff, Anti-entropy repair, Counter distribution • Storage: Counter storage, different compaction strategies, partitioning etc • Datamodel: de-normalisation, TTLs, secondary indexes, CQL, super-columns, schema optional • Operations: backup, nodetool, performance tuning • Integration: Hadoop, Client Libraries etc 38 38
  • 39. • Distributed, scalable database • Opensource, widely used • Tunably consistent • Highly-available • Partition tolerant • Write-optimised • Schema-optional 39
  • 41. Data Platform Data driven applications Web UI Acunu Analytics Control Apache Cassandra Center Acunu Storage Engine Configured and tuned OS Commodity Hardware 41
  • 42. Control Center “I've had the EC2 instance running for a little while and I have to say, I'm impressed. You guys have done well with this product.” - Lloyd, JustDevelopIt 42
  • 43. Control Center “The new UI has been critical in helping us work out what is wrong in our code” - Matt, TellyBug 43
  • 44. Castle: Built for Big Data • Storage engine optimized for large slow disks, many cores, Big Data workloads • Enterprise density on commodity hardware • Lightning disk rebuilds:10x faster than RAID Shared memory interface Castle keys Userspace Acunu Kernel userspace interface values In-kernel async, shared memory ring workloads shared buffers kernelspace Streaming interface interface range key buffered key buffered queries insert value insert get value get Doubling Arrays • Opensource (GPLv2, MIT doubling array mapping layer for user libraries) insert Bloom filters queues key get arrays x range arrays queries management http://goo.gl/gzihe key • insert merges http://bitbucket.org/acunu Arrays mapping layer • modlist btree key Version tree Loadable Kernel Module, insert btree key get btree targeting CentOS’s 2.6.18 range queries value arrays 44 • Cache block mapping & http://www.acunu.com/ cacheing layer "Extent" layer prefetcher extent block extent cache blogs/andy-twigg/why- freespace allocator manager 44 flusher
  • 45. 45
  • 46. Rebuild time 5 4 Rebuild Time (Hours) 3 2 1 0 RAID10, 8 Disks RAID5, 8 Disks RDA, 8 Disks RDA, 15 Disks 46 46
  • 47. Analytics counter updates Click stream events Acunu Sensor data Analytics etc • Simple, real-time, incremental analytics • Push processing into ingest phase 47
  • 48. Questions? tom@acunu.com @tom_wilkie www.acunu.com 48
  • 49. Introduction Live & historical aggregates... 49 49
  • 51. Drill downs and roll ups 51 51
  • 52. Solution Con Scalability $$$ Not realtime Inefficient Recomputation Spartan query semantics => complex, DIY solutions 52 52