SlideShare a Scribd company logo
Conceptual Modeling Differences From A RDBMS
   Matthew F. Dennis, DataStax // @mdennis


Austin MySQL User Group
January 11, 2012
Cassandra Is Not Relational
get out of the relational mindset when working
  with Cassandra (or really any NoSQL DB)
Work Backwards From Queries
   Think in terms of queries, not in terms of
normalizing the data; in fact, you often want to
  denormalize (already common in the data
    warehousing world, even in RDBMS)
OK great, but how do I do that?
Well, you need to know how Cassandra Models
          Data (e.g. Google Big Table)

   research.google.com/archive/bigtable-osdi06.pdf



   Go Read It!
In Cassandra:

data is organized into Keyspaces (usually one per app)
➔




each Keyspace can have multiple Column Families
➔




each Column Family can have many Rows
➔




each Row has a Row Key and a variable number of Columns
➔




each Column consists of a Name, Value and Timestamp
➔
In Cassandra, Keyspaces:
are similar in concept to a “database” in some RDBMs
➔




are stored in separate directories on disk
➔




are usually one-one with applications
➔




are usually the administrative unit for things related to ops
➔




contain multiple column families
➔
In Cassandra, In Keyspaces, Column Famlies:
   ➔ are similar in concept to a “table” in most RDBMs

   ➔ are stored in separate files on disk (many per CF)

   ➔ are usually approximately one-one with query type

   ➔ are usually the administrative unit for things related to your data

   ➔ can contain many (~billion* per node) rows




* for a good sized node
(you can always add nodes)
In Cassandra, In Keyspaces, In Column Families ...
Rows

 thepaul   office: Austin      OS: OSX          twitter: thepaul0


 mdennis    office: UA         OS: Linux        twitter: mdennis


  thobbs   office: Austin   twitter: tylhobbs




Row Keys
thepaul   office: Austin       OS: OSX          twitter: thepaul0


mdennis    office: UA          OS: Linux        twitter: mdennis


thobbs    office: Austin    twitter: tylhobbs




                           Columns
Column Names

thepaul   office: Austin      OS: OSX          twitter: thepaul0


mdennis    office: UA         OS: Linux        twitter: mdennis


thobbs    office: Austin   twitter: tylhobbs
Column Values

thepaul   office: Austin      OS: OSX          twitter: thepaul0


mdennis    office: UA         OS: Linux        twitter: mdennis


thobbs    office: Austin   twitter: tylhobbs
thepaul   office: Austin       OS: OSX          twitter: thepaul0


mdennis    office: UA          OS: Linux        twitter: mdennis


thobbs    office: Austin    twitter: tylhobbs




                           Rows Are Randomly Ordered
                             (if using the RandomPartitioner)
thepaul   office: Austin           OS: OSX          twitter: thepaul0


mdennis    office: UA              OS: Linux        twitter: mdennis


thobbs    office: Austin        twitter: tylhobbs




                  Columns Are Ordered by Name
                           (by a configurable comparator)
Columns are ordered because
 doing so allows very efficient
implementations of useful and
     common operations

        (e.g. merge joins)
In particular, within a row I can
find given columns by name very
quickly (ordered names => log(n)
           binary search).
More importantly, I can query for a
      slice between a start and end

                 Row Key

RK   ts0   ts1   ...   ...   tsM ...   ...   ...   ...   tsN ...   ...   ...   ...   ...


 start                                                                         end
Why does that matter?
Because columns within a row aren't static!
The Column Name Can Be Part of Your Data

  INTC     ts0: $25.20         ts1: $25.25             ...


  AMR       ts0: $6.20          ts9: $0.26             ...


  CRDS      ts0: $1.05          ts5: $6.82             ...




                  Columns Are Ordered by Name
                   (in this case by a TimeUUID Comparator)
Turns Out That Pattern Comes Up A Lot
  ➔ stock ticks
  ➔ event logs

  ➔ ad clicks/views

  ➔ sensor records

  ➔ access/error logs

  ➔ plane/truck/person/”entity” locations

  ➔…
OK, but I can do that in SQL
Not efficiently at scale, at least not easily ...
How it Looks In a RDBMS
                    ticker   timestamp   bid   ask   ...
                    AMR      ts0         ...   ...   ...
                    ...      ...         ...   ...   ...
                    CRDS     ts0         ...   ...   ...
                    ...      ...         ...   ...   ...
Data I Care About   ...      ts0         ...   ...   ...
                    AMR      ts1         ...   ...   ...
                    ...      ...         ...   ...   ...
                    ...      ...         ...   ...   ...
                    …        ts1         ...   ...   ...
                    AMR      ts2         ...   ...   ...
                    ...      ts2         ...   ...   ...
How it Looks In a RDBMS
             ticker     timestamp   bid   ask   ...
             AMR        ts0         ...   ...   ...



                      Larger Than Your Page Size
Disk Seeks
             AMR        ts1         ...   ...   ...


                      Larger Than Your Page Size

             AMR        ts2         ...   ...   ...
             ...        ts2         ...   ...   ...
OK, but what about ...
PostgreSQL Cluster Command?
➔




MySQL Cluster Indexes?
➔




Oracle Index Organized Tables?
➔




SQLServer Clustered Index?
➔
OK, but what about ...
PostgreSQL Cluster Using?
➔




         Meh ...
MySQL [InnoDB] Cluster Indexes?
➔




Oracle Index Organized Table?
➔




SQLServer Clustered Index?
➔

    (seriously, who uses SQLServer?!)
The on-disk management of that
        clustering results in tons of IO …

In the case of PostgreSQL:

clustering is a one time operation
➔

    (implies you must periodically rewrite the entire table)

new data is *not* written in clustered order
➔

    (which is often the data you care most about)
OK, so just partition the tables ...
Not a bad idea, except in MySQL there is a limit of
 1024 partitions and generally less if using NDB

 (you should probably still do it if using MySQL though)

  http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html
OK fine, I agree storing data that is queried
       together on disk together is a good thing but
          what's that have to do with modeling?


        Seek To Here


 RK    ts0   ts1   ...   ...   tsM ...   ...   ...   ...   tsN ...   ...   ...   ...   ...



                                  Read Precisely My Data *



* more on some caveats later
Well, that's what is meant by “work backwards
from your queries” or “think in terms of queries”

(NB: this concept, in general, applies to RDBMS
 at scale as well; it is not specific to Cassandra)
An Example From Fraud Detection
  To calculate risk it is common to need to know all the
 emails, destinations, origins, devices, locations, phone
numbers, et cetera ever used for the account in question
In a normalized model that usually translates to a
          table for each type of entity being tracked

                id          name         ...           id          device         ...
                1           guy          ...           1000        0xdead         ...
                2           gal          ...           2000        0xb33f         ...
                ...         ...          ...           ...         ...            ...


id       dest         ...          id          email         ...            id          origin    ...
15       USA          ...          100         guy@          ...            150         USA       ...
25       Finland      ...          200         gal@          ...            250         Nigeria   ...
...      ...          ...          ...         ...           ...            ...         ...       ...
The problem is that at scale that also means
        a disk seek for each one …
    (even for perfect IOT et al if across multiple tables)




➔Previous emails? That's a seek …
➔Previous devices? That's a seek …

➔Previous destinations? That's a seek ...
But In Cassandra I Store The Data I Query
           Together On Disk Together
               (remember, column names need not be static)


  Data I Care About

acctY    ...          ...          ...       ...        ...      ...         ...
acctX    dest21       dev2         dev7        email3   email9   orig4       ...
acctZ    ...          ...          ...       ...        ...      ...         ...



                            email:cassandra@mailinator.com = dateEmailWasLastUsed




                            Column Name                                  Column Value
Don't treat Cassandra (or any DB) as a black box
  ➔Understand how your DBs (and data structures) work

  ➔Understand the building blocks they provide

  ➔Understand the work complexity (“big O”) of queries

  ➔For data sets > memory, goal is to minimize seeks *




* on a related note, SSDs are awesome
Q?
(then brief intermission)
Availability Has Many Levels
➔   Component Failure (disk)

➔   Machine Failure (NIC, cpu, power supply)

➔   Site Failure (UPS, power grid, tornado)

➔   Political Failure (war, coup)
The Common Theme In The Solutions?

            Replication
Replication In Cassandra Follows The
           Dynamo Model *
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html




      Read It!
Every Node Has A Token
       0 - 2^127
                t0




t3   t0 < t1 < t2 < t3 < 2^127   t1




                t2
Row Key Determines Node(s)
MD5(RK) => T
                               t0


                    t3 < T < 2^127


               t3                    t1




                               t2
Row Key Determines Node
MD5(RK) => T                         First Replica
                               t0


                    t3 < T < 2^127


               t3                       t1




                               t2
Walk The Ring To Find Subsequent Replicas *
   MD5(RK) => T                         First Replica
                                  t0


                       t3 < T < 2^127


                  t3                       t1




                                                 Second Replica


                                  t2
* by default
Writes Happen In Parallel To All Replicas
                                          First Replica
    client                       t0

                 RK= ...
                                RK= ...



                           t3                t1
                                RK= ...




                                                   Second Replica


Coordinator                      t2
(not a master)
Some Or All Replicas Respond
                                                 First Replica
    client                                 t0

             RK= ...
                                   “ok”


                               X
                        t3                          t1
                                          “ok”




                                                          Second Replica


Coordinator Waits For Ack(s)               t2
 From Destination Node(s)
The Coordinator Responds To Client
                                                 First Replica
    client                                 t0

             “ok”
                                   “ok”


                               X
                        t3                          t1
                                          “ok”




                                                          Second Replica


Coordinator Waits For Ack(s)               t2
 From Destination Node(s)
What Nodes Can Be A Coordinator?

The coordinator for any given read or
write is really just whatever node the
 client connected to for that request

any node for any request at any time
How Many Replicas Does The
       Coordinator Wait For?


configurable, per query
➔




ONE / QUORUM are the most common
➔

(more on this in a moment)
Writing At CL.One


                                First Replica
client                    t0




             t3                  t1


                      X
                                       Second Replica
                          t2          Third
                                      Replica

   Wait For At Least One Node
         (eventually all nodes get updates)
Writing At CL.One


                                 First Replica
client                    t0

         “ok”

                          “ok”
                t3                t1


                      X
                                        Second Replica
                          t2           Third
                                       Replica

   Wait For At Least One Node
         (eventually all nodes get updates)
Reading At CL.One


                              First Replica
client                  t0




           t3                  t1


                    X
                                     Second Replica
                        t2          Third
                                    Replica

   Wait For At Least One Node
         (so you might read stale data)
Reading At CL.One


                                   First Replica
client                        t0

         “old”
                      “old”

                 t3                 t1


                          X
                                          Second Replica
                              t2         Third
                                         Replica

   Wait For At Least One Node
          (so you might read stale data)
Writing At CL.Quorum


                                First Replica
client                    t0




             t3                  t1


                      X
                                       Second Replica
                          t2          Third
                                      Replica

    Wait For Majority Of Nodes
         (eventually all nodes get updates)
Writing At CL.Quorum


                                       First Replica
client                          t0

         “ok”        “ok”

                                “ok”
                t3                      t1


                            X
                                              Second Replica
                                t2           Third
                                             Replica

    Wait For Majority Of Nodes
         (eventually all nodes get updates)
Reading At CL.Quorum


                                First Replica
client                   t0


                     X

              t3                 t1



                                       Second Replica
                         t2           Third
                                      Replica

    Wait For Majority Of Nodes
         (majority => overlap => consistent)
Reading At CL.Quorum


                                                  First Replica
              client                       t0

                       “ok”
                                      X
                                           “ok”
                              t3                   t1


                                   “old”
coordinator chooses client
 response based on client
                                                         Second Replica
  supplied per column TS                   t2           Third
                                                        Replica

                  Wait For Majority Of Nodes
                       (majority => overlap => consistent)
Reading At CL.Quorum


                                               First Replica
               client                t0


                               X
Already Has
 Response                 t3                    t1

                                   “current”

                                                      Second Replica
                                     t2              Third
                                                     Replica

              Read Repair Updates Stale Nodes
On A Side Note, A Lost Response

                    t0


             “ok”


         X
    t3
Is The Same As A Lost Request

                                             t0



                                         X
                              RK = ...


                       t3




* In Regards To Meeting Consistency
Which Is The Same As A Failed/Slow Node


                                         X
                                         t0




                              RK = ...


                       t3




* In Regards To Meeting Consistency
In fact, it is actually impossible for the originator
       to reliably distinguish between the 3
One More Important Piece:

                writes are idempotent *



* except with the counter API, but if you want that it can be done
Why is that important?
    It means we can replay/retry writes, even late
     and/or out of order, and get the same results

After/during node failures
➔




After/during network partitions
➔




After/during upgrades
➔
In other words you can concurrently issue
  conflicting updates to two different nodes while
those nodes have no communication between them
Which is important because ...
Availability Has Many Levels
➔   Component Failure (disk)

➔   Machine Failure (NIC, cpu, power supply)

➔   Site Failure (UPS, power grid, tornado)

➔   Political Failure (war, coup)
If you care about global availability you must
serve reads and writes from multiple data centers

           There is no way around this
Q?
Conceptual Modeling Differences From A RDBMS
   Matthew F. Dennis, DataStax // @mdennis
A Brief Rant On Query Planners, Garbage
  Collectors, Virtual Memory, Automatic
   Transmissions and Data Structures

More Related Content

What's hot

Senten500.c
Senten500.cSenten500.c
Senten500.c
albertinous
 
Introduction to Rust
Introduction to RustIntroduction to Rust
Introduction to Rust
Jean Carlo Machado
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
nikomatsakis
 
8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages
The World of Smalltalk
 
11 bytecode
11 bytecode11 bytecode
Better Web Clients with Mantle and AFNetworking
Better Web Clients with Mantle and AFNetworkingBetter Web Clients with Mantle and AFNetworking
Better Web Clients with Mantle and AFNetworking
Guillermo Gonzalez
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
Duyhai Doan
 
Windows 10 Nt Heap Exploitation (Chinese version)
Windows 10 Nt Heap Exploitation (Chinese version)Windows 10 Nt Heap Exploitation (Chinese version)
Windows 10 Nt Heap Exploitation (Chinese version)
Angel Boy
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performance
aaronmorton
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
Andrey Lomakin
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Sioux
nikomatsakis
 
Dynamic C++ ACCU 2013
Dynamic C++ ACCU 2013Dynamic C++ ACCU 2013
Dynamic C++ ACCU 2013
aleks-f
 
Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! 
aleks-f
 
4 - OOP - Taste of Smalltalk (Tamagoshi)
4 - OOP - Taste of Smalltalk (Tamagoshi)4 - OOP - Taste of Smalltalk (Tamagoshi)
4 - OOP - Taste of Smalltalk (Tamagoshi)
The World of Smalltalk
 
12 virtualmachine
12 virtualmachine12 virtualmachine
12 virtualmachine
The World of Smalltalk
 
07 bestpractice
07 bestpractice07 bestpractice
07 bestpractice
The World of Smalltalk
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
Duyhai Doan
 
Stoop ed-class forreuse
Stoop ed-class forreuseStoop ed-class forreuse
Stoop ed-class forreuse
The World of Smalltalk
 
Introduction aux Macros
Introduction aux MacrosIntroduction aux Macros
Introduction aux Macros
univalence
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
JAX London
 

What's hot (20)

Senten500.c
Senten500.cSenten500.c
Senten500.c
 
Introduction to Rust
Introduction to RustIntroduction to Rust
Introduction to Rust
 
Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)Rust: Reach Further (from QCon Sao Paolo 2018)
Rust: Reach Further (from QCon Sao Paolo 2018)
 
8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages
 
11 bytecode
11 bytecode11 bytecode
11 bytecode
 
Better Web Clients with Mantle and AFNetworking
Better Web Clients with Mantle and AFNetworkingBetter Web Clients with Mantle and AFNetworking
Better Web Clients with Mantle and AFNetworking
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Windows 10 Nt Heap Exploitation (Chinese version)
Windows 10 Nt Heap Exploitation (Chinese version)Windows 10 Nt Heap Exploitation (Chinese version)
Windows 10 Nt Heap Exploitation (Chinese version)
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performance
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
 
Rust "Hot or Not" at Sioux
Rust "Hot or Not" at SiouxRust "Hot or Not" at Sioux
Rust "Hot or Not" at Sioux
 
Dynamic C++ ACCU 2013
Dynamic C++ ACCU 2013Dynamic C++ ACCU 2013
Dynamic C++ ACCU 2013
 
Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! Look Ma, “update DB to HTML5 using C++”, no hands! 
Look Ma, “update DB to HTML5 using C++”, no hands! 
 
4 - OOP - Taste of Smalltalk (Tamagoshi)
4 - OOP - Taste of Smalltalk (Tamagoshi)4 - OOP - Taste of Smalltalk (Tamagoshi)
4 - OOP - Taste of Smalltalk (Tamagoshi)
 
12 virtualmachine
12 virtualmachine12 virtualmachine
12 virtualmachine
 
07 bestpractice
07 bestpractice07 bestpractice
07 bestpractice
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Stoop ed-class forreuse
Stoop ed-class forreuseStoop ed-class forreuse
Stoop ed-class forreuse
 
Introduction aux Macros
Introduction aux MacrosIntroduction aux Macros
Introduction aux Macros
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
 

Viewers also liked

BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
Matthew Dennis
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
Matthew Dennis
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
Matthew Dennis
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 
Cassandra On EC2
Cassandra On EC2Cassandra On EC2
Cassandra On EC2
Matthew Dennis
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
Dave Gardner
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
Dave Gardner
 
Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
Dave Gardner
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
Dave Gardner
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
ebenhewitt
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
Dave Gardner
 

Viewers also liked (16)

BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Cassandra On EC2
Cassandra On EC2Cassandra On EC2
Cassandra On EC2
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 

Similar to Cassandra, Modeling and Availability at AMUG

Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Fact
mediumdata
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
Mike Acton
 
Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
Dmitry Buzdin
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
DataStax
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
aaronmorton
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
Altinity Ltd
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
Jacek Lewandowski
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Cody Ray
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
aaronmorton
 
Cassandra Client Tutorial
Cassandra Client TutorialCassandra Client Tutorial
Cassandra Client Tutorial
Joe McTee
 
Intro to riak
Intro to riakIntro to riak
Intro to riak
Jaseem Abid
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
DataStax Academy
 
Tokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java DeveloperTokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java Developer
Connor McDonald
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Heroku
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
Wim Godden
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 

Similar to Cassandra, Modeling and Availability at AMUG (20)

Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Fact
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
 
Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra Client Tutorial
Cassandra Client TutorialCassandra Client Tutorial
Cassandra Client Tutorial
 
Intro to riak
Intro to riakIntro to riak
Intro to riak
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Tokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java DeveloperTokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java Developer
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Cassandra, Modeling and Availability at AMUG

  • 1. Conceptual Modeling Differences From A RDBMS Matthew F. Dennis, DataStax // @mdennis Austin MySQL User Group January 11, 2012
  • 2. Cassandra Is Not Relational get out of the relational mindset when working with Cassandra (or really any NoSQL DB)
  • 3. Work Backwards From Queries Think in terms of queries, not in terms of normalizing the data; in fact, you often want to denormalize (already common in the data warehousing world, even in RDBMS)
  • 4. OK great, but how do I do that? Well, you need to know how Cassandra Models Data (e.g. Google Big Table) research.google.com/archive/bigtable-osdi06.pdf Go Read It!
  • 5. In Cassandra: data is organized into Keyspaces (usually one per app) ➔ each Keyspace can have multiple Column Families ➔ each Column Family can have many Rows ➔ each Row has a Row Key and a variable number of Columns ➔ each Column consists of a Name, Value and Timestamp ➔
  • 6. In Cassandra, Keyspaces: are similar in concept to a “database” in some RDBMs ➔ are stored in separate directories on disk ➔ are usually one-one with applications ➔ are usually the administrative unit for things related to ops ➔ contain multiple column families ➔
  • 7. In Cassandra, In Keyspaces, Column Famlies: ➔ are similar in concept to a “table” in most RDBMs ➔ are stored in separate files on disk (many per CF) ➔ are usually approximately one-one with query type ➔ are usually the administrative unit for things related to your data ➔ can contain many (~billion* per node) rows * for a good sized node (you can always add nodes)
  • 8. In Cassandra, In Keyspaces, In Column Families ...
  • 9. Rows thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs Row Keys
  • 10. thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs Columns
  • 11. Column Names thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs
  • 12. Column Values thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs
  • 13. thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs Rows Are Randomly Ordered (if using the RandomPartitioner)
  • 14. thepaul office: Austin OS: OSX twitter: thepaul0 mdennis office: UA OS: Linux twitter: mdennis thobbs office: Austin twitter: tylhobbs Columns Are Ordered by Name (by a configurable comparator)
  • 15. Columns are ordered because doing so allows very efficient implementations of useful and common operations (e.g. merge joins)
  • 16. In particular, within a row I can find given columns by name very quickly (ordered names => log(n) binary search).
  • 17. More importantly, I can query for a slice between a start and end Row Key RK ts0 ts1 ... ... tsM ... ... ... ... tsN ... ... ... ... ... start end
  • 18. Why does that matter? Because columns within a row aren't static!
  • 19. The Column Name Can Be Part of Your Data INTC ts0: $25.20 ts1: $25.25 ... AMR ts0: $6.20 ts9: $0.26 ... CRDS ts0: $1.05 ts5: $6.82 ... Columns Are Ordered by Name (in this case by a TimeUUID Comparator)
  • 20. Turns Out That Pattern Comes Up A Lot ➔ stock ticks ➔ event logs ➔ ad clicks/views ➔ sensor records ➔ access/error logs ➔ plane/truck/person/”entity” locations ➔…
  • 21. OK, but I can do that in SQL Not efficiently at scale, at least not easily ...
  • 22. How it Looks In a RDBMS ticker timestamp bid ask ... AMR ts0 ... ... ... ... ... ... ... ... CRDS ts0 ... ... ... ... ... ... ... ... Data I Care About ... ts0 ... ... ... AMR ts1 ... ... ... ... ... ... ... ... ... ... ... ... ... … ts1 ... ... ... AMR ts2 ... ... ... ... ts2 ... ... ...
  • 23. How it Looks In a RDBMS ticker timestamp bid ask ... AMR ts0 ... ... ... Larger Than Your Page Size Disk Seeks AMR ts1 ... ... ... Larger Than Your Page Size AMR ts2 ... ... ... ... ts2 ... ... ...
  • 24. OK, but what about ... PostgreSQL Cluster Command? ➔ MySQL Cluster Indexes? ➔ Oracle Index Organized Tables? ➔ SQLServer Clustered Index? ➔
  • 25. OK, but what about ... PostgreSQL Cluster Using? ➔ Meh ... MySQL [InnoDB] Cluster Indexes? ➔ Oracle Index Organized Table? ➔ SQLServer Clustered Index? ➔ (seriously, who uses SQLServer?!)
  • 26. The on-disk management of that clustering results in tons of IO … In the case of PostgreSQL: clustering is a one time operation ➔ (implies you must periodically rewrite the entire table) new data is *not* written in clustered order ➔ (which is often the data you care most about)
  • 27. OK, so just partition the tables ...
  • 28. Not a bad idea, except in MySQL there is a limit of 1024 partitions and generally less if using NDB (you should probably still do it if using MySQL though) http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html
  • 29. OK fine, I agree storing data that is queried together on disk together is a good thing but what's that have to do with modeling? Seek To Here RK ts0 ts1 ... ... tsM ... ... ... ... tsN ... ... ... ... ... Read Precisely My Data * * more on some caveats later
  • 30. Well, that's what is meant by “work backwards from your queries” or “think in terms of queries” (NB: this concept, in general, applies to RDBMS at scale as well; it is not specific to Cassandra)
  • 31. An Example From Fraud Detection To calculate risk it is common to need to know all the emails, destinations, origins, devices, locations, phone numbers, et cetera ever used for the account in question
  • 32. In a normalized model that usually translates to a table for each type of entity being tracked id name ... id device ... 1 guy ... 1000 0xdead ... 2 gal ... 2000 0xb33f ... ... ... ... ... ... ... id dest ... id email ... id origin ... 15 USA ... 100 guy@ ... 150 USA ... 25 Finland ... 200 gal@ ... 250 Nigeria ... ... ... ... ... ... ... ... ... ...
  • 33. The problem is that at scale that also means a disk seek for each one … (even for perfect IOT et al if across multiple tables) ➔Previous emails? That's a seek … ➔Previous devices? That's a seek … ➔Previous destinations? That's a seek ...
  • 34. But In Cassandra I Store The Data I Query Together On Disk Together (remember, column names need not be static) Data I Care About acctY ... ... ... ... ... ... ... acctX dest21 dev2 dev7 email3 email9 orig4 ... acctZ ... ... ... ... ... ... ... email:cassandra@mailinator.com = dateEmailWasLastUsed Column Name Column Value
  • 35. Don't treat Cassandra (or any DB) as a black box ➔Understand how your DBs (and data structures) work ➔Understand the building blocks they provide ➔Understand the work complexity (“big O”) of queries ➔For data sets > memory, goal is to minimize seeks * * on a related note, SSDs are awesome
  • 37. Availability Has Many Levels ➔ Component Failure (disk) ➔ Machine Failure (NIC, cpu, power supply) ➔ Site Failure (UPS, power grid, tornado) ➔ Political Failure (war, coup)
  • 38. The Common Theme In The Solutions? Replication
  • 39. Replication In Cassandra Follows The Dynamo Model * http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Read It!
  • 40. Every Node Has A Token 0 - 2^127 t0 t3 t0 < t1 < t2 < t3 < 2^127 t1 t2
  • 41. Row Key Determines Node(s) MD5(RK) => T t0 t3 < T < 2^127 t3 t1 t2
  • 42. Row Key Determines Node MD5(RK) => T First Replica t0 t3 < T < 2^127 t3 t1 t2
  • 43. Walk The Ring To Find Subsequent Replicas * MD5(RK) => T First Replica t0 t3 < T < 2^127 t3 t1 Second Replica t2 * by default
  • 44. Writes Happen In Parallel To All Replicas First Replica client t0 RK= ... RK= ... t3 t1 RK= ... Second Replica Coordinator t2 (not a master)
  • 45. Some Or All Replicas Respond First Replica client t0 RK= ... “ok” X t3 t1 “ok” Second Replica Coordinator Waits For Ack(s) t2 From Destination Node(s)
  • 46. The Coordinator Responds To Client First Replica client t0 “ok” “ok” X t3 t1 “ok” Second Replica Coordinator Waits For Ack(s) t2 From Destination Node(s)
  • 47. What Nodes Can Be A Coordinator? The coordinator for any given read or write is really just whatever node the client connected to for that request any node for any request at any time
  • 48. How Many Replicas Does The Coordinator Wait For? configurable, per query ➔ ONE / QUORUM are the most common ➔ (more on this in a moment)
  • 49. Writing At CL.One First Replica client t0 t3 t1 X Second Replica t2 Third Replica Wait For At Least One Node (eventually all nodes get updates)
  • 50. Writing At CL.One First Replica client t0 “ok” “ok” t3 t1 X Second Replica t2 Third Replica Wait For At Least One Node (eventually all nodes get updates)
  • 51. Reading At CL.One First Replica client t0 t3 t1 X Second Replica t2 Third Replica Wait For At Least One Node (so you might read stale data)
  • 52. Reading At CL.One First Replica client t0 “old” “old” t3 t1 X Second Replica t2 Third Replica Wait For At Least One Node (so you might read stale data)
  • 53. Writing At CL.Quorum First Replica client t0 t3 t1 X Second Replica t2 Third Replica Wait For Majority Of Nodes (eventually all nodes get updates)
  • 54. Writing At CL.Quorum First Replica client t0 “ok” “ok” “ok” t3 t1 X Second Replica t2 Third Replica Wait For Majority Of Nodes (eventually all nodes get updates)
  • 55. Reading At CL.Quorum First Replica client t0 X t3 t1 Second Replica t2 Third Replica Wait For Majority Of Nodes (majority => overlap => consistent)
  • 56. Reading At CL.Quorum First Replica client t0 “ok” X “ok” t3 t1 “old” coordinator chooses client response based on client Second Replica supplied per column TS t2 Third Replica Wait For Majority Of Nodes (majority => overlap => consistent)
  • 57. Reading At CL.Quorum First Replica client t0 X Already Has Response t3 t1 “current” Second Replica t2 Third Replica Read Repair Updates Stale Nodes
  • 58. On A Side Note, A Lost Response t0 “ok” X t3
  • 59. Is The Same As A Lost Request t0 X RK = ... t3 * In Regards To Meeting Consistency
  • 60. Which Is The Same As A Failed/Slow Node X t0 RK = ... t3 * In Regards To Meeting Consistency
  • 61. In fact, it is actually impossible for the originator to reliably distinguish between the 3
  • 62. One More Important Piece: writes are idempotent * * except with the counter API, but if you want that it can be done
  • 63. Why is that important? It means we can replay/retry writes, even late and/or out of order, and get the same results After/during node failures ➔ After/during network partitions ➔ After/during upgrades ➔
  • 64. In other words you can concurrently issue conflicting updates to two different nodes while those nodes have no communication between them
  • 65. Which is important because ...
  • 66. Availability Has Many Levels ➔ Component Failure (disk) ➔ Machine Failure (NIC, cpu, power supply) ➔ Site Failure (UPS, power grid, tornado) ➔ Political Failure (war, coup)
  • 67. If you care about global availability you must serve reads and writes from multiple data centers There is no way around this
  • 68. Q? Conceptual Modeling Differences From A RDBMS Matthew F. Dennis, DataStax // @mdennis
  • 69. A Brief Rant On Query Planners, Garbage Collectors, Virtual Memory, Automatic Transmissions and Data Structures