SlideShare a Scribd company logo
1 of 85
Download to read offline
HOW DO I
CASSANDRA?
Rick Branson, DataStax




Memphis Python Users Group
November 21st, 2011
WHO?
•   This is my first day at DataStax, halp!
•   Coroutine for 18 months
•   American Roamer for 3 years
•   FedEx for 4 years
OCCUPY SQL
Why is it that 99% of the applications
 go into 1% of database softwares?
HOW DO I NOSQL?
1. Anvil transitions. Deal with it.
2. Find data store that doesn’t use SQL
3. ANYTHING
4. Cram all the things into it
5. Triumphantly blog this success
6. Complain a month later when it bursts
   into flames
SEEMS LEGIT
... it really is legit. There are real data sets that
 don’t make sense in the relational model, nor
               modern ACID databases.
THE GREAT HAMMER
... but, the typical developer only has a relational
       database; everything looks like a nail.
SNAKE OIL
Typically, developers pick data structures suited to
 each use case. The purveyors of relational would
  have you believe everything fits into one mold.
FIT WHAT INTO WHERE?
 Trees, Semi-Structured Data, Web Content,
     Multi-Dimensional Cubes, Graphs
HAMMERS GONNA HAMMER
 It does work – so does COBOL. Just like a motion
    picture film, the speed of modern computers
                  creates an illusion.
MAKE A WHAT OUTTA WHO?
     ASSUMPTIONS MADE ON YOUR BEHALF


• Data must be synchronized everywhere,
  instantaneously
• The structure of data doesn’t change over time
• Hardware never fails and networks are perfect
• Humans always do the right thing
• No upgrading or patches are ever needed
• Planned outages are good for the soul
THE MOVEMENT


“No one tool is appropriate for
   solving every problem.”
                     ~ Cliff McKinney
                     Gettysburg Address, cir. 1863
SRS BZNS
10gen, Basho, DataStax, Cloudant, Cloudera,
       HortonWorks, VMware, Oracle



                         Guess the number of
                         craps Russ here gives
                           about databases?
MOVING ON
OH: “The last DBA in the room went dataway”
AN DEFINITION
  Cassandra – affectionately “C*” – is an open
source, distributed store for structured data that
 scales-out on cheap, commodity hardware and
   stays up even when things get really bad.
FUH-GET ABOUT IT
                     Ruins availability

  • Strict Consistency
  • Transactions Deadlocks & Not scalable
  • Ad-hoc Queries
  • Indexes aka “Dumbass” Queries
Scumbag index covers every
column; ignored by planner
THE REWARDS
•   Simplicity of Operations
•   Transparency
•   Very High Availability
•   Painless Scale-Out
•   Solid, Predictable Performance on
    Commodity and Cloud Servers
Cassandra is about trading some
developer time for operational pain –
 or 3am phone calls as some of us
              know it.
DEM GOOD GENES
 Put simply: the scalable, efficient storage model
 from Google’s BigTable atop the incredibly fault
resistant distributed design of Amazon’s Dynamo

            resisting, not just tolerating!
VS OTHER NoSQL
•   Truly Distributed Design
•   No Special Servers, No SPOF
•   2D Data Model
•   Rocks on Spinning Disks
•   Single-Server Durability
HORRIBLE IDEAS
     THINGS NOT TO USE CASSANDRA FOR


•   Prototyping
•   Temporal Data
•   Distributed Locks
•   Message Queues
•   ... anywhere availability doesn’t matter
UNBOUNDED HUMANS
 ... OR DATA THAT THRIVES ON CASSANDRA

• Social
  Voluntary, human-generated data: text messages, comments, check-
  ins, real-time collaboration


• Activity
  Involuntary or indirectly human-generated data: news feeds,
  timelines, clickstreams, system + application logs


• Images
  Smaller files < ~10MB fit well into columns


• ... anywhere availability is paramount,
     (not necessarily big data or scale either)
TO START
• easy_install pycassa
• Pretty much ignore any info pre-0.7
• Use DataStax Community Edition
• Unfortunately, the “official” wiki sucks,
  so use DataStax docs & tutorials
• IRC, Mailing List, StackOverflow, Quora
  all monitored regularly and kick ass
WHY CASSANDRA +
          PYTHON?

• pycassa is one of the best client
• Healthy Cassandra+Python Community
• Many examples of real, solid,
  production Python applications built on-
  top of Cassandra
PYTHON + CASSANDRA
NOTABLE IMPLEMENTATIONS

• Reddit
  Caches generated comment trees in Cassandra

• UrbanAirship
  Delivers 20M push notifications per day; bursts to 100k/sec+

• GameFly behind their social network
  The database

• Digg
  Pretty much the whole site
DATA MODEL
“I gotcha column family ratttt heeeeyyyaahhh!”
AN COLUMN FAMILY

 jim     age: 36   car: camaro   gender: M

 carol   age: 37   car: subaru   gender: F

johnny   age:12    gender: M

 suzy    age:10     gender: F
AN COLUMN FAMILY

  • Don’t think in relational database terms
  • Columns are { name, value } tuples
  • It’s dictionaries all the way down...
    { “key”: { “name”: “value” } }

Column                  Column
Family     Row
Row Key



  jim     age: 36   car: camaro   gender: M

 carol    age: 37   car: subaru   gender: F

 johnny   age:12    gender: M

  suzy    age:10     gender: F
Row Key



  jim     age: 36   car: camaro   gender: M

 carol    age: 37   car: subaru   gender: F

 johnny   age:12    gender: M

  suzy    age:10     gender: F



  Rows are randomly ordered
Columns


 jim     age: 36    car: camaro   gender: M

carol    age: 37    car: subaru   gender: F

johnny   age:12     gender: M

suzy     age:10      gender: F
Column Names
 jim     age: 36   car: camaro   gender: M

carol    age: 37   car: subaru   gender: F

johnny   age: 12   gender: M

suzy     age: 10    gender: F
Column Names
 jim       age: 36   car: camaro   gender: M

carol      age: 37   car: subaru   gender: F

johnny     age: 12   gender: M

suzy       age: 10    gender: F


         Column Name dictates order
Column Values
 jim     age: 36   car: camaro   gender: M

carol    age: 37   car: subaru   gender: F

johnny   age: 12   gender: M

suzy     age: 10    gender: F
PARTITIONING
Cassandra divides the data set into ranges and
 distributes rows among multiple servers to
              achieve scale-out.
Row Key determines node placement



  jim     age: 36   car: camaro   gender: M

  carol   age: 37   car: subaru   gender: F

 johnny    age:12   gender: M

  suzy     age:10    gender: F
Row Key   MD5 Hash



  jim     5e02739678...
                             MD5 hash
 carol    a9a0198010...   operation yields a
                           128-bit number
 johnny   f4eb27cea7...       for keys
                             of any size.
  suzy    78b421309e...
Conceptually, hashed keys are placed
along a ring representing the keyspace.




              Keyspace
The ring is divided up into ranges, one
     for each node in the cluster.


           Node A     Node B




          Node D      Node C
In a 4-node cluster, each node holds
      roughly a quarter of the rows.


               End (Token)                                Start
Node A   0x00000000000000000000000000000000   0xc0000000000000000000000000000001



Node B   0x40000000000000000000000000000000   0x00000000000000000000000000000001



Node C   0x80000000000000000000000000000000   0x40000000000000000000000000000001



Node D   0xc0000000000000000000000000000000   0x80000000000000000000000000000001
Start            End
A   0xc000000000..1 0x0000000000..0

B   0x0000000000..1 0x4000000000..0

C   0x4000000000..1 0x8000000000..0

D   0x8000000000..1 0xc000000000..0




     jim          5e02739678...


    carol         a9a0198010...


    johnny        f4eb27cea7...


    suzy          78b421309e...
Start            End
A   0xc000000000..1 0x0000000000..0

B   0x0000000000..1 0x4000000000..0

C   0x4000000000..1 0x8000000000..0

D   0x8000000000..1 0xc000000000..0




     jim          5e02739678...


    carol         a9a0198010...


    johnny        f4eb27cea7...


    suzy          78b421309e...
Start            End
A   0xc000000000..1 0x0000000000..0

B   0x0000000000..1 0x4000000000..0

C   0x4000000000..1 0x8000000000..0

D   0x8000000000..1 0xc000000000..0




     jim          5e02739678...


    carol         a9a0198010...


    johnny        f4eb27cea7...


    suzy          78b421309e...
Start            End
A   0xc000000000..1 0x0000000000..0

B   0x0000000000..1 0x4000000000..0

C   0x4000000000..1 0x8000000000..0

D   0x8000000000..1 0xc000000000..0




     jim          5e02739678...


    carol         a9a0198010...


    johnny        f4eb27cea7...


    suzy          78b421309e...
Start            End
A   0xc000000000..1 0x0000000000..0

B   0x0000000000..1 0x4000000000..0

C   0x4000000000..1 0x8000000000..0

D   0x8000000000..1 0xc000000000..0




     jim          5e02739678...


    carol         a9a0198010...


    johnny        f4eb27cea7...


    suzy          78b421309e...
REPLICATION
Cassandra creates multiple copies of rows so that
 if any one node is behaving badly, all of the data
       stays available and fully operational.
REPLICATION
•   Column Family has “Replication Factor”
•   RF = total number of copies of each row
•   Don’t be fail; NEVER use 1
•   Use 3+ in the “cloud”
•   Use 2+ on redundant hardware
•   Write operations on any replica
Replicas are chosen by jumping to the
        next node(s) on the ring


                        Node A   Node B




                        Node D   Node C


carol   a9a0198010...
Replication Factor = 2



                        Node A   Node B




                        Node D   Node C


carol   a9a0198010...
Replication Factor = 3



                        Node A   Node B




                        Node D   Node C


carol   a9a0198010...
DEEPER: TIMESTAMPS

• I lied earlier
• Columns are actually tuples of:
  { name, value, timestamp }
• Distributed conflict resolution
• Highest timestamp value wins!
• Provided by client for every write
jim     age: 36   car: camaro   gender: M

carol    age: 37   car: subaru   gender: F

johnny   age:12    gender: M

suzy     age:10     gender: F
jim     age: 36   car: camaro   gender: M

carol    age: 37   car: subaru   gender: F

johnny   age:12    gender: M

suzy     age:10     gender: F
jim       age: 36          car: camaro        gender: M
         2011/01/01 12:35   2011/01/01 12:35   2011/01/01 12:35


carol      age: 37          car: subaru         gender: F
         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


johnny      age:12          gender: M
         2011/05/15 07:35   2011/05/15 07:35


suzy        age:10           gender: F
         2011/09/17 19:13   2011/09/17 19:13
Per-Column Timestamp

      jim       age: 36          car: camaro        gender: M
              2011/01/01 12:35   2011/01/01 12:35   2011/01/01 12:35


      carol     age: 37          car: subaru         gender: F
              2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


     johnny      age:12          gender: M
              2011/05/15 07:35   2011/05/15 07:35


      suzy       age:10           gender: F
              2011/09/17 19:13   2011/09/17 19:13
jim       age: 36          car: camaro        gender: M
         2011/01/01 12:35   2011/01/01 12:35   2011/01/01 12:35


carol      age: 37          car: subaru         gender: F
         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


johnny      age:12          gender: M
         2011/05/15 07:35   2011/05/15 07:35


suzy        age:10           gender: F
         2011/09/17 19:13   2011/09/17 19:13
carol¹     age: 37          car: subaru         gender: F
         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


carol²     age: 37          car: subaru         gender: F
         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


carol³     age: 37          car: subaru         gender: F
         2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39
insert(“carol”,
       { “car”: “daewoo”, 2011/11/15 15:00 })


    carol¹     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


    carol²     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


    carol³     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39
insert(“carol”,
       { “car”: “daewoo”, 2011/11/15 15:00 })


    carol¹     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


    carol²     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


    carol³     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39
insert(“carol”,
       { “car”: “daewoo”, 2011/11/15 15:00 })


    carol¹     age: 37          car: subaru             gender: F
             2011/01/01 12:38       2011/01/01 12:38   2011/01/01 12:39


    carol²     age: 37          car: subaru             gender: F
             2011/01/01 12:38       2011/01/01 12:38   2011/01/01 12:39


    carol³     age: 37          car: subaru             gender: F
             2011/01/01 12:38       2011/01/01 12:38   2011/01/01 12:39




 2011/11/15 15:00               >     2011/01/01 12:38
insert(“carol”,
       { “car”: “daewoo”, 2011/11/15 15:00 })


    carol¹     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


    carol²     age: 37          car: daewoo         gender: F
             2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


    carol³     age: 37          car: subaru         gender: F
             2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39
insert(“carol”,
       { “car”: “daewoo”, 2011/11/15 15:00 })


    carol¹     age: 37          car: daewoo         gender: F
             2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


    carol²     age: 37          car: daewoo         gender: F
             2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


    carol³     age: 37          car: daewoo         gender: F
             2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39




  Cassandra replicates data automagically
DEEPER: CONSISTENCY
• I lied again left something out
• All ops also have a consistency level
• Controls number of replicas involved in
  an operation (read or write)
• Eventual Consistency > Full Consistency
• Use lowest you can get away with
• Use quorum in place of “all” consistency
WRITE CONSISTENCY LEVEL:
                      ONE


carol¹       age: 37          car: subaru         gender: F
           2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


carol²       age: 37          car: daewoo         gender: F
           2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


carol³       age: 37          car: subaru         gender: F
           2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39



                      Writes to any replica and
                       acknowledges to client,
                      other nodes eventually get
                             the update
WRITE CONSISTENCY LEVEL:
               QUORUM


carol¹       age: 37          car: daewoo         gender: F
           2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


carol²       age: 37          car: daewoo         gender: F
           2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


carol³       age: 37          car: subaru         gender: F
           2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


                    Writes to majority replicas,
                    other nodes eventually get
                            the update
READ CONSISTENCY LEVEL:
                      ONE


carol¹       age: 37          car: subaru         gender: F
           2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


carol²       age: 37          car: daewoo         gender: F
           2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


carol³       age: 37          car: subaru         gender: F
           2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39



                     Reads from first replica
                     to respond to query, so
                      could be inconsistent
READ CONSISTENCY LEVEL:
                    QUORUM
Timestamp resolves conflict, “daewoo” wins

     carol¹       age: 37          car: subaru         gender: F
                2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


     carol²       age: 37          car: daewoo         gender: F
                2011/01/01 12:38   2011/11/15 15:00   2011/01/01 12:39


     carol³       age: 37          car: subaru         gender: F
                2011/01/01 12:38   2011/01/01 12:38   2011/01/01 12:39


                 Reads from majority replicas,
                   so if paired with quorum
                 writes, guarantees consistency
QUICKIE LIST-AS-COLUMNS
TIMELINE COLUMN FAMILY


         time-uuid:   time-uuid:   time-uuid:
  jim
         “hello!”      “echo!”     “anyone!”
         time-uuid:   time-uuid:
 carol
          “first!”    “second!”

          Time-sortable globally unique
              IDs orders columns
             by their insertion time
THINK IN QUERIES

•   Break your normalization habit
•   Roughly ~one CF per query
•   Denormalize!
•   Use in-column entity caching
QUESTIONS YET?
PYCASSA
           CREATE A CONNECTION




>>> pool = pycassa.ConnectionPool(“Keyspace”,
                           server_list=[“192.168.1.1”])

 Moar servers here will distribute
  this client’s load over multiple
       nodes in the cluster.
PYCASSA
          OPEN A COLUMN FAMILY




>>> cf = pycassa.ColumnFamily(pool, “People”)




                               Column Family
PYCASSA
                    GET A ROW
          Row Key


>>> cf.get(“carol”)
{ “age”: “37”, “car”: “subaru”, “gender”: “F” }



All columns returned as a dictionary
PYCASSA
           GET SPECIFIC COLUMNS
         Row Key           Column Names


>>> cf.get(“carol”, columns=[“age”, “gender”])
{ “age”: “37”, “gender”: “F” }
PYCASSA
           GET SLICE OF COLUMNS
         Row Key              Column Name Range


>>> cf.get(“carol”, column_start=”a”, column_end=”d”)
{ “age”: “37”, “car”: “subaru” }
PYCASSA
             “UPSERT” A COLUMN

                   Column Name


>>> cf.insert(“carol”, { “car”: “mercedes” })




            Row Key          Column Value
PYCASSA
                 DELETE A ROW




>>> cf.remove(“carol”)




            Row Key
PYCASSA
              DELETE A COLUMN

                         Column Names


>>> cf.remove(“carol”, columns=[“car”])




            Row Key
PYCASSA
            CUSTOM TIMESTAMPS
    • Default is GMT timestamp
    • Overridden for inserts & removes
    • Per-CF generator function can be
       specified.            Insert beats delete!

>>> cf.insert(“carol”, { “car”: “subaru” },
              timestamp=10000001)

>>> cf.remove(“carol”, columns=[“car”],
              timestamp=10000000)
PYCASSA
             CONSISTENCY LEVEL

    • Can be overridden on all operations
    • Global default read & write is ONE
    • Per-CF override for default
>>> cf.insert(“carol”, { “car”: “subaru” },
       write_consistency_level=ConsistencyLevel.QUORUM)

>>> cf.get(“carol”,
       read_consistency_level=ConsistencyLevel.QUORUM)
PYCASSA
                      UUIDv4



>>> import uuid
>>> cf.insert(uuid.uuid4().bytes_le, { “foo”: “bar” })
>>> cf.insert(“joe”, { uuid.uuid4(): “bar” })
PYCASSA
            TimeUUID (aka UUIDv1)




>>> import uuid
>>> timeline.insert(“joe”, { uuid.uuid1(): “hello!” })
DIDN’T COVER
•   CQL
•   Batch Operations
•   Schema Definition
•   SuperColumns (don’t use them)
•   Key Range Queries (don’t use them)
•   Secondary Indexes
•   ... and the list goes on
SHAMELESS
• DataStax is the leading provider of
  Apache Cassandra training, support,
  and consulting services
• DataStax Enterprise is Hadoop on top of
  Cassandra; really nice for operations
• Seriously smart peeps, no lie
• www.datastax.com
QUESTIONS?
THANKS
http://www.datastax.com/docs/1.0/
http://www.datastax.com/dev/tutorials/
http://www.datastax.com/products/community/
http://pycassa.github.com/
http://apache.cassandra.org/


Twitter: @rbranson

More Related Content

What's hot

Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
CassandraMeetup-0225-updated
CassandraMeetup-0225-updatedCassandraMeetup-0225-updated
CassandraMeetup-0225-updatedWei Zhu
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
RIPE64 - DNS and DNSSEC in the .se Zone
RIPE64 - DNS and DNSSEC in the .se ZoneRIPE64 - DNS and DNSSEC in the .se Zone
RIPE64 - DNS and DNSSEC in the .se Zonepawal
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 

What's hot (11)

Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
CassandraMeetup-0225-updated
CassandraMeetup-0225-updatedCassandraMeetup-0225-updated
CassandraMeetup-0225-updated
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
RIPE64 - DNS and DNSSEC in the .se Zone
RIPE64 - DNS and DNSSEC in the .se ZoneRIPE64 - DNS and DNSSEC in the .se Zone
RIPE64 - DNS and DNSSEC in the .se Zone
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 

Viewers also liked

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hoodAndriy Rymar
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Introduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraIntroduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraJim Hatcher
 
C*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache CassandraC*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache CassandraDataStax
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects PerformanceCassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects PerformanceDan Foody
 
Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)twentyideas
 
C*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraC*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraDataStax
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durabilityMatthew Dennis
 

Viewers also liked (20)

Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra under the hood
Cassandra under the hoodCassandra under the hood
Cassandra under the hood
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Introduction to Data Modeling in Cassandra
Introduction to Data Modeling in CassandraIntroduction to Data Modeling in Cassandra
Introduction to Data Modeling in Cassandra
 
C*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache CassandraC*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache Cassandra
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects PerformanceCassandra Meetup Boston - How Table "Shape" Affects Performance
Cassandra Meetup Boston - How Table "Shape" Affects Performance
 
Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)Cassandra Data Modelling with CQL (OSCON 2015)
Cassandra Data Modelling with CQL (OSCON 2015)
 
C*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraC*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache Cassandra
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durability
 

Similar to How Do I Cassandra?

Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Getting bad ideas out of the way with keynote
Getting bad ideas out of the way with keynoteGetting bad ideas out of the way with keynote
Getting bad ideas out of the way with keynoteTravis Isaacs
 
What Is Good DataViz Design?
What Is Good DataViz Design?What Is Good DataViz Design?
What Is Good DataViz Design?Randy Krum
 
Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014DataStax Academy
 
Bournemouth 10/13
Bournemouth 10/13Bournemouth 10/13
Bournemouth 10/13moongolfer
 
Defcon 23 - program
Defcon 23 - programDefcon 23 - program
Defcon 23 - programFelipe Prado
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
BSidesDC 2015 CryptKids Crypto Challenge
BSidesDC 2015 CryptKids Crypto ChallengeBSidesDC 2015 CryptKids Crypto Challenge
BSidesDC 2015 CryptKids Crypto ChallengeAndrew Shumate
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
History of database monitoring
History of database monitoringHistory of database monitoring
History of database monitoringKyle Hailey
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingSteven Tovey
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingBrian Enochson
 

Similar to How Do I Cassandra? (20)

Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Getting bad ideas out of the way with keynote
Getting bad ideas out of the way with keynoteGetting bad ideas out of the way with keynote
Getting bad ideas out of the way with keynote
 
What Is Good DataViz Design?
What Is Good DataViz Design?What Is Good DataViz Design?
What Is Good DataViz Design?
 
Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014
 
Bournemouth 10/13
Bournemouth 10/13Bournemouth 10/13
Bournemouth 10/13
 
Defcon 23 - program
Defcon 23 - programDefcon 23 - program
Defcon 23 - program
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
BSidesDC 2015 CryptKids Crypto Challenge
BSidesDC 2015 CryptKids Crypto ChallengeBSidesDC 2015 CryptKids Crypto Challenge
BSidesDC 2015 CryptKids Crypto Challenge
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
History of database monitoring
History of database monitoringHistory of database monitoring
History of database monitoring
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time Lighting
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data Modeling
 
Basic Graphics with R
Basic Graphics with RBasic Graphics with R
Basic Graphics with R
 

Recently uploaded

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

How Do I Cassandra?

  • 1. HOW DO I CASSANDRA? Rick Branson, DataStax Memphis Python Users Group November 21st, 2011
  • 2. WHO? • This is my first day at DataStax, halp! • Coroutine for 18 months • American Roamer for 3 years • FedEx for 4 years
  • 3. OCCUPY SQL Why is it that 99% of the applications go into 1% of database softwares?
  • 4. HOW DO I NOSQL? 1. Anvil transitions. Deal with it. 2. Find data store that doesn’t use SQL 3. ANYTHING 4. Cram all the things into it 5. Triumphantly blog this success 6. Complain a month later when it bursts into flames
  • 5. SEEMS LEGIT ... it really is legit. There are real data sets that don’t make sense in the relational model, nor modern ACID databases.
  • 6. THE GREAT HAMMER ... but, the typical developer only has a relational database; everything looks like a nail.
  • 7. SNAKE OIL Typically, developers pick data structures suited to each use case. The purveyors of relational would have you believe everything fits into one mold.
  • 8. FIT WHAT INTO WHERE? Trees, Semi-Structured Data, Web Content, Multi-Dimensional Cubes, Graphs
  • 9. HAMMERS GONNA HAMMER It does work – so does COBOL. Just like a motion picture film, the speed of modern computers creates an illusion.
  • 10. MAKE A WHAT OUTTA WHO? ASSUMPTIONS MADE ON YOUR BEHALF • Data must be synchronized everywhere, instantaneously • The structure of data doesn’t change over time • Hardware never fails and networks are perfect • Humans always do the right thing • No upgrading or patches are ever needed • Planned outages are good for the soul
  • 11. THE MOVEMENT “No one tool is appropriate for solving every problem.” ~ Cliff McKinney Gettysburg Address, cir. 1863
  • 12. SRS BZNS 10gen, Basho, DataStax, Cloudant, Cloudera, HortonWorks, VMware, Oracle Guess the number of craps Russ here gives about databases?
  • 13. MOVING ON OH: “The last DBA in the room went dataway”
  • 14. AN DEFINITION Cassandra – affectionately “C*” – is an open source, distributed store for structured data that scales-out on cheap, commodity hardware and stays up even when things get really bad.
  • 15. FUH-GET ABOUT IT Ruins availability • Strict Consistency • Transactions Deadlocks & Not scalable • Ad-hoc Queries • Indexes aka “Dumbass” Queries Scumbag index covers every column; ignored by planner
  • 16. THE REWARDS • Simplicity of Operations • Transparency • Very High Availability • Painless Scale-Out • Solid, Predictable Performance on Commodity and Cloud Servers
  • 17. Cassandra is about trading some developer time for operational pain – or 3am phone calls as some of us know it.
  • 18. DEM GOOD GENES Put simply: the scalable, efficient storage model from Google’s BigTable atop the incredibly fault resistant distributed design of Amazon’s Dynamo resisting, not just tolerating!
  • 19. VS OTHER NoSQL • Truly Distributed Design • No Special Servers, No SPOF • 2D Data Model • Rocks on Spinning Disks • Single-Server Durability
  • 20. HORRIBLE IDEAS THINGS NOT TO USE CASSANDRA FOR • Prototyping • Temporal Data • Distributed Locks • Message Queues • ... anywhere availability doesn’t matter
  • 21. UNBOUNDED HUMANS ... OR DATA THAT THRIVES ON CASSANDRA • Social Voluntary, human-generated data: text messages, comments, check- ins, real-time collaboration • Activity Involuntary or indirectly human-generated data: news feeds, timelines, clickstreams, system + application logs • Images Smaller files < ~10MB fit well into columns • ... anywhere availability is paramount, (not necessarily big data or scale either)
  • 22. TO START • easy_install pycassa • Pretty much ignore any info pre-0.7 • Use DataStax Community Edition • Unfortunately, the “official” wiki sucks, so use DataStax docs & tutorials • IRC, Mailing List, StackOverflow, Quora all monitored regularly and kick ass
  • 23. WHY CASSANDRA + PYTHON? • pycassa is one of the best client • Healthy Cassandra+Python Community • Many examples of real, solid, production Python applications built on- top of Cassandra
  • 24. PYTHON + CASSANDRA NOTABLE IMPLEMENTATIONS • Reddit Caches generated comment trees in Cassandra • UrbanAirship Delivers 20M push notifications per day; bursts to 100k/sec+ • GameFly behind their social network The database • Digg Pretty much the whole site
  • 25. DATA MODEL “I gotcha column family ratttt heeeeyyyaahhh!”
  • 26. AN COLUMN FAMILY jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 27. AN COLUMN FAMILY • Don’t think in relational database terms • Columns are { name, value } tuples • It’s dictionaries all the way down... { “key”: { “name”: “value” } } Column Column Family Row
  • 28. Row Key jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 29. Row Key jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F Rows are randomly ordered
  • 30. Columns jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 31. Column Names jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age: 12 gender: M suzy age: 10 gender: F
  • 32. Column Names jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age: 12 gender: M suzy age: 10 gender: F Column Name dictates order
  • 33. Column Values jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age: 12 gender: M suzy age: 10 gender: F
  • 34. PARTITIONING Cassandra divides the data set into ranges and distributes rows among multiple servers to achieve scale-out.
  • 35. Row Key determines node placement jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 36. Row Key MD5 Hash jim 5e02739678... MD5 hash carol a9a0198010... operation yields a 128-bit number johnny f4eb27cea7... for keys of any size. suzy 78b421309e...
  • 37. Conceptually, hashed keys are placed along a ring representing the keyspace. Keyspace
  • 38. The ring is divided up into ranges, one for each node in the cluster. Node A Node B Node D Node C
  • 39. In a 4-node cluster, each node holds roughly a quarter of the rows. End (Token) Start Node A 0x00000000000000000000000000000000 0xc0000000000000000000000000000001 Node B 0x40000000000000000000000000000000 0x00000000000000000000000000000001 Node C 0x80000000000000000000000000000000 0x40000000000000000000000000000001 Node D 0xc0000000000000000000000000000000 0x80000000000000000000000000000001
  • 40. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 41. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 42. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 43. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 44. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 45. REPLICATION Cassandra creates multiple copies of rows so that if any one node is behaving badly, all of the data stays available and fully operational.
  • 46. REPLICATION • Column Family has “Replication Factor” • RF = total number of copies of each row • Don’t be fail; NEVER use 1 • Use 3+ in the “cloud” • Use 2+ on redundant hardware • Write operations on any replica
  • 47. Replicas are chosen by jumping to the next node(s) on the ring Node A Node B Node D Node C carol a9a0198010...
  • 48. Replication Factor = 2 Node A Node B Node D Node C carol a9a0198010...
  • 49. Replication Factor = 3 Node A Node B Node D Node C carol a9a0198010...
  • 50. DEEPER: TIMESTAMPS • I lied earlier • Columns are actually tuples of: { name, value, timestamp } • Distributed conflict resolution • Highest timestamp value wins! • Provided by client for every write
  • 51. jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 52. jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 53. jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35 carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35 suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  • 54. Per-Column Timestamp jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35 carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35 suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  • 55. jim age: 36 car: camaro gender: M 2011/01/01 12:35 2011/01/01 12:35 2011/01/01 12:35 carol age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 johnny age:12 gender: M 2011/05/15 07:35 2011/05/15 07:35 suzy age:10 gender: F 2011/09/17 19:13 2011/09/17 19:13
  • 56. carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  • 57. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  • 58. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  • 59. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 2011/11/15 15:00 > 2011/01/01 12:38
  • 60. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39
  • 61. insert(“carol”, { “car”: “daewoo”, 2011/11/15 15:00 }) carol¹ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 Cassandra replicates data automagically
  • 62. DEEPER: CONSISTENCY • I lied again left something out • All ops also have a consistency level • Controls number of replicas involved in an operation (read or write) • Eventual Consistency > Full Consistency • Use lowest you can get away with • Use quorum in place of “all” consistency
  • 63. WRITE CONSISTENCY LEVEL: ONE carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Writes to any replica and acknowledges to client, other nodes eventually get the update
  • 64. WRITE CONSISTENCY LEVEL: QUORUM carol¹ age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Writes to majority replicas, other nodes eventually get the update
  • 65. READ CONSISTENCY LEVEL: ONE carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Reads from first replica to respond to query, so could be inconsistent
  • 66. READ CONSISTENCY LEVEL: QUORUM Timestamp resolves conflict, “daewoo” wins carol¹ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 carol² age: 37 car: daewoo gender: F 2011/01/01 12:38 2011/11/15 15:00 2011/01/01 12:39 carol³ age: 37 car: subaru gender: F 2011/01/01 12:38 2011/01/01 12:38 2011/01/01 12:39 Reads from majority replicas, so if paired with quorum writes, guarantees consistency
  • 67. QUICKIE LIST-AS-COLUMNS TIMELINE COLUMN FAMILY time-uuid: time-uuid: time-uuid: jim “hello!” “echo!” “anyone!” time-uuid: time-uuid: carol “first!” “second!” Time-sortable globally unique IDs orders columns by their insertion time
  • 68. THINK IN QUERIES • Break your normalization habit • Roughly ~one CF per query • Denormalize! • Use in-column entity caching
  • 70. PYCASSA CREATE A CONNECTION >>> pool = pycassa.ConnectionPool(“Keyspace”, server_list=[“192.168.1.1”]) Moar servers here will distribute this client’s load over multiple nodes in the cluster.
  • 71. PYCASSA OPEN A COLUMN FAMILY >>> cf = pycassa.ColumnFamily(pool, “People”) Column Family
  • 72. PYCASSA GET A ROW Row Key >>> cf.get(“carol”) { “age”: “37”, “car”: “subaru”, “gender”: “F” } All columns returned as a dictionary
  • 73. PYCASSA GET SPECIFIC COLUMNS Row Key Column Names >>> cf.get(“carol”, columns=[“age”, “gender”]) { “age”: “37”, “gender”: “F” }
  • 74. PYCASSA GET SLICE OF COLUMNS Row Key Column Name Range >>> cf.get(“carol”, column_start=”a”, column_end=”d”) { “age”: “37”, “car”: “subaru” }
  • 75. PYCASSA “UPSERT” A COLUMN Column Name >>> cf.insert(“carol”, { “car”: “mercedes” }) Row Key Column Value
  • 76. PYCASSA DELETE A ROW >>> cf.remove(“carol”) Row Key
  • 77. PYCASSA DELETE A COLUMN Column Names >>> cf.remove(“carol”, columns=[“car”]) Row Key
  • 78. PYCASSA CUSTOM TIMESTAMPS • Default is GMT timestamp • Overridden for inserts & removes • Per-CF generator function can be specified. Insert beats delete! >>> cf.insert(“carol”, { “car”: “subaru” }, timestamp=10000001) >>> cf.remove(“carol”, columns=[“car”], timestamp=10000000)
  • 79. PYCASSA CONSISTENCY LEVEL • Can be overridden on all operations • Global default read & write is ONE • Per-CF override for default >>> cf.insert(“carol”, { “car”: “subaru” }, write_consistency_level=ConsistencyLevel.QUORUM) >>> cf.get(“carol”, read_consistency_level=ConsistencyLevel.QUORUM)
  • 80. PYCASSA UUIDv4 >>> import uuid >>> cf.insert(uuid.uuid4().bytes_le, { “foo”: “bar” }) >>> cf.insert(“joe”, { uuid.uuid4(): “bar” })
  • 81. PYCASSA TimeUUID (aka UUIDv1) >>> import uuid >>> timeline.insert(“joe”, { uuid.uuid1(): “hello!” })
  • 82. DIDN’T COVER • CQL • Batch Operations • Schema Definition • SuperColumns (don’t use them) • Key Range Queries (don’t use them) • Secondary Indexes • ... and the list goes on
  • 83. SHAMELESS • DataStax is the leading provider of Apache Cassandra training, support, and consulting services • DataStax Enterprise is Hadoop on top of Cassandra; really nice for operations • Seriously smart peeps, no lie • www.datastax.com