SlideShare a Scribd company logo
Apache Cassandra in Action




 Jonathan Ellis
 jbellis@datastax.com / @spyced
Why Cassandra?
•
    Relational databases are not designed to
    scale
•
    B-trees are slow
    –
        and require read-before-write
(“The eBay Architecture,” Randy Shoup and Dan Pritchett)
Reader
                    Memtable
     Writer




    Commitlog




The Log-Structured Merge-Tree,
Bigtable: A Distributed Storage
System for Structured Data
Dynamo, 2007
Bigtable, 2006




                           OSS, 2008




         Incubator, 2009       TLP, 2010
Cassandra in production
•
    Digital Reasoning: NLP + entity analytics
•
    OpenWave: enterprise messaging
•
    OpenX: largest publisher-side ad network in the
    world
•
    Cloudkick: performance data & aggregation
•
    SimpleGEO: location-as-API
•
    Ooyala: video analytics and business intelligence
•
    ngmoco: massively multiplayer game worlds
FUD?
•
    “Cassandra is only appropriate for
    unimportant data.”
Durabilty
•
    Write to commitlog
    –
        fsync is cheap since it’s append-only
•
    Write to memtable
•
    [amortized] flush memtable to sstable
SSTable format, briefly


       <key 127>
       <key 255>             <row data 0>
       ...                   <row data 1>
                             ...
                             <row data 127>
                             ...
                             <row data 255>
                             ...


                   Sorted [clustered] by row key
Scaling
W   A




T
        L
W   A




            F


T
        L
W           A




                    F
        (A-L]



T
                L
W           A




        (A-F]       F


T
        (F-L]   L
Key “C”
              W   A




                      F


          T
                  L
Reliability
•
    No single points of failure
•
    Multiple datacenters
•
    Monitorable
Some headlines
•
    “Resyncing Broken MySQL Replication”
•
    “How To Repair MySQL Replication”
•
    “Fixing Broken MySQL Database Replication”
•
    “Replication on Linux broken after db restore”
•
    “MySQL :: Repairing broken replication”
Good architecture solves multiple
problems at once
•
    Availability in single datacenter
•
    Availability in multiple datacenters
Y
                        Key “C”
            A
    W



U
                    F



    T
                L
        P
Y
                           Key “C”
               A
    W



U
                       F




               X
    T   hint
                   L
        P
Y
            A
    W



U
                    F



    T
                L
        P
Y
                        Key “C”
            A
    W



U
                    F



    T
                L
        P
Y
                    Key “C”
            A
    W



U
                F


    T
            L
        P
Tuneable consistency
•
    ONE, QUORUM, ALL
•
    R+W>N
•
    Choose availability vs consistency (and latency)
Monitorable
JMX
OpsCenter
When do you need Cassandra?
•
    Ian Eure: “If you’re deploying memcache on top of your
    database, you’re inventing your own ad-hoc, difficult to
    maintain NoSQL data store”
Not Only SQL
•
    Curt Monash: “ACID-compliant transaction integrity
    commonly costs more in terms of DBMS licenses and many other
    components of TCO (Total Cost of Ownership) than [scalable
    NoSQL]. Worse, it can actually hurt application uptime,
    by forcing your system to pull in its horns and stop functioning in the
    face of failures that a non-transactional system might smoothly work
    around. Other flavors of “complexity can be a bad thing” apply as
    well. Thus, transaction integrity can be more trouble
    than it’s worth.” [Curt’s emphasis]
Keyspaces & ColumnFamilies
•
    Conceptually, like “schemas” and “tables”
Inside CFs, columns are dynamic
•
    Twitter: “Fifteen months ago, it took two
    weeks to perform ALTER TABLE on the
    statuses [tweets] table.”
ColumnFamilies
•
    Static
    –
        Object data
•
    Dynamic
    –
        Precalculated query results
“static” columnfamilies

                      Users
   zznate    Password: *    Name: Nate

   driftx    Password: *   Name: Brandon

   thobbs    Password: *    Name: Tyler

   jbellis   Password: *   Name: Jonathan   Site: riptano.com
“dynamic” columnfamilies

                     Following
zznate    driftx:   thobbs:

driftx

thobbs    zznate:

jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Inserting
•
    Really “insert or update”
•
    Not a key/value store – update as much of
    the row as you want
Example: twissandra
•
    http://twissandra.com
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    username VARCHAR(64),
    password VARCHAR(64)
);

CREATE TABLE following (
    user INTEGER REFERENCES user(id),
    followed INTEGER REFERENCES user(id)
);

CREATE TABLE tweets (
    id INTEGER,
    user INTEGER REFERENCES user(id),
    body VARCHAR(140),
    timestamp TIMESTAMP
);
Cassandrified
create column family users with comparator = UTF8Type
and column_metadata = [{column_name: password,
validation_class: UTF8Type}]

create column family tweets with comparator = UTF8Type
and column_metadata = [{column_name: body, validation_class:
UTF8Type}, {column_name: username, validation_class:
UTF8Type}]

create column family friends with comparator = UTF8Type
create column family followers with comparator = UTF8Type

create column family userline with comparator = LongType and
default_validation_class = UUIDType
create column family timeline with comparator = LongType and
default_validation_class = UUIDType
Connecting
CLIENT = pycassa.connect_thread_local('Twissandra')

USER = pycassa.ColumnFamily(CLIENT, 'User')
User
RowKey: ericflo
=> (column=password, value=****,
timestamp=1289446382541473)

-------------------
RowKey: jbellis
=> (column=password, value=****,
timestamp=1289446438490709)


uname = 'jericevans'
password = '**********'

columns = {'password': password}

USER.insert(uname, columns)
Natural keys vs surrogate
Friends and Followers
RowKey: ericflo

=> (column=jbellis, value=1289446467611029,
timestamp=1289446467611064)

=> (column=b6n, value=1289446467611031,
timestamp=1289446467611080)

to_uname = 'ericflo'

FRIENDS.insert(uname, {to_uname: time.time()})
FOLLOWERS.insert(to_uname, {uname: time.time()})
zznate    driftx:   thobbs:

driftx

thobbs    zznate:

jbellis   driftx:   mdenni    pcmanu   thobbs:   xedin:   zznat
                      s:        s:                          e:
Tweets
RowKey: 92dbeb50-ed45-11df-a6d0-000c29864c4f

=> (column=body, value=Four score and seven years ago,
timestamp=1289446891681799)

=> (column=username, value=alincoln,
timestamp=1289446891681799)

-------------------
RowKey: d418a66e-edc5-11df-ae6c-000c29864c4f

=> (column=body, value=Do geese see God?,
timestamp=1289501976713199)

=> (column=username, value=pdrome,
timestamp=1289501976713199)
Userline
RowKey: ericflo

=> (column=1289446393708810, value=6a0b4834-ed44-11df-
bc31-000c29864c4f, timestamp=1289446393710212)

=> (column=1289446397693831, value=6c6b5916-ed44-11df-
bc31-000c29864c4f, timestamp=1289446397694646)

=> (column=1289446891681780, value=92dbeb50-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446891685065)

=> (column=1289446897315887, value=96379f92-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446897317676)
Userline


zznate    1289847840615: 3f19757a-c89d...   1289847887086: a20fcf52-595c...


driftx

thobbs    1289847887086: a20fcf52-595c...


jbellis   1289847840615: 3f19757a-c89d...   128984784425: 844e75e2-b546...
Timeline
RowKey: ericflo

=> (column=1289446393708810, value=6a0b4834-ed44-11df-
bc31-000c29864c4f, timestamp=1289446393710212)

=> (column=1289446397693831, value=6c6b5916-ed44-11df-
bc31-000c29864c4f, timestamp=1289446397694646)

=> (column=1289446891681780, value=92dbeb50-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446891685065)

=> (column=1289446897315887, value=96379f92-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446897317676)
Adding a tweet
tweet_id = str(uuid())
body = '@ericflo thanks for Twissandra, it helps!'
timestamp = long(time.time() * 1e6)

columns = {'uname': useruuid, 'body': body}
TWEET.insert(tweet_id, columns)

columns = {ts: tweet_id}
USERLINE.insert(uname, columns)

TIMELINE.insert(uname, columns)
for follower_uname in FOLLOWERS.get(uname, 5000):
    TIMELINE.insert(follower_uname, columns)
Reads
timeline = USERLINE.get(uname, column_reversed=True)
tweets = TWEET.multiget(timeline.values())


start = request.GET.get('start')
limit = NUM_PER_PAGE

timeline = TIMELINE.get(uname, column_start=start,
column_count=limit, column_reversed=True)
tweets = TWEET.multiget(timeline.values())
Programatically
•
    Don't use thrift directly
•
    Higher level clients have a lot of features you
    want
    –
        Knowledge about data types
    –
        Connection pooling
    –
        Automatic retries
    –
        Logging
Raw thrift API: Connecting
def get_client(host='127.0.0.1', port=9170):
    socket = TSocket.TSocket(host, port)
    transport = TTransport.TBufferedTransport(socket)
    transport.open()
    protocol =
TBinaryProtocol.TBinaryProtocolAccelerated(transport)
    client = Cassandra.Client(protocol)
    return client
Raw thrift API: Inserting
data = {'id': useruuid, ...}
columns = [Column(k, v, time.time())
           for (k, v) in data.items()]
mutations = [Mutation(ColumnOrSuperColumn(column=c))
             for c in columns]
rows = {useruuid: {'User': mutations}}

client.batch_mutate('Twissandra', rows,
ConsistencyLevel.ONE)
API layers
•
    libpq    •
                 Thrift
•
    JDBC     •
                 Hector
•
    JPA      •
                 Hector object-
                 mapper
Running twissandra
•
    Login: notroot/notroot
    –
        (root/riptano)


•
    cd twissandra
•
    python manage.py runserver &
•
    Navigate to http://127.0.0.1:8000
•
    Login as jim/jim, tom/tom, or create your own
One more thing
•
    !PUBLIC! userline
Exercise 1
•
    $ cassandra-cli --host localhost
•
    ] use twissandra;
    ] help;
    ] help list;
    ] help get;
    ] help del;
•
    Delete the most recent tweet
    –
        How would you find this w/o looking at the UI?
Exercise 2
•
    User jim is following user tom, but
    twissandra doesn't populate Timeline with
    tweets from before the follow action.
•
    Insert a tweet from tom before the follow
    action into jim's timeline
Secondary (column) indexes
Exercise 3
•
    Add a state column to the Tweet column
    family definition, with an index (index_type
    KEYS).
    –
        Hint: a no-op update column family on Tweet would be
        update column family Tweet with
        column_metadata=[{column_name:body,
        validation_class:UTF8Type}, {column_name:username,
        validation_class:UTF8Type}]
•
    Set the state column on several tweets to TX.
    Select them using get … where.
Language support
•
    Python
    –
        pycassa
    –
        telephus
•
    Ruby
    –
        Speed is a negative
•
    Java
    –
        Hector
•
    PHP
    –
        phpcassa
Done yet?
•
    Still doing 1+N queries per page
•
    Solution: Supercolumns
Applying SuperColumns to Twissandra

jbellis   1289847840615
            1289847844275      1289847844275     1289847887086
                                                 1289847844275
                 Id:
                  Id:               Id:
                                     Id:                Id:
                                                      Id:
          3f19757a-c89d...
              3f19757a-       844e75e2-b546...
                                 3f19757a-       a20fcf52-595c...
                                                  3f19757a-
               c89d...             c89d...          c89d...
              uname:
               uname:             uname:
                                   uname:          uname:
                                                    uname:
              zznate
               zznate              driftx
                                   zznate           zznate
                                                     zznate

                body:
                 body:            body:
                                    body:             body:
                                                       body:
          O Do geese see
            stone be not so   Rise geese see
                               Do to vote sir    Do Igeese see
                                                      prefer pi
                  ...                ...                ...
Supercolumns: limitations
•
    Requires reading an entire SC (not the entire
    row) from disk even if you just want one
    subcolumn
UUIDs
•
    Column names should be uuids, not longs,
    to avoid collisions
•
    Version 1 UUIDs can be sorted by time
    (“TimeUUID”)
•
    Any UUID can be sorted by its raw bytes
    (“LexicalUUID”)
    –
        Usually Version 4
    –
        Slightly less overhead
Lucandra
•
    What documents contain term X?
    –
        … and term Y?
    –
        … or start with Z?
Fields and Terms

<doc>
  <field name=”title”>apache talk</field>
  <field name=”date”>20110201</field>
</doc>


   feld      term       freq     position
   title    apache       1          0
   title      talk       1          1
   date    20110201      1          0
Lucandra ColumnFamilies
create column family documents with comparator = BytesType;

Create column family terminfo with column_type = Super and
comparator = BytesType and subcomparator = BytesType;
Lucandra data
Document Key      col name         value
"documentId" => { fieldName , value }

Term Key          col name         value
"field/term" => { documentId , position vector }
Lucandra queries
•
    get_slice
•
    get_range_slices
•
    No silver bullet
FAQ: counting
•
    UUIDs + batch process
•
    column-per-app-server
•
    counter API (after 1.0 is out)
Locking
•
    Zookeeper
•
    Cages: http://code.google.com/p/cages/
•
    Not suitable for multi-DC
UUIDs

counter1   672e34a2-ba33...   b681a0b1-58f2...


counter2   3f19757a-c89d...   844e75e2-b546...   a20fcf52-595c...




counter1    aggregated: 27


counter2    aggregated: 42
Column per appserver

counter1   672e34a2-ba33: 12    b681a0b1-58f2: 4   1872c1c2-38f1: 9


counter2   3f19757a-c89d: 7    844e75e2-b546: 11
Counter API

 key   counter1: (14, 13, 9)   counter2: (11, 15, 17)
General Tips
●
    Start with queries, work backwards
●
    Avoid storing extra “timestamp” columns
●
    Insert instead of check-then-insert
●
    Use client-side clock to your advantage
●
    use TTL
●
    Learn to love wide rows
Cassandra Tutorial

More Related Content

What's hot

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
SoftwareMill
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
ElifTech
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
Phil Peace
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
nickmbailey
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
Eric Evans
 
Managing Objects and Data in Apache Cassandra
Managing Objects and Data in Apache CassandraManaging Objects and Data in Apache Cassandra
Managing Objects and Data in Apache Cassandra
DataStax
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
Edureka!
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internals
Bhuvan Rawal
 

What's hot (20)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
Managing Objects and Data in Apache Cassandra
Managing Objects and Data in Apache CassandraManaging Objects and Data in Apache Cassandra
Managing Objects and Data in Apache Cassandra
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internals
 

Similar to Cassandra Tutorial

Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Alternator webinar september 2019
Alternator webinar   september 2019Alternator webinar   september 2019
Alternator webinar september 2019
Nadav Har'El
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
ScyllaDB
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
Qureshi Tehmina
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
Christopher Batey
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
confluent
 
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
Cassandra Day Chicago 2015: Building Java Applications with Apache CassandraCassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
DataStax Academy
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
Patricia Gorla
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
siculars
 
Data stores: beyond relational databases
Data stores: beyond relational databasesData stores: beyond relational databases
Data stores: beyond relational databases
Javier García Magna
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Hanborq Inc.
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 

Similar to Cassandra Tutorial (20)

Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Alternator webinar september 2019
Alternator webinar   september 2019Alternator webinar   september 2019
Alternator webinar september 2019
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Linq
LinqLinq
Linq
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
 
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
Cassandra Day Chicago 2015: Building Java Applications with Apache CassandraCassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Data stores: beyond relational databases
Data stores: beyond relational databasesData stores: beyond relational databases
Data stores: beyond relational databases
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Cassandra Tutorial

  • 1. Apache Cassandra in Action Jonathan Ellis jbellis@datastax.com / @spyced
  • 2. Why Cassandra? • Relational databases are not designed to scale • B-trees are slow – and require read-before-write
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. (“The eBay Architecture,” Randy Shoup and Dan Pritchett)
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Reader Memtable Writer Commitlog The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data
  • 15. Dynamo, 2007 Bigtable, 2006 OSS, 2008 Incubator, 2009 TLP, 2010
  • 16. Cassandra in production • Digital Reasoning: NLP + entity analytics • OpenWave: enterprise messaging • OpenX: largest publisher-side ad network in the world • Cloudkick: performance data & aggregation • SimpleGEO: location-as-API • Ooyala: video analytics and business intelligence • ngmoco: massively multiplayer game worlds
  • 17. FUD? • “Cassandra is only appropriate for unimportant data.”
  • 18. Durabilty • Write to commitlog – fsync is cheap since it’s append-only • Write to memtable • [amortized] flush memtable to sstable
  • 19. SSTable format, briefly <key 127> <key 255> <row data 0> ... <row data 1> ... <row data 127> ... <row data 255> ... Sorted [clustered] by row key
  • 21. W A T L
  • 22. W A F T L
  • 23. W A F (A-L] T L
  • 24. W A (A-F] F T (F-L] L
  • 25. Key “C” W A F T L
  • 26. Reliability • No single points of failure • Multiple datacenters • Monitorable
  • 27. Some headlines • “Resyncing Broken MySQL Replication” • “How To Repair MySQL Replication” • “Fixing Broken MySQL Database Replication” • “Replication on Linux broken after db restore” • “MySQL :: Repairing broken replication”
  • 28.
  • 29.
  • 30. Good architecture solves multiple problems at once • Availability in single datacenter • Availability in multiple datacenters
  • 31. Y Key “C” A W U F T L P
  • 32. Y Key “C” A W U F X T hint L P
  • 33. Y A W U F T L P
  • 34.
  • 35. Y Key “C” A W U F T L P
  • 36. Y Key “C” A W U F T L P
  • 37. Tuneable consistency • ONE, QUORUM, ALL • R+W>N • Choose availability vs consistency (and latency)
  • 39. JMX
  • 41. When do you need Cassandra? • Ian Eure: “If you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL data store”
  • 42. Not Only SQL • Curt Monash: “ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than [scalable NoSQL]. Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around. Other flavors of “complexity can be a bad thing” apply as well. Thus, transaction integrity can be more trouble than it’s worth.” [Curt’s emphasis]
  • 43.
  • 44. Keyspaces & ColumnFamilies • Conceptually, like “schemas” and “tables”
  • 45. Inside CFs, columns are dynamic • Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”
  • 46. ColumnFamilies • Static – Object data • Dynamic – Precalculated query results
  • 47. “static” columnfamilies Users zznate Password: * Name: Nate driftx Password: * Name: Brandon thobbs Password: * Name: Tyler jbellis Password: * Name: Jonathan Site: riptano.com
  • 48. “dynamic” columnfamilies Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 49. Inserting • Really “insert or update” • Not a key/value store – update as much of the row as you want
  • 50. Example: twissandra • http://twissandra.com
  • 51. CREATE TABLE users ( id INTEGER PRIMARY KEY, username VARCHAR(64), password VARCHAR(64) ); CREATE TABLE following ( user INTEGER REFERENCES user(id), followed INTEGER REFERENCES user(id) ); CREATE TABLE tweets ( id INTEGER, user INTEGER REFERENCES user(id), body VARCHAR(140), timestamp TIMESTAMP );
  • 52. Cassandrified create column family users with comparator = UTF8Type and column_metadata = [{column_name: password, validation_class: UTF8Type}] create column family tweets with comparator = UTF8Type and column_metadata = [{column_name: body, validation_class: UTF8Type}, {column_name: username, validation_class: UTF8Type}] create column family friends with comparator = UTF8Type create column family followers with comparator = UTF8Type create column family userline with comparator = LongType and default_validation_class = UUIDType create column family timeline with comparator = LongType and default_validation_class = UUIDType
  • 54. User RowKey: ericflo => (column=password, value=****, timestamp=1289446382541473) ------------------- RowKey: jbellis => (column=password, value=****, timestamp=1289446438490709) uname = 'jericevans' password = '**********' columns = {'password': password} USER.insert(uname, columns)
  • 55. Natural keys vs surrogate
  • 56. Friends and Followers RowKey: ericflo => (column=jbellis, value=1289446467611029, timestamp=1289446467611064) => (column=b6n, value=1289446467611031, timestamp=1289446467611080) to_uname = 'ericflo' FRIENDS.insert(uname, {to_uname: time.time()}) FOLLOWERS.insert(to_uname, {uname: time.time()})
  • 57. zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdenni pcmanu thobbs: xedin: zznat s: s: e:
  • 58. Tweets RowKey: 92dbeb50-ed45-11df-a6d0-000c29864c4f => (column=body, value=Four score and seven years ago, timestamp=1289446891681799) => (column=username, value=alincoln, timestamp=1289446891681799) ------------------- RowKey: d418a66e-edc5-11df-ae6c-000c29864c4f => (column=body, value=Do geese see God?, timestamp=1289501976713199) => (column=username, value=pdrome, timestamp=1289501976713199)
  • 59. Userline RowKey: ericflo => (column=1289446393708810, value=6a0b4834-ed44-11df- bc31-000c29864c4f, timestamp=1289446393710212) => (column=1289446397693831, value=6c6b5916-ed44-11df- bc31-000c29864c4f, timestamp=1289446397694646) => (column=1289446891681780, value=92dbeb50-ed45-11df- a6d0-000c29864c4f, timestamp=1289446891685065) => (column=1289446897315887, value=96379f92-ed45-11df- a6d0-000c29864c4f, timestamp=1289446897317676)
  • 60. Userline zznate 1289847840615: 3f19757a-c89d... 1289847887086: a20fcf52-595c... driftx thobbs 1289847887086: a20fcf52-595c... jbellis 1289847840615: 3f19757a-c89d... 128984784425: 844e75e2-b546...
  • 61.
  • 62. Timeline RowKey: ericflo => (column=1289446393708810, value=6a0b4834-ed44-11df- bc31-000c29864c4f, timestamp=1289446393710212) => (column=1289446397693831, value=6c6b5916-ed44-11df- bc31-000c29864c4f, timestamp=1289446397694646) => (column=1289446891681780, value=92dbeb50-ed45-11df- a6d0-000c29864c4f, timestamp=1289446891685065) => (column=1289446897315887, value=96379f92-ed45-11df- a6d0-000c29864c4f, timestamp=1289446897317676)
  • 63. Adding a tweet tweet_id = str(uuid()) body = '@ericflo thanks for Twissandra, it helps!' timestamp = long(time.time() * 1e6) columns = {'uname': useruuid, 'body': body} TWEET.insert(tweet_id, columns) columns = {ts: tweet_id} USERLINE.insert(uname, columns) TIMELINE.insert(uname, columns) for follower_uname in FOLLOWERS.get(uname, 5000): TIMELINE.insert(follower_uname, columns)
  • 64. Reads timeline = USERLINE.get(uname, column_reversed=True) tweets = TWEET.multiget(timeline.values()) start = request.GET.get('start') limit = NUM_PER_PAGE timeline = TIMELINE.get(uname, column_start=start, column_count=limit, column_reversed=True) tweets = TWEET.multiget(timeline.values())
  • 65. Programatically • Don't use thrift directly • Higher level clients have a lot of features you want – Knowledge about data types – Connection pooling – Automatic retries – Logging
  • 66. Raw thrift API: Connecting def get_client(host='127.0.0.1', port=9170): socket = TSocket.TSocket(host, port) transport = TTransport.TBufferedTransport(socket) transport.open() protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) client = Cassandra.Client(protocol) return client
  • 67. Raw thrift API: Inserting data = {'id': useruuid, ...} columns = [Column(k, v, time.time()) for (k, v) in data.items()] mutations = [Mutation(ColumnOrSuperColumn(column=c)) for c in columns] rows = {useruuid: {'User': mutations}} client.batch_mutate('Twissandra', rows, ConsistencyLevel.ONE)
  • 68. API layers • libpq • Thrift • JDBC • Hector • JPA • Hector object- mapper
  • 69. Running twissandra • Login: notroot/notroot – (root/riptano) • cd twissandra • python manage.py runserver & • Navigate to http://127.0.0.1:8000 • Login as jim/jim, tom/tom, or create your own
  • 70. One more thing • !PUBLIC! userline
  • 71. Exercise 1 • $ cassandra-cli --host localhost • ] use twissandra; ] help; ] help list; ] help get; ] help del; • Delete the most recent tweet – How would you find this w/o looking at the UI?
  • 72. Exercise 2 • User jim is following user tom, but twissandra doesn't populate Timeline with tweets from before the follow action. • Insert a tweet from tom before the follow action into jim's timeline
  • 74. Exercise 3 • Add a state column to the Tweet column family definition, with an index (index_type KEYS). – Hint: a no-op update column family on Tweet would be update column family Tweet with column_metadata=[{column_name:body, validation_class:UTF8Type}, {column_name:username, validation_class:UTF8Type}] • Set the state column on several tweets to TX. Select them using get … where.
  • 75. Language support • Python – pycassa – telephus • Ruby – Speed is a negative • Java – Hector • PHP – phpcassa
  • 76. Done yet? • Still doing 1+N queries per page • Solution: Supercolumns
  • 77. Applying SuperColumns to Twissandra jbellis 1289847840615 1289847844275 1289847844275 1289847887086 1289847844275 Id: Id: Id: Id: Id: Id: 3f19757a-c89d... 3f19757a- 844e75e2-b546... 3f19757a- a20fcf52-595c... 3f19757a- c89d... c89d... c89d... uname: uname: uname: uname: uname: uname: zznate zznate driftx zznate zznate zznate body: body: body: body: body: body: O Do geese see stone be not so Rise geese see Do to vote sir Do Igeese see prefer pi ... ... ...
  • 78. Supercolumns: limitations • Requires reading an entire SC (not the entire row) from disk even if you just want one subcolumn
  • 79. UUIDs • Column names should be uuids, not longs, to avoid collisions • Version 1 UUIDs can be sorted by time (“TimeUUID”) • Any UUID can be sorted by its raw bytes (“LexicalUUID”) – Usually Version 4 – Slightly less overhead
  • 80. Lucandra • What documents contain term X? – … and term Y? – … or start with Z?
  • 81. Fields and Terms <doc> <field name=”title”>apache talk</field> <field name=”date”>20110201</field> </doc> feld term freq position title apache 1 0 title talk 1 1 date 20110201 1 0
  • 82. Lucandra ColumnFamilies create column family documents with comparator = BytesType; Create column family terminfo with column_type = Super and comparator = BytesType and subcomparator = BytesType;
  • 83. Lucandra data Document Key col name value "documentId" => { fieldName , value } Term Key col name value "field/term" => { documentId , position vector }
  • 84. Lucandra queries • get_slice • get_range_slices • No silver bullet
  • 85. FAQ: counting • UUIDs + batch process • column-per-app-server • counter API (after 1.0 is out)
  • 86. Locking • Zookeeper • Cages: http://code.google.com/p/cages/ • Not suitable for multi-DC
  • 87. UUIDs counter1 672e34a2-ba33... b681a0b1-58f2... counter2 3f19757a-c89d... 844e75e2-b546... a20fcf52-595c... counter1 aggregated: 27 counter2 aggregated: 42
  • 88. Column per appserver counter1 672e34a2-ba33: 12 b681a0b1-58f2: 4 1872c1c2-38f1: 9 counter2 3f19757a-c89d: 7 844e75e2-b546: 11
  • 89. Counter API key counter1: (14, 13, 9) counter2: (11, 15, 17)
  • 90. General Tips ● Start with queries, work backwards ● Avoid storing extra “timestamp” columns ● Insert instead of check-then-insert ● Use client-side clock to your advantage ● use TTL ● Learn to love wide rows