• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to apache_cassandra_for_developers-lhg
 

Introduction to apache_cassandra_for_developers-lhg

on

  • 7,716 views

 

Statistics

Views

Total Views
7,716
Views on SlideShare
7,703
Embed Views
13

Actions

Likes
9
Downloads
431
Comments
0

5 Embeds 13

http://www.techgig.com 8
https://twitter.com 2
http://twitter.com 1
http://paper.li 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to apache_cassandra_for_developers-lhg Introduction to apache_cassandra_for_developers-lhg Presentation Transcript

      • Introduction to  Apache Cassandra (for Java Developers!)
        Nate McCall [email_address] @zznate
      • Overview 
        Apache Cassandra is NOT a "key/value store” Columns are dynamic inside a column family (but they don't have to be)
      Gain an understanding concepts in Apache Cassandra that have particulr effect on application development Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
      • Brief Intro - Storage 
        SSTables are immutable  SSTables merged on reads
      • Brief Intro - Compaction 
        Combine columns
      • Keep SSTable count down Discard tombstones (more on this later)
      • Brief Intro - The Ring
        All nodes share the same role:
      • No single point of failure
      • Easy to scale
      • Simplified operations
      • Brief Intro - Consistency Level - ONE
        Cassandra provides consistency when R + W > N
      • (read replica count + write replica count > replication factor).
      • Brief Intro - Consistency Level – QUORUM
      • Brief Intro – Read Repair
      • vs. RDBMS - Consistency Level
        *** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK *** Idempotent: an operation can be applied multiple times without changing the result
      • (except counters!)
      • vs. RDBMS - Append Only
        Proper data modeling will minimizes seeks No read before write (Go to Matt's presentation for more!)
      • How does this impact development?
        Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  Trade availability for consistency Store whatever you want. It's all just bytes. Think about how you will query the data before you write it.
      • Neat. So Now What?
        Like any database, you need a client!
        • Python:
          • Telephus:  http://github.com/driftx/Telephus  (Twisted)
          • Pycassa:  http://github.com/pycassa/pycassa
        • Java:
          • Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  )
          • Pelops:  http://github.com/s7/scale7-pelops
          • Kundera  http://code.google.com/p/kundera/
          • Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin
        • Grails:
          • grails-cassandra:  https://github.com/wolpert/grails-cassandra
        • .NET:
          • FluentCassandra:  http://github.com/managedfusion/fluentcassandra
          • Aquiles:  http://aquiles.codeplex.com/
        • Ruby:
          • Cassandra:  http://github.com/fauna/cassandra
        • PHP:
          • phpcassa:  http://github.com/thobbs/phpcassa
          • SimpleCassie:  http://code.google.com/p/simpletools-php/wiki/SimpleCassie
      • ... but do not roll your own
      • Thrift
        • Fast, efficient serialization and network IO. 
        • Lots of clients available (you can probably use it in other places as well)
        Why you don't want to work with the Thrift API directly:
        • SuperColumn
        • ColumnOrSuperColumn (don't forget Counters!)
        • ColumnParent.super_column
        • ColumnPath.super_column
        • Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap 
      • Higher Level Clients
        Hector
        • JMX Counters
        • Add/remove hosts:
          • automatically 
          • programatically
          • via JMX
        • Plugable load balancing
        • Complete encapsulation of Thrift API
        • Type-safe approach to dealing with Apache Cassandra
        • Lightweight ORM (supports JPA 1.0 annotations)
        • JPA support: https://github.com/riptano/hector-jpa
        • Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/
      • “ CQL”
        • Viable alternative as of 0.8.0 
        • JDBC Driver implementation means lots of possibilities
        • Encapsulate API changes
        • In-tree support on the way for:
          • DataSource
          • Pooling
      • Avro, etc??
        Gone. Added too much complexity after Thrift caught up.   “ None of the libraries distinguished themselves as being a particularly crappy choice for serialization.”  (See  CASSANDRA-1765 )
      • Thrift API Methods
        Five general categories
      • Retrieving
      • Writing/Updating/Removing (all the same op!)
        • Increment counters
      • Meta Information
      • Schema Manipulation
      • CQL Execution
      • On to the Code...
        https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps) http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx
      • Data Shape
        512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin
      • Get a Single Column for a Key
        GetCityForNpanxx.java
        columnQuery.setColumnFamily(&quot;Npanxx&quot;);
      • columnQuery.setKey(&quot;512204&quot;);
      • columnQuery.setName(&quot;city&quot;);
      • Get the Contents of a Row
        GetSliceForNpanxx.java
        sliceQuery.setColumnFamily(&quot;Npanxx&quot;);
      • sliceQuery.setKey(&quot;512202&quot;);
      • sliceQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
      • Get the (sorted!) Columns of a Row 
        GetSliceForStateCity.java
        sliceQuery.setColumnFamily(&quot;StateCity&quot;);
      • sliceQuery.setKey(&quot;TX Austin&quot;);
      • sliceQuery.setRange(202L, 204L, false, 5)
      • Get the Same Slice from Several Rows
        MultigetSliceForNpanxx.java
        multigetSlicesQuery.setColumnFamily(&quot;Npanxx&quot;);
      • multigetSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
      • multigetSlicesQuery.setKeys(&quot;512202&quot;,&quot;512203&quot;,&quot;512205&quot;,&quot;512206&quot;);
      • Get Slices From a Range of Rows
        GetRangeSlicesForStateCity.java The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!)
        rangeSlicesQuery.setColumnFamily(&quot;Npanxx&quot;);
      • rangeSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
      • rangeSlicesQuery.setKeys(&quot;512202&quot;, &quot;512205&quot;);
      • rangeSlicesQuery.setRowCount(5);
      • Get Slices From a Range of Rows - 2
        GetSliceForAreaCodeCity.java Bonus: DynamicComparator and DynamicComposite (Ed's talk)
      • sliceQuery.setKey(&quot;512&quot;);
      • sliceQuery.setRange(&quot;Austin&quot;, &quot;Austin__204&quot;, false, 5);
      • Get Slices from Indexed Columns
        GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns
      • isq.setColumnFamily(&quot;Npanxx&quot;);
      • isq.setColumnNames(&quot;city&quot;,&quot;lat&quot;,&quot;lng&quot;);
      • isq.addEqualsExpression(&quot;state&quot;, &quot;TX&quot;);
      • isq.addEqualsExpression(&quot;city&quot;, &quot;Austin&quot;);
      • isq.addGteExpression(&quot;lat&quot;, &quot;30.30&quot;);
      • Insert, Update and Delete
        ... are effectively the same operation:
      • Application of columns to a row
      • Insertion
        Inser tRowsForColumnFamilies.java
        mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lat&quot;, &quot;37.57&quot;));
      • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lng&quot;, &quot;122.34&quot;));
      • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;city&quot;, &quot;Burlingame&quot;));
      • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;state&quot;, &quot;CA&quot;));
        mutator.addInsertion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;,
      • HFactory.createColumn(650L, &quot;37.57x122.34&quot;,longSerializer,stringSerializer));
      • mutator.addInsertion(&quot;650&quot;, &quot;AreaCode&quot;,
      • HFactory.createStringColumn(&quot;Burlingame__650&quot;, &quot;37.57x122.34&quot;));
        Add insertions to the other two column families to the same mutation
      • Deletion
        DeleteRowsForColumnFamily.java
        mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
        mutator.addDeletion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, null, stringSerializer);
      • mutator.addDeletion(&quot;650&quot;, &quot;AreaCode&quot;, null, stringSerializer);
      • mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, null, stringSerializer);
        Or row level
        Record Level
      • Deletion
        [default@Tutorial] list StateCity;
      • Using default limit of 100
      • -------------------
      • RowKey: CA Burlingame
      • => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
      • -------------------
      • RowKey: TX Austin
      • => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
      • => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
      • => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
      • => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
      • => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
      • Deletion
        [default@Tutorial] list StateCity;
      • Using default limit of 100
      • -------------------
      • RowKey: CA Burlingame
      • -------------------
      • RowKey: TX Austin
      • => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
      • => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
      • => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
      • => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
      • => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
      • Deletion - FYI
        mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
        You just inserted a tombstone!
        Sending a deletion for a non-existing row:
        [default@Tutorial] list Npanxx;
      • Using default limit of 100
      • . . .
      • -------------------
      • RowKey: 202230
      • -------------------
      • . . .
      • ColumnFamilyTemplate
        ColumnFamilyUpdater<String,String> updater =
      • template.createUpdater(&quot;cskey1&quot;);
      • updater.setString(&quot;stringval&quot;,&quot;value1&quot;);
      • updater.setDate(&quot;curdate&quot;, date);
      • updater.setLong(&quot;longval&quot;, 5L);
      • template.update(updater);
      • template.addColumn(&quot;stringval&quot;, se);
      • template.addColumn(&quot;curdate&quot;, DateSerializer.get());
      • template.addColumn(&quot;longval&quot;, LongSerializer.get());
      • ColumnFamilyResult wrapper = template.queryColumns(&quot;cskey1&quot;);
        Template method design pattern
      • https://github.com/rantav/hector/wiki/Getting-started-%285-minutes%29
      • Development Resources
        Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter
        Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
      • Stuff I Punted on for the Sake of Brevity
        meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 multiple nodes and failure scenarios Data modeling (go see Matt's presentation)
      • Things to Remember
        • deletes and timestamp granularity
        • “ range ghosts” and “tombstones”
        • using the wrong column comparator, key/default validators and InvalidRequestException
        • “ Schema-less” -> “Schema Optional”
        • use column-level TTL to automate deletion
        • &quot;how do I iterate over all the rows in a column family&quot;?
          • get_range_slices, but don't do that
          • a good sign your data model is wrong
      • Questions?