Introduction to apache_cassandra_for_developers-lhg
Upcoming SlideShare
Loading in...5
×
 

Introduction to apache_cassandra_for_developers-lhg

on

  • 7,847 views

 

Statistics

Views

Total Views
7,847
Views on SlideShare
7,834
Embed Views
13

Actions

Likes
9
Downloads
437
Comments
0

5 Embeds 13

http://www.techgig.com 8
https://twitter.com 2
http://twitter.com 1
http://paper.li 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to apache_cassandra_for_developers-lhg Introduction to apache_cassandra_for_developers-lhg Presentation Transcript

    • Introduction to  Apache Cassandra (for Java Developers!)
      Nate McCall [email_address] @zznate
    • Overview 
      Apache Cassandra is NOT a "key/value store” Columns are dynamic inside a column family (but they don't have to be)
    Gain an understanding concepts in Apache Cassandra that have particulr effect on application development Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
    • Brief Intro - Storage 
      SSTables are immutable  SSTables merged on reads
    • Brief Intro - Compaction 
      Combine columns
    • Keep SSTable count down Discard tombstones (more on this later)
    • Brief Intro - The Ring
      All nodes share the same role:
    • No single point of failure
    • Easy to scale
    • Simplified operations
    • Brief Intro - Consistency Level - ONE
      Cassandra provides consistency when R + W > N
    • (read replica count + write replica count > replication factor).
    • Brief Intro - Consistency Level – QUORUM
    • Brief Intro – Read Repair
    • vs. RDBMS - Consistency Level
      *** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK *** Idempotent: an operation can be applied multiple times without changing the result
    • (except counters!)
    • vs. RDBMS - Append Only
      Proper data modeling will minimizes seeks No read before write (Go to Matt's presentation for more!)
    • How does this impact development?
      Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  Trade availability for consistency Store whatever you want. It's all just bytes. Think about how you will query the data before you write it.
    • Neat. So Now What?
      Like any database, you need a client!
      • Python:
        • Telephus:  http://github.com/driftx/Telephus  (Twisted)
        • Pycassa:  http://github.com/pycassa/pycassa
      • Java:
        • Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  )
        • Pelops:  http://github.com/s7/scale7-pelops
        • Kundera  http://code.google.com/p/kundera/
        • Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin
      • Grails:
        • grails-cassandra:  https://github.com/wolpert/grails-cassandra
      • .NET:
        • FluentCassandra:  http://github.com/managedfusion/fluentcassandra
        • Aquiles:  http://aquiles.codeplex.com/
      • Ruby:
        • Cassandra:  http://github.com/fauna/cassandra
      • PHP:
        • phpcassa:  http://github.com/thobbs/phpcassa
        • SimpleCassie:  http://code.google.com/p/simpletools-php/wiki/SimpleCassie
    • ... but do not roll your own
    • Thrift
      • Fast, efficient serialization and network IO. 
      • Lots of clients available (you can probably use it in other places as well)
      Why you don't want to work with the Thrift API directly:
      • SuperColumn
      • ColumnOrSuperColumn (don't forget Counters!)
      • ColumnParent.super_column
      • ColumnPath.super_column
      • Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap 
    • Higher Level Clients
      Hector
      • JMX Counters
      • Add/remove hosts:
        • automatically 
        • programatically
        • via JMX
      • Plugable load balancing
      • Complete encapsulation of Thrift API
      • Type-safe approach to dealing with Apache Cassandra
      • Lightweight ORM (supports JPA 1.0 annotations)
      • JPA support: https://github.com/riptano/hector-jpa
      • Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/
    • “ CQL”
      • Viable alternative as of 0.8.0 
      • JDBC Driver implementation means lots of possibilities
      • Encapsulate API changes
      • In-tree support on the way for:
        • DataSource
        • Pooling
    • Avro, etc??
      Gone. Added too much complexity after Thrift caught up.   “ None of the libraries distinguished themselves as being a particularly crappy choice for serialization.”  (See  CASSANDRA-1765 )
    • Thrift API Methods
      Five general categories
    • Retrieving
    • Writing/Updating/Removing (all the same op!)
      • Increment counters
    • Meta Information
    • Schema Manipulation
    • CQL Execution
    • On to the Code...
      https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps) http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx
    • Data Shape
      512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin
    • Get a Single Column for a Key
      GetCityForNpanxx.java
      columnQuery.setColumnFamily(&quot;Npanxx&quot;);
    • columnQuery.setKey(&quot;512204&quot;);
    • columnQuery.setName(&quot;city&quot;);
    • Get the Contents of a Row
      GetSliceForNpanxx.java
      sliceQuery.setColumnFamily(&quot;Npanxx&quot;);
    • sliceQuery.setKey(&quot;512202&quot;);
    • sliceQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
    • Get the (sorted!) Columns of a Row 
      GetSliceForStateCity.java
      sliceQuery.setColumnFamily(&quot;StateCity&quot;);
    • sliceQuery.setKey(&quot;TX Austin&quot;);
    • sliceQuery.setRange(202L, 204L, false, 5)
    • Get the Same Slice from Several Rows
      MultigetSliceForNpanxx.java
      multigetSlicesQuery.setColumnFamily(&quot;Npanxx&quot;);
    • multigetSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
    • multigetSlicesQuery.setKeys(&quot;512202&quot;,&quot;512203&quot;,&quot;512205&quot;,&quot;512206&quot;);
    • Get Slices From a Range of Rows
      GetRangeSlicesForStateCity.java The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!)
      rangeSlicesQuery.setColumnFamily(&quot;Npanxx&quot;);
    • rangeSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
    • rangeSlicesQuery.setKeys(&quot;512202&quot;, &quot;512205&quot;);
    • rangeSlicesQuery.setRowCount(5);
    • Get Slices From a Range of Rows - 2
      GetSliceForAreaCodeCity.java Bonus: DynamicComparator and DynamicComposite (Ed's talk)
    • sliceQuery.setKey(&quot;512&quot;);
    • sliceQuery.setRange(&quot;Austin&quot;, &quot;Austin__204&quot;, false, 5);
    • Get Slices from Indexed Columns
      GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns
    • isq.setColumnFamily(&quot;Npanxx&quot;);
    • isq.setColumnNames(&quot;city&quot;,&quot;lat&quot;,&quot;lng&quot;);
    • isq.addEqualsExpression(&quot;state&quot;, &quot;TX&quot;);
    • isq.addEqualsExpression(&quot;city&quot;, &quot;Austin&quot;);
    • isq.addGteExpression(&quot;lat&quot;, &quot;30.30&quot;);
    • Insert, Update and Delete
      ... are effectively the same operation:
    • Application of columns to a row
    • Insertion
      Inser tRowsForColumnFamilies.java
      mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lat&quot;, &quot;37.57&quot;));
    • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lng&quot;, &quot;122.34&quot;));
    • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;city&quot;, &quot;Burlingame&quot;));
    • mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;state&quot;, &quot;CA&quot;));
      mutator.addInsertion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;,
    • HFactory.createColumn(650L, &quot;37.57x122.34&quot;,longSerializer,stringSerializer));
    • mutator.addInsertion(&quot;650&quot;, &quot;AreaCode&quot;,
    • HFactory.createStringColumn(&quot;Burlingame__650&quot;, &quot;37.57x122.34&quot;));
      Add insertions to the other two column families to the same mutation
    • Deletion
      DeleteRowsForColumnFamily.java
      mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
      mutator.addDeletion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, null, stringSerializer);
    • mutator.addDeletion(&quot;650&quot;, &quot;AreaCode&quot;, null, stringSerializer);
    • mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, null, stringSerializer);
      Or row level
      Record Level
    • Deletion
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • -------------------
    • RowKey: CA Burlingame
    • => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
    • -------------------
    • RowKey: TX Austin
    • => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
    • Deletion
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • -------------------
    • RowKey: CA Burlingame
    • -------------------
    • RowKey: TX Austin
    • => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
    • Deletion - FYI
      mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
      You just inserted a tombstone!
      Sending a deletion for a non-existing row:
      [default@Tutorial] list Npanxx;
    • Using default limit of 100
    • . . .
    • -------------------
    • RowKey: 202230
    • -------------------
    • . . .
    • ColumnFamilyTemplate
      ColumnFamilyUpdater<String,String> updater =
    • template.createUpdater(&quot;cskey1&quot;);
    • updater.setString(&quot;stringval&quot;,&quot;value1&quot;);
    • updater.setDate(&quot;curdate&quot;, date);
    • updater.setLong(&quot;longval&quot;, 5L);
    • template.update(updater);
    • template.addColumn(&quot;stringval&quot;, se);
    • template.addColumn(&quot;curdate&quot;, DateSerializer.get());
    • template.addColumn(&quot;longval&quot;, LongSerializer.get());
    • ColumnFamilyResult wrapper = template.queryColumns(&quot;cskey1&quot;);
      Template method design pattern
    • https://github.com/rantav/hector/wiki/Getting-started-%285-minutes%29
    • Development Resources
      Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter
      Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
    • Stuff I Punted on for the Sake of Brevity
      meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 multiple nodes and failure scenarios Data modeling (go see Matt's presentation)
    • Things to Remember
      • deletes and timestamp granularity
      • “ range ghosts” and “tombstones”
      • using the wrong column comparator, key/default validators and InvalidRequestException
      • “ Schema-less” -> “Schema Optional”
      • use column-level TTL to automate deletion
      • &quot;how do I iterate over all the rows in a column family&quot;?
        • get_range_slices, but don't do that
        • a good sign your data model is wrong
    • Questions?