Introduction to apache_cassandra_for_developers-lhg

8,141 views
8,038 views

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,141
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
469
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Introduction to apache_cassandra_for_developers-lhg

  1. 1. <ul>Introduction to  Apache Cassandra (for Java Developers!) </ul><ul>Nate McCall [email_address] @zznate </ul>
  2. 2. <ul>Overview  </ul><ul>Apache Cassandra is NOT a &quot;key/value store” Columns are dynamic inside a column family (but they don't have to be) </ul>Gain an understanding concepts in Apache Cassandra that have particulr effect on application development Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
  3. 3. <ul>Brief Intro - Storage  </ul><ul>SSTables are immutable  SSTables merged on reads </ul>
  4. 4. <ul>Brief Intro - Compaction  </ul><ul>Combine columns <li>Keep SSTable count down Discard tombstones (more on this later) </li></ul>
  5. 5. <ul>Brief Intro - The Ring </ul><ul>All nodes share the same role: <li>No single point of failure
  6. 6. Easy to scale
  7. 7. Simplified operations </li></ul>
  8. 8. <ul>Brief Intro - Consistency Level - ONE </ul><ul>Cassandra provides consistency when R + W > N <li>(read replica count + write replica count > replication factor). </li></ul>
  9. 9. <ul>Brief Intro - Consistency Level – QUORUM </ul>
  10. 10. <ul>Brief Intro – Read Repair </ul>
  11. 11. <ul>vs. RDBMS - Consistency Level </ul><ul>*** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK *** Idempotent: an operation can be applied multiple times without changing the result <li>(except counters!) </li></ul>
  12. 12. <ul>vs. RDBMS - Append Only </ul><ul>Proper data modeling will minimizes seeks No read before write (Go to Matt's presentation for more!) </ul>
  13. 13. <ul>How does this impact development? </ul><ul>Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  Trade availability for consistency Store whatever you want. It's all just bytes. Think about how you will query the data before you write it. </ul>
  14. 14. <ul>Neat. So Now What? </ul><ul>Like any database, you need a client! </ul><ul><ul><li>Python: </li><ul><li>Telephus:  http://github.com/driftx/Telephus  (Twisted)
  15. 15. Pycassa:  http://github.com/pycassa/pycassa </li></ul><li>Java: </li><ul><li>Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  )
  16. 16. Pelops:  http://github.com/s7/scale7-pelops
  17. 17. Kundera  http://code.google.com/p/kundera/
  18. 18. Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin </li></ul><li>Grails: </li><ul><li>grails-cassandra:  https://github.com/wolpert/grails-cassandra </li></ul><li>.NET: </li><ul><li>FluentCassandra:  http://github.com/managedfusion/fluentcassandra
  19. 19. Aquiles:  http://aquiles.codeplex.com/ </li></ul><li>Ruby: </li><ul><li>Cassandra:  http://github.com/fauna/cassandra </li></ul><li>PHP: </li><ul><li>phpcassa:  http://github.com/thobbs/phpcassa
  20. 20. SimpleCassie:  http://code.google.com/p/simpletools-php/wiki/SimpleCassie </li></ul></ul></ul>
  21. 21. <ul>... but do not roll your own </ul>
  22. 22. <ul>Thrift </ul><ul><ul><li>Fast, efficient serialization and network IO. 
  23. 23. Lots of clients available (you can probably use it in other places as well) </li></ul></ul><ul>Why you don't want to work with the Thrift API directly: </ul><ul><ul><li>SuperColumn
  24. 24. ColumnOrSuperColumn (don't forget Counters!)
  25. 25. ColumnParent.super_column
  26. 26. ColumnPath.super_column
  27. 27. Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap  </li></ul></ul>
  28. 28. <ul>Higher Level Clients </ul><ul>Hector </ul><ul><ul><li>JMX Counters
  29. 29. Add/remove hosts: </li></ul></ul><ul><ul><ul><li>automatically 
  30. 30. programatically
  31. 31. via JMX </li></ul></ul></ul><ul><ul><li>Plugable load balancing
  32. 32. Complete encapsulation of Thrift API
  33. 33. Type-safe approach to dealing with Apache Cassandra
  34. 34. Lightweight ORM (supports JPA 1.0 annotations)
  35. 35. JPA support: https://github.com/riptano/hector-jpa
  36. 36. Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/ </li></ul></ul>
  37. 37. <ul>“ CQL” </ul><ul><ul><li>Viable alternative as of 0.8.0 
  38. 38. JDBC Driver implementation means lots of possibilities
  39. 39. Encapsulate API changes
  40. 40. In-tree support on the way for: </li></ul></ul><ul><ul><ul><li>DataSource
  41. 41. Pooling </li></ul></ul></ul>
  42. 42. <ul>Avro, etc?? </ul><ul>Gone. Added too much complexity after Thrift caught up.   “ None of the libraries distinguished themselves as being a particularly crappy choice for serialization.”  (See  CASSANDRA-1765 ) </ul>
  43. 43. <ul>Thrift API Methods </ul><ul>Five general categories <li>Retrieving
  44. 44. Writing/Updating/Removing (all the same op!) </li><ul><li>Increment counters </li></ul><li>Meta Information
  45. 45. Schema Manipulation
  46. 46. CQL Execution </li></ul>
  47. 47. <ul>On to the Code... </ul><ul>https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps) http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx </ul>
  48. 48. <ul>Data Shape </ul><ul>512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin </ul>
  49. 49. <ul>Get a Single Column for a Key </ul><ul>GetCityForNpanxx.java </ul><ul>columnQuery.setColumnFamily(&quot;Npanxx&quot;); <li>columnQuery.setKey(&quot;512204&quot;);
  50. 50. columnQuery.setName(&quot;city&quot;); </li></ul>
  51. 51. <ul>Get the Contents of a Row </ul><ul>GetSliceForNpanxx.java </ul><ul>sliceQuery.setColumnFamily(&quot;Npanxx&quot;); <li>sliceQuery.setKey(&quot;512202&quot;);
  52. 52. sliceQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;); </li></ul>
  53. 53. <ul>Get the (sorted!) Columns of a Row  </ul><ul>GetSliceForStateCity.java </ul><ul>sliceQuery.setColumnFamily(&quot;StateCity&quot;); <li>sliceQuery.setKey(&quot;TX Austin&quot;);
  54. 54. sliceQuery.setRange(202L, 204L, false, 5) </li></ul>
  55. 55. <ul>Get the Same Slice from Several Rows </ul><ul>MultigetSliceForNpanxx.java </ul><ul>multigetSlicesQuery.setColumnFamily(&quot;Npanxx&quot;); <li>multigetSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
  56. 56. multigetSlicesQuery.setKeys(&quot;512202&quot;,&quot;512203&quot;,&quot;512205&quot;,&quot;512206&quot;); </li></ul>
  57. 57. <ul>Get Slices From a Range of Rows </ul><ul>GetRangeSlicesForStateCity.java The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!) </ul><ul>rangeSlicesQuery.setColumnFamily(&quot;Npanxx&quot;); <li>rangeSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
  58. 58. rangeSlicesQuery.setKeys(&quot;512202&quot;, &quot;512205&quot;);
  59. 59. rangeSlicesQuery.setRowCount(5); </li></ul>
  60. 60. <ul>Get Slices From a Range of Rows - 2 </ul><ul>GetSliceForAreaCodeCity.java Bonus: DynamicComparator and DynamicComposite (Ed's talk) </ul><ul><li>sliceQuery.setKey(&quot;512&quot;);
  61. 61. sliceQuery.setRange(&quot;Austin&quot;, &quot;Austin__204&quot;, false, 5); </li></ul>
  62. 62. <ul>Get Slices from Indexed Columns </ul><ul>GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns </ul><ul><li>isq.setColumnFamily(&quot;Npanxx&quot;);
  63. 63. isq.setColumnNames(&quot;city&quot;,&quot;lat&quot;,&quot;lng&quot;);
  64. 64. isq.addEqualsExpression(&quot;state&quot;, &quot;TX&quot;);
  65. 65. isq.addEqualsExpression(&quot;city&quot;, &quot;Austin&quot;);
  66. 66. isq.addGteExpression(&quot;lat&quot;, &quot;30.30&quot;); </li></ul>
  67. 67. <ul>Insert, Update and Delete </ul><ul>... are effectively the same operation: <li>Application of columns to a row </li></ul>
  68. 68. <ul>Insertion </ul><ul>Inser tRowsForColumnFamilies.java </ul><ul>mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lat&quot;, &quot;37.57&quot;)); <li>mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lng&quot;, &quot;122.34&quot;));
  69. 69. mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;city&quot;, &quot;Burlingame&quot;));
  70. 70. mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;state&quot;, &quot;CA&quot;)); </li></ul><ul>mutator.addInsertion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, <li>HFactory.createColumn(650L, &quot;37.57x122.34&quot;,longSerializer,stringSerializer));
  71. 71. mutator.addInsertion(&quot;650&quot;, &quot;AreaCode&quot;,
  72. 72. HFactory.createStringColumn(&quot;Burlingame__650&quot;, &quot;37.57x122.34&quot;)); </li></ul><ul>Add insertions to the other two column families to the same mutation </ul>
  73. 73. <ul>Deletion </ul><ul>DeleteRowsForColumnFamily.java </ul><ul>mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, “city”, stringSerializer); </ul><ul>mutator.addDeletion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, null, stringSerializer); <li>mutator.addDeletion(&quot;650&quot;, &quot;AreaCode&quot;, null, stringSerializer);
  74. 74. mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, null, stringSerializer); </li></ul><ul>Or row level </ul><ul>Record Level </ul>
  75. 75. <ul>Deletion </ul><ul>[default@Tutorial] list StateCity; <li>Using default limit of 100
  76. 76. -------------------
  77. 77. RowKey: CA Burlingame
  78. 78. => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
  79. 79. -------------------
  80. 80. RowKey: TX Austin
  81. 81. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
  82. 82. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
  83. 83. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
  84. 84. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
  85. 85. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000) </li></ul>
  86. 86. <ul>Deletion </ul><ul>[default@Tutorial] list StateCity; <li>Using default limit of 100
  87. 87. -------------------
  88. 88. RowKey: CA Burlingame
  89. 89. -------------------
  90. 90. RowKey: TX Austin
  91. 91. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
  92. 92. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
  93. 93. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
  94. 94. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
  95. 95. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000) </li></ul>
  96. 96. <ul>Deletion - FYI </ul><ul>mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer); </ul><ul>You just inserted a tombstone! </ul><ul>Sending a deletion for a non-existing row: </ul><ul>[default@Tutorial] list Npanxx; <li>Using default limit of 100
  97. 97. . . .
  98. 98. -------------------
  99. 99. RowKey: 202230
  100. 100. -------------------
  101. 101. . . . </li></ul>
  102. 102. <ul>ColumnFamilyTemplate </ul><ul>ColumnFamilyUpdater<String,String> updater = <li>template.createUpdater(&quot;cskey1&quot;);
  103. 103. updater.setString(&quot;stringval&quot;,&quot;value1&quot;);
  104. 104. updater.setDate(&quot;curdate&quot;, date);
  105. 105. updater.setLong(&quot;longval&quot;, 5L);
  106. 106. template.update(updater);
  107. 107. template.addColumn(&quot;stringval&quot;, se);
  108. 108. template.addColumn(&quot;curdate&quot;, DateSerializer.get());
  109. 109. template.addColumn(&quot;longval&quot;, LongSerializer.get());
  110. 110. ColumnFamilyResult wrapper = template.queryColumns(&quot;cskey1&quot;); </li></ul><ul>Template method design pattern <li>https://github.com/rantav/hector/wiki/Getting-started-%285-minutes%29 </li></ul>
  111. 111. <ul>Development Resources </ul><ul>Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter </ul><ul>Cassandra AMIs https://github.com/riptano/CassandraClusterAMI </ul>
  112. 112. <ul>Stuff I Punted on for the Sake of Brevity </ul><ul>meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 multiple nodes and failure scenarios Data modeling (go see Matt's presentation) </ul>
  113. 113. <ul>Things to Remember </ul><ul><ul><li>deletes and timestamp granularity
  114. 114. “ range ghosts” and “tombstones”
  115. 115. using the wrong column comparator, key/default validators and InvalidRequestException
  116. 116. “ Schema-less” -> “Schema Optional”
  117. 117. use column-level TTL to automate deletion
  118. 118. &quot;how do I iterate over all the rows in a column family&quot;? </li></ul></ul><ul><ul><ul><li>get_range_slices, but don't do that
  119. 119. a good sign your data model is wrong </li></ul></ul></ul>
  120. 120. <ul>Questions? </ul>

×