Your SlideShare is downloading. ×
0
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to apache_cassandra_for_developers-lhg

7,661

Published on

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,661
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
453
Comments
0
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. <ul>Introduction to  Apache Cassandra (for Java Developers!) </ul><ul>Nate McCall [email_address] @zznate </ul>
  • 2. <ul>Overview  </ul><ul>Apache Cassandra is NOT a &quot;key/value store” Columns are dynamic inside a column family (but they don't have to be) </ul>Gain an understanding concepts in Apache Cassandra that have particulr effect on application development Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
  • 3. <ul>Brief Intro - Storage  </ul><ul>SSTables are immutable  SSTables merged on reads </ul>
  • 4. <ul>Brief Intro - Compaction  </ul><ul>Combine columns <li>Keep SSTable count down Discard tombstones (more on this later) </li></ul>
  • 5. <ul>Brief Intro - The Ring </ul><ul>All nodes share the same role: <li>No single point of failure
  • 6. Easy to scale
  • 7. Simplified operations </li></ul>
  • 8. <ul>Brief Intro - Consistency Level - ONE </ul><ul>Cassandra provides consistency when R + W > N <li>(read replica count + write replica count > replication factor). </li></ul>
  • 9. <ul>Brief Intro - Consistency Level – QUORUM </ul>
  • 10. <ul>Brief Intro – Read Repair </ul>
  • 11. <ul>vs. RDBMS - Consistency Level </ul><ul>*** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK *** Idempotent: an operation can be applied multiple times without changing the result <li>(except counters!) </li></ul>
  • 12. <ul>vs. RDBMS - Append Only </ul><ul>Proper data modeling will minimizes seeks No read before write (Go to Matt's presentation for more!) </ul>
  • 13. <ul>How does this impact development? </ul><ul>Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  Trade availability for consistency Store whatever you want. It's all just bytes. Think about how you will query the data before you write it. </ul>
  • 14. <ul>Neat. So Now What? </ul><ul>Like any database, you need a client! </ul><ul><ul><li>Python: </li><ul><li>Telephus:  http://github.com/driftx/Telephus  (Twisted)
  • 15. Pycassa:  http://github.com/pycassa/pycassa </li></ul><li>Java: </li><ul><li>Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  )
  • 16. Pelops:  http://github.com/s7/scale7-pelops
  • 17. Kundera  http://code.google.com/p/kundera/
  • 18. Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin </li></ul><li>Grails: </li><ul><li>grails-cassandra:  https://github.com/wolpert/grails-cassandra </li></ul><li>.NET: </li><ul><li>FluentCassandra:  http://github.com/managedfusion/fluentcassandra
  • 19. Aquiles:  http://aquiles.codeplex.com/ </li></ul><li>Ruby: </li><ul><li>Cassandra:  http://github.com/fauna/cassandra </li></ul><li>PHP: </li><ul><li>phpcassa:  http://github.com/thobbs/phpcassa
  • 20. SimpleCassie:  http://code.google.com/p/simpletools-php/wiki/SimpleCassie </li></ul></ul></ul>
  • 21. <ul>... but do not roll your own </ul>
  • 22. <ul>Thrift </ul><ul><ul><li>Fast, efficient serialization and network IO. 
  • 23. Lots of clients available (you can probably use it in other places as well) </li></ul></ul><ul>Why you don't want to work with the Thrift API directly: </ul><ul><ul><li>SuperColumn
  • 24. ColumnOrSuperColumn (don't forget Counters!)
  • 25. ColumnParent.super_column
  • 26. ColumnPath.super_column
  • 27. Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap  </li></ul></ul>
  • 28. <ul>Higher Level Clients </ul><ul>Hector </ul><ul><ul><li>JMX Counters
  • 29. Add/remove hosts: </li></ul></ul><ul><ul><ul><li>automatically 
  • 30. programatically
  • 31. via JMX </li></ul></ul></ul><ul><ul><li>Plugable load balancing
  • 32. Complete encapsulation of Thrift API
  • 33. Type-safe approach to dealing with Apache Cassandra
  • 34. Lightweight ORM (supports JPA 1.0 annotations)
  • 35. JPA support: https://github.com/riptano/hector-jpa
  • 36. Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/ </li></ul></ul>
  • 37. <ul>“ CQL” </ul><ul><ul><li>Viable alternative as of 0.8.0 
  • 38. JDBC Driver implementation means lots of possibilities
  • 39. Encapsulate API changes
  • 40. In-tree support on the way for: </li></ul></ul><ul><ul><ul><li>DataSource
  • 41. Pooling </li></ul></ul></ul>
  • 42. <ul>Avro, etc?? </ul><ul>Gone. Added too much complexity after Thrift caught up.   “ None of the libraries distinguished themselves as being a particularly crappy choice for serialization.”  (See  CASSANDRA-1765 ) </ul>
  • 43. <ul>Thrift API Methods </ul><ul>Five general categories <li>Retrieving
  • 44. Writing/Updating/Removing (all the same op!) </li><ul><li>Increment counters </li></ul><li>Meta Information
  • 45. Schema Manipulation
  • 46. CQL Execution </li></ul>
  • 47. <ul>On to the Code... </ul><ul>https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps) http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx </ul>
  • 48. <ul>Data Shape </ul><ul>512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin </ul>
  • 49. <ul>Get a Single Column for a Key </ul><ul>GetCityForNpanxx.java </ul><ul>columnQuery.setColumnFamily(&quot;Npanxx&quot;); <li>columnQuery.setKey(&quot;512204&quot;);
  • 50. columnQuery.setName(&quot;city&quot;); </li></ul>
  • 51. <ul>Get the Contents of a Row </ul><ul>GetSliceForNpanxx.java </ul><ul>sliceQuery.setColumnFamily(&quot;Npanxx&quot;); <li>sliceQuery.setKey(&quot;512202&quot;);
  • 52. sliceQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;); </li></ul>
  • 53. <ul>Get the (sorted!) Columns of a Row  </ul><ul>GetSliceForStateCity.java </ul><ul>sliceQuery.setColumnFamily(&quot;StateCity&quot;); <li>sliceQuery.setKey(&quot;TX Austin&quot;);
  • 54. sliceQuery.setRange(202L, 204L, false, 5) </li></ul>
  • 55. <ul>Get the Same Slice from Several Rows </ul><ul>MultigetSliceForNpanxx.java </ul><ul>multigetSlicesQuery.setColumnFamily(&quot;Npanxx&quot;); <li>multigetSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
  • 56. multigetSlicesQuery.setKeys(&quot;512202&quot;,&quot;512203&quot;,&quot;512205&quot;,&quot;512206&quot;); </li></ul>
  • 57. <ul>Get Slices From a Range of Rows </ul><ul>GetRangeSlicesForStateCity.java The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!) </ul><ul>rangeSlicesQuery.setColumnFamily(&quot;Npanxx&quot;); <li>rangeSlicesQuery.setColumnNames(&quot;city&quot;,&quot;state&quot;,&quot;lat&quot;,&quot;lng&quot;);
  • 58. rangeSlicesQuery.setKeys(&quot;512202&quot;, &quot;512205&quot;);
  • 59. rangeSlicesQuery.setRowCount(5); </li></ul>
  • 60. <ul>Get Slices From a Range of Rows - 2 </ul><ul>GetSliceForAreaCodeCity.java Bonus: DynamicComparator and DynamicComposite (Ed's talk) </ul><ul><li>sliceQuery.setKey(&quot;512&quot;);
  • 61. sliceQuery.setRange(&quot;Austin&quot;, &quot;Austin__204&quot;, false, 5); </li></ul>
  • 62. <ul>Get Slices from Indexed Columns </ul><ul>GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns </ul><ul><li>isq.setColumnFamily(&quot;Npanxx&quot;);
  • 63. isq.setColumnNames(&quot;city&quot;,&quot;lat&quot;,&quot;lng&quot;);
  • 64. isq.addEqualsExpression(&quot;state&quot;, &quot;TX&quot;);
  • 65. isq.addEqualsExpression(&quot;city&quot;, &quot;Austin&quot;);
  • 66. isq.addGteExpression(&quot;lat&quot;, &quot;30.30&quot;); </li></ul>
  • 67. <ul>Insert, Update and Delete </ul><ul>... are effectively the same operation: <li>Application of columns to a row </li></ul>
  • 68. <ul>Insertion </ul><ul>Inser tRowsForColumnFamilies.java </ul><ul>mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lat&quot;, &quot;37.57&quot;)); <li>mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;lng&quot;, &quot;122.34&quot;));
  • 69. mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;city&quot;, &quot;Burlingame&quot;));
  • 70. mutator.addInsertion(&quot;650222&quot;, &quot;Npanxx&quot;, HFactory.createStringColumn(&quot;state&quot;, &quot;CA&quot;)); </li></ul><ul>mutator.addInsertion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, <li>HFactory.createColumn(650L, &quot;37.57x122.34&quot;,longSerializer,stringSerializer));
  • 71. mutator.addInsertion(&quot;650&quot;, &quot;AreaCode&quot;,
  • 72. HFactory.createStringColumn(&quot;Burlingame__650&quot;, &quot;37.57x122.34&quot;)); </li></ul><ul>Add insertions to the other two column families to the same mutation </ul>
  • 73. <ul>Deletion </ul><ul>DeleteRowsForColumnFamily.java </ul><ul>mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, “city”, stringSerializer); </ul><ul>mutator.addDeletion(&quot;CA Burlingame&quot;, &quot;StateCity&quot;, null, stringSerializer); <li>mutator.addDeletion(&quot;650&quot;, &quot;AreaCode&quot;, null, stringSerializer);
  • 74. mutator.addDeletion(&quot;650222&quot;, &quot;Npanxx&quot;, null, stringSerializer); </li></ul><ul>Or row level </ul><ul>Record Level </ul>
  • 75. <ul>Deletion </ul><ul>[default@Tutorial] list StateCity; <li>Using default limit of 100
  • 76. -------------------
  • 77. RowKey: CA Burlingame
  • 78. => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
  • 79. -------------------
  • 80. RowKey: TX Austin
  • 81. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
  • 82. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
  • 83. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
  • 84. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
  • 85. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000) </li></ul>
  • 86. <ul>Deletion </ul><ul>[default@Tutorial] list StateCity; <li>Using default limit of 100
  • 87. -------------------
  • 88. RowKey: CA Burlingame
  • 89. -------------------
  • 90. RowKey: TX Austin
  • 91. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
  • 92. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
  • 93. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
  • 94. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
  • 95. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000) </li></ul>
  • 96. <ul>Deletion - FYI </ul><ul>mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer); </ul><ul>You just inserted a tombstone! </ul><ul>Sending a deletion for a non-existing row: </ul><ul>[default@Tutorial] list Npanxx; <li>Using default limit of 100
  • 97. . . .
  • 98. -------------------
  • 99. RowKey: 202230
  • 100. -------------------
  • 101. . . . </li></ul>
  • 102. <ul>ColumnFamilyTemplate </ul><ul>ColumnFamilyUpdater<String,String> updater = <li>template.createUpdater(&quot;cskey1&quot;);
  • 103. updater.setString(&quot;stringval&quot;,&quot;value1&quot;);
  • 104. updater.setDate(&quot;curdate&quot;, date);
  • 105. updater.setLong(&quot;longval&quot;, 5L);
  • 106. template.update(updater);
  • 107. template.addColumn(&quot;stringval&quot;, se);
  • 108. template.addColumn(&quot;curdate&quot;, DateSerializer.get());
  • 109. template.addColumn(&quot;longval&quot;, LongSerializer.get());
  • 110. ColumnFamilyResult wrapper = template.queryColumns(&quot;cskey1&quot;); </li></ul><ul>Template method design pattern <li>https://github.com/rantav/hector/wiki/Getting-started-%285-minutes%29 </li></ul>
  • 111. <ul>Development Resources </ul><ul>Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter </ul><ul>Cassandra AMIs https://github.com/riptano/CassandraClusterAMI </ul>
  • 112. <ul>Stuff I Punted on for the Sake of Brevity </ul><ul>meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 multiple nodes and failure scenarios Data modeling (go see Matt's presentation) </ul>
  • 113. <ul>Things to Remember </ul><ul><ul><li>deletes and timestamp granularity
  • 114. “ range ghosts” and “tombstones”
  • 115. using the wrong column comparator, key/default validators and InvalidRequestException
  • 116. “ Schema-less” -> “Schema Optional”
  • 117. use column-level TTL to automate deletion
  • 118. &quot;how do I iterate over all the rows in a column family&quot;? </li></ul></ul><ul><ul><ul><li>get_range_slices, but don't do that
  • 119. a good sign your data model is wrong </li></ul></ul></ul>
  • 120. <ul>Questions? </ul>

×