Your SlideShare is downloading. ×
  • Like
Meetup cassandra for_java_cql
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Meetup cassandra for_java_cql

  • 2,114 views
Published

Slides from 10/26/2011 Cassandra Austin Meetup group

Slides from 10/26/2011 Cassandra Austin Meetup group

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,114
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
98
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.
      Building Java Applications with Apache Cassandra
      Nate McCall [email_address] @zznate
  • 2.
      What is Apache Cassandra?
  • 3. CAP Theorem C onsistency A vailability P artition Tolerance “ Thou shalt have but 2” - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See: http://en.wikipedia.org/wiki/CAP_theorem for more
  • 4.
      Apache Cassandra Concepts
    - Explicit choice of partition tolerance and availability. Consistency is tunable. - No read before write - Merge on read - Idempotent - Schema Optional - All nodes share the same role - Still performs well with larger-than-memory data sets
  • 5. Generally complements another system(s) (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
  • 6. How does this differ from an RDBMS?
  • 7. How does this differ from an RDBMS? Substantially.
  • 8. vs. RDBMS - No Joins Unless: - you do them on the client - you do them via Map/Reduce
  • 9. vs. RDBMS - Schema Optional (Though you can add meta information for validation and type checking) *** Supports secondary indexes too: “ … WHERE state = 'TX' ”
  • 10. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries
  • 11. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
  • 12.
      vs. RDBMS - Facilitates Consolidation
    It can be your caching layer * Off-heap cache (provided you install JNA) It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon
  • 13. vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
  • 14.
      vs. RDBMS - Real Linear Scalability
    Want 2x performance? Add 2x nodes. *** 'No downtime' included!
  • 15.
      vs. RDBMS - Performance
    Reads on par with writes
  • 16.
      Storage (Briefly)
  • 17.
      Storage (Briefly)
      Understanding the on-disk format is extremely helpful in designing your data model correctly
  • 18.
      Storage - SSTable
      - SSTables are immutable (“Merge on read”)
    • - Newest timestamp wins
  • 19.
      Storage – Compaction
      Merge SSTables – keeping count down making Merge on Read more efficient
    • Discards Tombstones (more on this later!)
  • 20.
      Data Model
  • 21.
      Data Model
      "...sparse, persistent, distributed, multi-dimensional sorted map."
    • (The “Bigtable” paper)
  • 22.
      Data Model
      Keyspace
    • - Collection of Column Families
    • 23. - Controls replication
    • 24. Column Family
    • 25. - Similar to a table
    • 26. - Columns ordered by name
  • 27.
      Data Model – Column Family
      Static Column Family
    • - Model my object data
    • 28. Dynamic Column Family
    • 29. - Pre-calculated query results
    • 30. Nothing stopping you from mixing them!
  • 31.
      Data Model – Static CF
      GOOG
      AAPL
      NFLX
      NOK
    • price: 589.55
    • price: 401.76
      price: 78.73
      name : Google
      name : Apple
      name : Netflix
      price: 6.90
      name : Nokia
      exchange : NYSE
      Stocks
  • 32.
      Data Model – Prematerialized Query
      StockHist
      10/25/2011: 6.71
      GOOG
      AAPL
      NFLX
      NOK
      10/24/2011: 6.76
      10/21/2011: 6.61
      10/25/2011: 77.37
      10/24/2011: 118.84
      10/21/2011: 117.04
      10/25/2011: 397.77
      10/24/2011: 405.77
      10/21/2011: 392.87
      10/25/2011: 583.16
      10/24/2011: 596.42
      10/21/2011: 590.49
  • 33.
      API Operations
  • 34. Five general categories
      Retrieving Writing/Updating/Removing (all the same op!)
        Increment counters
      Meta Information Schema Manipulation CQL Execution
  • 35. Using a Client Hector Client: http://hector-client.org - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed *** like any open source project fully dependent on another open source project it has its worts
  • 36.
      Sample Project for Experimenting
    https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-examples Built using Hector Really basic – designed to be beginner level w/ very few moving parts Modify/abuse/alter as needed *** Descriptions of what is going on and how to run each example are in the Javadoc comments. 
  • 37.
      ColumnFamilyTemplate
    Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); *** (no generics for clarity)
  • 38.
      ColumnFamilyTemplate
    new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); Key Format Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
  • 39.
      ColumnFamilyTemplate
    ColumnFamilyResult<String, String> res = cft.queryColumns(&quot;zznate&quot;); String value = res.getString(&quot;email&quot;); Date startDate = res.getDate(“startDate”); Key Format Column Name Format
  • 40.
      ColumnFamilyTemplate
    ColumnFamilyResult wrapper = template.queryColumns(&quot;GOOG&quot;, &quot;AAPL&quot;, &quot;NFLX&quot;); String googName = wrapper.getString(&quot;name&quot;); wrapper.next(); String aaplName = wrapper.getString(&quot;name&quot;); wrapper.next(); String nflxName = wrapper.getString(&quot;name&quot;); Querying multiple rows and iterating over results
  • 41.
      ColumnFamilyTemplate
    ColumnFamilyUpdater updater = template.createUpdater(&quot;AAPL&quot;); updater.setString(&quot;exchange&quot;,&quot;NASDAQ&quot;); updater.addKey(&quot;GOOG&quot;); updater.setString(&quot;exchange&quot;,&quot;NASDAQ&quot;); template.update(updater); Inserting data with ColumnFamilyUpdater
  • 42.
      ColumnFamilyTemplate
    template.deleteColumn(&quot;AAPL&quot;, &quot;notNeededStuff&quot;); template.deleteColumn(&quot;GOOG&quot;, &quot;somethingElse&quot;); template.deleteColumn(&quot;GOOG&quot;, &quot;aDifferentColumnName&quot;); ... template.deleteRow(“NOK”); template.executeBatch(); Deleting Data with ColumnFamilyTemplate
  • 43.
      Deletion
  • 44.
      Deletion
    • Again: Every mutation is an insert!
    • 45. - Merge on read
    • 46. - Sstables are immutable
    • 47. - Highest timestamp wins
  • 48.
      Deletion – As Seen by CLI
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • 49. -------------------
    • 50. RowKey: CA Burlingame
    • 51. => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
    • 52. -------------------
    • 53. RowKey: TX Austin
    • 54. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • 55. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • 56. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • 57. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • 58. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
  • 59.
      Deletion – As Seen by CLI
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • 60. -------------------
    • 61. RowKey: CA Burlingame
    • 62. -------------------
    • 63. RowKey: TX Austin
    • 64. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • 65. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • 66. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • 67. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • 68. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
  • 69.
      Deletion – FYI
      mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
      Does not exist? You just inserted a tombstone!
      Sending a deletion for a non-existing row:
      [default@Tutorial] list Npanxx;
    • Using default limit of 100
    • 70. . . .
    • 71. -------------------
    • 72. RowKey: 202230
    • 73. -------------------
    • 74. . . .
  • 75.
      Integrating with existing patterns
    • <bean id=&quot;cassandraHostConfigurator&quot;
    • 76. class=&quot;me.prettyprint.cassandra.service.CassandraHostConfigurator&quot;>
    • 77. <constructor-arg value=&quot;localhost:9170&quot;/>
    • 78. </bean>
    • 79. <bean id=&quot;cluster&quot; class=&quot;me.prettyprint.cassandra.service.ThriftCluster&quot;>
    • 80. <constructor-arg value=&quot;TestCluster&quot;/>
    • 81. <constructor-arg ref=&quot;cassandraHostConfigurator&quot;/>
    • 82. </bean>
    • 83. <bean id=&quot;consistencyLevelPolicy&quot; class=&quot;me.prettyprint.cassandra.model.ConfigurableConsistencyLevel&quot;>
    • 84. <property name=&quot;defaultReadConsistencyLevel&quot; value=&quot;ONE&quot;/>
    • 85. </bean>
    • 86. <bean id=&quot;keyspaceOperator&quot; class=&quot;me.prettyprint.hector.api.factory.HFactory&quot;
    • 87. factory-method=&quot;createKeyspace&quot;>
    • 88. <constructor-arg value=&quot;Keyspace1&quot;/>
    • 89. <constructor-arg ref=&quot;cluster&quot;/>
    • 90. <constructor-arg ref=&quot;consistencyLevelPolicy&quot;/>
    • 91. </bean>
    • 92. <bean id=&quot;simpleCassandraDao&quot; class=&quot;me.prettyprint.cassandra.dao.SimpleCassandraDao&quot;>
    • 93. <property name=&quot;keyspace&quot; ref=&quot;keyspaceOperator&quot;/>
    • 94. <property name=&quot;columnFamilyName&quot; value=&quot;Standard1&quot;/>
    • 95. </bean>
  • 96.
      Integrating with existing patterns
    • Hector Object Mapper:
    • 97. https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29
    • 98. Hector JPA:
    • 99. https://github.com/riptano/hector-jpa
  • 100.
      CQL via JDBC
  • 101.
      CQL via JDBC
    • - Integrate with existing tools (Spring Framework's JdbcTemplate in this case)
    • 102. *** Still some caveats and missing features
  • 103.
      CQL via JDBC
    • https://github.com/riptano/jdbc-conn-pool
    • 104. - see portfolio_example sub project
  • 105.
      CQL via JDBC: Components
    • - HCQLDataSource (from jdbc-pool)
    • 106. - Spring Framework's JdbcTemplate
    • 107. - DAO class with associated domain objects
    • 108. - Junit
    • 109. - Spring Framework's SpringJUnit4ClassRunner (context setup and injection)
    • 110. - EmbededServerHelper from hector-test (manage Cassandra lifecycle, directories and configuration)
  • 111.
      CQL via JDBC: Configuration
    • Pool Configuration
    • 112. - Cluster name, keyspace and at least 1 host required
    • 113. - Additional settings for:
    • 114. * fail over semantics
    • 115. * automatic host discovery
    • 116. * timeout counters and thresholds
  • 117.
      CQL via JDBC: Configuration (JNDI)
    • <Resource name= &quot;cassandra/CassandraClientFactory&quot;
    • 118. auth= &quot;Container&quot;
    • 119. type= &quot;me.prettyprint.cassandra.api.Keyspace&quot;
    • 120. factory= &quot;me.prettyprint.cassandra.jndi.CassandraClientJndiResourceFactory&quot;
    • 121. hosts= &quot;cass1:9160,cass2:9160,cass3:9160&quot;
    • 122. user= &quot;user&quot;
    • 123. password= &quot;passwd&quot;
    • 124. keyspace=&quot; Keyspace1&quot;
    • 125. clusterName= &quot;Test Cluster&quot;
    • 126. maxActive= &quot;20&quot;
    • 127. maxWaitTimeWhenExhausted= &quot;10&quot;
    • 128. failoverPolicy= &quot;ON_FAIL_TRY_ALL_AVAILABLE&quot;
    • 129. autoDiscoverHosts= &quot;true&quot;
    • 130. runAutoDiscoveryAtStartup= &quot;true&quot; />
  • 131.
      CQL via JDBC: Configuration (Spring)
    • <bean class= &quot;com.datastax.drivers.jdbc.pool.cassandra.jdbc.HCQLDataSource&quot;
    • 132. id= &quot;ds&quot; >
    • 133. <property name= &quot;clusterName&quot; value= &quot;TestCluster&quot; />
    • 134. <property name= &quot;keyspaceName&quot; value= &quot;PortfolioDemo&quot; />
    • 135. <property name= &quot;hosts&quot; value= &quot;127.0.0.1:9170&quot; />
    • 136. </bean>
    • 137. <bean class= &quot;org.springframework.jdbc.core.JdbcTemplate&quot;
    • 138. id=&quot; jdbcTemplate&quot; >
    • 139. <constructor-arg ref= &quot;ds&quot; />
    • 140. </bean>
  • 141.
      CQL via JDBC: Components
    • private static final String PORTFOLIOS_INSERT =
    • 142. &quot;BEGIN BATCH &quot;
    • 143. + &quot;INSERT INTO Portfolios (KEY, BLU, CJS, DAL) VALUES (168,'19', '7', '38') &quot;
    • 144. + &quot;INSERT INTO Portfolios (KEY, BSX, CHK, DNB, MCI, SR) VALUES (236,'32', '27', '7','8','3') &quot;
    • 145. + &quot;APPLY BATCH&quot; ;
    • 146. ...
    • 147. jdbcTemplate.execute(PORTFOLIOS_INSERT);
    Inserting Test Data
  • 148.
      CQL via JDBC: Components
    • public Stock mapRow(ResultSet rs, int row) throws SQLException {
    • 149. CassandraResultSet crs = (CassandraResultSet)rs;
    • 150. Stock stock = new Stock();
    • 151. stock.setTicker(new String(crs.getKey()));
    • 152. stock.setPrice(crs.getDouble(&quot;price&quot;));
    • 153. return stock;
    • 154. }
    • 155. See PortfolioDao#loadStocks
    Reading Data via RowMapper
  • 156.
      Development Resources
    CQL Documentation (and CQL Shell) http://www.datastax.com/docs/1.0/dml/using_cql Hector Documentation http://hector-client.org
    • Cassandra Maven Plugin (exec-cql goal) http://mojo.codehaus.org/cassandra-maven-plugin/
    • 157. CCM localhost cassandra cluster https://github.com/pcmanus/ccm
    • 158. OpsCenter http://www.datastax.com/products/opscenter
      Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
  • 159.
      Putting it Together
  • 160.
      Take control of consistency
    • If you do need a high degree of consistency, use thresholds to trigger different behavior
    • 161. - Bank account:
    • 162. “ on values over $10,000, wait to here from all replicas”
    • 163. - Distributed Shopping Cart:
    • 164. Show a confirmation page to verify order resolution
    • 165. *** What is your appetite for risk?
  • 166. Uniquely identify operations in the application
    • Facilitates idempotent behavior and out-of-order execution
  • 167.
      Denormalization
    • The point of normalization is to avoid update anomalies
    • 168. ***But In an append-only system, we don't do updates
  • 169.
      Summary
    • - Take advantage of strengths
    • 170. - Look for idempotence and asynchronicity in your business processes
    • 171. - If it's not in the API, you are probably doing it wrong
    • 172. - Seek death is still possible if you model incorrectly
  • 173.
      Questions
      Nate McCall [email_address] @zznate
  • 174.
      Additional Resources
    • DataStax Documentation: http://www.datastax.com/docs/0.8/index
    • 175. Apache Cassandra project wiki: http://wiki.apache.org/cassandra/
    • 176. “ The Dynamo Paper”
    • 177. http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
    • 178. P. Helland. Building on Quicksand
    • 179. http://arxiv.org/pdf/0909.1788
    • 180. P. Helland. Life Beyond Distributed Transactions
    • 181. http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
    • 182. S. Anand. “Netflix's Transition to High-Availability Storage Systems”
    • 183. http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf
    • 184. “ The Megastore Paper”
    • 185. http://research.google.com/pubs/archive/36971.pdf