Meetup cassandra for_java_cql

  • 2,098 views
Uploaded on

Slides from 10/26/2011 Cassandra Austin Meetup group

Slides from 10/26/2011 Cassandra Austin Meetup group

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,098
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
98
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.
      Building Java Applications with Apache Cassandra
      Nate McCall [email_address] @zznate
  • 2.
      What is Apache Cassandra?
  • 3. CAP Theorem C onsistency A vailability P artition Tolerance “ Thou shalt have but 2” - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See: http://en.wikipedia.org/wiki/CAP_theorem for more
  • 4.
      Apache Cassandra Concepts
    - Explicit choice of partition tolerance and availability. Consistency is tunable. - No read before write - Merge on read - Idempotent - Schema Optional - All nodes share the same role - Still performs well with larger-than-memory data sets
  • 5. Generally complements another system(s) (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
  • 6. How does this differ from an RDBMS?
  • 7. How does this differ from an RDBMS? Substantially.
  • 8. vs. RDBMS - No Joins Unless: - you do them on the client - you do them via Map/Reduce
  • 9. vs. RDBMS - Schema Optional (Though you can add meta information for validation and type checking) *** Supports secondary indexes too: “ … WHERE state = 'TX' ”
  • 10. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries
  • 11. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
  • 12.
      vs. RDBMS - Facilitates Consolidation
    It can be your caching layer * Off-heap cache (provided you install JNA) It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon
  • 13. vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
  • 14.
      vs. RDBMS - Real Linear Scalability
    Want 2x performance? Add 2x nodes. *** 'No downtime' included!
  • 15.
      vs. RDBMS - Performance
    Reads on par with writes
  • 16.
      Storage (Briefly)
  • 17.
      Storage (Briefly)
      Understanding the on-disk format is extremely helpful in designing your data model correctly
  • 18.
      Storage - SSTable
      - SSTables are immutable (“Merge on read”)
    • - Newest timestamp wins
  • 19.
      Storage – Compaction
      Merge SSTables – keeping count down making Merge on Read more efficient
    • Discards Tombstones (more on this later!)
  • 20.
      Data Model
  • 21.
      Data Model
      "...sparse, persistent, distributed, multi-dimensional sorted map."
    • (The “Bigtable” paper)
  • 22.
      Data Model
      Keyspace
    • - Collection of Column Families
    • 23. - Controls replication
    • 24. Column Family
    • 25. - Similar to a table
    • 26. - Columns ordered by name
  • 27.
      Data Model – Column Family
      Static Column Family
    • - Model my object data
    • 28. Dynamic Column Family
    • 29. - Pre-calculated query results
    • 30. Nothing stopping you from mixing them!
  • 31.
      Data Model – Static CF
      GOOG
      AAPL
      NFLX
      NOK
    • price: 589.55
    • price: 401.76
      price: 78.73
      name : Google
      name : Apple
      name : Netflix
      price: 6.90
      name : Nokia
      exchange : NYSE
      Stocks
  • 32.
      Data Model – Prematerialized Query
      StockHist
      10/25/2011: 6.71
      GOOG
      AAPL
      NFLX
      NOK
      10/24/2011: 6.76
      10/21/2011: 6.61
      10/25/2011: 77.37
      10/24/2011: 118.84
      10/21/2011: 117.04
      10/25/2011: 397.77
      10/24/2011: 405.77
      10/21/2011: 392.87
      10/25/2011: 583.16
      10/24/2011: 596.42
      10/21/2011: 590.49
  • 33.
      API Operations
  • 34. Five general categories
      Retrieving Writing/Updating/Removing (all the same op!)
        Increment counters
      Meta Information Schema Manipulation CQL Execution
  • 35. Using a Client Hector Client: http://hector-client.org - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed *** like any open source project fully dependent on another open source project it has its worts
  • 36.
      Sample Project for Experimenting
    https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-examples Built using Hector Really basic – designed to be beginner level w/ very few moving parts Modify/abuse/alter as needed *** Descriptions of what is going on and how to run each example are in the Javadoc comments. 
  • 37.
      ColumnFamilyTemplate
    Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); *** (no generics for clarity)
  • 38.
      ColumnFamilyTemplate
    new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); Key Format Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
  • 39.
      ColumnFamilyTemplate
    ColumnFamilyResult<String, String> res = cft.queryColumns(&quot;zznate&quot;); String value = res.getString(&quot;email&quot;); Date startDate = res.getDate(“startDate”); Key Format Column Name Format
  • 40.
      ColumnFamilyTemplate
    ColumnFamilyResult wrapper = template.queryColumns(&quot;GOOG&quot;, &quot;AAPL&quot;, &quot;NFLX&quot;); String googName = wrapper.getString(&quot;name&quot;); wrapper.next(); String aaplName = wrapper.getString(&quot;name&quot;); wrapper.next(); String nflxName = wrapper.getString(&quot;name&quot;); Querying multiple rows and iterating over results
  • 41.
      ColumnFamilyTemplate
    ColumnFamilyUpdater updater = template.createUpdater(&quot;AAPL&quot;); updater.setString(&quot;exchange&quot;,&quot;NASDAQ&quot;); updater.addKey(&quot;GOOG&quot;); updater.setString(&quot;exchange&quot;,&quot;NASDAQ&quot;); template.update(updater); Inserting data with ColumnFamilyUpdater
  • 42.
      ColumnFamilyTemplate
    template.deleteColumn(&quot;AAPL&quot;, &quot;notNeededStuff&quot;); template.deleteColumn(&quot;GOOG&quot;, &quot;somethingElse&quot;); template.deleteColumn(&quot;GOOG&quot;, &quot;aDifferentColumnName&quot;); ... template.deleteRow(“NOK”); template.executeBatch(); Deleting Data with ColumnFamilyTemplate
  • 43.
      Deletion
  • 44.
      Deletion
    • Again: Every mutation is an insert!
    • 45. - Merge on read
    • 46. - Sstables are immutable
    • 47. - Highest timestamp wins
  • 48.
      Deletion – As Seen by CLI
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • 49. -------------------
    • 50. RowKey: CA Burlingame
    • 51. => (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
    • 52. -------------------
    • 53. RowKey: TX Austin
    • 54. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • 55. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • 56. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • 57. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • 58. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
  • 59.
      Deletion – As Seen by CLI
      [default@Tutorial] list StateCity;
    • Using default limit of 100
    • 60. -------------------
    • 61. RowKey: CA Burlingame
    • 62. -------------------
    • 63. RowKey: TX Austin
    • 64. => (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
    • 65. => (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
    • 66. => (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
    • 67. => (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
    • 68. => (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
  • 69.
      Deletion – FYI
      mutator.addDeletion(&quot;202230&quot;, &quot;Npanxx&quot;, “city”, stringSerializer);
      Does not exist? You just inserted a tombstone!
      Sending a deletion for a non-existing row:
      [default@Tutorial] list Npanxx;
    • Using default limit of 100
    • 70. . . .
    • 71. -------------------
    • 72. RowKey: 202230
    • 73. -------------------
    • 74. . . .
  • 75.
      Integrating with existing patterns
    • <bean id=&quot;cassandraHostConfigurator&quot;
    • 76. class=&quot;me.prettyprint.cassandra.service.CassandraHostConfigurator&quot;>
    • 77. <constructor-arg value=&quot;localhost:9170&quot;/>
    • 78. </bean>
    • 79. <bean id=&quot;cluster&quot; class=&quot;me.prettyprint.cassandra.service.ThriftCluster&quot;>
    • 80. <constructor-arg value=&quot;TestCluster&quot;/>
    • 81. <constructor-arg ref=&quot;cassandraHostConfigurator&quot;/>
    • 82. </bean>
    • 83. <bean id=&quot;consistencyLevelPolicy&quot; class=&quot;me.prettyprint.cassandra.model.ConfigurableConsistencyLevel&quot;>
    • 84. <property name=&quot;defaultReadConsistencyLevel&quot; value=&quot;ONE&quot;/>
    • 85. </bean>
    • 86. <bean id=&quot;keyspaceOperator&quot; class=&quot;me.prettyprint.hector.api.factory.HFactory&quot;
    • 87. factory-method=&quot;createKeyspace&quot;>
    • 88. <constructor-arg value=&quot;Keyspace1&quot;/>
    • 89. <constructor-arg ref=&quot;cluster&quot;/>
    • 90. <constructor-arg ref=&quot;consistencyLevelPolicy&quot;/>
    • 91. </bean>
    • 92. <bean id=&quot;simpleCassandraDao&quot; class=&quot;me.prettyprint.cassandra.dao.SimpleCassandraDao&quot;>
    • 93. <property name=&quot;keyspace&quot; ref=&quot;keyspaceOperator&quot;/>
    • 94. <property name=&quot;columnFamilyName&quot; value=&quot;Standard1&quot;/>
    • 95. </bean>
  • 96.
      Integrating with existing patterns
    • Hector Object Mapper:
    • 97. https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29
    • 98. Hector JPA:
    • 99. https://github.com/riptano/hector-jpa
  • 100.
      CQL via JDBC
  • 101.
      CQL via JDBC
    • - Integrate with existing tools (Spring Framework's JdbcTemplate in this case)
    • 102. *** Still some caveats and missing features
  • 103.
      CQL via JDBC
    • https://github.com/riptano/jdbc-conn-pool
    • 104. - see portfolio_example sub project
  • 105.
      CQL via JDBC: Components
    • - HCQLDataSource (from jdbc-pool)
    • 106. - Spring Framework's JdbcTemplate
    • 107. - DAO class with associated domain objects
    • 108. - Junit
    • 109. - Spring Framework's SpringJUnit4ClassRunner (context setup and injection)
    • 110. - EmbededServerHelper from hector-test (manage Cassandra lifecycle, directories and configuration)
  • 111.
      CQL via JDBC: Configuration
    • Pool Configuration
    • 112. - Cluster name, keyspace and at least 1 host required
    • 113. - Additional settings for:
    • 114. * fail over semantics
    • 115. * automatic host discovery
    • 116. * timeout counters and thresholds
  • 117.
      CQL via JDBC: Configuration (JNDI)
    • <Resource name= &quot;cassandra/CassandraClientFactory&quot;
    • 118. auth= &quot;Container&quot;
    • 119. type= &quot;me.prettyprint.cassandra.api.Keyspace&quot;
    • 120. factory= &quot;me.prettyprint.cassandra.jndi.CassandraClientJndiResourceFactory&quot;
    • 121. hosts= &quot;cass1:9160,cass2:9160,cass3:9160&quot;
    • 122. user= &quot;user&quot;
    • 123. password= &quot;passwd&quot;
    • 124. keyspace=&quot; Keyspace1&quot;
    • 125. clusterName= &quot;Test Cluster&quot;
    • 126. maxActive= &quot;20&quot;
    • 127. maxWaitTimeWhenExhausted= &quot;10&quot;
    • 128. failoverPolicy= &quot;ON_FAIL_TRY_ALL_AVAILABLE&quot;
    • 129. autoDiscoverHosts= &quot;true&quot;
    • 130. runAutoDiscoveryAtStartup= &quot;true&quot; />
  • 131.
      CQL via JDBC: Configuration (Spring)
    • <bean class= &quot;com.datastax.drivers.jdbc.pool.cassandra.jdbc.HCQLDataSource&quot;
    • 132. id= &quot;ds&quot; >
    • 133. <property name= &quot;clusterName&quot; value= &quot;TestCluster&quot; />
    • 134. <property name= &quot;keyspaceName&quot; value= &quot;PortfolioDemo&quot; />
    • 135. <property name= &quot;hosts&quot; value= &quot;127.0.0.1:9170&quot; />
    • 136. </bean>
    • 137. <bean class= &quot;org.springframework.jdbc.core.JdbcTemplate&quot;
    • 138. id=&quot; jdbcTemplate&quot; >
    • 139. <constructor-arg ref= &quot;ds&quot; />
    • 140. </bean>
  • 141.
      CQL via JDBC: Components
    • private static final String PORTFOLIOS_INSERT =
    • 142. &quot;BEGIN BATCH &quot;
    • 143. + &quot;INSERT INTO Portfolios (KEY, BLU, CJS, DAL) VALUES (168,'19', '7', '38') &quot;
    • 144. + &quot;INSERT INTO Portfolios (KEY, BSX, CHK, DNB, MCI, SR) VALUES (236,'32', '27', '7','8','3') &quot;
    • 145. + &quot;APPLY BATCH&quot; ;
    • 146. ...
    • 147. jdbcTemplate.execute(PORTFOLIOS_INSERT);
    Inserting Test Data
  • 148.
      CQL via JDBC: Components
    • public Stock mapRow(ResultSet rs, int row) throws SQLException {
    • 149. CassandraResultSet crs = (CassandraResultSet)rs;
    • 150. Stock stock = new Stock();
    • 151. stock.setTicker(new String(crs.getKey()));
    • 152. stock.setPrice(crs.getDouble(&quot;price&quot;));
    • 153. return stock;
    • 154. }
    • 155. See PortfolioDao#loadStocks
    Reading Data via RowMapper
  • 156.
      Development Resources
    CQL Documentation (and CQL Shell) http://www.datastax.com/docs/1.0/dml/using_cql Hector Documentation http://hector-client.org
    • Cassandra Maven Plugin (exec-cql goal) http://mojo.codehaus.org/cassandra-maven-plugin/
    • 157. CCM localhost cassandra cluster https://github.com/pcmanus/ccm
    • 158. OpsCenter http://www.datastax.com/products/opscenter
      Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
  • 159.
      Putting it Together
  • 160.
      Take control of consistency
    • If you do need a high degree of consistency, use thresholds to trigger different behavior
    • 161. - Bank account:
    • 162. “ on values over $10,000, wait to here from all replicas”
    • 163. - Distributed Shopping Cart:
    • 164. Show a confirmation page to verify order resolution
    • 165. *** What is your appetite for risk?
  • 166. Uniquely identify operations in the application
    • Facilitates idempotent behavior and out-of-order execution
  • 167.
      Denormalization
    • The point of normalization is to avoid update anomalies
    • 168. ***But In an append-only system, we don't do updates
  • 169.
      Summary
    • - Take advantage of strengths
    • 170. - Look for idempotence and asynchronicity in your business processes
    • 171. - If it's not in the API, you are probably doing it wrong
    • 172. - Seek death is still possible if you model incorrectly
  • 173.
      Questions
      Nate McCall [email_address] @zznate
  • 174.
      Additional Resources
    • DataStax Documentation: http://www.datastax.com/docs/0.8/index
    • 175. Apache Cassandra project wiki: http://wiki.apache.org/cassandra/
    • 176. “ The Dynamo Paper”
    • 177. http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
    • 178. P. Helland. Building on Quicksand
    • 179. http://arxiv.org/pdf/0909.1788
    • 180. P. Helland. Life Beyond Distributed Transactions
    • 181. http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
    • 182. S. Anand. “Netflix's Transition to High-Availability Storage Systems”
    • 183. http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf
    • 184. “ The Megastore Paper”
    • 185. http://research.google.com/pubs/archive/36971.pdf