<ul>Java, Big Data and  Apache Cassandra </ul><ul>Nate McCall [email_address] @zznate </ul>
<ul>Apache Cassandra: Origins in big data </ul>
<ul>Apache Cassandra: Origins in big data </ul>
But first... the CAP Theorem  C onsistency A vailability  P artition Tolerance “ Thou shalt have but 2”  - Conjecture made...
CAP Theorem: Cassandra Style  - Explicit choice of partition tolerance and availability.  - Opt for more consistency at th...
<ul>Apache Cassandra Concepts </ul>- No read before write - Merge on read - Idempotent - Schema Optional - All nodes share...
Generally complements another system(s)  (Not intended to be one-size-fits-all) *** You should always use the right tool f...
How does this differ from an RDBMS?
How does this differ from an RDBMS? Substantially.
vs. RDBMS - No Joins  Unless:  - you do them on the client  - you do them via Map/Reduce
vs. RDBMS - Schema Optional  (Though you can add meta information for validation and type checking)  *** Supports secondar...
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions  - Limited support for ad-hoc queries
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions  - Limited support for ad-hoc queries *** You are ...
<ul>vs. RDBMS - Facilitates Consolidation </ul>It can be your caching layer * Off-heap cache (provided you install JNA) It...
vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No sin...
<ul>vs. RDBMS - Real Linear Scalability </ul>Want 2x performance? Add 2x nodes (with no downtime!)
<ul>vs. RDBMS - Performance </ul>Reads on par with writes
<ul>Clustering </ul>
<ul>Clustering </ul>Consistent Hashing FTW: - No fancy shard logic or tedious management of such required  - Ring ownershi...
<ul>Clustering </ul>Single node cluster (easy development setup) - one node owns the whole hash range
<ul>Clustering </ul>Two node cluster - Key range divided between nodes
<ul>Clustering </ul>Consistent Hashing: md5(“zznate”) = “C”
<ul>Clustering: The Client's Perspective  </ul>Client Read:  get(“zznate”) md5 = “C”
<ul>Clustering – Scale Out </ul>
<ul>Clustering – Scale Out </ul>
<ul>Clustering – Scale Out </ul>
<ul>Clustering - Multi-DC </ul>
<ul>Clustering - Reliability </ul>
<ul>Clustering - Reliability </ul>
<ul>Clustering - Reliability </ul>
<ul>Clustering - Reliability </ul>
<ul>Clustering - Multi-Datacenter </ul>
<ul>Clustering – Multi-DC Reliability </ul>
<ul>Storage (Briefly)  </ul>
<ul>Storage (Briefly)  </ul><ul>Understanding the on-disk format is extremely helpful in designing your data model correct...
<ul>Storage - SSTable </ul><ul>- SSTables are immutable (“Merge on read”) <li>- Newest timestamp wins </li></ul>
<ul>Storage – Compaction </ul><ul>Merge SSTables – keeping count down making Merge on Read more efficient <li>Discards Tom...
<ul>Data Model </ul>
<ul>Data Model </ul><ul>&quot;...sparse, persistent, distributed, multi-dimensional sorted map.&quot; <li>(The “Bigtable” ...
<ul>Data Model </ul><ul>Keyspace <li>- Collection of Column Families
- Controls replication
Column Family
- Similar to a table
- Columns ordered by name </li></ul>
<ul>Data Model – Column Family </ul><ul>Static Column Family <li>- Model my object data
Dynamic Column Family
- Pre-calculated query results
Nothing stopping you from mixing them! </li></ul>
<ul>Data Model – Static CF </ul><ul>GOOG </ul><ul>AAPL </ul><ul>NFLX </ul><ul>NOK </ul><ul><li>price: 589.55 </li></ul><ul...
<ul>Data Model – Prematerialized Query </ul><ul>StockHist </ul><ul>10/25/2011: 6.71 </ul><ul>GOOG </ul><ul>AAPL </ul><ul>N...
Data Model – Prematerialized Query Additional examples: Timeline of tweets by a user Timeline of tweets by all of the peop...
<ul>API Operations  </ul>
Five general categories <ul>Retrieving Writing/Updating/Removing (all the same op!) <ul>Increment counters </ul>Meta Infor...
Big Data Fun and Hijinks <ul>- Hadoop integration - Pig Integration - Hive Integration  * open source version coming soon ...
Big Data: Map/Reduce Integration Cassandra Implementations of: - InputFormat and OutputFormat  - RecordReader and RecordWr...
Big Data: Pig Integration grunt> name_group = GROUP score_data BY name PARALLEL 3; grunt> name_total = FOREACH name_group ...
Using a Client Hector Client: http://hector-client.org - Most popular Java client  - In use at very large installations - ...
<ul>Sample Project for Experimenting </ul>https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-ex...
<ul>Hector: ColumnFamilyTemplate </ul>Familiar, type-safe approach - based on template-method design pattern - generic: Co...
<ul>Hector: ColumnFamilyTemplate </ul>new ThriftColumnFamilyTemplate(keyspaceName,  columnFamilyName,  StringSerializer.ge...
Hector:  ColumnFamilyTemplate ColumnFamilyResult<String, String> res = cft.queryColumns(&quot;zznate&quot;); String value ...
Hector:  ColumnFamilyTemplate ColumnFamilyResult wrapper =  template.queryColumns(&quot;zznate&quot;, &quot;patricioe&quot...
Hector:  ColumnFamilyTemplate ColumnFamilyResult wrapper =  template.queryColumns(&quot;zznate&quot;, &quot;patricioe&quot...
Hector:  ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;);  updater.setString...
Hector:  ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;);  updater.setString...
Hector:  ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;);  updater.setString...
Hector:  ColumnFamilyTemplate template.deleteColumn(&quot;zznate&quot;, &quot;notNeededStuff&quot;); template.deleteColumn...
Hector:  ColumnFamilyTemplate template.deleteColumn(&quot;zznate&quot;, &quot;notNeededStuff&quot;); template.deleteColumn...
<ul>Deletion </ul>
<ul>Deletion </ul><ul><li>Again: Every mutation is an insert!
- Merge on read
- Sstables are immutable
- Highest timestamp wins </li></ul>
<ul>Deletion – As Seen by CLI </ul><ul>[default@Tutorial] list Portfolio; <li>Using default limit of 100
-------------------
RowKey: 12783
=> (column=GOOG, value=30, timestamp=1310340410528000)
-------------------
RowKey: 15736
=> (column=AAPL, value=20, timestamp=1310143852392000)
=> (column=NOK, value=90, timestamp=1310143852444000)
Upcoming SlideShare
Loading in...5
×

Nyc summit intro_to_cassandra

1,947

Published on

Introduction to Apache Cassandra from Cassandra NYC summit

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,947
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
96
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • TODO: need fb logo
  • TODO: need fb logo
  • Nyc summit intro_to_cassandra

    1. 1. <ul>Java, Big Data and Apache Cassandra </ul><ul>Nate McCall [email_address] @zznate </ul>
    2. 2. <ul>Apache Cassandra: Origins in big data </ul>
    3. 3. <ul>Apache Cassandra: Origins in big data </ul>
    4. 4. But first... the CAP Theorem C onsistency A vailability P artition Tolerance “ Thou shalt have but 2” - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See: http://en.wikipedia.org/wiki/CAP_theorem for more
    5. 5. CAP Theorem: Cassandra Style - Explicit choice of partition tolerance and availability. - Opt for more consistency at the cost of availability Consistency is tunable (per operation)
    6. 6. <ul>Apache Cassandra Concepts </ul>- No read before write - Merge on read - Idempotent - Schema Optional - All nodes share the same roll - Still performs well with larger-than-memory data sets
    7. 7. Generally complements another system(s) (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
    8. 8. How does this differ from an RDBMS?
    9. 9. How does this differ from an RDBMS? Substantially.
    10. 10. vs. RDBMS - No Joins Unless: - you do them on the client - you do them via Map/Reduce
    11. 11. vs. RDBMS - Schema Optional (Though you can add meta information for validation and type checking) *** Supports secondary indexes too: “ … WHERE state = 'TX' ”
    12. 12. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries
    13. 13. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
    14. 14. <ul>vs. RDBMS - Facilitates Consolidation </ul>It can be your caching layer * Off-heap cache (provided you install JNA) It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon
    15. 15. vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
    16. 16. <ul>vs. RDBMS - Real Linear Scalability </ul>Want 2x performance? Add 2x nodes (with no downtime!)
    17. 17. <ul>vs. RDBMS - Performance </ul>Reads on par with writes
    18. 18. <ul>Clustering </ul>
    19. 19. <ul>Clustering </ul>Consistent Hashing FTW: - No fancy shard logic or tedious management of such required - Ring ownership continuously “gossiped” between nodes - Any node can act as a “coordinator” to service client requests for any key * requests forwarded to the appropriate nodes by coordinator transparently to the client
    20. 20. <ul>Clustering </ul>Single node cluster (easy development setup) - one node owns the whole hash range
    21. 21. <ul>Clustering </ul>Two node cluster - Key range divided between nodes
    22. 22. <ul>Clustering </ul>Consistent Hashing: md5(“zznate”) = “C”
    23. 23. <ul>Clustering: The Client's Perspective </ul>Client Read: get(“zznate”) md5 = “C”
    24. 24. <ul>Clustering – Scale Out </ul>
    25. 25. <ul>Clustering – Scale Out </ul>
    26. 26. <ul>Clustering – Scale Out </ul>
    27. 27. <ul>Clustering - Multi-DC </ul>
    28. 28. <ul>Clustering - Reliability </ul>
    29. 29. <ul>Clustering - Reliability </ul>
    30. 30. <ul>Clustering - Reliability </ul>
    31. 31. <ul>Clustering - Reliability </ul>
    32. 32. <ul>Clustering - Multi-Datacenter </ul>
    33. 33. <ul>Clustering – Multi-DC Reliability </ul>
    34. 34. <ul>Storage (Briefly) </ul>
    35. 35. <ul>Storage (Briefly) </ul><ul>Understanding the on-disk format is extremely helpful in designing your data model correctly </ul>
    36. 36. <ul>Storage - SSTable </ul><ul>- SSTables are immutable (“Merge on read”) <li>- Newest timestamp wins </li></ul>
    37. 37. <ul>Storage – Compaction </ul><ul>Merge SSTables – keeping count down making Merge on Read more efficient <li>Discards Tombstones (more on this later!) </li></ul>
    38. 38. <ul>Data Model </ul>
    39. 39. <ul>Data Model </ul><ul>&quot;...sparse, persistent, distributed, multi-dimensional sorted map.&quot; <li>(The “Bigtable” paper) </li></ul>
    40. 40. <ul>Data Model </ul><ul>Keyspace <li>- Collection of Column Families
    41. 41. - Controls replication
    42. 42. Column Family
    43. 43. - Similar to a table
    44. 44. - Columns ordered by name </li></ul>
    45. 45. <ul>Data Model – Column Family </ul><ul>Static Column Family <li>- Model my object data
    46. 46. Dynamic Column Family
    47. 47. - Pre-calculated query results
    48. 48. Nothing stopping you from mixing them! </li></ul>
    49. 49. <ul>Data Model – Static CF </ul><ul>GOOG </ul><ul>AAPL </ul><ul>NFLX </ul><ul>NOK </ul><ul><li>price: 589.55 </li></ul><ul><li>price: 401.76 </li></ul><ul>price: 78.73 </ul><ul>name : Google </ul><ul>name : Apple </ul><ul>name : Netflix </ul><ul>price: 6.90 </ul><ul>name : Nokia </ul><ul>exchange : NYSE </ul><ul>Stocks </ul>
    50. 50. <ul>Data Model – Prematerialized Query </ul><ul>StockHist </ul><ul>10/25/2011: 6.71 </ul><ul>GOOG </ul><ul>AAPL </ul><ul>NFLX </ul><ul>NOK </ul><ul>10/24/2011: 6.76 </ul><ul>10/21/2011: 6.61 </ul><ul>10/25/2011: 77.37 </ul><ul>10/24/2011: 118.84 </ul><ul>10/21/2011: 117.04 </ul><ul>10/25/2011: 397.77 </ul><ul>10/24/2011: 405.77 </ul><ul>10/21/2011: 392.87 </ul><ul>10/25/2011: 583.16 </ul><ul>10/24/2011: 596.42 </ul><ul>10/21/2011: 590.49 </ul>
    51. 51. Data Model – Prematerialized Query Additional examples: Timeline of tweets by a user Timeline of tweets by all of the people a user is following List of comments sorted by score List of friends grouped by state
    52. 52. <ul>API Operations </ul>
    53. 53. Five general categories <ul>Retrieving Writing/Updating/Removing (all the same op!) <ul>Increment counters </ul>Meta Information Schema Manipulation CQL Execution </ul>
    54. 54. Big Data Fun and Hijinks <ul>- Hadoop integration - Pig Integration - Hive Integration * open source version coming soon * available in DataStax Enterprise </ul>
    55. 55. Big Data: Map/Reduce Integration Cassandra Implementations of: - InputFormat and OutputFormat - RecordReader and RecordWriter - InputSplit for Column Families *** See org.apache.cassandra.hadoop package and examples for more
    56. 56. Big Data: Pig Integration grunt> name_group = GROUP score_data BY name PARALLEL 3; grunt> name_total = FOREACH name_group GENERATE group, COUNT(score_data.name), LongSum(score_data.score) AS total_score; grunt> ordered_scores = ORDER name_total BY total_score DESC PARALLEL 3; grunt> DUMP ordered_scores;
    57. 57. Using a Client Hector Client: http://hector-client.org - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed *** like any open source project fully dependent on another open source project it has it's worts
    58. 58. <ul>Sample Project for Experimenting </ul>https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-examples Built using Hector Really basic – designed to be beginner level w/ very few moving parts Modify/abuse/alter as needed *** Descriptions of what is going on and how to run each example are in the Javadoc comments. 
    59. 59. <ul>Hector: ColumnFamilyTemplate </ul>Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); *** (no generics for clarity)
    60. 60. <ul>Hector: ColumnFamilyTemplate </ul>new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); Key Format Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
    61. 61. Hector: ColumnFamilyTemplate ColumnFamilyResult<String, String> res = cft.queryColumns(&quot;zznate&quot;); String value = res.getString(&quot;email&quot;); Date startDate = res.getDate(“startDate”); Key Format Column Name Format
    62. 62. Hector: ColumnFamilyTemplate ColumnFamilyResult wrapper = template.queryColumns(&quot;zznate&quot;, &quot;patricioe&quot;, &quot;thobbs&quot;) ; while (wrapper.hasNext() ) { emails.put(wrapper.getKey(), wrapper.getString(&quot;email&quot;)); ... Querying multiple rows
    63. 63. Hector: ColumnFamilyTemplate ColumnFamilyResult wrapper = template.queryColumns(&quot;zznate&quot;, &quot;patricioe&quot;, &quot;thobbs&quot;); while ( wrapper.hasNext() ) { emails.put(wrapper.getKey(), wrapper.getString(&quot;email&quot;)); ... Iterating over results
    64. 64. Hector: ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;); updater.setString(&quot;companyName&quot;,&quot;DataStax&quot;); updater.addKey(&quot;sergek&quot;); updater.setString(&quot;companyName&quot;,&quot;PrestoSports&quot;); template.update(updater); Insert: Creating an updater for a key
    65. 65. Hector: ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;); updater.setString(&quot;companyName&quot;,&quot;DataStax&quot;); updater.addKey(&quot;sergek&quot;); updater.setString(&quot;companyName&quot;,&quot;PrestoSports&quot;); template.update(updater); Insert: Adding Multiple Rows
    66. 66. Hector: ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;); updater.setString(&quot;companyName&quot;,&quot;DataStax&quot;); updater.addKey(&quot;sergek&quot;); updater.setString(&quot;companyName&quot;,&quot;PrestoSports&quot;); template.update(updater); Insert: Invoking Batch Execution
    67. 67. Hector: ColumnFamilyTemplate template.deleteColumn(&quot;zznate&quot;, &quot;notNeededStuff&quot;); template.deleteColumn(&quot;zznate&quot;, &quot;somethingElse&quot;); template.deleteColumn(&quot;patricioe&quot;, &quot;aDifferentColumnName&quot;); ... template.deleteRow(“someuser”); template.executeBatch(); Deleting Data: Single Column
    68. 68. Hector: ColumnFamilyTemplate template.deleteColumn(&quot;zznate&quot;, &quot;notNeededStuff&quot;); template.deleteColumn(&quot;zznate&quot;, &quot;somethingElse&quot;); template.deleteColumn(&quot;patricioe&quot;, &quot;aDifferentColumnName&quot;); ... template.deleteRow(“someuser”); template.executeBatch(); Deleting Data: Whole Row
    69. 69. <ul>Deletion </ul>
    70. 70. <ul>Deletion </ul><ul><li>Again: Every mutation is an insert!
    71. 71. - Merge on read
    72. 72. - Sstables are immutable
    73. 73. - Highest timestamp wins </li></ul>
    74. 74. <ul>Deletion – As Seen by CLI </ul><ul>[default@Tutorial] list Portfolio; <li>Using default limit of 100
    75. 75. -------------------
    76. 76. RowKey: 12783
    77. 77. => (column=GOOG, value=30, timestamp=1310340410528000)
    78. 78. -------------------
    79. 79. RowKey: 15736
    80. 80. => (column=AAPL, value=20, timestamp=1310143852392000)
    81. 81. => (column=NOK, value=90, timestamp=1310143852444000)
    82. 82. => (column=IBM, value=50, timestamp=1310143852448000)
    83. 83. => (column=GOOG, value=5, timestamp=1310143852453000)
    84. 84. => (column=INTC, value=200, timestamp=1310143852457000) </li></ul>
    85. 85. <ul>Deletion – As Seen by CLI </ul><ul>[default@Tutorial] list Portfolio; <li>Using default limit of 100
    86. 86. -------------------
    87. 87. RowKey: 12783
    88. 88. -------------------
    89. 89. RowKey: 15736
    90. 90. => (column=AAPL, value=20, timestamp=1310143852392000)
    91. 91. => (column=NOK, value=90, timestamp=1310143852444000)
    92. 92. => (column=IBM, value=50, timestamp=1310143852448000)
    93. 93. => (column=GOOG, value=5, timestamp=1310143852453000)
    94. 94. => (column=INTC, value=200, timestamp=1310143852457000) </li></ul>
    95. 95. <ul>Deletion – FYI </ul><ul>mutator.addDeletion(&quot;14100&quot;, &quot;INTC&quot;, 75, stringSerializer); </ul><ul>Does not exist? You just inserted a tombstone! </ul><ul>Sending a deletion for a non-existing row: </ul><ul>[default@Tutorial] list Portfolio; <li>Using default limit of 100
    96. 96. . . .
    97. 97. -------------------
    98. 98. RowKey: 14100
    99. 99. -------------------
    100. 100. . . . </li></ul>
    101. 101. <ul>Integrating with existing patterns </ul>
    102. 102. <ul>Integrating with existing patterns </ul><ul><li>“ Yes.” </li></ul>
    103. 103. <ul>Integrating with existing patterns </ul><ul><li><bean id=&quot;cassandraHostConfigurator&quot;
    104. 104. class=&quot;me.prettyprint.cassandra.service.CassandraHostConfigurator&quot;>
    105. 105. <constructor-arg value=&quot;localhost:9170&quot;/>
    106. 106. </bean>
    107. 107. <bean id=&quot;cluster&quot; class=&quot;me.prettyprint.cassandra.service.ThriftCluster&quot;>
    108. 108. <constructor-arg value=&quot;TestCluster&quot;/>
    109. 109. <constructor-arg ref=&quot;cassandraHostConfigurator&quot;/>
    110. 110. </bean>
    111. 111. <bean id=&quot;consistencyLevelPolicy&quot; class=&quot;me.prettyprint.cassandra.model.ConfigurableConsistencyLevel&quot;>
    112. 112. <property name=&quot;defaultReadConsistencyLevel&quot; value=&quot;ONE&quot;/>
    113. 113. </bean>
    114. 114. <bean id=&quot;keyspaceOperator&quot; class=&quot;me.prettyprint.hector.api.factory.HFactory&quot;
    115. 115. factory-method=&quot;createKeyspace&quot;>
    116. 116. <constructor-arg value=&quot;Keyspace1&quot;/>
    117. 117. <constructor-arg ref=&quot;cluster&quot;/>
    118. 118. <constructor-arg ref=&quot;consistencyLevelPolicy&quot;/>
    119. 119. </bean>
    120. 120. <bean id=&quot;simpleCassandraDao&quot; class=&quot;me.prettyprint.cassandra.dao.SimpleCassandraDao&quot;>
    121. 121. <property name=&quot;keyspace&quot; ref=&quot;keyspaceOperator&quot;/>
    122. 122. <property name=&quot;columnFamilyName&quot; value=&quot;Standard1&quot;/>
    123. 123. </bean> </li></ul>
    124. 124. <ul>Integrating with existing patterns </ul><ul><li>Hector Object Mapper (simple, JPA 1.0-style annotations):
    125. 125. https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29
    126. 126. Hector JPA (experimental open-jpa implementation):
    127. 127. https://github.com/riptano/hector-jpa </li></ul>
    128. 128. <ul>Integrating with existing patterns </ul><ul><li>private static final String STOCK_CQL =
    129. 129. “ select price FROM Stocks WHERE KEY = ?&quot;;
    130. 130. jdbcTemplate.query(STOCK_CQL, stockTicker,
    131. 131. new RowMapper<Stock>() {
    132. 132. public Stock mapRow(ResultSet rs, int row) throws SQLException {
    133. 133. CassandraResultSet crs = (CassandraResultSet)rs;
    134. 134. Stock stock = new Stock();
    135. 135. stock.setTicker(new String(crs.getKey()));
    136. 136. stock.setPrice(crs.getDouble(&quot;price&quot;));
    137. 137. return stock;
    138. 138. }
    139. 139. }); </li></ul>
    140. 140. <ul>Integrating with existing patterns </ul><ul><li>private static String UPDATE_PORTOFOLIO_CQL =
    141. 141. &quot; update Portfolios set ? = ? where KEY = ? &quot;;
    142. 142. jdbcTemplate.update(UPDATE_PORTFOLIO_CQL,
    143. 143. new Object[] {position.getTicker(),
    144. 144. position.getCount(),
    145. 145. portfolio.getName()}); </li></ul>
    146. 146. <ul>Integrating with existing patterns </ul><ul><li>private static final String UPDATE_PORT_CQL =
    147. 147. &quot;update Portfolios set ? = ? where KEY = ?&quot;;
    148. 148. jdbcTemplate.batchUpdate(UPDATE_PORT_CQL,
    149. 149. new BatchPreparedStatementSetter() {
    150. 150. public void setValues(PreparedStatement ps, int index) throws SQLException {
    151. 151. Position pos = portfolio.getConstituents().get(index);
    152. 152. ps.setString(1, pos.getTicker());
    153. 153. ps.setLong(2, pos.getShares());
    154. 154. ps.setString(3,portfolio.getName());
    155. 155. }
    156. 156. public int getBatchSize() {
    157. 157. return portfolio.getConstituents().size();
    158. 158. }
    159. 159. }); </li></ul>
    160. 160. <ul>Putting it Together </ul>
    161. 161. <ul>Take control of consistency </ul><ul><li>If you do need a high degree of consistency, use thresholds to trigger different behavior
    162. 162. - Bank account:
    163. 163. “ on values over $10,000, wait to here from all replicas”
    164. 164. - Distributed Shopping Cart:
    165. 165. Show a confirmation page to verify order resolution
    166. 166. *** What is your appetite for risk? </li></ul>
    167. 167. Uniquely identify operations in the application <ul><li>Facilitates idempotent behavior and out-of-order execution </li></ul>
    168. 168. <ul>Denormalization </ul><ul><li>The point of normalization is to avoid update anomalies
    169. 169. ***But In an append-only system, we don't do updates </li></ul>
    170. 170. <ul>Summary </ul><ul><li>- Take advantage of strengths
    171. 171. - Look for idempotence and asynchronicity in your business processes
    172. 172. - If it's not in the API, you are probably doing it wrong
    173. 173. - Seek death is still possible if you model incorrectly </li></ul>
    174. 174. <ul>Questions </ul><ul>Nate McCall [email_address] @zznate </ul>
    175. 175. <ul>Development Resources </ul>Hector Documentation http://hector-client.org <ul><li>Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/
    176. 176. CCM localhost cassandra cluster https://github.com/pcmanus/ccm
    177. 177. OpsCenter http://www.datastax.com/products/opscenter </li></ul><ul>Cassandra AMIs https://github.com/riptano/CassandraClusterAMI </ul>
    178. 178. <ul>Additional Resources </ul><ul><li>DataStax Documentation: http://www.datastax.com/docs/0.8/index
    179. 179. Apache Cassandra project wiki: http://wiki.apache.org/cassandra/
    180. 180. “ The Dynamo Paper”
    181. 181. http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
    182. 182. P. Helland. Building on Quicksand
    183. 183. http://arxiv.org/pdf/0909.1788
    184. 184. P. Helland. Life Beyond Distributed Transactions
    185. 185. http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
    186. 186. S. Anand. “Netflix's Transition to High-Availability Storage Systems”
    187. 187. http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf
    188. 188. “ The Megastore Paper”
    189. 189. http://research.google.com/pubs/archive/36971.pdf </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×