Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Software Development with Apache Cassandra

465 views

Published on

Start to finish overview of tools, tips and techniques for developing software for Apache Cassandra. Includes code and configuration examples, build systems and container support.

Published in: Technology

Software Development with Apache Cassandra

  1. 1. CASSANDRA DAY DALLAS 2015 SOFTWARE DEVELOPMENT WITH CASSANDRA: A WALKTHROUGH Nate McCall @zznate Co-Founder & Sr.Technical Consultant http://www.slideshare.net/zznate/soft-dev-withcassandraawalkthrough Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. AboutThe Last Pickle. Work with clients to deliver and improve Apache Cassandra based solutions. Based in New Zealand,Australia & USA.
  3. 3. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  4. 4. Overview: What makes a software development project successful?
  5. 5. Overview: Successful Software Development - it ships - maintainable - good test coverage - check out and build
  6. 6. Overview: Impedance mismatch: distributed systems development on a laptop.
  7. 7. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  8. 8. Data Modeling: … a topic unto itself. But quickly:
  9. 9. Data Modeling - Quickly • It’s Hard • Do research • #1 performance problem • Don’t “port” your schema!
  10. 10. Data Modeling - Using CQL: • tools support • easy tracing (and trace discovery) • documentation* *Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile
  11. 11. Data Modeling - DevCenter: Tools: DataStax DevCenter http://www.datastax.com/what-we-offer/products-services/devcenter
  12. 12. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  13. 13. Writing Code: use CQL
  14. 14. Writing Code - Java Driver: Use the Java Driver • Reference implementation • Well written, extensive coverage • Open source • Dedicated development resources https://github.com/datastax/java-driver/
  15. 15. Writing Code - Java Driver: Existing Spring Users: Spring Data Integration http://projects.spring.io/spring-data-cassandra/
  16. 16. Writing Code - Java Driver: Four rules for Writing Code • one Cluster for physical cluster • one Session per app per keyspace • use PreparedStatements • use Batches to reduce network IO
  17. 17. Writing Code - Java Driver: Configuration is Similar to Other DB Drivers (with caveats**) http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html
  18. 18. Writing Cluster - Java Driver - Configuration: Major Difference: it’s a Cluster!
  19. 19. Writing Code - Java Driver - Configuration: Two groups of configurations • policies • connections
  20. 20. Writing Code - Java Driver - Configuration: Three Policy Types: • load balancing • connection • retry
  21. 21. Writing Code - Java Driver - Configuration: Connection Options: • protocol* • pooling** • socket *https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec **https://github.com/datastax/java-driver/tree/2.1/features/pooling
  22. 22. Writing Code - Java Driver - Configuration: Code sample for building a Cluster
  23. 23. https://github.com/datastax/java-driver/tree/2.1/features/compression
  24. 24. https://github.com/datastax/java-driver/tree/2.1/features/logging
  25. 25. Writing Code - Java Driver - Pagination: Simple result iteration CREATE TABLE IF NOT EXISTS transit.vehicle_data ( vehicle_id text, speed double, time timeuuid, PRIMARY KEY ((customer_id), time) );
  26. 26. Writing Code - Java Driver - Pagination: Simple result iteration: Java 8 style
  27. 27. Writing Code - Java Driver - Async Async! (not so) Simple result iteration
  28. 28. Writing Code - Java Driver - Pagination: Not much to it: PreparedStatement prepStmt = session.prepare(CQL_STRING); BoundStatement boundStmt = new BoundStatement(prepStmt); boundStatement.setFetchSize(100) https://github.com/datastax/java-driver/tree/2.1/features/paging
  29. 29. Writing Code - Java Driver - Inserts and Updates: About Inserts (and updates)
  30. 30. Writing Code - Java Driver - Inserts and Updates: Batches: three types - logged - unlogged - counter
  31. 31. Writing Code - Java Driver - Inserts and Updates: unlogged batch
  32. 32. Writing Code - Java Driver - Inserts and Updates: LWT: INSERT INTO vehicle (vehicle_id, make, model, vin) VALUES ('VHE-101', 'Toyota','Tercel','1234f') IF NOT EXISTS;
  33. 33. Writing Code - Java Driver - Inserts and Updates: LWT: UPDATE vehicle SET vin = '123fa' WHERE vehichle_id = 'VHE-101' IF vin = '1234f';
  34. 34. Writing Code: ORM? Great for basic CRUD operations http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html
  35. 35. https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
  36. 36. https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
  37. 37. https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
  38. 38. https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
  39. 39. Writing Code - Java Driver: A note about User Defined Types (UTDs)
  40. 40. Writing Code - Java Driver - Using UDTs: Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path * https://issues.apache.org/jira/browse/CASSANDRA-7423
  41. 41. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  42. 42. Testing: Use a Naming Scheme • *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”
  43. 43. Testing: Tip: wildcards on the CLI are not a naming schema.
  44. 44. Testing: Group tests into logical units (“suites”)
  45. 45. Testing - Suites: Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles
  46. 46. <profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
  47. 47. <profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
  48. 48. Testing - Suites: Using annotations for suites in code
  49. 49. Testing - Suites: Interesting test plumbing • [Before|Afer]Suite • [Before|After]Group • Listeners
  50. 50. Testing: Use Mocks where possible
  51. 51. Testing: scassandra: not quite integration http://www.scassandra.org/
  52. 52. Testing: Unit Integration Testing
  53. 53. Testing: Verify Assumptions: test failure scenarios explicitly
  54. 54. Testing - Integration: Runtime Integrations: • local • in-process • forked-process
  55. 55. Testing - Integration - Runtime: EmbeddedCassandra https://github.com/jsevellec/cassandra-unit/
  56. 56. Testing - Integration - Runtime: ProcessBuilder to fork Cassandra(s)
  57. 57. Testing - Integration - Runtime: CCMBridge: delegate to CCM https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java
  58. 58. Testing - Integration: Best Practice: Jenkins should be able to manage your cluster
  59. 59. Testing: Load Testing Goals • reproducible metrics • catch regressions • test to breakage point
  60. 60. Testing - LoadTesting: Stress.java (lot’s of changes recently) https://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.html http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
  61. 61. Testing - LoadTesting: Workload recording and playback coming soon one day https://issues.apache.org/jira/browse/CASSANDRA-8929
  62. 62. Testing: Primary testing goal: Don’t let cluster behavior surprise you.
  63. 63. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  64. 64. Writing Code: Metrics API for your own code https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/
  65. 65. Writing Code - Instrumentation via Metrics API: Run Riemann locally http://riemann.io/
  66. 66. Reviewing Said Code: Using Trace (and doing so frequently)
  67. 67. Writing Code -Tracing: Trace per query via DevCenter http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
  68. 68. Writing Code -Tracing: Trace per query via cqlsh http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
  69. 69. Writing Code -Tracing: Trace per query via Java Driver http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Statement.html#enableTracing()
  70. 70. cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; doc_version ------------- 65856 Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b
  71. 71. Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …
  72. 72. Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …
  73. 73. … Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
  74. 74. … Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592 !!?!
  75. 75. … Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
  76. 76. Writing Code -Tracing: Enable traces in the driver http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html
  77. 77. Writing Code -Tracing: `nodetool settraceprobability`
  78. 78. Writing Code -Tracing: …then make sure you try it again with a node down!
  79. 79. Writing Code -Tracing: Final note on tracing: do it sparingly
  80. 80. Writing Code -Tracing: Enable query latency logging https://github.com/datastax/java-driver/tree/2.1/features/logging
  81. 81. Writing Code: LoggingVerbosity can be changed dynamically** ** since 0.4rc1 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html
  82. 82. Writing Code: nodetool for developers • cfstats • cfshistograms • proxyhistograms
  83. 83. Writing Code - nodetool - cfstats: cfstats: per-table statistics about size and performance (single most useful command)
  84. 84. Writing Code - nodetool - cfhistograms: cfhistograms: column count and partition size vs. latency distribution
  85. 85. Writing Code - nodetool - proxyhistograms: proxyhistograms: performance of inter-cluster requests
  86. 86. OVERVIEW DATA MODELING WRITING CODE TESTING REVIEWING MANAGING ENVIRONMENTS
  87. 87. Managing Environments: Configuration Management is Essential
  88. 88. Managing Environments: Laptop to Production with NO Manual Modifications!
  89. 89. Managing Environments: Running Cassandra during development
  90. 90. Managing Environments - Running Cassandra: Local Cassandra • easy to setup • you control it • but then you control it!
  91. 91. Managing Environments - Running Cassandra: CCM • supports multiple versions • clusters and datacenters • up/down individual nodes https://github.com/pcmanus/ccm
  92. 92. Managing Environments - Running Cassandra: Docker: • Official image available with excellent docs* • Docker Compose for more granular control** *https://hub.docker.com/_/cassandra/ **https://docs.docker.com/compose/
  93. 93. Managing Environments - Running Cassandra: Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production! http://www.vagrantup.com/
  94. 94. server_count = 3 network = '192.168.2.' first_ip = 10 servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
  95. 95. server_count = 3 network = '192.168.2.' first_ip = 10 servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
  96. 96. server_count = 3 network = '192.168.2.' first_ip = 10 servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
  97. 97. chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }
  98. 98. Managing Environments - Running Cassandra: Mesos? Compelling features, but not quite there (though it won't be long) http://mesosphere.github.io/cassandra-mesos/docs/ http://www.datastax.com/2015/08/a-match-made-in-heaven-cassandra-and-mesos
  99. 99. Summary: • Cluster-level defaults, override in queries • Follow existing patterns (it's not that different) • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures
  100. 100. Thanks.
  101. 101. Nate McCall @zznate Co-Founder & Sr.Technical Consultant www.thelastpickle.com #CassandraDays

×