Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache cassandra v4.0

2,290 views

Published on

Cassandra Meetup in Tokyo February 2019
New features in Apache Cassandra v4.0

Published in: Technology
  • Be the first to comment

Apache cassandra v4.0

  1. 1. Apache Cassandra v4.0 1 © DataStax, All Rights Reserved. Confidential DataStax Japan Yuki Morishita Cassandra Meetup in Tokyo February 2019 2019/2/21
  2. 2. Who am I? Yuki Morishita (yuki@datastax.com) – Principal Architect at DataStax Japan – Apache Cassandra committer © DataStax, All Rights Reserved.2 Confidential
  3. 3. Apache Cassandra v4.0!!! 3 © DataStax, All Rights Reserved. Confidential
  4. 4. Apache Cassandra v4.0 • Been working for 1.5 years after 3.11.0 release • Feature freeze on September 1, 2018 • No fixed release date • See NEWS.TXT for upgrade instruction – No direct upgrade from pre-3.0 versions 4 © DataStax, All Rights Reserved. Confidential
  5. 5. CHANGES.TXT • More than 340 changes (new features / bug fixes) 5 © DataStax, All Rights Reserved. Confidential 4.0 * Fix SimpleStrategy option validation (CASSANDRA-15007) * Don't try to cancel 2i compactions when starting anticompaction (CASSANDRA-15024) * Avoid NPE in RepairRunnable.recordFailure (CASSANDRA-15025) * SSL Cert Hot Reloading should check for sanity of the new keystore/truststore before loading it (CASSANDRA-14991) * Avoid leaking threads when failing anticompactions and rate limit anticompactions (CASSANDRA- 15002) * Validate token() arguments early instead of throwing NPE at execution (CASSANDRA-14989) * Add a new tool to dump audit logs (CASSANDRA-14885) * Fix generating javadoc with Java11 (CASSANDRA-14988) * Only cancel conflicting compactions when starting anticompactions and sub range compactions (CASSANDRA-14935) * Use a stub IndexRegistry for non-daemon use cases (CASSANDRA-14938) * Don't enable client transports when bootstrap is pending (CASSANDRA-14525) * Make antiCompactGroup throw exception on error and anticompaction non cancellable again (CASSANDRA-14936) * Catch empty/invalid bounds in SelectStatement (CASSANDRA-14849) * Auto-expand replication_factor for NetworkTopologyStrategy (CASSANDRA-14303) * Transient Replication: support EACH_QUORUM (CASSANDRA-14727) * BufferPool: allocating thread for new chunks should acquire directly (CASSANDRA-14832) * Send correct messaging version in internode messaging handshake's third message (CASSANDRA- 14896) * Make Read and Write Latency columns consistent for proxyhistograms and tablehistograms (CASSANDRA-11939) * Make protocol checksum type option case insensitive (CASSANDRA-14716) * Forbid re-adding static columns as regular and vice versa (CASSANDRA-14913)
  6. 6. New Features • Virtual Tables • Transient Replication • Audit logging • Full query logging • Zero-copy SSTable streaming 6 © DataStax, All Rights Reserved. Confidential
  7. 7. Virtual Tables • Before Virtual Table… – JMX or nodetool • Cache information – nodetool info • Thread pools – nodetool tpstats • Connected clients – nodetool clientstats –all (new in v4.0!) • SSTable activities – nodetool compactionstats – Config • vim cassandra.yaml 7 © DataStax, All Rights Reserved. Confidential
  8. 8. Virtual Tables • After Virtual Table… (in addition to the methods in prev slide) – Cache information • SELECT * FROM system_views.caches; • SELECT * FROM system_views.caches WHERE name = 'chunks’; – Thread pools • SELECT * FROM system_views.thread_pools; – Connected clients • SELECT * FROM system_views.clients; – SSTable activities • SELECT * FROM system_views.sstable_tasks – Config • SELECT * FROM system_views.settings; 8 © DataStax, All Rights Reserved. Confidential
  9. 9. Virtual Tables • How about updating config? – Not yet supported • Keep track of issues – https://issues.apache.org/jira/browse/CASSANDRA-14670?jql=labels%20%3D%20virtual-tables 9 © DataStax, All Rights Reserved. Confidential
  10. 10. Transient Replication • Save disk space, CPU cycle and IO with temporary copy (transient replica) • Temporary copy is replicated with incremental repair, and removed • Experimental feature in v4.0 • Unsupported features – Monotonic read (read with blocking read repair) – Logged batch – LWT – Counters – Secondary Index / Materialized view (never support) 10 © DataStax, All Rights Reserved. Confidential
  11. 11. Transient Replication • Read / Write with RF=3, CL=QUORUM 11 © DataStax, All Rights Reserved. Confidential Write Read
  12. 12. Transient Replication • Read / Write with RF=3 (with 1 transient replica), CL=QUORUM 12 © DataStax, All Rights Reserved. Confidential Write Read Transient ReplicaTransient Replica
  13. 13. Transient Replication • Read / Write with RF=3 (with 1 transient replica), CL=QUORUM • When one node is down 13 © DataStax, All Rights Reserved. Confidential Write Read Transient ReplicaTransient Replica
  14. 14. Transient Replication • Temporary copy is replicated and removed after incremental repair 14 © DataStax, All Rights Reserved. Confidential Transient Replica Incremental repair
  15. 15. Transient Replication • Temporary copy is replicated and removed after incremental repair 15 © DataStax, All Rights Reserved. Confidential Transient Replica
  16. 16. Transient Replication CREATE KEYSPACE ks WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': '3/1' }; 16 © DataStax, All Rights Reserved. Confidential
  17. 17. Audit Logging • Log queries / authentication to a node for auditing • When audit events are logged? – After executing CQL / Authentication, before sending response • Pluggable logger (IAuditLogger interface) to publish audit event anywhere – BinAuditLogger (default) – binary log – FileAuditLogger – logback • Audit log includes: – User, Client IP, timestamp, category, operation, … • See https://cassandra.apache.org/doc/latest/operating/audit_logging.html 17 © DataStax, All Rights Reserved. Confidential
  18. 18. Audit Logging • Enabling audit logging – cassandra.yaml 18 © DataStax, All Rights Reserved. Confidential # Audit logging - Logs every incoming CQL command request, authentication to a node. See the docs # on audit_logging for full details about the various configuration options. audit_logging_options: enabled: false logger: BinAuditLogger # audit_logs_dir: # included_keyspaces: # excluded_keyspaces: system, system_schema, system_virtual_schema # included_categories: # excluded_categories: # included_users: # excluded_users: # roll_cycle: HOURLY # block: true # max_queue_weight: 268435456 # 256 MiB # max_log_size: 17179869184 # 16 GiB ## archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled: # archive_command: # max_archive_retries: 10
  19. 19. Audit Logging • Enabling audit logging – nodetool enableauditlog • Can change the same settings using command line options – --included-keyspaces, --logger, etc. – nodetool disableauditlog 19 © DataStax, All Rights Reserved. Confidential
  20. 20. Audit Logging • New tool: auditlogviewer • auditlogviewer -f /var/log/cassandra/audit/ 20 © DataStax, All Rights Reserved. Confidential Type: AuditLog LogMessage: user:anonymous|host:172.17.0.2:7000|source:/127.0.0.1|port:49644|timestamp:1550558120912|type:USE_KEYSPACE|category:OTHER|ks:tesks|operation:use tesks ; Type: AuditLog LogMessage: user:anonymous|host:172.17.0.2:7000|source:/127.0.0.1|port:49644|timestamp:1550558120913|type:USE_KEYSPACE|category:OTHER|ks:tesks|operation:USE "tesks" Type: AuditLog LogMessage: user:anonymous|host:172.17.0.2:7000|source:/127.0.0.1|port:49644|timestamp:1550558139178|type:UPDATE|category:DML|ks:tesks|scope:test|operation: INSERT INTO tesks.test (key, val) VALUES ( 'a', currentTimestamp()); Type: AuditLog LogMessage: user:anonymous|host:172.17.0.2:7000|source:/127.0.0.1|port:49644|timestamp:1550558147633|type:SELECT|category:QUERY|ks:tesks|scope:test|operatio n:SELECT * FROM tesks.test
  21. 21. Full Query Logging • Enable workload recording, replay and compare result • Use case – Capture workload in production. Use it in dev environment to test in different Cassandra version. Compare result for correctness. 21 © DataStax, All Rights Reserved. Confidential Capture Replay Full Query Log Prod Dev
  22. 22. Full Query Logging • Use nodetool to start / stop capturing – nodetool enablefullquerylog • Enable full query logging, defaults for the options are configured in cassandra.yaml – nodetool disablefullquerylog • Disable the full query log – nodetool resetfullquerylog • Stop the full query log and clean files in the configured full query log directory from cassandra.yaml as well as JMX 22 © DataStax, All Rights Reserved. Confidential
  23. 23. Full Query Logging • Use fqltool to replay and compare result – fqltool replay • Replay full query logs – fqltool compare • Compare result files generated by fqltool replay – fqltool dump • Dump the contents of a full query log 23 © DataStax, All Rights Reserved. Confidential
  24. 24. Zero-copy SSTable streaming • SSTable streaming 24 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node Give me partitions between token 0 and 100
  25. 25. Zero-copy SSTable streaming • SSTable streaming 25 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node Here are partitions between token 0 and 100 in this SSTable
  26. 26. Zero-copy SSTable streaming • SSTable streaming 26 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node Here are partitions between token 0 and 100 in this SSTable
  27. 27. Zero-copy SSTable streaming • SSTable streaming 27 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node • Decompress • Deserialize • Update Stats / BF / Index • Serialize • Compress Data.dbIndex.db Filter.db Compress sionInfo.db
  28. 28. Zero-copy SSTable streaming • Zero-copy SSTable streaming 28 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node Give me partitions between token 0 and 100
  29. 29. Zero-copy SSTable streaming • Zero-copy SSTable streaming 29 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node This SSTable contains partitions in token 0 and 100!
  30. 30. Zero-copy SSTable streaming • Zero-copy SSTable streaming 30 © DataStax, All Rights Reserved. Confidential Data.db Index.db Filter.db Compress sionInfo.db Sender node Receiver node Zero-copy Transfer Data.db Index.db Filter.db Compress sionInfo.db
  31. 31. Zero-copy SSTable streaming • Enabled by default – Can be turned off in cassandra.yaml (stream_entire_sstables) • Only works for SSTables in table using LeveledCompactionStrategy – For now… • https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html 31 © DataStax, All Rights Reserved. Confidential
  32. 32. Other Notable Changes • Experimental Java 11 support – Can use ZGC • Asynchronous internode messaging / streaming – More efficient if you have large cluster • Thrift / COMPACT STORAGE removed (finally) • Network authorization – Can create role that only has access to certain datacenter • CDC improvement – Change data is available faster than previous implementation 32 © DataStax, All Rights Reserved. Confidential
  33. 33. Other Notable Changes • Read repair change – dc_read_repair_chance / read_repair_chance are gone • no more async read repair – Blocking read repair can be turned off • ALTER TABLE ... WITH read_repair = NONE; 33 © DataStax, All Rights Reserved. Confidential
  34. 34. Apache Cassandra Side-car project 34 © DataStax, All Rights Reserved. Confidential
  35. 35. Cassandra side-car • Cassandra management process that runs separately from Cassandra daemon • Provides HTTP API to monitor and manage Cassandra node • Proposal – https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 • Repository: – https://github.com/apache/cassandra-sidecar – Currently healthcheck API is implemented 35 © DataStax, All Rights Reserved. Confidential
  36. 36. Thank you, Questions? 36 © DataStax, All Rights Reserved. Confidential

×