Published on

May 2013: Cassandra presentation for the HadoopNJ user group by Edward Capriolo

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Cassandra
  2. 2. FUNdamentals Overview
  3. 3. Main pointsStructured log storageColumns ordered by name inside keyRows ordered by hash of row key *Column family storageFully distributed peer-to-peerPartitioned by row keyDynamo consistency
  4. 4. Structured log storageNo writes in place for youJVM heap is reserved for memtablesMemtables are sortedMemtables reach a specific size they areflushed to disk− Creates sstable file− Bloom filter file− Index file
  5. 5. CompactionSSTables mergedDeleted columns physically removedTwo compaction strategies− Sized− LevelDB
  6. 6. Commit logsEvery write/delete operation goes to commitlogIf a node were to shutdown with un-flushedmemtables (every shutdown really)Replay the commit logs
  7. 7. Columns ordered inside keyCassandra likes wide rows− Up to 2 billion− (but not really would be a 32GB row)set mystuff[ecapriolo][a]=1set mystuff[ecapriolo][b]=2set mystuff[ecapriolo][c]=3...slice mystuff[ecapriolo] [b] [g]
  8. 8. Rows ordered by hash of row keyAll columns of row a1 on the same nodeBut all columns of row a2 may not be onsame nodeReduces hot spotsBut there is no total ordering based on rowkeys
  9. 9. Peer to PeerNode list and token range is gossip-edEach node responsible for local storage andrequestsWhen a new node joins it take some tokenrange away from other nodes.
  10. 10. Ed Ed Edstacey stacey staceybob bob bobReplication 3
  11. 11. Dynamo consistencyOperations have a requested ConsistencyLevel− ONE− QUORUMCL nodes ack the operation before the userreceives ackIf an operation fails it is safe to retry *
  12. 12. Fully distributed. The goodHighly availableRedundantFault tolerant
  13. 13. Fully distributed! The badLocksCountersTombstonesConsistency
  14. 14. Hadoop
  15. 15. Hadoop and CassandraColumnFamilyInputFormat− Takes a ColumnFamily as input− Map(ByteBuffer[] key,SortedMap<ByteBuffer,Column>ColumnFamilyOutputFormat− Writes out to a column family− OutputFormat ByteBuffer,List<Mutation>
  16. 16. Hadoop optimizationsTasks run with locality if c* and h same nodeInputFormat can leverage c* secondaryindexesOutputFormat can use bulk loader− C* writes are helluva fast anyway
  17. 17. Hive and CassandraHive support similar to the hbase handlersupportCreate a hive table specifying propertiessimilar to those in map reducehive> CREATE EXTERNAL TABLEUsers(userid string, name string, emailstring, phone string)STORED BYorg.apache.hadoop.hive.cassandra.CassandraStorageHandler WITH
  18. 18. Other support out there− Delete UDF− Composite splitter/builder UDFSNot very hard to roll your own input format− OneRowInputFormat− ListOfRowsInputFormat
  19. 19. Pig CassandraNice support for pig/cassandraPigmalian libraryBut I dont use it− Cause I use hive− You should as well− And get my book :)
  20. 20. Comparison between c*and “other noSQL”I know your talking about hbase :)Cassandra does not store multiple versions ofcolumn− Last update wins− Use UUID as part of column name insteadThe row keys are not globally ordered *− Unless you are using ByteOrderPartitioner (no oneshould use this)
  21. 21. Comparison between c*and “other noSQL”Each c* replica actively servers reads & writesCassandra directly manages its storageShards are pre-defined tokens (no auto-split)Qualifier/column name can NOT be null
  22. 22. Key Performance tips
  23. 23. Know your dataDesign for the long tail scenarios− With design x our largest customer will have10000000000000 columns in one rowHow large will this column family be in 5months?What is the request rate?How random is the read pattern
  24. 24. Understanding write-once filesDeletes are writes that get compacted awaylaterCan you optimize from blind writes?What percent of your application isupdate/insert?
  25. 25. Profiling / Dark LaunchCompressionCompaction strategy
  26. 26. MetricsCollect the JMX information− Column family− CachesSet milestone alerts (traps)
  27. 27. HardwareFast disk (you almost always want SSD)RAM− Caches, bloom filters, young genCPU− Garbage collector, deserialization + compactionneeds cpu to work
  28. 28. Anti patternsUsing one row key as a queueDoing N reads to satisfy a requestRead before writeUsing collection support in place of wide rowsEncoding
  29. 29. Questions?