Cassandra – An Introduction                                  Mikio L. Braun                                    Leo Jugel  ...
What is NoSQL ●   For many web applications, “classical data     bases” are not the right choice:      ●   Database is jus...
NoSQL in comparisonClassical Databases                               NoSQLPowerful query language                         ...
Brewers CAP Theorem ●   CAP: Consistency, Availability, Partition     Tolerance      ●   Consistency: You never get old da...
Homepage                       http://cassandra.apache.orgLanguage                       JavaHistory                      ...
Version 0.6.x and 0.7.x ●   Most important changes in 0.7.x      ●   config file format changed from XML to YAML      ●   ...
Inspirations for Cassandra ●   Amazon Dynamo      ●   Clustering without dedicated master node      ●   Peer-to-peer disco...
Installation ●   Download tar.gz from     http://cassandra.apache.org/download/ ●   Unpack ●   ./conf contains config file...
Configuration ●   Database      ●   Version 0.6.x: conf/storage-conf.xml      ●   Version 0.7.x: conf/cassandra.yaml ●   J...
Cassandras Data ModelKeyspace (= database)                                          byte arrays  Column Family (= table)  ...
Example: Simple Object Store   class Person {       long id;       String name;       String affiliation;   }             ...
Example: Index   class Page {       long id;       …                                                                  Obje...
Are SuperColumnFamilies                     necessary? ●   Usually, you can replace a SuperColumnFamily     by several Col...
Cassandras Architecture                            MemTable                                Read Operation                 ...
Cassandras API  ●   THRIFT-based APIRead operations                                          Write operationsget          ...
Cassandra Clustering ●   Fully equivalent nodes, no master node. ●   Bootstrapping requires seed node.            “Storage...
Consistency Level and                       Replication Factor●Replication factor: On how many nodes is apiece of data sto...
How to deal with failure●   As long as requirements of the consistency level can be    met, everything is fine.●   Hinted ...
LibrariesPython        Pycassa: http://github.com/pycassa/pycass              Telephus: http://github.com/driftx/TelephusJ...
TWIMPACT: An Application ●   Real-time analysis of Twitter ●   Trend analysis based on retweets ●   Very high data rate (s...
TWIMPACT: twimpact.jpLinuxTag Berlin, 13. 5. 2011   (c) 2011 by Mikio L. Braun   @mikiobraun, blog.mikiobraun.de
TWIMPACT: twimpact.comLinuxTag Berlin, 13. 5. 2011   (c) 2011 by Mikio L. Braun   @mikiobraun, blog.mikiobraun.de
Application Profile ●   Information about tweets, users, and retweets ●   Text matching for non-API-retweets ●   Retweet f...
Practical Experiences with                       Cassandra ●   Very stable ●   Read operations relatively expensive ●   Mu...
Performance through Multithreading ●   Multithreading leads to much higher throughput ●   How to achieve multithreading wi...
Performance through Multithreading ●   Multithreading leads to much higher throughput ●   How to achieve multithreading wi...
Cassandra Tuning ●   Tuning opportunities:      ●   Size of memtables, thresholds for flushes      ●   Size of JVM Heap   ...
Overview of JVM GC                                                                     Old Generation                  You...
Cassandras Memory Usage            Flush                                              Memtables,                          ...
Cassandras Memory Usage ●   Memtables may survive for a very long time (up     to several hours)      ●   are placed in ol...
The Effects of GC and Compactions                                                       Große                             ...
Cluster vs Single Node●   Our set-up:     ●   1 Cluster with six-core CPU and RAID 5 with 6 hard disks     ●   4 Cluster w...
Alternatives ●   MongoDB, CouchDB, redis, even     memcached... . ●   Persistency: Disk or RAM? ●   Replication: Master/Sl...
Summary: Cassandra ●   Platform which scales well ●   Active user and developer community ●   Read operations quite expens...
Links●   Apache Cassandra http://cassandra.apache.org●   Apache Cassandra Wiki    http://wiki.apache.org/cassandra/FrontPa...
Upcoming SlideShare
Loading in...5
×

Cassandra - An Introduction

5,905

Published on

English translation of my slide for the talk held at LinuxTag 2011. I give an overview of Cassandra and talk about the experiences with Cassandra we've made using it for real-time analysis at TWIMPACT.

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
  • Can anyone send me the details of Cassandra Certification ?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,905
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
166
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra - An Introduction

  1. 1. Cassandra – An Introduction Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin 13. Mai 2011LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  2. 2. What is NoSQL ● For many web applications, “classical data bases” are not the right choice: ● Database is just used for storing objects. ● Consistency not essential. ● A lot of concurrent access.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  3. 3. NoSQL in comparisonClassical Databases NoSQLPowerful query language very simple query languageScales by using larger servers skales through clustering(“scaling up”) (“scaling out”)Changes of database schema very costly No fixed database schemaACID: Atomicity, Consistency, Isolation, Typically only “eventually consistent”DuratbilityTransactions, locking, etc. Typically no support for transactions etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  4. 4. Brewers CAP Theorem ● CAP: Consistency, Availability, Partition Tolerance ● Consistency: You never get old data. ● Availability: read/write operations always possible. ● Partition Tolerance: other guarantees hold even if network of servers break. ● You can only have two of these!Gilbert, Lynch, Brewers conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33, Issue 2, June 2002LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  5. 5. Homepage http://cassandra.apache.orgLanguage JavaHistory ● Developed at Facebook for inbox search, released as Open Source in July 2008 ● Apache Incubator since March 2009 ● Apache Top-Level since February 2010Main Properties ● structured key value store ● “eventually consistent” ● fully equivalent nodes ● cluster can be modified without restartingSupport DataStax (http://datastax.com)Licence Apache 2.0LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  6. 6. Version 0.6.x and 0.7.x ● Most important changes in 0.7.x ● config file format changed from XML to YAML ● schema modification (ColumnFamilies) without restart ● Beginning support for secondary indices ● However, also problems with stability initially.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  7. 7. Inspirations for Cassandra ● Amazon Dynamo ● Clustering without dedicated master node ● Peer-to-peer discovery of nodes, HintedHintoff, etc. ● Google BigTable ● data model ● requires central master node ● Provides much more fine grained control: – which data should be stored together – on-the-fly compression, etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  8. 8. Installation ● Download tar.gz from http://cassandra.apache.org/download/ ● Unpack ● ./conf contains config files ● ./bin/cassandra -f to start Cassandra, Ctrl-C to stopLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  9. 9. Configuration ● Database ● Version 0.6.x: conf/storage-conf.xml ● Version 0.7.x: conf/cassandra.yaml ● JVM Parameters ● Version 0.6.x: bin/cassandra.in.sh ● Version 0.7.x: conf/cassandra-env.shLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  10. 10. Cassandras Data ModelKeyspace (= database) byte arrays Column Family (= table) Row key {name1: value1, name2: value2, name3: value3, ...} column strings sorted by name! sorted according to partitioner Super Column Family key key {name1: value1, ...}LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  11. 11. Example: Simple Object Store class Person { long id; String name; String affiliation; } Convert fields to byte arrays Keyspace “MyDatabase”: ColumnFamily “Person”: “1”: {“id”: “1”, “name”: “Mikio Braun, “affiliation”: “TU Berlin”}LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  12. 12. Example: Index class Page { long id; … Object data fields List<Links> links; } Keyspace “MyDatabase” ColumnFamily “Pages” class Link { “3”: {“id”: 3, …} long id; “4”: {“id”: 4, …} ... Used for both, linking int numberOfHits; ColumnFamily “Links” and indexing! } “1”: {“id”: 1, “url”: …} “17”. {“id”: 17, “url”: …} ColumnFamily “LinksPerPageByNumberOfHits” “3”: { “00000132:00000001”: “t”, “000025: 00000017”: … “4”: { “00000044:00000024”: “t”, … } Here we exploit that columns are sorted by their names. Of course, everything encoded in byte arrays, not ASCIILinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  13. 13. Are SuperColumnFamilies necessary? ● Usually, you can replace a SuperColumnFamily by several CollumnFamilies. ● Since SuperColumnFamilies make the implementation and the protocol more compelx, there are also people advocating the remove SuperCFs... .LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  14. 14. Cassandras Architecture MemTable Read Operation Flush Memory DiskWrite Operation Commit Log SSTable SSTable SSTable Compaction! LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  15. 15. Cassandras API ● THRIFT-based APIRead operations Write operationsget single column insert single columnget_slice range of columns batch_mutate several columns inmultiget_slice range of columns in several rows several rows remove single columnget_count column count truncate while ColumnFamilyget_range_slice several columns from range of rowsget_indexed_slices range of columns from indexSonstigelogin, describe_*, add/drop column family/keyspace since 0.7.x LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  16. 16. Cassandra Clustering ● Fully equivalent nodes, no master node. ● Bootstrapping requires seed node. “Storage Proxy” Node Node Node Reads/writes according to consistency level QueryLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  17. 17. Consistency Level and Replication Factor●Replication factor: On how many nodes is apiece of data stored?● Consistency level:Consistency LevelANY A node has received the operation, even a HintedHandoff node.ONE One node has completed the request.QUORUM Operation has completed on majority of nodes / newest result is returned.LOCAL_QUORUM QUORUM in local data centerGLOBAL_QUORUM QUORUM in global data centerALL Wait till all nodes have completed the requestLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  18. 18. How to deal with failure● As long as requirements of the consistency level can be met, everything is fine.● Hinted Handoff: ● A write operation for a faulty node is stored on another node and pushed to the other node once it is available again. ● Data wont be readable after write!● Read Repair: ● After read operation has completed, data will be compared and updated on all nodes in the background.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  19. 19. LibrariesPython Pycassa: http://github.com/pycassa/pycass Telephus: http://github.com/driftx/TelephusJava Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin Hector: http://github.com/rantav/hector Kundera http://code.google.com/p/kundera/ Pelops: http://github.com/s7/scale7-pelopsGrails grails-cassandra: https://github.com/wolpert/grails-cassandra.NET Aquiles: http://aquiles.codeplex.com/ FluentCassandra: http://github.com/managedfusion/fluentcassandraRuby Cassandra: http://github.com/fauna/cassandraPHP phpcassa: http://github.com/thobbs/phpcassa SimpleCassie: http://code.google.com/p/simpletools-php/wiki/SimpleCassieOr roll your own based on THRIFT http://thrift.apache.org/ :)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  20. 20. TWIMPACT: An Application ● Real-time analysis of Twitter ● Trend analysis based on retweets ● Very high data rate (several million tweets per day, about 50 per second)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  21. 21. TWIMPACT: twimpact.jpLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  22. 22. TWIMPACT: twimpact.comLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  23. 23. Application Profile ● Information about tweets, users, and retweets ● Text matching for non-API-retweets ● Retweet frequency and user impact ● Operation profile: get_slice get get_slice batch_mutate insert batch_mutate remove (all) (range) (one row) Fraction 50.1% 6.0% 0.1% 14.9% 21.5% 6.8% 0.8% Duration 1.1ms 1.7ms 0.8ms 0.9ms 1.1ms 0.8ms 1.2msLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  24. 24. Practical Experiences with Cassandra ● Very stable ● Read operations relatively expensive ● Multithreading leads to a huge performance increase ● Requires quite extensive tuning ● Clustering doesnt automatically lead to better performance ● Compaction leads to performance decrease of up to 50%LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  25. 25. Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support? 64 32 16 8 4 2 1 Core i7, 4 cores (2 + 2 HT)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  26. 26. Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support?LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  27. 27. Cassandra Tuning ● Tuning opportunities: ● Size of memtables, thresholds for flushes ● Size of JVM Heap ● Frequency and depth of compaction ● Where? ● MemTableThresholds etc. in conf/cassandra.yaml ● JVM Parameters in conf/cassandra-env.shLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  28. 28. Overview of JVM GC Old Generation Young Generation CMSInitiatingOccupancyFraction “Eden” “Survivors” Additional memory usage while GC up to a few hundred MB dozens of GBs is runningLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  29. 29. Cassandras Memory Usage Flush Memtables, indexes, etc.Size of Memtable: 128M, JVM Heap: 3G, #CF: 12 Compaction LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  30. 30. Cassandras Memory Usage ● Memtables may survive for a very long time (up to several hours) ● are placed in old generation ● GC has to process several dozen GBs ● heap to small, GC triggered too late  “GC storm” ● Trade-off: ● I/O load vs. memory usage ● Do not neglect compaction!LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  31. 31. The Effects of GC and Compactions Große GC CompactionLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  32. 32. Cluster vs Single Node● Our set-up: ● 1 Cluster with six-core CPU and RAID 5 with 6 hard disks ● 4 Cluster with six-core CPU and RAID 0 with 2 hard disks● Single node consistently performs 1,5-3 times better.● Possible causes: ● Overhead through network communication/consistency levels, etc. ● Hard disk performance significant ● Cluster still too small● Effectively available disk space: ● 1 Cluster: 6 * 500 GB = 3TB with RAID 5 = 2.5 TB (83%) ● 4 Cluster: 4 * 1TB = 4TB with replication factor 2 = 2TB (50%)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  33. 33. Alternatives ● MongoDB, CouchDB, redis, even memcached... . ● Persistency: Disk or RAM? ● Replication: Master/Slave or Peer-to-Peer? ● Sharding? ● Upcoming trend towards more complex query languages (Javascript), map-reduce operations, etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  34. 34. Summary: Cassandra ● Platform which scales well ● Active user and developer community ● Read operations quite expensive ● For optimal performance, extensive tuning necessary ● Depending on your application, eventually consistent and lack of transactions/locking might be problematic.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  35. 35. Links● Apache Cassandra http://cassandra.apache.org● Apache Cassandra Wiki http://wiki.apache.org/cassandra/FrontPage● DataStax Dokumentation für Cassandra http://www.datastax.com/docs/0.7/index● My Blog: http://blog.mikiobraun.de● Twimpact: http://beta.twimpact.comLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×