Wide Column Store for Big DataAPACHE CASSANDRA                        Kai Spichale
Outline Motivation Introduction to Cassandra Big Data Solution
„Must Haves“ for Big Data?   What do modern businesses need for big data?   A scalable high-performance database    that...
„Must Haves“ for Big Data?   „Modern businesses need to be able to manage large    volumes of realtime data and run analy...
Legacy RDBMS ≠ Big Data   „Big data is comprised of (1) Velocity – how fast the data is coming in;    (2) Variety – all t...
Trends & Challenges in Data Mngt.Exponential Data                          Key Value    Growth     Cloud               Wid...
Trends & Challenges in Data Mngt.Exponential Data                          Key Value    Growth                           A...
Apache Cassandra   A massively scalable, decentralized, structured    data store (aka database).   Project history:
Cassandra is…                                         Nodes   Token                                         A       0     ...
Cassandra is…                                          C   AP-System (CAP Theorem)     Eventual consistency             ...
Cassandra is…                             Keyspace   A BigTable Clone                                        Column Famil...
Cassandra-based Big DataSolution              H            Analytics            Hadoop                            A       ...
Summary   Apache Cassandra is a elastic scalable, fault-    tolerant data store   Tunable consistency levels   Wide Col...
Thank you!             Q&A
Upcoming SlideShare
Loading in …5
×

Cassandra

875 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
875
On SlideShare
0
From Embeds
0
Number of Embeds
71
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra

  1. 1. Wide Column Store for Big DataAPACHE CASSANDRA Kai Spichale
  2. 2. Outline Motivation Introduction to Cassandra Big Data Solution
  3. 3. „Must Haves“ for Big Data? What do modern businesses need for big data? A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
  4. 4. „Must Haves“ for Big Data? „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
  5. 5. Legacy RDBMS ≠ Big Data „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“ “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
  6. 6. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
  7. 7. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Apache Cloud Wide Column Cassandra Semi Structured Document Data More Connected Graph Data
  8. 8. Apache Cassandra A massively scalable, decentralized, structured data store (aka database). Project history:
  9. 9. Cassandra is… Nodes Token A 0 B 4 C 8 D 12 E 16 F 20 G 24 O(1) Distributed Hash Table H 28 Sharding, Replication Elastic H A G B Fault tolerant No Single Point of Failure F C Durable E D
  10. 10. Cassandra is… C AP-System (CAP Theorem)  Eventual consistency A P Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
  11. 11. Cassandra is… Keyspace A BigTable Clone Column Family No schema Key Row Column Column Key Row Column Key Row Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
  12. 12. Cassandra-based Big DataSolution H Analytics Hadoop A Real-time Cassandra  Real-time queries with Cassandra GAnalytics B Real-time CassandraHadoop Cassandra Cluster  Distributed Search with (Replication) Solr FSearch C Real-time Solr Cassandra  Analytics with Hadoop MapReduce E Search D Search Solr Solr
  13. 13. Summary Apache Cassandra is a elastic scalable, fault- tolerant data store Tunable consistency levels Wide Column: flexible datamodel without schema Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
  14. 14. Thank you! Q&A

×