Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Wide Column Store for Big DataAPACHE CASSANDRA                        Kai Spichale
Outline Motivation Introduction to Cassandra Big Data Solution
„Must Haves“ for Big Data?   What do modern businesses need for big data?   A scalable high-performance database    that...
„Must Haves“ for Big Data?   „Modern businesses need to be able to manage large    volumes of realtime data and run analy...
Legacy RDBMS ≠ Big Data   „Big data is comprised of (1) Velocity – how fast the data is coming in;    (2) Variety – all t...
Trends & Challenges in Data Mngt.Exponential Data                          Key Value    Growth     Cloud               Wid...
Trends & Challenges in Data Mngt.Exponential Data                          Key Value    Growth                          Ap...
Apache Cassandra   A massively scalable, decentralized, structured    data store (aka database).   Project history:
Nodes   TokenCassandra is…                            A                                         B                         ...
Cassandra is…                                          C   AP-System (CAP Theorem)     Eventual consistency             ...
Cassandra is…                             Keyspace   A BigTable Clone                                        Column Famil...
Cassandra-based Big DataSolution            Analytics            Hadoop                        Real-time                  ...
Summary   Apache Cassandra is a elastic scalable, fault-    tolerant data store   Tunable consistency levels   Wide Col...
Thank you!             Q&A
Upcoming SlideShare
Loading in …5
×

Cassandra

808 views

Published on

Wide Column Store for BigData

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cassandra

  1. 1. Wide Column Store for Big DataAPACHE CASSANDRA Kai Spichale
  2. 2. Outline Motivation Introduction to Cassandra Big Data Solution
  3. 3. „Must Haves“ for Big Data? What do modern businesses need for big data? A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
  4. 4. „Must Haves“ for Big Data? „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
  5. 5. Legacy RDBMS ≠ Big Data „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“ “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
  6. 6. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
  7. 7. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Apache Cloud Cassandra Semi Structured Document Data More Connected Graph Data
  8. 8. Apache Cassandra A massively scalable, decentralized, structured data store (aka database). Project history:
  9. 9. Nodes TokenCassandra is… A B C 0 4 8 D 12 E 16 F 20 G 24 O(1) Distributed Hash Table H 28 Sharding, Replication Elastic H A G B Fault tolerant No Single Point of Failure F C Durable E D
  10. 10. Cassandra is… C AP-System (CAP Theorem)  Eventual consistency A P Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
  11. 11. Cassandra is… Keyspace A BigTable Clone Column Family No schema Key Row Column Column Key Row Column Key Row Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
  12. 12. Cassandra-based Big DataSolution Analytics Hadoop Real-time Cassandra  Real-time queries with CassandraAnalytics Real-time CassandraHadoop Cassandra Cluster  Distributed Search with (Replication) Solr Real-timeSearch Solr Cassandra  Analytics with Hadoop MapReduce Search Search Solr Solr
  13. 13. Summary Apache Cassandra is a elastic scalable, fault- tolerant data store Tunable consistency levels Wide Column: flexible datamodel without schema Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
  14. 14. Thank you! Q&A

×