Published on

Wide Column Store for BigData

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Wide Column Store for Big DataAPACHE CASSANDRA Kai Spichale
  2. 2. Outline Motivation Introduction to Cassandra Big Data Solution
  3. 3. „Must Haves“ for Big Data? What do modern businesses need for big data? A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
  4. 4. „Must Haves“ for Big Data? „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
  5. 5. Legacy RDBMS ≠ Big Data „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“ “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
  6. 6. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
  7. 7. Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Apache Cloud Cassandra Semi Structured Document Data More Connected Graph Data
  8. 8. Apache Cassandra A massively scalable, decentralized, structured data store (aka database). Project history:
  9. 9. Nodes TokenCassandra is… A B C 0 4 8 D 12 E 16 F 20 G 24 O(1) Distributed Hash Table H 28 Sharding, Replication Elastic H A G B Fault tolerant No Single Point of Failure F C Durable E D
  10. 10. Cassandra is… C AP-System (CAP Theorem)  Eventual consistency A P Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
  11. 11. Cassandra is… Keyspace A BigTable Clone Column Family No schema Key Row Column Column Key Row Column Key Row Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
  12. 12. Cassandra-based Big DataSolution Analytics Hadoop Real-time Cassandra  Real-time queries with CassandraAnalytics Real-time CassandraHadoop Cassandra Cluster  Distributed Search with (Replication) Solr Real-timeSearch Solr Cassandra  Analytics with Hadoop MapReduce Search Search Solr Solr
  13. 13. Summary Apache Cassandra is a elastic scalable, fault- tolerant data store Tunable consistency levels Wide Column: flexible datamodel without schema Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
  14. 14. Thank you! Q&A