• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



3 Embeds 69

http://spichale.blogspot.de 56
http://spichale.blogspot.com 12
http://spichale.blogspot.ro 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Cassandra Cassandra Presentation Transcript

    • Wide Column Store for Big DataAPACHE CASSANDRA Kai Spichale
    • Outline Motivation Introduction to Cassandra Big Data Solution
    • „Must Haves“ for Big Data? What do modern businesses need for big data? A scalable high-performance database that is easy to use and cost effective Scalable Performance Cost Operational Effective Ease
    • „Must Haves“ for Big Data? „Modern businesses need to be able to manage large volumes of realtime data and run analytic and enterprise search operations on that same data as quickly as possible to make business decisions.“ Real-Time Analytic/Search Databases Databases Data Movement ETL Process
    • Legacy RDBMS ≠ Big Data „Big data is comprised of (1) Velocity – how fast the data is coming in; (2) Variety – all types are new being captured; (3) Volume – TB‘s to PB‘s of data; (4) Complexity – mulit-location, data center, etc.“ “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
    • Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Cloud Wide Column Semi Structured Document Data More Connected Graph Data
    • Trends & Challenges in Data Mngt.Exponential Data Key Value Growth Apache Cloud Wide Column Cassandra Semi Structured Document Data More Connected Graph Data
    • Apache Cassandra A massively scalable, decentralized, structured data store (aka database). Project history:
    • Cassandra is… Nodes Token A 0 B 4 C 8 D 12 E 16 F 20 G 24 O(1) Distributed Hash Table H 28 Sharding, Replication Elastic H A G B Fault tolerant No Single Point of Failure F C Durable E D
    • Cassandra is… C AP-System (CAP Theorem)  Eventual consistency A P Tunable trade-offs:  Consistency vs. Latency  Choose between synchronous or asynchronous replication for each update C = Consistency A = High Availability P = Partitioning Tolerance
    • Cassandra is… Keyspace A BigTable Clone Column Family No schema Key Row Column Column Key Row Column Key Row Predestined for Column Column Column  Semi-structured data Column Family Row  Sparse data SuperColumn SuperColumn Column Column Column Column Row SuperColumn Column Column Column
    • Cassandra-based Big DataSolution H Analytics Hadoop A Real-time Cassandra  Real-time queries with Cassandra GAnalytics B Real-time CassandraHadoop Cassandra Cluster  Distributed Search with (Replication) Solr FSearch C Real-time Solr Cassandra  Analytics with Hadoop MapReduce E Search D Search Solr Solr
    • Summary Apache Cassandra is a elastic scalable, fault- tolerant data store Tunable consistency levels Wide Column: flexible datamodel without schema Supports: real-time queries, analytics through Hadoop integration, Solr-based fulltext search
    • Thank you! Q&A