SlideShare a Scribd company logo
1 of 72
#theedge2012




Practical Introduction
           To

                    Sonia Margulis
                     @robosonia
       March 2012
Your Application
Gone Viral
Best Hardware Money Can Buy
Improve Reads
Sharding RDBMS – A Nightmare
Cassandra’s Sweet Spot

   Many                    Linear
 concurrent              Scalability
   users
                          Distributed

 High Volumes            Inherently
 of Operations            Clustered
The Road to Mastership
    Introduction
    to Cassandra                    Introduction to
                                       Cassandra

                          Data
Running a                 Model
Server
                                              Modeling
                                              Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
A non-relational database

Values availability

Scales out, not up

Open source

Active community
Always
Available
Who Uses It?
Use Case: Social & Timelines
Use Case: Statistics & Logs




Logs by Rick Payette
The Road to Mastership
    Introduction
    to Cassandra
                                   Running a Server


                          Data
Running a                 Model
Server
                                              Modeling
                                              Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
The Cassandra Project

 »          Project
 » Runs on:
 » Apache License
 » Current release: 1.0.8
                            You are
                             here



  sonia@hiro:~/apache-cassandra-1.0.8$
Running a Server

  sonia@hiro:~/apache-cassandra-1.0.8$
  bin/cassandra -f




 ....
 Now serving reads.
 localhost/127.0.0.1:9160
Connecting to Our Server

 Cassandra command line interface (CLI) tool
 sonia@hiro:~/apache-cassandra-1.0.8$
 bin/cassandra-cli –host 127.0.0.1 –port 9160




 Connected to: “Test Cluster” on
 localhost/9160
 Welcome to Cassandra CLI version 1.0.8
Creating a Keyspace

 Cassandra’s equivalent to RDBMSs database
 [default@unknown] create keyspace demo;




 Lets start using it
 [default@unknown] use demo;
 [default@demo]
Creating a Column Family

 A column family holds data, much like a table in
 RDBMS.
 [default@demo] create column family user;



 Start adding data
 [default@demo] set user[1][a]=utf8(„foo‟);

 [default@demo] set user[2][b]=utf8(„bar‟);
 [default@demo] set user[2][c]=utf8(„test‟);
Retrieving Data

 Retrieving columns by user key
 [default@demo] get user[2];



   (column=b, value=bar)
   (column=c, value=test)
 Returned 2 results.
The Road to Mastership
    Introduction
    to Cassandra
                                       Data Model


                          Data
Running a                 Model
Server
                                              Modeling
                                              Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
Column


         Column
         Name
         Value
Column


                 name
                Peter Parker


     1
         name           Peter Parker
Row


            icon    name          residence
spiderman
                   Peter Parker   New York
Row
                          Columns
 Row Id

             icon         name          residence
spiderman
                         Peter Parker   New York



 1                  2
 spiderman              name            Peter Parker
Column Family

    spider-   icon    name     residence
    man              Peter P   New York

              icon    name     residence
   batman
                     Bruce W   Gotham

              icon    name     residence
    hulk
                     Bruce B   New York
Column Family

     spider-    icon         name     residence
     man                    Peter P   New York

                icon         name     residence
    batman
  set user[„spiderman‟][„name‟] W „Peter Parker‟
                          Bruce = Gotham

                icon         name     residence
     hulk                                        Value
                                  Column
                            Bruce B New York
                   Row id         name
       Column
       Family
The Allies Column Family


          Robin    Alfred
 batman


  spider- Iceman   Firestar   Iron Man   Storm
  man
Published Issues Column Family
                   ~2600 columns


spider- 1/8/1962
man       ###
                   ...   1/3/2012 8/3/2012
                            ###      ###


batman 1/5/1939
         ###
                    ...      2/3/2012 9/3/2012
                                ###      ###


                     ~3800 columns
Model Flexibility




 Flexible
 Data Model
                    Image: photostock / FreeDigitalPhotos.net
Keyspace

 » Like RDBMS database
 » A container for column families
 [default@unknown] create keyspace demo;




 » One keyspace per application, in most cases
Expiring Columns – TTL


            icon     name        passwd_ residence
  spider-                        reminder
  man               Peter P        abcd         New York




  set users[„spiredman‟][„passwd_reminder‟] =
  „abcd‟ with ttl = 7200;

                              7200s = 2 hours
Distributed Counters


          javaedge speakers   sessions
          .com      1035       3402


 incr page_views[„javaedge.com‟][„speakers‟] by 1


 get page_views[„javaedge.com‟][„speakers‟]
The Road to Mastership
    Introduction
    to Cassandra                  Communication with
                                   the Server: Clients

                          Data
Running a                 Model
Server
                                              Modeling
                                              Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
Cassandra Query Language

 » Looks a lot like SQL
 INSERT INTO users (KEY, name, universe)
            VALUES (hulk, Bruce, marvel)




 » Mostly valid SQL
 SELECT name, universe
 FROM users
 WHERE KEY = „hulk‟
Advantages of using CQL

 » Run ad-hoc queries
 » Very familiar, easier to use
 » Stable interface
   ▪ For library developers
   ▪ For users
CQL Example

 SELECT name, residence FROM users


 SELECT 01/1/2011 .. 1/1/2012
 FROM published_issues
 WHERE KEY = „spiderman‟



 SELECT FIRST 5
 FROM allies
 WHERE KEY = „spiderman‟
CQL Example

 SELECT name, residence FROM users


 SELECT 01/1/2011 .. 1/1/2012
 FROM published_issues
 WHERE KEY = „spiderman‟



 SELECT FIRST 5
 FROM allies
 WHERE KEY = „spiderman‟
CQL Example

 SELECT name, residence FROM users


 SELECT 01/1/2011 .. 1/1/2012
 FROM published_issues
 WHERE KEY = „spiderman‟



 SELECT FIRST 5
 FROM allies
 WHERE KEY = „spiderman‟
Cassandra JDBC Driver

 import java.sql.*;



 Class.forName(
   "org.apache.cassandra.cql.jdbc.CassandraDriver");
 Connection con = DriverManager.getConnection(
   "jdbc:cassandra://localhost:9160/keyspace");
Cassandra JDBC Driver

 Statement stmt = con.createStatement();
 ResultSet rs = stmt.executeQuery(
  “SELECT name, residence
   FROM users
   WHERE KEY ='" + key + "'");
Cassandra JDBC Driver




        JDBC
Hector

 SliceQuery<...> query =
     HFactory.createSliceQuery(keyspace, ...);

 query.setRange(startDate, endDate, false, 100)
     .setColumnFamily("published_issues")
     .setKey("spiderman");

 QueryResult<ColumnSlice<Date, String>> result =
     query.execute();
Hector: Advanced Features

 » Failover support
 » Connection pooling
 » Load balancing
 » JMX counters
 » Object mapper
Maven plugin

 mvn cassandra:start


                  Run your tests


 mvn cassandra:cql-exec

 mvn cassandra:stop
The Road to Mastership
    Introduction
    to Cassandra
                                    Modeling Data


                          Data
Running a                 Model
Server
                                              Modeling
                                              Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
Queries First

 » Use the same Column Family for data that
   should be fetched together
   ▪ Reduces IO
 » Consider filtering and ordering
Denormalize

 » Less seeks - faster reads
 » Storing redundant data
   ▪ Manually handling data integrity
 » Disk space is cheaper than seek time
Secondary Index
 » Requirement:
   Find all superheroes that live in New York
                  icon    name       residence
      spiderman
                         Peter Parker New York
Secondary Index
 » Requirement:
   Find all superheroes that live in New York
                   icon    name       residence
       spiderman
                          Peter Parker New York

 create column family users
 ... and column_metadata=
 [{column_name: residence, index_type: KEYS}];

 » Good nameindexes with low cardinality
 SELECT for
 FROM users
 WHERE residence = „New York‟
Manually Managed Index

 » Requirement:
   Find a superhero by name
Manually Managed Index

 » Requirement:
   Find a superhero by name
                       hulk        batman
               Bruce
  Search                                     Keys in
   term                                     users CF
                       spiderman
               Peter

 » Manually maintain an inverted index
Bucketing


  hulk_jan 1/1/2012   2/1/2012    4/1/2012
  _2012     Issue-1    Issue-2    Issue-3
                                                All
                                              issues
  hulk_feb 2/2/2012   28/2/2012   29/2/2012
  _2012     Issue-4    Issue-5    Issue-6


  By month
The Road to Mastership
    Introduction
    to Cassandra
                                  Cassandra Cluster


                          Data
Running a                 Model
Server
                                          Modeling
                                          Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
Virtual Ring


                     10

               90          40


                75        60
Node Token


                    10
 Node Keys
              90          40
 10   91-10
 40   11-40
 60   41-60
 75   61-75
 90   76-90
               75        60
Node Token
 hulk
MD5’(hulk) = 20         10

                  90          40


                   75        60
Node Token


MD5’(hulk) = 20         10        hulk

                  90          40


                   75        60
Node Token


                        10        hulk
 thor                         40
MD5’(thor) = 42   90


                   75        60
Node Token


                        10         hulk

MD5’(thor) = 42   90          40

                                  thor
                   75        60
Inter-Node Communication


                    10

              90            40
» Gossip
» Failure
  Detection

               75          60
Fault Tolerance
» Replication factor
» Hinted Handoff
                            10        hulk

                   90             40


                       75        60   thor
Replication Factor
» Replication factor
» Hinted Handoff
                             10             hulk

        thor        90   Replication
                                        40
                         factor = 3



             hulk                           hulk
             thor      75              60   thor
Fault Tolerance
» Replication factor
» Hinted Handoff
                            10

                   90             40


                       75        60
Hinted Handoff
» Replication factor
» Hinted Handoff
                            10

                   90             40


                       75        60
Hinted Handoff
» Replication factor
» Hinted Handoff
                            10

                   90             40


                       75        60
Client Requests

              Coordinator
                                  10
     Write Request
                            90


                             75        60
Consistency Level

         Consistency
         level = ONE
                             10
     Write Request
                       90


                        75        60
Consistency Level

         Consistency
          level = ALL
                              10
     Write Request
                        90


                         75        60
The Road to Mastership
    Introduction
    to Cassandra
                                       Summary


                          Data
Running a                 Model
Server
                                          Modeling
                                          Data
        Communicating
        with the Server
                                  Growing
                                  a Cluster
Where Do You Sign?

 » Cassandra
   ▪ http://cassandra.apache.com
   ▪ http://www.datastax.com/
      • Docs, tutorials & videos
   ▪ IRC: #cassandra on freenode
 » Hector
   ▪ https://github.com/rantav/hector
   ▪ https://github.com/zznate/hector-examples

More Related Content

Similar to Cassandra Intro -- TheEdge2012

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeDavid Boike
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopsrisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 FlowUsing Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 FlowKarsten Dambekalns
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with DockerMariaDB plc
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
Raven lovin' - .NET does NoSQL
Raven lovin' - .NET does NoSQLRaven lovin' - .NET does NoSQL
Raven lovin' - .NET does NoSQLJudah Himango
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrationsT Jake Luciani
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...NoSQLmatters
 

Similar to Cassandra Intro -- TheEdge2012 (20)

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 FlowUsing Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
 
No Sql
No SqlNo Sql
No Sql
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Raven lovin' - .NET does NoSQL
Raven lovin' - .NET does NoSQLRaven lovin' - .NET does NoSQL
Raven lovin' - .NET does NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Cassandra Intro -- TheEdge2012

  • 1. #theedge2012 Practical Introduction To Sonia Margulis @robosonia March 2012
  • 6. Sharding RDBMS – A Nightmare
  • 7. Cassandra’s Sweet Spot Many Linear concurrent Scalability users Distributed High Volumes Inherently of Operations Clustered
  • 8. The Road to Mastership Introduction to Cassandra Introduction to Cassandra Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 9. A non-relational database Values availability Scales out, not up Open source Active community
  • 12. Use Case: Social & Timelines
  • 13. Use Case: Statistics & Logs Logs by Rick Payette
  • 14. The Road to Mastership Introduction to Cassandra Running a Server Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 15. The Cassandra Project » Project » Runs on: » Apache License » Current release: 1.0.8 You are here sonia@hiro:~/apache-cassandra-1.0.8$
  • 16. Running a Server sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f .... Now serving reads. localhost/127.0.0.1:9160
  • 17. Connecting to Our Server Cassandra command line interface (CLI) tool sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160 Connected to: “Test Cluster” on localhost/9160 Welcome to Cassandra CLI version 1.0.8
  • 18. Creating a Keyspace Cassandra’s equivalent to RDBMSs database [default@unknown] create keyspace demo; Lets start using it [default@unknown] use demo; [default@demo]
  • 19. Creating a Column Family A column family holds data, much like a table in RDBMS. [default@demo] create column family user; Start adding data [default@demo] set user[1][a]=utf8(„foo‟); [default@demo] set user[2][b]=utf8(„bar‟); [default@demo] set user[2][c]=utf8(„test‟);
  • 20. Retrieving Data Retrieving columns by user key [default@demo] get user[2]; (column=b, value=bar) (column=c, value=test) Returned 2 results.
  • 21. The Road to Mastership Introduction to Cassandra Data Model Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 22. Column Column Name Value
  • 23. Column name Peter Parker 1 name Peter Parker
  • 24. Row icon name residence spiderman Peter Parker New York
  • 25. Row Columns Row Id icon name residence spiderman Peter Parker New York 1 2 spiderman name Peter Parker
  • 26. Column Family spider- icon name residence man Peter P New York icon name residence batman Bruce W Gotham icon name residence hulk Bruce B New York
  • 27. Column Family spider- icon name residence man Peter P New York icon name residence batman set user[„spiderman‟][„name‟] W „Peter Parker‟ Bruce = Gotham icon name residence hulk Value Column Bruce B New York Row id name Column Family
  • 28. The Allies Column Family Robin Alfred batman spider- Iceman Firestar Iron Man Storm man
  • 29. Published Issues Column Family ~2600 columns spider- 1/8/1962 man ### ... 1/3/2012 8/3/2012 ### ### batman 1/5/1939 ### ... 2/3/2012 9/3/2012 ### ### ~3800 columns
  • 30. Model Flexibility Flexible Data Model Image: photostock / FreeDigitalPhotos.net
  • 31. Keyspace » Like RDBMS database » A container for column families [default@unknown] create keyspace demo; » One keyspace per application, in most cases
  • 32. Expiring Columns – TTL icon name passwd_ residence spider- reminder man Peter P abcd New York set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200; 7200s = 2 hours
  • 33. Distributed Counters javaedge speakers sessions .com 1035 3402 incr page_views[„javaedge.com‟][„speakers‟] by 1 get page_views[„javaedge.com‟][„speakers‟]
  • 34. The Road to Mastership Introduction to Cassandra Communication with the Server: Clients Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 35. Cassandra Query Language » Looks a lot like SQL INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel) » Mostly valid SQL SELECT name, universe FROM users WHERE KEY = „hulk‟
  • 36. Advantages of using CQL » Run ad-hoc queries » Very familiar, easier to use » Stable interface ▪ For library developers ▪ For users
  • 37. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  • 38. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  • 39. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  • 40. Cassandra JDBC Driver import java.sql.*; Class.forName( "org.apache.cassandra.cql.jdbc.CassandraDriver"); Connection con = DriverManager.getConnection( "jdbc:cassandra://localhost:9160/keyspace");
  • 41. Cassandra JDBC Driver Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery( “SELECT name, residence FROM users WHERE KEY ='" + key + "'");
  • 43. Hector SliceQuery<...> query = HFactory.createSliceQuery(keyspace, ...); query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues") .setKey("spiderman"); QueryResult<ColumnSlice<Date, String>> result = query.execute();
  • 44. Hector: Advanced Features » Failover support » Connection pooling » Load balancing » JMX counters » Object mapper
  • 45. Maven plugin mvn cassandra:start Run your tests mvn cassandra:cql-exec mvn cassandra:stop
  • 46. The Road to Mastership Introduction to Cassandra Modeling Data Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 47. Queries First » Use the same Column Family for data that should be fetched together ▪ Reduces IO » Consider filtering and ordering
  • 48. Denormalize » Less seeks - faster reads » Storing redundant data ▪ Manually handling data integrity » Disk space is cheaper than seek time
  • 49. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York
  • 50. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York create column family users ... and column_metadata= [{column_name: residence, index_type: KEYS}]; » Good nameindexes with low cardinality SELECT for FROM users WHERE residence = „New York‟
  • 51. Manually Managed Index » Requirement: Find a superhero by name
  • 52. Manually Managed Index » Requirement: Find a superhero by name hulk batman Bruce Search Keys in term users CF spiderman Peter » Manually maintain an inverted index
  • 53. Bucketing hulk_jan 1/1/2012 2/1/2012 4/1/2012 _2012 Issue-1 Issue-2 Issue-3 All issues hulk_feb 2/2/2012 28/2/2012 29/2/2012 _2012 Issue-4 Issue-5 Issue-6 By month
  • 54. The Road to Mastership Introduction to Cassandra Cassandra Cluster Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 55. Virtual Ring 10 90 40 75 60
  • 56. Node Token 10 Node Keys 90 40 10 91-10 40 11-40 60 41-60 75 61-75 90 76-90 75 60
  • 57. Node Token hulk MD5’(hulk) = 20 10 90 40 75 60
  • 58. Node Token MD5’(hulk) = 20 10 hulk 90 40 75 60
  • 59. Node Token 10 hulk thor 40 MD5’(thor) = 42 90 75 60
  • 60. Node Token 10 hulk MD5’(thor) = 42 90 40 thor 75 60
  • 61. Inter-Node Communication 10 90 40 » Gossip » Failure Detection 75 60
  • 62. Fault Tolerance » Replication factor » Hinted Handoff 10 hulk 90 40 75 60 thor
  • 63. Replication Factor » Replication factor » Hinted Handoff 10 hulk thor 90 Replication 40 factor = 3 hulk hulk thor 75 60 thor
  • 64. Fault Tolerance » Replication factor » Hinted Handoff 10 90 40 75 60
  • 65. Hinted Handoff » Replication factor » Hinted Handoff 10 90 40 75 60
  • 66. Hinted Handoff » Replication factor » Hinted Handoff 10 90 40 75 60
  • 67. Client Requests Coordinator 10 Write Request 90 75 60
  • 68. Consistency Level Consistency level = ONE 10 Write Request 90 75 60
  • 69. Consistency Level Consistency level = ALL 10 Write Request 90 75 60
  • 70. The Road to Mastership Introduction to Cassandra Summary Data Running a Model Server Modeling Data Communicating with the Server Growing a Cluster
  • 71.
  • 72. Where Do You Sign? » Cassandra ▪ http://cassandra.apache.com ▪ http://www.datastax.com/ • Docs, tutorials & videos ▪ IRC: #cassandra on freenode » Hector ▪ https://github.com/rantav/hector ▪ https://github.com/zznate/hector-examples

Editor's Notes

  1. האפליקציה שלכם ויראלית כמות המשתמשים מוכפלת כל שבוע
  2. Sparse nested hashtables
  3. מילות מפתח:העמודות ממויינות
  4. Columns are stored in rowsRows are indexed by row-id - This is the primary index in Cassandraמילות מפתח: עמודה ככלי עיקרי לשמירת נתונים. עד 2 ביליון עמודות.
  5. כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C
  6. כלהאימפורטים הם java.sql, וצריך רק לשים לב שה-sql שלכם מתחים ב-C
  7. 128 bit = 16 byteShardingטבעי של הנתונים