Real-Time Big Data in
practice with Cassandra

Michaël Figuière
@mfiguiere
Speaker

                 Michaël Figuière




                      @mfiguiere


©2012 DataStax                      2
Ring Architecture


                        Node     Node




                 Node     Cassandra     Node




                        Node
                                Node




©2012 DataStax                                 3
Ring Architecture


                         Node     Replica




                 Node
                                            Replica




                        Node
                                Replica




©2012 DataStax                                        4
Linear Scalability
                 Client Writes/s by Node Count - Replication Factor = 3




©2012 DataStax                                                            5
Client / Server Communication

          Client   ?           Node     Replica



          Client

                       Node
                                                  Replica
          Client


                              Node
          Client                      Replica




©2012 DataStax                                              6
Client / Server Communication

          Client           Node         Replica



          Client

                   Node
                                                  Replica
          Client


                          Node
          Client                      Replica




                   Coordinator node:
                   Forwards all R/W requests
                   to corresponding replicas
©2012 DataStax                                              7
Tunable Consistency
                                    Time


                        A   A   A


           3 replicas




©2012 DataStax                             8
Tunable Consistency
                                                            Time


                        A   A      A


                        B   A      A


            Write ‘B’




                                    Write and wait for
                                acknowledge from one node
©2012 DataStax                                                     9
Tunable Consistency
                                                                  Time
                               R +W < N
                          A       A       A


                          B       A       A


                          B       A       A




   Read waiting for one node              Write and wait for
          to answer                   acknowledge from one node
©2012 DataStax                                                       10
Tunable Consistency
                                                                    Time
                               R +W = N
                          A       A       A


                          B       B       A


                          B       B       A




   Read waiting for one node               Write and wait for
          to answer                   acknowledges from two nodes
©2012 DataStax                                                         11
Tunable Consistency
                                                                    Time
                               R +W > N
                         A        A       A


                         B        B       A


                         B        B       A




  Read waiting for two nodes               Write and wait for
          to answer                   acknowledges from two nodes
©2012 DataStax                                                         12
Tunable Consistency
                                        Time
                  R = W = QUORUM
                   A      A      A


                   B      B      A


                   B      B      A




                 QUORUM = (N / 2) + 1


©2012 DataStax                             13
Request Path
                       1
          Client                   Node             Replica
                                              2

                                                   3
          Client
                   4                                    2
                           Node
                                                              Replica
                                                        3
          Client
                                                    2
                                          3
                                  Node
          Client                                  Replica




                           Coordinator node

©2012 DataStax                                                          14
Column Family Data Model


                             name        email         address    state
                  jbellis
                            Jonathan   jb@ds.com      123 main    TX
                             name        email         address    state
                  dhutch
                             Daria     dh@ds.com      45 2nd st   CA
                             name        email
                 egilmore
                              Eric     eg@ds.com


                 Row Key                         Columns




©2012 DataStax                                                            15
Column Family Data Model


                            dhutch     egilmore    datastax   mzcassie
                  jbellis

                            egilmore
                  dhutch

                            datastax   mzcassie
                 egilmore



                 Row Key                     Columns




©2012 DataStax                                                           16
CQL3 Data Model
    Timeline Table
         user_id     tweet_id      author                       body
         gmason        1765        phenry      Give me liberty or give me death
         gmason        1742      gwashington   I chopped down the cherry tree
       ahamilton       1797        jadams      A government of laws, not men
       ahamilton       1742      gwashington   I chopped down the cherry tree

       Partition     Remaining
         Key           Key




©2012 DataStax                                                                    17
CQL3 Data Model
    Timeline Table
         user_id     tweet_id     author                       body
         gmason        1765       phenry      Give me liberty or give me death
         gmason        1742     gwashington   I chopped down the cherry tree
       ahamilton       1797       jadams      A government of laws, not men
       ahamilton       1742     gwashington   I chopped down the cherry tree


    CQL
                   CREATE TABLE timeline (
                            user_id varchar,
                            tweet_id uuid,
                            author varchar,
                            body varchar,
                            PRIMARY KEY (user_id, tweet_id));


©2012 DataStax                                                                   18
CQL3 Data Model
    Timeline Table
         user_id         tweet_id        author                            body
         gmason            1765          phenry       Give me liberty or give me death
         gmason            1742        gwashington    I chopped down the cherry tree
       ahamilton           1797          jadams       A government of laws, not men
       ahamilton           1742        gwashington    I chopped down the cherry tree



Timeline Physical Layout
                 [1742, author]       [1742, body]        [1765, author]          [1765, body]
   gmason
                 gwashington      I chopped down the...      phenry        Give me liberty or give...
                 [1742, author]       [1742, body]        [1797, author]          [1797, body]
 ahamilton
                 gwashington      I chopped down the...      jadams        A government of laws...


©2012 DataStax                                                                                          19
Real-Time Analytics

   Google Analytics gives you
   immediate statistics about
         your website traffic




©2012 DataStax                  20
Web Analytics Data Model
   Analytics Table
             url     time      views   from_search   direct   from_referrer
      /index.html    12:00      354       300         20           34
      /index.html    12:01      402       333         25           44
    /contacts.html   12:00      23         3           0           20
    /contacts.html   12:01      20         4           1           15


    CQL
                     CREATE TABLE analytics (
                              url varchar,
                              time timestamp,
                              views counter,
                              from_search counter,
                              direct counter,
                              from_referrer counter,
                              PRIMARY KEY (url, time));
©2012 DataStax                                                                21
Web Analytics Data Model
   Analytics Table
             url     time      views   from_search   direct   from_referrer
      /index.html    12:00     354        300         20           34
      /index.html    12:01     402        333         25           44
    /contacts.html   12:00      23         3           0           20
    /contacts.html   12:01      20         4           1           15


    CQL

                     UPDATE analytics
                     SET views = views + 1,
                         from_search = from_search + 1
                     WHERE url = '/index.html'
                     AND   time = '2012-10-06 12:00';


©2012 DataStax                                                                22
Web Analytics Data Model
   Analytics Table
             url     time         views   from_search    direct   from_referrer
      /index.html    12:00         354       300          20           34
      /index.html    12:01         402       333          25           44
    /contacts.html   12:00         23         3            0           20
    /contacts.html   12:01         20         4            1           15


    CQL


                             SELECT * FROM analytics
                             WHERE url = '/index.html'




©2012 DataStax                                                                    23
Connect and Write

       Cluster cluster = Cluster.builder()
                         .addContactPoints("127.0.0.1", "127.0.0.2")
                         .build();

       Session session = cluster.connect();

       session.execute(
          "INSERT INTO user (user_id, name, email)
           VALUES (12345, 'johndoe', 'john@doe.com')"
       );




©2012 DataStax                                                     24
Read

             ResultSet rs = session.execute("SELECT * FROM user");

             List<CQLRow> rows = rs.fetchAll();

             for (CQLRow row : rows) {

                 String userId = row.getString("user_id");
                 String name = row.getString("name");
                 String email = row.getString("email");
             }




©2012 DataStax                                                       25
Object Mapping

     @Table("user_and_messages")   public enum Gender {
     public class User {
     	                             	   @EnumValue("m")
     	 @Column("user_id")          	   MALE,
     	 private String userId;      	
     	                             	   @EnumValue("f")
     	 private String name;        	   FEMALE;
     	                             }
     	 private String email;
     	
     	 private Gender gender;
     }




©2012 DataStax                                            26
Aggregation
  @Table("user_and_messages")         	 public class Message {
  public class User {
  	                                   	   	 private String title;
  	 @Column("user_id")                	   	
  	 private String userId;            	   	 private String body;
  	                                   	   }
  	 private String name;
  	
  	 private String email;
  	
  	 @GroupBy("user_id")
  	 private List<Message> messages;
  }



©2012 DataStax                                                      27
Inheritance
@Table("catalog")
                                        @InheritanceValue("tv")
@Inheritance({Phone.class, TV.class})
                                        public class TV
@InheritanceColumn("product_type")
                                        extends Product {
public abstract class Product {

                                        	 private float size;
	     @Column("product_id")
                                        }
	     private String productId;
	
	     private float price;
	
	     private String vendor;
	
	     private String model;

}

©2012 DataStax                                                  28
Online Business Intelligence

                      Storage for application            Distributed batch
                          in production                     processing



            Application                      Cassandra                       Hadoop



                          Using results in                 Storage for
                           production                        results




©2012 DataStax                                                                        29
Stay Tuned!

          blog.datastax.com
          @mfiguiere

NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

  • 1.
    Real-Time Big Datain practice with Cassandra Michaël Figuière @mfiguiere
  • 2.
    Speaker Michaël Figuière @mfiguiere ©2012 DataStax 2
  • 3.
    Ring Architecture Node Node Node Cassandra Node Node Node ©2012 DataStax 3
  • 4.
    Ring Architecture Node Replica Node Replica Node Replica ©2012 DataStax 4
  • 5.
    Linear Scalability Client Writes/s by Node Count - Replication Factor = 3 ©2012 DataStax 5
  • 6.
    Client / ServerCommunication Client ? Node Replica Client Node Replica Client Node Client Replica ©2012 DataStax 6
  • 7.
    Client / ServerCommunication Client Node Replica Client Node Replica Client Node Client Replica Coordinator node: Forwards all R/W requests to corresponding replicas ©2012 DataStax 7
  • 8.
    Tunable Consistency Time A A A 3 replicas ©2012 DataStax 8
  • 9.
    Tunable Consistency Time A A A B A A Write ‘B’ Write and wait for acknowledge from one node ©2012 DataStax 9
  • 10.
    Tunable Consistency Time R +W < N A A A B A A B A A Read waiting for one node Write and wait for to answer acknowledge from one node ©2012 DataStax 10
  • 11.
    Tunable Consistency Time R +W = N A A A B B A B B A Read waiting for one node Write and wait for to answer acknowledges from two nodes ©2012 DataStax 11
  • 12.
    Tunable Consistency Time R +W > N A A A B B A B B A Read waiting for two nodes Write and wait for to answer acknowledges from two nodes ©2012 DataStax 12
  • 13.
    Tunable Consistency Time R = W = QUORUM A A A B B A B B A QUORUM = (N / 2) + 1 ©2012 DataStax 13
  • 14.
    Request Path 1 Client Node Replica 2 3 Client 4 2 Node Replica 3 Client 2 3 Node Client Replica Coordinator node ©2012 DataStax 14
  • 15.
    Column Family DataModel name email address state jbellis Jonathan jb@ds.com 123 main TX name email address state dhutch Daria dh@ds.com 45 2nd st CA name email egilmore Eric eg@ds.com Row Key Columns ©2012 DataStax 15
  • 16.
    Column Family DataModel dhutch egilmore datastax mzcassie jbellis egilmore dhutch datastax mzcassie egilmore Row Key Columns ©2012 DataStax 16
  • 17.
    CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Partition Remaining Key Key ©2012 DataStax 17
  • 18.
    CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree CQL CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id)); ©2012 DataStax 18
  • 19.
    CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Timeline Physical Layout [1742, author] [1742, body] [1765, author] [1765, body] gmason gwashington I chopped down the... phenry Give me liberty or give... [1742, author] [1742, body] [1797, author] [1797, body] ahamilton gwashington I chopped down the... jadams A government of laws... ©2012 DataStax 19
  • 20.
    Real-Time Analytics Google Analytics gives you immediate statistics about your website traffic ©2012 DataStax 20
  • 21.
    Web Analytics DataModel Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time)); ©2012 DataStax 21
  • 22.
    Web Analytics DataModel Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL UPDATE analytics SET views = views + 1, from_search = from_search + 1 WHERE url = '/index.html' AND time = '2012-10-06 12:00'; ©2012 DataStax 22
  • 23.
    Web Analytics DataModel Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL SELECT * FROM analytics WHERE url = '/index.html' ©2012 DataStax 23
  • 24.
    Connect and Write Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .build(); Session session = cluster.connect(); session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', 'john@doe.com')" ); ©2012 DataStax 24
  • 25.
    Read ResultSet rs = session.execute("SELECT * FROM user"); List<CQLRow> rows = rs.fetchAll(); for (CQLRow row : rows) { String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email"); } ©2012 DataStax 25
  • 26.
    Object Mapping @Table("user_and_messages") public enum Gender { public class User { @EnumValue("m") @Column("user_id") MALE, private String userId; @EnumValue("f") private String name; FEMALE; } private String email; private Gender gender; } ©2012 DataStax 26
  • 27.
    Aggregation @Table("user_and_messages") public class Message { public class User { private String title; @Column("user_id") private String userId; private String body; } private String name; private String email; @GroupBy("user_id") private List<Message> messages; } ©2012 DataStax 27
  • 28.
    Inheritance @Table("catalog") @InheritanceValue("tv") @Inheritance({Phone.class, TV.class}) public class TV @InheritanceColumn("product_type") extends Product { public abstract class Product { private float size; @Column("product_id") } private String productId; private float price; private String vendor; private String model; } ©2012 DataStax 28
  • 29.
    Online Business Intelligence Storage for application Distributed batch in production processing Application Cassandra Hadoop Using results in Storage for production results ©2012 DataStax 29
  • 30.
    Stay Tuned! blog.datastax.com @mfiguiere