SlideShare a Scribd company logo
1 of 30
Download to read offline
Real-Time Big Data in
practice with Cassandra

Michaël Figuière
@mfiguiere
Speaker

                 Michaël Figuière




                      @mfiguiere


©2012 DataStax                      2
Ring Architecture


                        Node     Node




                 Node     Cassandra     Node




                        Node
                                Node




©2012 DataStax                                 3
Ring Architecture


                         Node     Replica




                 Node
                                            Replica




                        Node
                                Replica




©2012 DataStax                                        4
Linear Scalability
                 Client Writes/s by Node Count - Replication Factor = 3




©2012 DataStax                                                            5
Client / Server Communication

          Client   ?           Node     Replica



          Client

                       Node
                                                  Replica
          Client


                              Node
          Client                      Replica




©2012 DataStax                                              6
Client / Server Communication

          Client           Node         Replica



          Client

                   Node
                                                  Replica
          Client


                          Node
          Client                      Replica




                   Coordinator node:
                   Forwards all R/W requests
                   to corresponding replicas
©2012 DataStax                                              7
Tunable Consistency
                                    Time


                        A   A   A


           3 replicas




©2012 DataStax                             8
Tunable Consistency
                                                            Time


                        A   A      A


                        B   A      A


            Write ‘B’




                                    Write and wait for
                                acknowledge from one node
©2012 DataStax                                                     9
Tunable Consistency
                                                                  Time
                               R +W < N
                          A       A       A


                          B       A       A


                          B       A       A




   Read waiting for one node              Write and wait for
          to answer                   acknowledge from one node
©2012 DataStax                                                       10
Tunable Consistency
                                                                    Time
                               R +W = N
                          A       A       A


                          B       B       A


                          B       B       A




   Read waiting for one node               Write and wait for
          to answer                   acknowledges from two nodes
©2012 DataStax                                                         11
Tunable Consistency
                                                                    Time
                               R +W > N
                         A        A       A


                         B        B       A


                         B        B       A




  Read waiting for two nodes               Write and wait for
          to answer                   acknowledges from two nodes
©2012 DataStax                                                         12
Tunable Consistency
                                        Time
                  R = W = QUORUM
                   A      A      A


                   B      B      A


                   B      B      A




                 QUORUM = (N / 2) + 1


©2012 DataStax                             13
Request Path
                       1
          Client                   Node             Replica
                                              2

                                                   3
          Client
                   4                                    2
                           Node
                                                              Replica
                                                        3
          Client
                                                    2
                                          3
                                  Node
          Client                                  Replica




                           Coordinator node

©2012 DataStax                                                          14
Column Family Data Model


                             name        email         address    state
                  jbellis
                            Jonathan   jb@ds.com      123 main    TX
                             name        email         address    state
                  dhutch
                             Daria     dh@ds.com      45 2nd st   CA
                             name        email
                 egilmore
                              Eric     eg@ds.com


                 Row Key                         Columns




©2012 DataStax                                                            15
Column Family Data Model


                            dhutch     egilmore    datastax   mzcassie
                  jbellis

                            egilmore
                  dhutch

                            datastax   mzcassie
                 egilmore



                 Row Key                     Columns




©2012 DataStax                                                           16
CQL3 Data Model
    Timeline Table
         user_id     tweet_id      author                       body
         gmason        1765        phenry      Give me liberty or give me death
         gmason        1742      gwashington   I chopped down the cherry tree
       ahamilton       1797        jadams      A government of laws, not men
       ahamilton       1742      gwashington   I chopped down the cherry tree

       Partition     Remaining
         Key           Key




©2012 DataStax                                                                    17
CQL3 Data Model
    Timeline Table
         user_id     tweet_id     author                       body
         gmason        1765       phenry      Give me liberty or give me death
         gmason        1742     gwashington   I chopped down the cherry tree
       ahamilton       1797       jadams      A government of laws, not men
       ahamilton       1742     gwashington   I chopped down the cherry tree


    CQL
                   CREATE TABLE timeline (
                            user_id varchar,
                            tweet_id uuid,
                            author varchar,
                            body varchar,
                            PRIMARY KEY (user_id, tweet_id));


©2012 DataStax                                                                   18
CQL3 Data Model
    Timeline Table
         user_id         tweet_id        author                            body
         gmason            1765          phenry       Give me liberty or give me death
         gmason            1742        gwashington    I chopped down the cherry tree
       ahamilton           1797          jadams       A government of laws, not men
       ahamilton           1742        gwashington    I chopped down the cherry tree



Timeline Physical Layout
                 [1742, author]       [1742, body]        [1765, author]          [1765, body]
   gmason
                 gwashington      I chopped down the...      phenry        Give me liberty or give...
                 [1742, author]       [1742, body]        [1797, author]          [1797, body]
 ahamilton
                 gwashington      I chopped down the...      jadams        A government of laws...


©2012 DataStax                                                                                          19
Real-Time Analytics

   Google Analytics gives you
   immediate statistics about
         your website traffic




©2012 DataStax                  20
Web Analytics Data Model
   Analytics Table
             url     time      views   from_search   direct   from_referrer
      /index.html    12:00      354       300         20           34
      /index.html    12:01      402       333         25           44
    /contacts.html   12:00      23         3           0           20
    /contacts.html   12:01      20         4           1           15


    CQL
                     CREATE TABLE analytics (
                              url varchar,
                              time timestamp,
                              views counter,
                              from_search counter,
                              direct counter,
                              from_referrer counter,
                              PRIMARY KEY (url, time));
©2012 DataStax                                                                21
Web Analytics Data Model
   Analytics Table
             url     time      views   from_search   direct   from_referrer
      /index.html    12:00     354        300         20           34
      /index.html    12:01     402        333         25           44
    /contacts.html   12:00      23         3           0           20
    /contacts.html   12:01      20         4           1           15


    CQL

                     UPDATE analytics
                     SET views = views + 1,
                         from_search = from_search + 1
                     WHERE url = '/index.html'
                     AND   time = '2012-10-06 12:00';


©2012 DataStax                                                                22
Web Analytics Data Model
   Analytics Table
             url     time         views   from_search    direct   from_referrer
      /index.html    12:00         354       300          20           34
      /index.html    12:01         402       333          25           44
    /contacts.html   12:00         23         3            0           20
    /contacts.html   12:01         20         4            1           15


    CQL


                             SELECT * FROM analytics
                             WHERE url = '/index.html'




©2012 DataStax                                                                    23
Connect and Write

       Cluster cluster = Cluster.builder()
                         .addContactPoints("127.0.0.1", "127.0.0.2")
                         .build();

       Session session = cluster.connect();

       session.execute(
          "INSERT INTO user (user_id, name, email)
           VALUES (12345, 'johndoe', 'john@doe.com')"
       );




©2012 DataStax                                                     24
Read

             ResultSet rs = session.execute("SELECT * FROM user");

             List<CQLRow> rows = rs.fetchAll();

             for (CQLRow row : rows) {

                 String userId = row.getString("user_id");
                 String name = row.getString("name");
                 String email = row.getString("email");
             }




©2012 DataStax                                                       25
Object Mapping

     @Table("user_and_messages")   public enum Gender {
     public class User {
     	                             	   @EnumValue("m")
     	 @Column("user_id")          	   MALE,
     	 private String userId;      	
     	                             	   @EnumValue("f")
     	 private String name;        	   FEMALE;
     	                             }
     	 private String email;
     	
     	 private Gender gender;
     }




©2012 DataStax                                            26
Aggregation
  @Table("user_and_messages")         	 public class Message {
  public class User {
  	                                   	   	 private String title;
  	 @Column("user_id")                	   	
  	 private String userId;            	   	 private String body;
  	                                   	   }
  	 private String name;
  	
  	 private String email;
  	
  	 @GroupBy("user_id")
  	 private List<Message> messages;
  }



©2012 DataStax                                                      27
Inheritance
@Table("catalog")
                                        @InheritanceValue("tv")
@Inheritance({Phone.class, TV.class})
                                        public class TV
@InheritanceColumn("product_type")
                                        extends Product {
public abstract class Product {

                                        	 private float size;
	     @Column("product_id")
                                        }
	     private String productId;
	
	     private float price;
	
	     private String vendor;
	
	     private String model;

}

©2012 DataStax                                                  28
Online Business Intelligence

                      Storage for application            Distributed batch
                          in production                     processing



            Application                      Cassandra                       Hadoop



                          Using results in                 Storage for
                           production                        results




©2012 DataStax                                                                        29
Stay Tuned!

          blog.datastax.com
          @mfiguiere

More Related Content

Similar to NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
 
NYC* Tech Day - New Cassandra Drivers in Depth
NYC* Tech Day - New Cassandra Drivers in DepthNYC* Tech Day - New Cassandra Drivers in Depth
NYC* Tech Day - New Cassandra Drivers in DepthMichaël Figuière
 
Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001jucaab
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversMichaël Figuière
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Oracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetOracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetKyle Hailey
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetupPatrick McFadin
 
NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?Guido Schmutz
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersMichaël Figuière
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesNitin Khattar
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 

Similar to NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra (16)

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
NYC* Tech Day - New Cassandra Drivers in Depth
NYC* Tech Day - New Cassandra Drivers in DepthNYC* Tech Day - New Cassandra Drivers in Depth
NYC* Tech Day - New Cassandra Drivers in Depth
 
Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra Drivers
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
How NOSQL Paid off for Telenor
How NOSQL Paid off for TelenorHow NOSQL Paid off for Telenor
How NOSQL Paid off for Telenor
 
Oracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*NetOracle 10g Performance: chapter 11 SQL*Net
Oracle 10g Performance: chapter 11 SQL*Net
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetup
 
B17 Eliminating the database bottleneck
B17 Eliminating the database bottleneckB17 Eliminating the database bottleneck
B17 Eliminating the database bottleneck
 
NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL Databases for Implementing Data Services – Should I Care?
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 

More from Michaël Figuière

EclipseCon - Building an IDE for Apache Cassandra
EclipseCon - Building an IDE for Apache CassandraEclipseCon - Building an IDE for Apache Cassandra
EclipseCon - Building an IDE for Apache CassandraMichaël Figuière
 
YaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersYaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersMichaël Figuière
 
Geneva JUG - Cassandra for Java Developers
Geneva JUG - Cassandra for Java DevelopersGeneva JUG - Cassandra for Java Developers
Geneva JUG - Cassandra for Java DevelopersMichaël Figuière
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Michaël Figuière
 
GTUG Nantes (Dec 2011) - BigTable et NoSQL
GTUG Nantes (Dec 2011) - BigTable et NoSQLGTUG Nantes (Dec 2011) - BigTable et NoSQL
GTUG Nantes (Dec 2011) - BigTable et NoSQLMichaël Figuière
 
Duchess France (Nov 2011) - Atelier Apache Mahout
Duchess France (Nov 2011) - Atelier Apache MahoutDuchess France (Nov 2011) - Atelier Apache Mahout
Duchess France (Nov 2011) - Atelier Apache MahoutMichaël Figuière
 
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...Michaël Figuière
 
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec Cassandra
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec CassandraBreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec Cassandra
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec CassandraMichaël Figuière
 
Mix-IT (Apr 2011) - Intelligence Collective avec Apache Mahout
Mix-IT (Apr 2011) - Intelligence Collective avec Apache MahoutMix-IT (Apr 2011) - Intelligence Collective avec Apache Mahout
Mix-IT (Apr 2011) - Intelligence Collective avec Apache MahoutMichaël Figuière
 
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache MahoutXebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache MahoutMichaël Figuière
 
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux Entreprises
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux EntreprisesBreizh JUG (mar 2011) - NoSQL : Des Grands du Web aux Entreprises
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux EntreprisesMichaël Figuière
 
FOSDEM (feb 2011) - A real-time search engine with Lucene and S4
FOSDEM (feb 2011) -  A real-time search engine with Lucene and S4FOSDEM (feb 2011) -  A real-time search engine with Lucene and S4
FOSDEM (feb 2011) - A real-time search engine with Lucene and S4Michaël Figuière
 
Xebia Knowledge Exchange (feb 2011) - Large Scale Web Development
Xebia Knowledge Exchange (feb 2011) - Large Scale Web DevelopmentXebia Knowledge Exchange (feb 2011) - Large Scale Web Development
Xebia Knowledge Exchange (feb 2011) - Large Scale Web DevelopmentMichaël Figuière
 
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Michaël Figuière
 
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprises
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprisesLorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprises
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprisesMichaël Figuière
 
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprises
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprisesTours JUG (oct 2010) - NoSQL, des grands du Web aux entreprises
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprisesMichaël Figuière
 
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalité
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalitéParis JUG (sept 2010) - NoSQL : Des concepts à la réalité
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalitéMichaël Figuière
 
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldMichaël Figuière
 
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...Michaël Figuière
 

More from Michaël Figuière (20)

EclipseCon - Building an IDE for Apache Cassandra
EclipseCon - Building an IDE for Apache CassandraEclipseCon - Building an IDE for Apache Cassandra
EclipseCon - Building an IDE for Apache Cassandra
 
YaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersYaJug - Cassandra for Java Developers
YaJug - Cassandra for Java Developers
 
Geneva JUG - Cassandra for Java Developers
Geneva JUG - Cassandra for Java DevelopersGeneva JUG - Cassandra for Java Developers
Geneva JUG - Cassandra for Java Developers
 
ChtiJUG - Cassandra 2.0
ChtiJUG - Cassandra 2.0ChtiJUG - Cassandra 2.0
ChtiJUG - Cassandra 2.0
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
GTUG Nantes (Dec 2011) - BigTable et NoSQL
GTUG Nantes (Dec 2011) - BigTable et NoSQLGTUG Nantes (Dec 2011) - BigTable et NoSQL
GTUG Nantes (Dec 2011) - BigTable et NoSQL
 
Duchess France (Nov 2011) - Atelier Apache Mahout
Duchess France (Nov 2011) - Atelier Apache MahoutDuchess France (Nov 2011) - Atelier Apache Mahout
Duchess France (Nov 2011) - Atelier Apache Mahout
 
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...
JUG Summer Camp (Sep 2011) - Les applications et architectures d’entreprise d...
 
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec Cassandra
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec CassandraBreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec Cassandra
BreizhCamp (Jun 2011) - Haute disponibilité et élasticité avec Cassandra
 
Mix-IT (Apr 2011) - Intelligence Collective avec Apache Mahout
Mix-IT (Apr 2011) - Intelligence Collective avec Apache MahoutMix-IT (Apr 2011) - Intelligence Collective avec Apache Mahout
Mix-IT (Apr 2011) - Intelligence Collective avec Apache Mahout
 
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache MahoutXebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout
Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout
 
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux Entreprises
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux EntreprisesBreizh JUG (mar 2011) - NoSQL : Des Grands du Web aux Entreprises
Breizh JUG (mar 2011) - NoSQL : Des Grands du Web aux Entreprises
 
FOSDEM (feb 2011) - A real-time search engine with Lucene and S4
FOSDEM (feb 2011) -  A real-time search engine with Lucene and S4FOSDEM (feb 2011) -  A real-time search engine with Lucene and S4
FOSDEM (feb 2011) - A real-time search engine with Lucene and S4
 
Xebia Knowledge Exchange (feb 2011) - Large Scale Web Development
Xebia Knowledge Exchange (feb 2011) - Large Scale Web DevelopmentXebia Knowledge Exchange (feb 2011) - Large Scale Web Development
Xebia Knowledge Exchange (feb 2011) - Large Scale Web Development
 
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
 
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprises
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprisesLorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprises
Lorraine JUG (dec 2010) - NoSQL, des grands du Web aux entreprises
 
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprises
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprisesTours JUG (oct 2010) - NoSQL, des grands du Web aux entreprises
Tours JUG (oct 2010) - NoSQL, des grands du Web aux entreprises
 
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalité
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalitéParis JUG (sept 2010) - NoSQL : Des concepts à la réalité
Paris JUG (sept 2010) - NoSQL : Des concepts à la réalité
 
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
 
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...
Xebia Knowledge Exchange (may 2010) - NoSQL : Using the right tool for the ri...
 

NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

  • 1. Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere
  • 2. Speaker Michaël Figuière @mfiguiere ©2012 DataStax 2
  • 3. Ring Architecture Node Node Node Cassandra Node Node Node ©2012 DataStax 3
  • 4. Ring Architecture Node Replica Node Replica Node Replica ©2012 DataStax 4
  • 5. Linear Scalability Client Writes/s by Node Count - Replication Factor = 3 ©2012 DataStax 5
  • 6. Client / Server Communication Client ? Node Replica Client Node Replica Client Node Client Replica ©2012 DataStax 6
  • 7. Client / Server Communication Client Node Replica Client Node Replica Client Node Client Replica Coordinator node: Forwards all R/W requests to corresponding replicas ©2012 DataStax 7
  • 8. Tunable Consistency Time A A A 3 replicas ©2012 DataStax 8
  • 9. Tunable Consistency Time A A A B A A Write ‘B’ Write and wait for acknowledge from one node ©2012 DataStax 9
  • 10. Tunable Consistency Time R +W < N A A A B A A B A A Read waiting for one node Write and wait for to answer acknowledge from one node ©2012 DataStax 10
  • 11. Tunable Consistency Time R +W = N A A A B B A B B A Read waiting for one node Write and wait for to answer acknowledges from two nodes ©2012 DataStax 11
  • 12. Tunable Consistency Time R +W > N A A A B B A B B A Read waiting for two nodes Write and wait for to answer acknowledges from two nodes ©2012 DataStax 12
  • 13. Tunable Consistency Time R = W = QUORUM A A A B B A B B A QUORUM = (N / 2) + 1 ©2012 DataStax 13
  • 14. Request Path 1 Client Node Replica 2 3 Client 4 2 Node Replica 3 Client 2 3 Node Client Replica Coordinator node ©2012 DataStax 14
  • 15. Column Family Data Model name email address state jbellis Jonathan jb@ds.com 123 main TX name email address state dhutch Daria dh@ds.com 45 2nd st CA name email egilmore Eric eg@ds.com Row Key Columns ©2012 DataStax 15
  • 16. Column Family Data Model dhutch egilmore datastax mzcassie jbellis egilmore dhutch datastax mzcassie egilmore Row Key Columns ©2012 DataStax 16
  • 17. CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Partition Remaining Key Key ©2012 DataStax 17
  • 18. CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree CQL CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id)); ©2012 DataStax 18
  • 19. CQL3 Data Model Timeline Table user_id tweet_id author body gmason 1765 phenry Give me liberty or give me death gmason 1742 gwashington I chopped down the cherry tree ahamilton 1797 jadams A government of laws, not men ahamilton 1742 gwashington I chopped down the cherry tree Timeline Physical Layout [1742, author] [1742, body] [1765, author] [1765, body] gmason gwashington I chopped down the... phenry Give me liberty or give... [1742, author] [1742, body] [1797, author] [1797, body] ahamilton gwashington I chopped down the... jadams A government of laws... ©2012 DataStax 19
  • 20. Real-Time Analytics Google Analytics gives you immediate statistics about your website traffic ©2012 DataStax 20
  • 21. Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time)); ©2012 DataStax 21
  • 22. Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL UPDATE analytics SET views = views + 1, from_search = from_search + 1 WHERE url = '/index.html' AND time = '2012-10-06 12:00'; ©2012 DataStax 22
  • 23. Web Analytics Data Model Analytics Table url time views from_search direct from_referrer /index.html 12:00 354 300 20 34 /index.html 12:01 402 333 25 44 /contacts.html 12:00 23 3 0 20 /contacts.html 12:01 20 4 1 15 CQL SELECT * FROM analytics WHERE url = '/index.html' ©2012 DataStax 23
  • 24. Connect and Write Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .build(); Session session = cluster.connect(); session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', 'john@doe.com')" ); ©2012 DataStax 24
  • 25. Read ResultSet rs = session.execute("SELECT * FROM user"); List<CQLRow> rows = rs.fetchAll(); for (CQLRow row : rows) { String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email"); } ©2012 DataStax 25
  • 26. Object Mapping @Table("user_and_messages") public enum Gender { public class User { @EnumValue("m") @Column("user_id") MALE, private String userId; @EnumValue("f") private String name; FEMALE; } private String email; private Gender gender; } ©2012 DataStax 26
  • 27. Aggregation @Table("user_and_messages") public class Message { public class User { private String title; @Column("user_id") private String userId; private String body; } private String name; private String email; @GroupBy("user_id") private List<Message> messages; } ©2012 DataStax 27
  • 28. Inheritance @Table("catalog") @InheritanceValue("tv") @Inheritance({Phone.class, TV.class}) public class TV @InheritanceColumn("product_type") extends Product { public abstract class Product { private float size; @Column("product_id") } private String productId; private float price; private String vendor; private String model; } ©2012 DataStax 28
  • 29. Online Business Intelligence Storage for application Distributed batch in production processing Application Cassandra Hadoop Using results in Storage for production results ©2012 DataStax 29
  • 30. Stay Tuned! blog.datastax.com @mfiguiere