Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling
Upcoming SlideShare
Loading in...5
×
 

Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

on

  • 1,339 views

- Introduction to CQL3 and DataModeling (Johnny Miller, Cassandra Solutions Architect, Datastax): ...

- Introduction to CQL3 and DataModeling (Johnny Miller, Cassandra Solutions Architect, Datastax):
Johnny Miller is an experience developer, architect, team
lead and agile coach with a history of working at Sky, AOL
Broadband and Alcatel-Lucent. Johnny has architected and
delivered a number of platforms using Cassandra as a key
component for achieving high availability and efficient scaling.

Statistics

Views

Total Views
1,339
Views on SlideShare
1,339
Embed Views
0

Actions

Likes
0
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling Presentation Transcript

  • Introduction to CQL and Data Modeling Helsinki Cassandra Meetup 10th February 2014 ©2014 DataStax Confidential. Do not distribute without consent.
  • Agenda •  •  •  •  •  Introduction CQL Basics Data Modeling Time Series/Sensor Data Java Driver ©2014 DataStax Confidential. Do not distribute without consent. 2
  • About me Johnny Miller DataStax Solutions Architect www.datastax.com @DataStax @CyanMiller https://www.linkedin.com/in/johnnymiller jmiller@datastax.com ©2014 DataStax Confidential. Do not distribute without consent. 3 View slide
  • DataStax •  Founded in April 2010 •  We drive Apache Cassandra™ •  400+ customers (20 of the Fortune 100) •  200+ employees •  Home to Apache Cassandra Chair & most committers •  Headquartered in San Francisco Bay area •  European headquarters established in London Our Goal To be the first and best database choice for online applications ©2014 DataStax Confidential. Do not distribute without consent. 4 View slide
  • DataStax •  DataStax supports both the open source community and enterprises. Open Source/Community Enterprise Software •  Apache Cassandra (employ Cassandra chair and 90+% of the committers) •  DataStax Community Edition •  DataStax OpsCenter •  DataStax DevCenter •  DataStax Drivers/Connectors •  Online Documentation •  Online Training •  Mailing lists and forums ©2014 DataStax Confidential. Do not distribute without consent. •  DataStax Enterprise Edition •  Certified Cassandra •  Built-in Analytics •  Built-in Enterprise Search •  Enterprise Security •  DataStax OpsCenter •  Expert Support •  Consultative Help •  Professional Training 5
  • Cassandra Adoption Source http://db-engines.com/en/ranking, Feb 2014 ©2014 DataStax Confidential. Do not distribute without consent. 6
  • A sample of Cassandra & DataStax Enterprise users ©2014 DataStax Confidential. Do not distribute without consent. 7
  • Why Good Data Modeling is Important •  Cassandra is a highly available, highly scalable, & highly distributed database, with no single point of failure •  To achieve this, Cassandra is optimized for non-relational data models. •  Joins do not function well on distributed databases. •  Locking and transactions jam up distributed nodes •  By modeling data properly for Cassandra you can avoid joins, locking, and transactions for your application. ©2014 DataStax Confidential. Do not distribute without consent. 8
  • CQL Basics YesCQL ©2014 DataStax Confidential. Do not distribute without consent. 9
  • CQL Basics •  Cassandra Query Language •  SQL–like language to query Cassandra •  Limited predicates. Attempts to prevent bad queries •  but, you can still get into trouble! •  Keyspace – analogous to a schema. •  Has various storage attributes. •  The keyspace determines the RF. •  Table – looks like a SQL Table. •  A table must have a Primary Key. •  We can fully qualify a table as <keyspace>.<table> ©2014 DataStax Confidential. Do not distribute without consent. 10
  • DevCenter •  DataStax DevCenter – a free, visual query tool for creating and running CQL statements against Cassandra and DataStax Enterprise. ©2014 DataStax Confidential. Do not distribute without consent. 11
  • CQL Basics •  Usual statements •  CREATE / DROP / ALTER TABLE • SELECT BUT •  INSERT AND UPDATE are similar to each other •  If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it. •  Think of it as an UPSERT •  Therefore we never get a key violation •  For updates, Cassandra never reads ©2014 DataStax Confidential. Do not distribute without consent. 12
  • Creating a keyspace - Single Data Centre Consistency ©2014 DataStax Confidential. Do not distribute without consent. 13
  • Creating a keyspace - Multiple Data Centre Consistency ©2014 DataStax Confidential. Do not distribute without consent. 14
  • CQL Basics – creating a table CREATE TABLE cities (! city_name varchar,! elevation int,! population int,! latitude float,! longitude float,! PRIMARY KEY (city_name)! );! •  We can visualize it this way: •  city_name is the partition key •  In this example, the partition key = primary key ©2014 DataStax Confidential. Do not distribute without consent. 15
  • CQL Basics – Composite Primary Key The Primary Key •  The key uniquely identifies a row. •  A composite primary key consists of: •  A partition key •  One or more clustering columns e.g. PRIMARY KEY (partition key, cluster columns, ...)! •  The partition key determines on which node the partition resides •  Data is ordered in cluster column order within the partition ©2014 DataStax Confidential. Do not distribute without consent. 16
  • CQL Basics – Composite Primary Key CREATE TABLE sporty_league (! team_name varchar,! player_name varchar,! jersey int,! PRIMARY KEY (team_name, player_name)! );! ©2014 DataStax Confidential. Do not distribute without consent. 17
  • CQL Basics – Simple Select SELECT * FROM sporty_league;! •  More that a few rows can be slow. (Limited to 10,000 rows by default) •  Use LIMIT keyword to choose fewer or more rows ©2014 DataStax Confidential. Do not distribute without consent. 18
  • CQL Basics - Simple Select on Partition Key and Cluster Columns SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’;! SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ 
 and player_name = ‘Lucky’;! ©2014 DataStax Confidential. Do not distribute without consent. 19
  • CQL Basics – Insert/Update INSERT INTO sporty_league (team_name, player_name, jersey) VALUES ('Mighty Mutts',’Felix’,90);! ©2014 DataStax Confidential. Do not distribute without consent. 20
  • CQL Basics - Ordering •  •  •  •  Partition keys are not ordered, but the cluster columns are. However, you can only order by a column if it’s a cluster column. Data will returned by default in the order of the clustering column. You can also use the ORDER BY keyword – but only on the clustering column! SELECT * FROM sporty_league 
 WHERE team_name = ‘Mighty Mutts’ 
 ORDER BY player_name DESC;! ©2014 DataStax Confidential. Do not distribute without consent. 21
  • CQL Basics – Group By •  We have already done this! •  The partition key effectively names the columns for grouping. •  The previous table contained all of the players grouped by their team_name. ©2014 DataStax Confidential. Do not distribute without consent. 22
  • CQL Basics - Predicates •  On the partition key: = and IN •  On the cluster columns: <, <=, =, >=, >, IN ©2014 DataStax Confidential. Do not distribute without consent. 23
  • CQL Basics – Composite Partition Key CREATE TABLE cities (! city_name varchar,! state varchar! PRIMARY KEY ((city_name,state))! );! •  Each city gets it own partition! ©2014 DataStax Confidential. Do not distribute without consent. 24
  • CQL Basics – Performance considerations •  The best queries are in a single partition. i.e. WHERE partition key = <something>! •  Each new partition requires a new disk seek. •  Queries that span multiple partitions are s-l-o-w •  Queries that span multiple cluster columns are fast ©2014 DataStax Confidential. Do not distribute without consent. 25
  • CQL Basics – Authentication and Authorisation •  •  •  •  CQL supports creating users and granting them access to tables etc.. You need to enable authentication in the cassandra.yaml config file. You can create, alter, drop and list users You can then GRANT permissions to users accordingly – ALTER, AUTHORIZE, DROP, MODIFY, SELECT. ©2014 DataStax Confidential. Do not distribute without consent. 26
  • CQL Basics - Tracing •  You can turn on tracing on or off for queries with the TRACING ON | OFF command. •  This can help you understand what Cassandra is doing and identify any performance problems. •  http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 ©2014 DataStax Confidential. Do not distribute without consent. 27
  • CQL Basics – TTL •  Expiring Columns, or Time to Live (TTL) INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘abe’, ‘lincoln’) USING TTL 3600;! // Expires data in one hour! ©2014 DataStax Confidential. Do not distribute without consent. 28
  • CQL Basics – Data Types ©2014 DataStax Confidential. Do not distribute without consent. 29
  • CQL Basics – Data Types: Collections •  CQL supports having columns that contain collections of data. •  The collection types include: •  Set, List and Map. CREATE TABLE collections_example (! !id int PRIMARY KEY,! !set_example set<text>,! !list_example list<text>,! !map_example map<int, text>! ); •  These data types are intended to support the type of 1-to-many relationships that can be modeled in a relational DB e.g. a user has many email addresses. •  Some performance considerations around collections. •  Requires serialization so don’t go crazy! •  Often more efficient to denormalise further rather than use collections if intending to store lots of data. •  Favour sets over list – lists not very performant ©2014 DataStax Confidential. Do not distribute without consent. 30
  • CQL Basics – Data Types: Counters •  Stores a number that incrementally counts the occurrences of a particular event or process. UPDATE UserActions SET total = total + 2 
 WHERE user = 123 AND action = ’xyz';! ©2014 DataStax Confidential. Do not distribute without consent. 31
  • CQL Basics - Lightweight Transactions •  Introduced in Cassandra 2.0 •  DSE 4 will include Cassandra 2.0 (due soon…) •  DSE 3.2 (current version) is using Cassandra 1.2 •  Uses the Paxos consensus protocol to obtain an agreement across the cluster. •  Example: !INSERT INTO customer_account (customerID, customer_email) 
 !VALUES (‘LauraS’, ‘lauras@gmail.com’) 
 !IF NOT EXISTS;! !UPDATE customer_account SET customer_email=’laurass@gmail.com’
 !IF customer_email=’lauras@gmail.com’;! •  Great for 1% of your application – but not recommended to be used too much! •  Eventual consistency is your friend: http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency- hopeful-consistency-by-christos-kalantzis ©2014 DataStax Confidential. Do not distribute without consent. 32
  • Data Modeling Query based and denormalised ©2014 DataStax Confidential. Do not distribute without consent. 33
  • Cassandra is not a relational database •  Cassandra doesn’t work the same way as an RDBMS •  Your data modeling approach won’t work the same way either •  No foreign keys •  No joins ©2014 DataStax Confidential. Do not distribute without consent. 34
  • Query-Driven Data Modeling •  Start by addressing the queries that you will need to answer •  Your data should be able to match it directly •  Think about: •  The actions your application needs to perform •  How you want to access the data •  What are the use cases? •  What does the data look like? ©2014 DataStax Confidential. Do not distribute without consent. 35
  • Query-Driven Data Modeling contd. •  What are you trying to retrieve •  Does it need to be ordered? •  Is there any nesting of data? •  Do you need to group data? •  Do you need to filter data? •  Does data expire? •  Does data need to be retrieved in chronological order? ©2014 DataStax Confidential. Do not distribute without consent. 36
  • Denormalisation •  Combine table columns into a single view i.e. materialized view •  we have to create table that stores all the data that would be in the view •  Remember - no joins in Cassandra! Advantage: •  Having the data stored in a this manner greatly improves performance •  Less seeking •  Less network traffic Disadvantage: •  Data duplication •  different tables for different queries •  you will use more disk space – but disks are cheap! ©2014 DataStax Confidential. Do not distribute without consent. 37
  • Avoid client-side joins •  What is a client-side join? •  Querying a table from Cassandra •  Using the results from the first query to query a second table •  Why avoid? •  Degrades performance i.e. more I/O, seeks and traffic ©2014 DataStax Confidential. Do not distribute without consent. 38
  • Don’t be scared of writes •  •  •  •  Cassandra is the fastest DB there is for writes. Writing to multiple tables is not going to be slow! 3-5000 writes/second/core e.g. 8 core server = 24k-30k writes per second! < 1ms typical for most rights (varies based on hardware) ©2014 DataStax Confidential. Do not distribute without consent. 39
  • Performance “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013. http://vldb.org/pvldb/vol5/ p1724_tilmannrabl_vldb2013.pdf Netflix Cloud Benchmark… End Point independent NoSQL Benchmark Highest in throughput vs MongoDB and HBase Lowest in latency vs MongoDB and HBase http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalabilityon.html ©2014 DataStax Confidential. Do not distribute without consent. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQLDatabases.pdf 40
  • One-to-many •  Relationship without being relational •  Example – Users have many videos •  Wait? Where is the foreign key? ©2014 DataStax Confidential. Do not distribute without consent. 41
  • One-to-Many CREATE TABLE videos (! videoid uuid,! videoname varchar,! username varchar,! description varchar,! tags varchar,! upload_date timestamp,! PRIMARY KEY(videoid)! );! CREATE TABLE username_video_index (! username varchar,! videoid uuid,! upload_date timestamp,! video_name varchar,! PRIMARY KEY (username, videoid)! );! ! •  Static table to store videos SELECT video_name FROM username_video_index WHERE username = ‘tcodd’ AND videoid = ‘99051fe9’! •  UUID for unique video id •  Lookup video by username •  Add username to denormalize Write in two tables at once for fast lookups ©2014 DataStax Confidential. Do not distribute without consent. 42
  • Many-to-many •  Example - users and videos have many comments. ©2014 DataStax Confidential. Do not distribute without consent. 43
  • Many-to-many •  Model both sides of the view •  Insert both when comment is created •  Materialized views from either side CREATE TABLE comments_by_user (! username varchar,! videoid uuid,! comment_ts timestamp,! comment varchar,! PRIMARY KEY (username,videoid)! );! CREATE TABLE comments_by_video (! videoid uuid,! username varchar,! comment_ts timestamp,! comment varchar,! PRIMARY KEY (videoid,username)! );! DON’T BE AFRAID OF WRITES ©2014 DataStax Confidential. Do not distribute without consent. 44
  • Partition Key is not the same as a Primary Key •  Within a table, a row is referenced by a partition key •  This is either your primary key or the first part of a compound primary key Similarities •  Partition key identifies a partition as being separate from other partitions •  Must be unique within a table Differences •  Inserting a new record with a partition key that already exists doesn’t do what you’re used to in a RDBMS i.e. No primary key violations •  An INSERT using an existing partition key is allowed •  As a consequence, INSERT and UPDATE act in the same way i.e. UPSERT ©2014 DataStax Confidential. Do not distribute without consent. 45
  • How to avoid UPSERTS •  Guarantee that your primary keys are unique from one another •  Use an appropriate natural key based on your data •  Use a surrogate key for partition key Risks with natural keys •  Depending on the type of natural key that is used, there may still be an increased risk of UPSERTs •  Changing the datum used for a Natural Key requires a lot of overhead. •  So why not use a sequence to generate a surrogate key? •  You cant – Cassandra doesn’t provide sequences! ©2014 DataStax Confidential. Do not distribute without consent. 46
  • What, no sequences? •  Sequences are a handy feature in RDMBS for auto-creation of IDs for you data. •  Guaranteed unique •  E.g. INSERT INTO user (id, firstName, •  Cassandra has no sequences! LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)! •  Extremely difficult in a masterless distributed system •  Requires a lock (perf killer) •  What to do? •  Use part of the data to create a unique key •  Use a UUID ©2014 DataStax Confidential. Do not distribute without consent. 47
  • UUID •  Universal Unique ID •  128 bit number represented in character form e.g. 99051fe9-6a9c-46c2b949-38ef78858dd0 •  Easily generated on the client •  Version 1 has a timestamp component •  Version 4 has no timestamp component •  Faster to generate ©2014 DataStax Confidential. Do not distribute without consent. 48
  • Indexing •  This gives you fast access to data •  Secondary indexes != relational indexes ©2014 DataStax Confidential. Do not distribute without consent. 49
  • Adding an Index to a table •  If we want to do a query on a column that is not part of your PK, you can create an index: CREATE INDEX ON <table>(<column>); •  Than you can do a select: •  SELECT * FROM product WHERE type= ’PC'; •  Avoid doing this •  Not great for performance (although improvements are being made) •  Much more efficient to model your data around the query i.e. roll your own indexes!! ©2014 DataStax Confidential. Do not distribute without consent. 50
  • Keyword index example •  Now we can define an index for tagging videos •  Using the previous video example, users want to tag videos. •  Video table defined as: ! CREATE TABLE video_tag_index (! CREATE TABLE videos (! tag varchar,! videoid uuid,! videoid uuid,! videoname varchar,! timestamp timestamp! username varchar,! PRIMARY KEY(tag, videoid)! description varchar,! );! tags varchar,! upload_date timestamp,! PRIMARY KEY(videoid)! );! Fast ©2014 DataStax Confidential. Do not distribute without consent. Efficient 51
  • Partial word index example •  Table: CREATE TABLE email_index (! !domain varchar,! !user varchar,! !username varchar,! !PRIMARY KEY (domain, user)! )! •  User: jmiller, Email: jmiller@datastax.com INSERT INTO email_index (domain, user, username) ! VALUES (‘@datastax.com’, ‘jmiller’, ‘jmiller’)! ©2014 DataStax Confidential. Do not distribute without consent. 52
  • Bitmap index •  Multiple parts to a key •  Create a truth table of the various combinations •  However, inserts == the number of combinations ©2014 DataStax Confidential. Do not distribute without consent. 53
  • Bitmap index example •  Find a car in a car park by variable combinations ©2014 DataStax Confidential. Do not distribute without consent. 54
  • Bitmap index example – Table definition •  Make a table with three different key combinations CREATE TABLE car_location_index (! !make varchar,! !model varchar,! !colour varchar,! !vehicle_id int,! !lot_id int,! !PRIMARY KEY ((make, mode, colour), vehicle_id)! );! ©2014 DataStax Confidential. Do not distribute without consent. 55
  • Bitmap index example – Adding records •  We are pre-optimizing for 7 possible queries of the index on insert. 1.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘Ford’, ‘Mustang’, ‘Blue’, 1234, 8675309);! 2.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘Ford’, ‘Mustang’, ‘’, 1234, 8675309);! 3.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘Ford’, ‘’, ‘Blue’, 1234, 8675309);! 4.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘Ford’, ‘’, ‘’, 1234, 8675309);! 5.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘’, ‘Mustang’, ‘Blue’, 1234, 8675309);! 6.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘’, ‘Mustang’, ‘’, 1234, 8675309);! 7.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)
 VALUES (‘’, ‘’, ‘Blue’, 1234, 8675309);! ©2014 DataStax Confidential. Do not distribute without consent. 56
  • Bitmap - selecting •  Different queries are now possible: ©2014 DataStax Confidential. Do not distribute without consent. 57
  • Time Series/Sensor Data ©2014 DataStax Confidential. Do not distribute without consent. 58
  • What is time series data? •  Sensors •  CPU, Network Card, Electronic Power Meter, Resource Utilization, Weather •  Clickstream data •  Historical trends •  Stock Ticker •  Anything that varies on a temporal basis •  Top Ten Most Popular Videos ©2014 DataStax Confidential. Do not distribute without consent. 59
  • Why Cassandra for time series data? •  Cassandra based on BigTable storage model •  One key row and lots of (variable) columns •  Single layout on disk ©2014 DataStax Confidential. Do not distribute without consent. 60
  • Time Series Example •  Storing weather data •  One weather station •  Temperature measurement every minute ©2014 DataStax Confidential. Do not distribute without consent. 61
  • Times Series Example – query data •  Weather station id = Locality of single node ©2014 DataStax Confidential. Do not distribute without consent. 62
  • Time Series Example - Table •  Data partitioned by weather station ID and time •  Timestamp goes in the clustered column •  Store the measurement as the non-clustered column(s) •  Take advantage of partition clustering CREATE TABLE temperature (! !weatherstation_id text,! !event_time timestamp,! !temperature text! !PRIMARY KEY (weatherstation_id, event_time) ! );! ©2014 DataStax Confidential. Do not distribute without consent. 63
  • Time Series Example •  Simple to insert: INSERT INTO temperature (weatherstation_id, event_time, temperature)! VALUES (‘1234abcd’, ‘2013-12-11 07:01:00’, ‘72F’);! ! •  Simple to query SELECT temperature from temperature WHERE weatherstation_id=‘1234abcd’ AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ ! ! ©2014 DataStax Confidential. Do not distribute without consent. 64
  • Time Series Example – Partitioning •  With the previous table, you can end up with a very large row on 1 partition i.e. PRIMARY KEY (weatherstation_id, event_time) •  This would have to fit on 1 node. •  Cassandra can store 2 billion columns per storage row. •  The solution is to have a composite partition key to split things up: CREATE TABLE temperature (! !weatherstation_id text,! !date text,! !event_time timestamp,! !temperature text! !PRIMARY KEY ((weatherstation_id, date), event_time) ! );! ©2014 DataStax Confidential. Do not distribute without consent. 65
  • Time Series Example – reading and writing •  Simple to insert: INSERT INTO temperature (weatherstation_id, date, event_time, temperature)! VALUES (‘1234abcd’, ‘2013-12-11’, ‘2013-12-11 07:01:00’, ‘72F’);! ! •  Simple to query SELECT temperature from temperature ! WHERE weatherstation_id=‘1234abcd’ ! AND date = ‘2013-12-11’! AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ ! ! ©2014 DataStax Confidential. Do not distribute without consent. 66
  • Time Series Example – reverse ordering •  Common pattern for time series data is rolling storage. •  For example, we only want to show the last 10 temperature readings and older data is no longer needed •  On most DBs you would need some background job to purge the old data. •  With Cassandra you can use TTL’s! CREATE TABLE temperature (! !weatherstation_id text,! !date text,! !event_time timestamp,! !temperature text! !PRIMARY KEY ((weatherstation_id, date), event_time) ! ) WITH CLUSTERING ORDER BY (event_time DESC);! •  As part of the table definition, WITH CLUSTERING ORDER BY (event_time DESC), is used to order the data by the most recent first i.e. the data will be returned in this order.! ©2014 DataStax Confidential. Do not distribute without consent. 67
  • Time Series Example – TTL’ing •  Simple to insert: INSERT INTO temperature (weatherstation_id, date, event_time, temperature)! VALUES (‘1234abcd’, ‘2013-12-11’, ‘2013-12-11 07:01:00’, ‘72F’) USING TTL 20;! •  This data point will automatically be deleted after 20 seconds. •  Eventually you will see all the data disappear. ! •  Simple to query SELECT temperature from temperature ! WHERE weatherstation_id=‘1234abcd’ ! AND date = ‘2013-12-11’! AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ ! ©2014 DataStax Confidential. Do not distribute without consent. 68
  • Time Series Bucket Example – mitigating spikes in data •  In some situations, there might be a risk that you get an unforeseen volume of sensor data for the partition key for your row. •  The risk here is that your row will continue to grow and fill-up the node. •  The workaround here is to attempt to split your data across multiple nodes: CREATE TABLE temperature (! !weatherstation_id text,! !date text,! !bucket_id int,! !event_time timestamp,! !temperature text! !PRIMARY KEY ((weatherstation_id, date, bucket_id), event_time) ! );! ©2014 DataStax Confidential. Do not distribute without consent. 69
  • Time Series Bucket Example – reading and writing •  Not so simple to insert. Client needs to generate a bucket id (often a random number within a certain range): INSERT INTO temperature (weatherstation_id, date, bucket, event_time, temperature)! VALUES (‘1234abcd’, ‘2013-12-11’, 10, ‘2013-12-11 07:01:00’, ‘72F’);! ! •  Much more expensive to read. The client will have to iterate through the range of random numbers, execute a read for each and then merge and order the data in the client SELECT temperature from temperature ! WHERE weatherstation_id=‘1234abcd’ AND date = ‘2013-12-11’! AND bucket = 10, ! AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ ! ! ©2014 DataStax Confidential. Do not distribute without consent. 70
  • Time Series Bucket Example •  Only do this as a last resort. •  Reads become very expensive i.e. n x read(s) where n > range of buckets •  If your dealing with large volumes of data it can be hard work for the client to merge and re-order things. ©2014 DataStax Confidential. Do not distribute without consent. 71
  • DataStax Native Java Driver ©2013 DataStax Confidential. Do not distribute without consent. 72
  • Features •  Provides CQL3 access to Cassandra using Java •  Utilizes Cassandra’s native protocol •  Automatic routing of client requests •  Configurable consistency policy •  Automatic failover •  Tracing support •  Tunable policies •  Load balancing •  Reconnection •  Consistency •  Queries can be executed synchronously or asynchronously •  Supports prepared statements •  Non-blocking I/O ©2014 DataStax Confidential. Do not distribute without consent. 73
  • Cassandra clients - Drivers •  DataStax drivers for Cassandra •  Python •  C++ •  Java •  C# •  And more on the way… •  http://www.datastax.com/download/clientdrivers ©2014 DataStax Confidential. Do not distribute without consent. 74
  • Where to get it? •  The latest release of the driver is available on Maven Central. •  You can install it in your application using the following Maven dependency: •  Documentation: http://www.datastax.com/documentation/developer/java-driver Javadoc: http://www.datastax.com/drivers/java/apidocs/index.html ©2014 DataStax Confidential. Do not distribute without consent. 75
  • Native Protocol •  To use CQL via the client drivers, you must set the property start_native_transport to true in the cassandra.yaml on every node. •  This protocol is an extremely efficient way of integrating with Cassandra. •  Supports synchronous and asynchronous requests •  Use the corresponding native driver in your app. ©2014 DataStax Confidential. Do not distribute without consent. 76
  • CQL to Java Mappings CQL3 Data Type Java Type CQL3 Data Type Java Type ascii java. lang. String int int bigint long list java.util.List<T> blob java.nio.ByteBuffer map java.util.Map<K, V> boolean boolean set java.util.Set<T> counter long text java.lang.String decimal float timeuuid java.util.UUID double double uuid java.util.UUID float float varchar java.lang.String inet java.net.InetAddress varint java.math.BigInteger ©2014 DataStax Confidential. Do not distribute without consent. 77
  • Connecting to a Cluster •  The Cluster class is your client apps entry point for connecting to Cassandra and getting back its metadata. Cluster cluster = Cluster.builder().addContactPoints(”10.158.02.40”,“10.158.02.44”).build(); •  You can pass in one or many node addresses to connect to. •  Make sure to tidy up your cluster after your finished: cluster.shutdown(); ©2014 DataStax Confidential. Do not distribute without consent. 78
  • Connecting to a Keyspace •  After connecting to the cluster, you creation a Session on the keyspace you want to iteract with. Session session = cluster.connect(“akeyspace”); •  Make sure to tidy up after your self: session.shutdown(); ©2014 DataStax Confidential. Do not distribute without consent. 79
  • Inserting Data try { session.execute( “INSERT INTO user (username, password)” + “VALUES(‘user1’, ‘user1password’);”); session.execute( “INSERT INTO user (username, password)” + “VALUES(‘user2’, ‘user2password’);”); } catch (NoHostAvailableException ex) { System.out.println(“No Host available”); } ©2014 DataStax Confidential. Do not distribute without consent. 80
  • Reading Data try { ResultSet result = session.execute ( "SELECT password from user " + "WHERE username = 'user2';"); if (result.isExhausted()) return; Row user = result.one(); System.out.println("Password is: " + user.getString("password")); } catch (NoHostAvailableException ex) { System.out.println("No Host Available"); } catch (QueryValidationException ex) { System.out.println(“Requested consistency” + “level not met”); } ©2014 DataStax Confidential. Do not distribute without consent. 81
  • Prepared Statements PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " + "VALUES (?, ?);"); BoundStatement boundStatement = new BoundStatement(statement); try { session.execute(boundStatement.bind("user4”,"user4password")); } catch (NoHostAvailableException ex) { System.out.println("Host Not Available"); } catch (QueryExecutionException ex) { System.out.println (”Syntax error, runtime, not authorized"); } catch (QueryValidationException ex) { System.out.println ("Requested consistency level not met"); } ©2014 DataStax Confidential. Do not distribute without consent. 82
  • Query Builder Insert insert = QueryBuilder.insertInto("user”) .value("username", ”rcohen”) .value("password", ”mypassword"); session.execute(insert); Query query = QueryBuilder .select() .all() .from(”akeyspace", "user"); ResultSet rs = session.execute(query); for (Row row : rs) { System.out.println(String.format("%-20st%-20s", row.getString("username"), row.getString("password"))); } ©2014 DataStax Confidential. Do not distribute without consent. 83
  • Consistency Level SimpleStatement simpleStatement = new SimpleStatement ( "SELECT * FROM USER WHERE username = 'user2’;”); // This will show the default consistency level of ConsistencyLevel.ONE System.out.println("Consistency Level for this request: ” +simpleStatement.getConsistencyLevel()); //Now change the consistency level simpleStatement.setConsistencyLevel(ConsistencyLevel.ALL); You can also set the consistency level using the QueryBuilder Insert insert = QueryBuilder.insertInto("user”) .value("username", ”johnny”) .value("password", ”mypassword") setConsistencyLevel(ConsistencyLevel.ALL); ©2014 DataStax Confidential. Do not distribute without consent. 84
  • Tracing •  Tracing can help with debugging or analysing how Cassandra is handling your queries. Query insert = QueryBuilder.insertInto("simplex", "songs") .value("id", UUID.randomUUID()) .value("title", "Golden Brown") .value("album", "La Folie") .value("artist", "The Stranglers") .setConsistencyLevel(ConsistencyLevel.ONE).enableTracing(); ©2014 DataStax Confidential. Do not distribute without consent. 85
  • Tracing ResultSet results = getSession().execute(insert); ExecutionInfo executionInfo = results.getExecutionInfo(); •  This ExecutionInfo object contains information on the hosts it attempted to communicate with, the host it used and a QueryTrace object. QueryTrace queryTrace = executionInfo.getQueryTrace(); •  With these two objects you can obtain quite detail on how your query performed ©2014 DataStax Confidential. Do not distribute without consent. 86
  • Tracing Connected to cluster: xerxes
 Simplex keyspace and schema created.
 Host (queried): /127.0.0.1
 Host (tried): /127.0.0.1
 Trace id: 96ac9400-a3a5-11e2-96a9-4db56cdc5fe7! activity | timestamp | source | source_elapsed! ---------------------------------------+--------------+------------+--------------! Parsing statement | 12:17:16.736 | /127.0.0.1 | 28! Peparing statement | 12:17:16.736 | /127.0.0.1 | 199! Determining replicas for mutation | 12:17:16.736 | /127.0.0.1 | 348! Sending message to /127.0.0.3 | 12:17:16.736 | /127.0.0.1 | 788! Sending message to /127.0.0.2 | 12:17:16.736 | /127.0.0.1 | 805! Acquiring switchLock read lock | 12:17:16.736 | /127.0.0.1 | 828! Appending to commitlog | 12:17:16.736 | /127.0.0.1 | 848! Adding to songs memtable | 12:17:16.736 | /127.0.0.1 | 900! Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.2 | 34! Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 25! Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.2 | 672! Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.3 | 525! Appending to commitlog | 12:17:16.737 | /127.0.0.2 | 692! Appending to commitlog | 12:17:16.737 | /127.0.0.3 | 541! Adding to songs memtable | 12:17:16.737 | /127.0.0.2 | 741! Adding to songs memtable | 12:17:16.737 | /127.0.0.3 | 583! ©2014Enqueuing response not distribute without consent. DataStax Confidential. Do to /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 87 751! Enqueuing response to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 950! Message received from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 178! Sending message to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 1189! Message received from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 249! Processing response from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 345! Processing response from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 377!
  • OpsCenter ©2013 DataStax Confidential. Do not distribute without consent. 88
  • DataStax OpsCenter •  DataStax OpsCenter is a browser-based, visual management and monitoring solution for Apache Cassandra and DataStax Enterprise •  Functionality is also exposed via HTTP APIs ©2013 DataStax Confidential. Do not distribute without consent. 89
  • Thank You We power the big data apps that transform business. ©2014 DataStax Confidential. Do not distribute without consent. 90