Toronto jaspersoft meetup

Toronto Jaspersoft User Group

Move. Faster.
Patrick McFadin, Principal Solution Architect
@PatrickMcFadin
©2012 DataStax
1

About Me/Moi?

• Principal Solution Architect at DataStax, THE
Cassandra company

• Cassandra user since .7

• Prior

- Chief Architect at Hobsons

- Started a software services company. Link-11

• Follow me here: @PatrickMcFadin
©2012 DataStax
©2012 DataStax
2 2

Who is

• We employ most of the Cassandra committers
• 24/7 support
• Consulting
• DataStax enterprise

©2012 DataStax
©2012 DataStax
3 3

And beer!

And cupcakes! (??)

©2012 DataStax
4

Our Solution
DataStax Enterprise allows
you to focus on your Big Data
applications instead of battling
your underlying infrastructure:

•Velocity
•Volume
•Variety
•Complexity
•Distribution

©2012 DataStax
5

DATASTAX
Enterprise
also includes…

•Log4j application log integration
•A single graphical management
tool
•World-class support

©2012 DataStax
6

Cassandra as real-
time foundation

•Continuous availability
•Extreme scale
•Multi-datacenter support
•Cloud enablement
•Operational simplicity

©2012 DataStax
7

Hadoop in the
same system:

•Batch analytics
•Reduced data movement,
less ETL operations
•No complex architectures
•Integrated mahout, sqoop,
hive, pig, etc.

©2012 DataStax
8

And we integrate
Solr:

•Enterprise search
•Always indexed data
•Scalable performance
•Mission-critical dependability

©2012 DataStax
9

Can we just talk
about Cassandra

... and aliens.

©2012 DataStax
10

Roots

Dynamo

BigTable

©2012 DataStax
11

Core concepts Shared Nothing

©2012 DataStax
12

Core concepts Replicated

©2012 DataStax
13

Core concepts WAN Replication

©2012 DataStax
14

Core concepts Scaling

• Need more write throughput? - add nodes
• Need more read throughput? - add nodes
• Cassandra scales in a linear fashion
• Massive number of ops/sec

©2012 DataStax
15

Core concepts Scaling

Source: Solving big data challenges for enterprise application performance management
Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735
©2012 DataStax
16

Core concepts CAP Theorem

Partition- onsistency-
C

Nodes can’t see Eventual, but
each other but Cassandra will not
cluster is still up lose your data.

Cassandra lives
Availability- ...and sometimes
Max uptime for
here clients lives here

It’s your choice!

©2012 DataStax
17

Core concepts Availability

Text

Continuous Availability > High Availability

Your infrastructure will fail
...deal with it.

©2012 DataStax
18

Data Model Basics

©2012 DataStax
19

Data Model Basics Cluster

Cluster - Multiple Nodes acting together. Even over WAN.

Keyspace - Logical collection of Column Families. Stores
replication strategy.

Column Family (Table) - Stores rows of data

©2012 DataStax
20

Data Model Basics Rows

• Unique in column family
• Hashed
• Randomly assigned to node*
• Indexed for speed

*You pick the partitioner. Please pick random. Please. Please. Please
©2012 DataStax
21

Data Model Basics Columns

• Assigned to a row
• Column Name: 64k ByteArray
• Column Value: 2G ByteArray (!!)
• Timestamp of when set
• Optional: Expire TTL
• Dynamic

Row Column Name ...
Column Value

Timestamp

TTL

©2012 DataStax
22

Data Model Basics Wide Rows

• How wide? 2 Billion columns!!!
• No schema needed
• Row key, many columns
• Add columns as needed per row

©2012 DataStax
23

Data Model Basics Data Access

Thrift

• Cassandra's client API built entirely on top of Thrift*
• Provides for manipulation of Data Model and Data
• Almost all current clients implement this API

CQL

• Cassandra Query Language
• New binary driver as of 1.2
• Extends functionality beyond Thrift

©2012 DataStax
24


More about CQL

• Rapidly evolving spec
- Version 1 since Cassandra 0.8
- Final cut in 1.2
• Offers more enhanced features than thrift
• DataStax Drivers

©2012 DataStax
25

Data Model Basics Fixed schema

• Similar to a RDBMS table. Fairly fixed columns
• This example: Row key = username and is unique
• Use secondary indexes on firstname and lastname for lookup
• Adding columns with Cassandra is super easy (no downtime)

CREATE TABLE users (
username varchar,
firstname varchar,
lastname varchar,
email varchar,
password varchar,
created_date timestamp,
PRIMARY KEY (username)
);

CREATE INDEX user_firstname ON users (firstname);
CREATE INDEX user_lastname ON users (lastname);

©2012 DataStax
26

Data Model Basics One-to-many

• Videos have many comments
• Comments have many users
• Order is as inserted (Reversable if needed)
• Use getSlice() to pull some or all of the comments

CREATE TABLE comments (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username,comment_ts)
);

©2012 DataStax
27

Data Model Basics One-to-many pt2

• Underlying storage model is still wide rows
• CQL presents as a table
• username and comment_ts are filterable

Wide row
Time ordered

SELECT comment
FROM comments
WHERE username = ‘ctodd’
AND comment_ts > ‘2012-07-12 10:30:00’;

©2012 DataStax
28

Data Model Basics Query Tables

• No joins in Cassandra
• Filtering and scans can be expensive
• Tag is unique regardless of video
• Great for “List videos with X tag”
• Tags have to be updated in Video and Tag at the same time
• Index integrity is maintained in app logic

CREATE TABLE tag_index (
tag varchar, Powerful performance tool!
videoid varchar,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
);

©2012 DataStax
29

Data Model Basics Loading data

> 1 Million rows
• BI Tools - Talend, Pentaho, JasperSoft
• Custom code - My personal favorite
• sstable loader - Only for specific file types

sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles

Requires files to be in sstable format

©2012 DataStax
30

Data Model Basics Loading data

< 1 Million rows
• Everything that worked for 1 Million +
• CQL copy command
• Loads a delimited file into a table

COPY customers(Card_ID, Registration_Date, Gender, Birth_Date)
FROM 'Customers_File.txt'
WITH HEADER=true
AND DELIMITER=’,';

©2012 DataStax
31

Cassandra 1.2 Data Access

•Collections (maps, sets, lists)Support for virtual
nodes (vnodes)Query ProfilerAtomic
batchesEnhanced JBOD supportNative binary
CQL transport (no Thrift)Parallel leveled
compactionsOff-heap bloom filters

©2012 DataStax
32

Collections

•Structure to column values
•Insert and update
• Map
• List cqlsh> CREATE TABLE users (
• Set user_id text PRIMARY KEY,
first_name text,
last_name text,
emails set<text>
);

http://www.datastax.com/dev/blog/cql3_collections
©2012 DataStax
33

Request tracing
•Automatically stored for 24h
•Full path trace cqlsh> tracing on;
Now tracing requests.

•Includes node info cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

activity | timestamp | source | source_elapsed
-------------------------------------+--------------+-----------+----------------
execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0
Parsing statement | 00:02:37,015 | 127.0.0.1 | 81
Preparing statement | 00:02:37,015 | 127.0.0.1 | 273
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779

Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63
Applying mutation | 00:02:37,016 | 127.0.0.2 | 220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277
Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888

Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550
Request complete | 00:02:37,017 | 127.0.0.1 | 2581

http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
©2012 DataStax
34

Virtual Nodes (vnodes)
•Many nodes per JVM
•Tokens are auto-assigned (!!!)
•Faster...
✓repair
✓bootstrap
✓decommission

http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
©2012 DataStax
35

Toronto jaspersoft meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Toronto jaspersoft meetup

Similar to Toronto jaspersoft meetup (20)

More from Patrick McFadin

More from Patrick McFadin (18)

Toronto jaspersoft meetup