Toronto Jaspersoft User Group Move. Faster.Patrick McFadin, Principal Solution Architect@PatrickMcFadin©2012 DataStax     ...
About Me/Moi?                 •   Principal Solution Architect at DataStax, THE                     Cassandra company     ...
Who is                 • We employ most of the Cassandra committers                 • 24/7 support                 • Consu...
And beer!                 And cupcakes! (??)©2012 DataStax                                      4
Our SolutionDataStax Enterprise allowsyou to focus on your Big Dataapplications instead of battlingyour underlying infrast...
DATASTAXEnterprisealso includes…•Log4j application log integration•A single graphical managementtool•World-class support©2...
Cassandra as real-time foundation•Continuous availability•Extreme scale•Multi-datacenter support•Cloud enablement•Operatio...
Hadoop in thesame system:•Batch analytics•Reduced data movement,less ETL operations•No complex architectures•Integrated ma...
And we integrateSolr:•Enterprise search•Always indexed data•Scalable performance•Mission-critical dependability©2012 DataS...
Can we just talk                 about Cassandra                  ... and aliens.©2012 DataStax                           ...
Roots             Dynamo             BigTable©2012 DataStax                         11
Core concepts   Shared Nothing©2012 DataStax                                    12
Core concepts   Replicated©2012 DataStax                                13
Core concepts   WAN Replication©2012 DataStax                                     14
Core concepts                    Scaling     • Need more write throughput? - add nodes     • Need more read throughput? - ...
Core concepts                                                Scaling                 Source: Solving big data challenges f...
Core concepts                               CAP Theorem                          Partition-               onsistency-     ...
Core concepts                     Availability                                      TextContinuous Availability > High Ava...
Data Model Basics©2012 DataStax                       19
Data Model Basics                         Cluster           Cluster - Multiple Nodes acting together. Even over WAN.      ...
Data Model Basics                          Rows            • Unique in column family            • Hashed            • Rand...
Data Model Basics                     Columns            • Assigned to a row            • Column Name: 64k ByteArray      ...
Data Model Basics                   Wide Rows            • How wide? 2 Billion columns!!!            • No schema needed   ...
Data Model Basics                            Data Access          Thrift          • Cassandras client API built entirely o...
Data Model Basics                         Data Access                 More about CQL                   • Rapidly evolving ...
Data Model Basics                     Fixed schema  • Similar to a RDBMS table. Fairly fixed columns  • This example: Row ...
Data Model Basics                         One-to-many      • Videos have many comments      • Comments have many users    ...
Data Model Basics                         One-to-many pt2        • Underlying storage model is still wide rows        • CQ...
Data Model Basics                        Query Tables          • No joins in Cassandra          • Filtering and scans can ...
Data Model Basics                             Loading data                 > 1 Million rows                 • BI Tools - T...
Data Model Basics                          Loading data                 < 1 Million rows                 • Everything that...
Cassandra 1.2                  Data Access        •Collections (maps, sets, lists)Support for virtual        nodes (vnodes...
Collections          •Structure to column values          •Insert and update                 • Map                 • List ...
Request tracing•Automatically stored for 24h•Full path trace                  cqlsh> tracing on;                          ...
Virtual Nodes (vnodes)•Many nodes per JVM•Tokens are auto-assigned (!!!)•Faster...       ✓repair       ✓bootstrap       ✓d...
Data Model Basics    Data Access                       DEMO©2012 DataStax                                      36
Upcoming SlideShare
Loading in …5
×

Toronto jaspersoft meetup

1,522 views
1,360 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,522
On SlideShare
0
From Embeds
0
Number of Embeds
91
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Toronto jaspersoft meetup

  1. 1. Toronto Jaspersoft User Group Move. Faster.Patrick McFadin, Principal Solution Architect@PatrickMcFadin©2012 DataStax 1
  2. 2. About Me/Moi? • Principal Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Prior - Chief Architect at Hobsons - Started a software services company. Link-11 • Follow me here: @PatrickMcFadin©2012 DataStax©2012 DataStax 2 2
  3. 3. Who is • We employ most of the Cassandra committers • 24/7 support • Consulting • DataStax enterprise©2012 DataStax©2012 DataStax 3 3
  4. 4. And beer! And cupcakes! (??)©2012 DataStax 4
  5. 5. Our SolutionDataStax Enterprise allowsyou to focus on your Big Dataapplications instead of battlingyour underlying infrastructure:•Velocity•Volume•Variety•Complexity•Distribution©2012 DataStax 5
  6. 6. DATASTAXEnterprisealso includes…•Log4j application log integration•A single graphical managementtool•World-class support©2012 DataStax 6
  7. 7. Cassandra as real-time foundation•Continuous availability•Extreme scale•Multi-datacenter support•Cloud enablement•Operational simplicity©2012 DataStax 7
  8. 8. Hadoop in thesame system:•Batch analytics•Reduced data movement,less ETL operations•No complex architectures•Integrated mahout, sqoop,hive, pig, etc.©2012 DataStax 8
  9. 9. And we integrateSolr:•Enterprise search•Always indexed data•Scalable performance•Mission-critical dependability©2012 DataStax 9
  10. 10. Can we just talk about Cassandra ... and aliens.©2012 DataStax 10
  11. 11. Roots Dynamo BigTable©2012 DataStax 11
  12. 12. Core concepts Shared Nothing©2012 DataStax 12
  13. 13. Core concepts Replicated©2012 DataStax 13
  14. 14. Core concepts WAN Replication©2012 DataStax 14
  15. 15. Core concepts Scaling • Need more write throughput? - add nodes • Need more read throughput? - add nodes • Cassandra scales in a linear fashion • Massive number of ops/sec©2012 DataStax 15
  16. 16. Core concepts Scaling Source: Solving big data challenges for enterprise application performance management Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735©2012 DataStax 16
  17. 17. Core concepts CAP Theorem Partition- onsistency- C Nodes can’t see Eventual, but each other but Cassandra will not cluster is still up lose your data. Cassandra lives Availability- ...and sometimes Max uptime for here clients lives here It’s your choice!©2012 DataStax 17
  18. 18. Core concepts Availability TextContinuous Availability > High AvailabilityYour infrastructure will fail ...deal with it.©2012 DataStax 18
  19. 19. Data Model Basics©2012 DataStax 19
  20. 20. Data Model Basics Cluster Cluster - Multiple Nodes acting together. Even over WAN. Keyspace - Logical collection of Column Families. Stores replication strategy. Column Family (Table) - Stores rows of data©2012 DataStax 20
  21. 21. Data Model Basics Rows • Unique in column family • Hashed • Randomly assigned to node* • Indexed for speed *You pick the partitioner. Please pick random. Please. Please. Please©2012 DataStax 21
  22. 22. Data Model Basics Columns • Assigned to a row • Column Name: 64k ByteArray • Column Value: 2G ByteArray (!!) • Timestamp of when set • Optional: Expire TTL • Dynamic Row Column Name ... Column Value Timestamp TTL©2012 DataStax 22
  23. 23. Data Model Basics Wide Rows • How wide? 2 Billion columns!!! • No schema needed • Row key, many columns • Add columns as needed per row©2012 DataStax 23
  24. 24. Data Model Basics Data Access Thrift • Cassandras client API built entirely on top of Thrift* • Provides for manipulation of Data Model and Data • Almost all current clients implement this API CQL • Cassandra Query Language • New binary driver as of 1.2 • Extends functionality beyond Thrift©2012 DataStax 24
  25. 25. Data Model Basics Data Access More about CQL • Rapidly evolving spec - Version 1 since Cassandra 0.8 - Version 2 since Cassandra 1.0 - Version 3 since Cassandra 1.1 - Final cut in 1.2 • Offers more enhanced features than thrift • DataStax Drivers©2012 DataStax 25
  26. 26. Data Model Basics Fixed schema • Similar to a RDBMS table. Fairly fixed columns • This example: Row key = username and is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy (no downtime) CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); CREATE INDEX user_firstname ON users (firstname); CREATE INDEX user_lastname ON users (lastname);©2012 DataStax 26
  27. 27. Data Model Basics One-to-many • Videos have many comments • Comments have many users • Order is as inserted (Reversable if needed) • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) );©2012 DataStax 27
  28. 28. Data Model Basics One-to-many pt2 • Underlying storage model is still wide rows • CQL presents as a table • username and comment_ts are filterable Wide row Time ordered SELECT comment FROM comments WHERE username = ‘ctodd’ AND comment_ts > ‘2012-07-12 10:30:00’;©2012 DataStax 28
  29. 29. Data Model Basics Query Tables • No joins in Cassandra • Filtering and scans can be expensive • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, Powerful performance tool! videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) );©2012 DataStax 29
  30. 30. Data Model Basics Loading data > 1 Million rows • BI Tools - Talend, Pentaho, JasperSoft • Custom code - My personal favorite • sstable loader - Only for specific file types sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles Requires files to be in sstable format©2012 DataStax 30
  31. 31. Data Model Basics Loading data < 1 Million rows • Everything that worked for 1 Million + • CQL copy command • Loads a delimited file into a table COPY customers(Card_ID, Registration_Date, Gender, Birth_Date) FROM Customers_File.txt WITH HEADER=true AND DELIMITER=’,;©2012 DataStax 31
  32. 32. Cassandra 1.2 Data Access •Collections (maps, sets, lists)Support for virtual nodes (vnodes)Query ProfilerAtomic batchesEnhanced JBOD supportNative binary CQL transport (no Thrift)Parallel leveled compactionsOff-heap bloom filters©2012 DataStax 32
  33. 33. Collections •Structure to column values •Insert and update • Map • List cqlsh> CREATE TABLE users ( • Set user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> ); http://www.datastax.com/dev/blog/cql3_collections©2012 DataStax 33
  34. 34. Request tracing•Automatically stored for 24h•Full path trace cqlsh> tracing on; Now tracing requests.•Includes node info cqlsh:foo> INSERT INTO test (a, b) VALUES (1, example); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2©2012 DataStax 34
  35. 35. Virtual Nodes (vnodes)•Many nodes per JVM•Tokens are auto-assigned (!!!)•Faster... ✓repair ✓bootstrap ✓decommission http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2©2012 DataStax 35
  36. 36. Data Model Basics Data Access DEMO©2012 DataStax 36

×