Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Toronto jaspersoft meetup

1,688 views

Published on

  • Be the first to comment

Toronto jaspersoft meetup

  1. 1. Toronto Jaspersoft User Group Move. Faster.Patrick McFadin, Principal Solution Architect@PatrickMcFadin©2012 DataStax 1
  2. 2. About Me/Moi? • Principal Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Prior - Chief Architect at Hobsons - Started a software services company. Link-11 • Follow me here: @PatrickMcFadin©2012 DataStax©2012 DataStax 2 2
  3. 3. Who is • We employ most of the Cassandra committers • 24/7 support • Consulting • DataStax enterprise©2012 DataStax©2012 DataStax 3 3
  4. 4. And beer! And cupcakes! (??)©2012 DataStax 4
  5. 5. Our SolutionDataStax Enterprise allowsyou to focus on your Big Dataapplications instead of battlingyour underlying infrastructure:•Velocity•Volume•Variety•Complexity•Distribution©2012 DataStax 5
  6. 6. DATASTAXEnterprisealso includes…•Log4j application log integration•A single graphical managementtool•World-class support©2012 DataStax 6
  7. 7. Cassandra as real-time foundation•Continuous availability•Extreme scale•Multi-datacenter support•Cloud enablement•Operational simplicity©2012 DataStax 7
  8. 8. Hadoop in thesame system:•Batch analytics•Reduced data movement,less ETL operations•No complex architectures•Integrated mahout, sqoop,hive, pig, etc.©2012 DataStax 8
  9. 9. And we integrateSolr:•Enterprise search•Always indexed data•Scalable performance•Mission-critical dependability©2012 DataStax 9
  10. 10. Can we just talk about Cassandra ... and aliens.©2012 DataStax 10
  11. 11. Roots Dynamo BigTable©2012 DataStax 11
  12. 12. Core concepts Shared Nothing©2012 DataStax 12
  13. 13. Core concepts Replicated©2012 DataStax 13
  14. 14. Core concepts WAN Replication©2012 DataStax 14
  15. 15. Core concepts Scaling • Need more write throughput? - add nodes • Need more read throughput? - add nodes • Cassandra scales in a linear fashion • Massive number of ops/sec©2012 DataStax 15
  16. 16. Core concepts Scaling Source: Solving big data challenges for enterprise application performance management Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735©2012 DataStax 16
  17. 17. Core concepts CAP Theorem Partition- onsistency- C Nodes can’t see Eventual, but each other but Cassandra will not cluster is still up lose your data. Cassandra lives Availability- ...and sometimes Max uptime for here clients lives here It’s your choice!©2012 DataStax 17
  18. 18. Core concepts Availability TextContinuous Availability > High AvailabilityYour infrastructure will fail ...deal with it.©2012 DataStax 18
  19. 19. Data Model Basics©2012 DataStax 19
  20. 20. Data Model Basics Cluster Cluster - Multiple Nodes acting together. Even over WAN. Keyspace - Logical collection of Column Families. Stores replication strategy. Column Family (Table) - Stores rows of data©2012 DataStax 20
  21. 21. Data Model Basics Rows • Unique in column family • Hashed • Randomly assigned to node* • Indexed for speed *You pick the partitioner. Please pick random. Please. Please. Please©2012 DataStax 21
  22. 22. Data Model Basics Columns • Assigned to a row • Column Name: 64k ByteArray • Column Value: 2G ByteArray (!!) • Timestamp of when set • Optional: Expire TTL • Dynamic Row Column Name ... Column Value Timestamp TTL©2012 DataStax 22
  23. 23. Data Model Basics Wide Rows • How wide? 2 Billion columns!!! • No schema needed • Row key, many columns • Add columns as needed per row©2012 DataStax 23
  24. 24. Data Model Basics Data Access Thrift • Cassandras client API built entirely on top of Thrift* • Provides for manipulation of Data Model and Data • Almost all current clients implement this API CQL • Cassandra Query Language • New binary driver as of 1.2 • Extends functionality beyond Thrift©2012 DataStax 24
  25. 25. Data Model Basics Data Access More about CQL • Rapidly evolving spec - Version 1 since Cassandra 0.8 - Version 2 since Cassandra 1.0 - Version 3 since Cassandra 1.1 - Final cut in 1.2 • Offers more enhanced features than thrift • DataStax Drivers©2012 DataStax 25
  26. 26. Data Model Basics Fixed schema • Similar to a RDBMS table. Fairly fixed columns • This example: Row key = username and is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy (no downtime) CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); CREATE INDEX user_firstname ON users (firstname); CREATE INDEX user_lastname ON users (lastname);©2012 DataStax 26
  27. 27. Data Model Basics One-to-many • Videos have many comments • Comments have many users • Order is as inserted (Reversable if needed) • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) );©2012 DataStax 27
  28. 28. Data Model Basics One-to-many pt2 • Underlying storage model is still wide rows • CQL presents as a table • username and comment_ts are filterable Wide row Time ordered SELECT comment FROM comments WHERE username = ‘ctodd’ AND comment_ts > ‘2012-07-12 10:30:00’;©2012 DataStax 28
  29. 29. Data Model Basics Query Tables • No joins in Cassandra • Filtering and scans can be expensive • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, Powerful performance tool! videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) );©2012 DataStax 29
  30. 30. Data Model Basics Loading data > 1 Million rows • BI Tools - Talend, Pentaho, JasperSoft • Custom code - My personal favorite • sstable loader - Only for specific file types sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles Requires files to be in sstable format©2012 DataStax 30
  31. 31. Data Model Basics Loading data < 1 Million rows • Everything that worked for 1 Million + • CQL copy command • Loads a delimited file into a table COPY customers(Card_ID, Registration_Date, Gender, Birth_Date) FROM Customers_File.txt WITH HEADER=true AND DELIMITER=’,;©2012 DataStax 31
  32. 32. Cassandra 1.2 Data Access •Collections (maps, sets, lists)Support for virtual nodes (vnodes)Query ProfilerAtomic batchesEnhanced JBOD supportNative binary CQL transport (no Thrift)Parallel leveled compactionsOff-heap bloom filters©2012 DataStax 32
  33. 33. Collections •Structure to column values •Insert and update • Map • List cqlsh> CREATE TABLE users ( • Set user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> ); http://www.datastax.com/dev/blog/cql3_collections©2012 DataStax 33
  34. 34. Request tracing•Automatically stored for 24h•Full path trace cqlsh> tracing on; Now tracing requests.•Includes node info cqlsh:foo> INSERT INTO test (a, b) VALUES (1, example); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2©2012 DataStax 34
  35. 35. Virtual Nodes (vnodes)•Many nodes per JVM•Tokens are auto-assigned (!!!)•Faster... ✓repair ✓bootstrap ✓decommission http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2©2012 DataStax 35
  36. 36. Data Model Basics Data Access DEMO©2012 DataStax 36

×