SlideShare a Scribd company logo
1 of 36
Toronto Jaspersoft User Group



 Move. Faster.
Patrick McFadin, Principal Solution Architect
@PatrickMcFadin
©2012 DataStax
                                                1
About Me/Moi?



                 •   Principal Solution Architect at DataStax, THE
                     Cassandra company

                 •   Cassandra user since .7

                 •   Prior

                     -   Chief Architect at Hobsons

                     -   Started a software services company. Link-11

                 •   Follow me here: @PatrickMcFadin
©2012 DataStax
©2012 DataStax
                                                                        2   2
Who is



                 • We employ most of the Cassandra committers
                 • 24/7 support
                 • Consulting
                 • DataStax enterprise




©2012 DataStax
©2012 DataStax
                                                                3   3
And beer!




                 And cupcakes! (??)




©2012 DataStax
                                      4
Our Solution
DataStax Enterprise allows
you to focus on your Big Data
applications instead of battling
your underlying infrastructure:

•Velocity
•Volume
•Variety
•Complexity
•Distribution


©2012 DataStax
                                   5
DATASTAX
Enterprise
also includes…

•Log4j application log integration
•A single graphical management
tool
•World-class support




©2012 DataStax
                                     6
Cassandra as real-
time foundation

•Continuous availability
•Extreme scale
•Multi-datacenter support
•Cloud enablement
•Operational simplicity




©2012 DataStax
                            7
Hadoop in the
same system:

•Batch analytics
•Reduced data movement,
less ETL operations
•No complex architectures
•Integrated mahout, sqoop,
hive, pig, etc.


©2012 DataStax
                             8
And we integrate
Solr:

•Enterprise search
•Always indexed data
•Scalable performance
•Mission-critical dependability




©2012 DataStax
                                  9
Can we just talk
                 about Cassandra

                  ... and aliens.




©2012 DataStax
                                    10
Roots

             Dynamo




             BigTable




©2012 DataStax
                         11
Core concepts   Shared Nothing




©2012 DataStax
                                    12
Core concepts   Replicated




©2012 DataStax
                                13
Core concepts   WAN Replication




©2012 DataStax
                                     14
Core concepts                    Scaling


     • Need more write throughput? - add nodes
     • Need more read throughput? - add nodes
     • Cassandra scales in a linear fashion
     • Massive number of ops/sec




©2012 DataStax
                                                 15
Core concepts                                                Scaling




                 Source: Solving big data challenges for enterprise application performance management
                 Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735
©2012 DataStax
                                                                                                         16
Core concepts                               CAP Theorem




                          Partition-               onsistency-
                                                   C

                         Nodes can’t see            Eventual, but
                          each other but          Cassandra will not
                         cluster is still up       lose your data.




       Cassandra lives
                                      Availability-                ...and sometimes
                                        Max uptime for
       here                                clients                 lives here



                                                                  It’s your choice!

©2012 DataStax
                                                                                      17
Core concepts                     Availability




                                      Text


Continuous Availability > High Availability

Your infrastructure will fail
    ...deal with it.



©2012 DataStax
                                                    18
Data Model Basics




©2012 DataStax
                       19
Data Model Basics                         Cluster



           Cluster - Multiple Nodes acting together. Even over WAN.


                   Keyspace - Logical collection of Column Families. Stores
                    replication strategy.

                                 Column Family (Table) - Stores rows of data




©2012 DataStax
                                                                               20
Data Model Basics                          Rows




            • Unique in column family
            • Hashed
            • Randomly assigned to node*
            • Indexed for speed




                            *You pick the partitioner. Please pick random. Please. Please. Please
©2012 DataStax
                                                                                                    21
Data Model Basics                     Columns




            • Assigned to a row
            • Column Name: 64k ByteArray
            • Column Value: 2G ByteArray (!!)
            • Timestamp of when set
            • Optional: Expire TTL
            • Dynamic

                  Row                           Column Name    ...
                                                Column Value

                                                 Timestamp

                                                    TTL


©2012 DataStax
                                                                 22
Data Model Basics                   Wide Rows



            • How wide? 2 Billion columns!!!
            • No schema needed
            • Row key, many columns
            • Add columns as needed per row




©2012 DataStax
                                                   23
Data Model Basics                            Data Access



          Thrift

          • Cassandra's client API built entirely on top of Thrift*
          • Provides for manipulation of Data Model and Data
          • Almost all current clients implement this API


             CQL

             • Cassandra Query Language
             • New binary driver as of 1.2
             • Extends functionality beyond Thrift




©2012 DataStax
                                                                      24
Data Model Basics                         Data Access


                 More about CQL


                   • Rapidly evolving spec
                   - Version 1 since Cassandra 0.8
                   - Version 2 since Cassandra 1.0
                   - Version 3 since Cassandra 1.1
                   - Final cut in 1.2
                   • Offers more enhanced features than thrift
                   • DataStax Drivers




©2012 DataStax
                                                                 25
Data Model Basics                     Fixed schema



  • Similar to a RDBMS table. Fairly fixed columns
  • This example: Row key = username and is unique
  • Use secondary indexes on firstname and lastname for lookup
  • Adding columns with Cassandra is super easy (no downtime)




                 CREATE TABLE users (
                   username varchar,
                   firstname varchar,
                   lastname varchar,
                   email varchar,
                   password varchar,
                   created_date timestamp,
                   PRIMARY KEY (username)
                 );

                 CREATE INDEX user_firstname ON users (firstname);
                 CREATE INDEX user_lastname ON users (lastname);


©2012 DataStax
                                                                     26
Data Model Basics                         One-to-many


      • Videos have many comments
      • Comments have many users
      • Order is as inserted (Reversable if needed)
      • Use getSlice() to pull some or all of the comments




                         CREATE TABLE comments (
                            videoid uuid,
                            username varchar,
                            comment_ts timestamp,
                            comment varchar,
                            PRIMARY KEY (videoid,username,comment_ts)
                         );




©2012 DataStax
                                                                        27
Data Model Basics                         One-to-many pt2



        • Underlying storage model is still wide rows
        • CQL presents as a table
        • username and comment_ts are filterable




                                                           Wide row
                                                         Time ordered


                     SELECT comment
                     FROM comments
                     WHERE username = ‘ctodd’
                     AND comment_ts > ‘2012-07-12 10:30:00’;




©2012 DataStax
                                                                        28
Data Model Basics                        Query Tables

          • No joins in Cassandra
          • Filtering and scans can be expensive
          • Tag is unique regardless of video
          • Great for “List videos with X tag”
          • Tags have to be updated in Video and Tag at the same time
          • Index integrity is maintained in app logic




                     CREATE TABLE tag_index (
                       tag varchar,                    Powerful performance tool!
                       videoid varchar,
                       timestamp timestamp,
                       PRIMARY KEY (tag, videoid)
                     );



©2012 DataStax
                                                                                    29
Data Model Basics                             Loading data




                 > 1 Million rows
                 • BI Tools - Talend, Pentaho, JasperSoft
                 • Custom code - My personal favorite
                 • sstable loader - Only for specific file types



                          sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles




                                                  Requires files to be in sstable format



©2012 DataStax
                                                                                           30
Data Model Basics                          Loading data




                 < 1 Million rows
                 • Everything that worked for 1 Million +
                 • CQL copy command
                 • Loads a delimited file into a table


                  COPY customers(Card_ID, Registration_Date, Gender, Birth_Date)
                  FROM 'Customers_File.txt'
                  WITH HEADER=true
                  AND DELIMITER=’,';




©2012 DataStax
                                                                                   31
Cassandra 1.2                  Data Access




        •Collections (maps, sets, lists)Support for virtual
        nodes (vnodes)Query ProfilerAtomic
        batchesEnhanced JBOD supportNative binary
        CQL transport (no Thrift)Parallel leveled
        compactionsOff-heap bloom filters




©2012 DataStax
                                                              32
Collections

          •Structure to column values
          •Insert and update
                 • Map
                 • List            cqlsh> CREATE TABLE users (
                 • Set                        user_id text PRIMARY KEY,
                                              first_name text,
                                              last_name text,
                                              emails set<text>
                                          );




                    http://www.datastax.com/dev/blog/cql3_collections
©2012 DataStax
                                                                          33
Request tracing
•Automatically stored for 24h
•Full path trace                  cqlsh> tracing on;
                                  Now tracing requests.


•Includes node info               cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');
                                  Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

                                   activity                            | timestamp    | source    | source_elapsed
                                  -------------------------------------+--------------+-----------+----------------
                                                    execute_cql3_query | 00:02:37,015 | 127.0.0.1 |              0
                                                     Parsing statement | 00:02:37,015 | 127.0.0.1 |             81
                                                   Preparing statement | 00:02:37,015 | 127.0.0.1 |            273
                                     Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 |            540
                                         Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 |            779

                                     Messsage received from /127.0.0.1   |   00:02:37,016   |   127.0.0.2   |     63
                                                     Applying mutation   |   00:02:37,016   |   127.0.0.2   |    220
                                                  Acquiring switchLock   |   00:02:37,016   |   127.0.0.2   |    250
                                                Appending to commitlog   |   00:02:37,016   |   127.0.0.2   |    277
                                                    Adding to memtable   |   00:02:37,016   |   127.0.0.2   |    378
                                      Enqueuing response to /127.0.0.1   |   00:02:37,016   |   127.0.0.2   |    710
                                         Sending message to /127.0.0.1   |   00:02:37,016   |   127.0.0.2   |    888

                                     Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |             2334
                                   Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |             2550
                                                      Request complete | 00:02:37,017 | 127.0.0.1 |             2581




                 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
©2012 DataStax
                                                                                                                   34
Virtual Nodes (vnodes)
•Many nodes per JVM
•Tokens are auto-assigned (!!!)
•Faster...
       ✓repair
       ✓bootstrap
       ✓decommission



                 http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
©2012 DataStax
                                                                                   35
Data Model Basics    Data Access




                       DEMO




©2012 DataStax
                                      36

More Related Content

What's hot

Oracle Exadata Version 2
Oracle Exadata Version 2Oracle Exadata Version 2
Oracle Exadata Version 2Jarod Wang
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documentsDr. Awase Khirni Syed
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniDr. Awase Khirni Syed
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systemsramazan fırın
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDBSandun Perera
 
Enterprise Virtualization with Xen
Enterprise Virtualization with XenEnterprise Virtualization with Xen
Enterprise Virtualization with XenFrank Martin
 
MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)Frazer Clement
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalramazan fırın
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndicThreads
 
Modernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabricModernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabricSoftware Guru
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreFilipe Silva
 
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)Chris Richardson
 

What's hot (20)

Oracle Exadata Version 2
Oracle Exadata Version 2Oracle Exadata Version 2
Oracle Exadata Version 2
 
Sql no sql
Sql no sqlSql no sql
Sql no sql
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systems
 
Introduction to NuoDB
Introduction to NuoDBIntroduction to NuoDB
Introduction to NuoDB
 
NoSQL
NoSQLNoSQL
NoSQL
 
Enterprise Virtualization with Xen
Enterprise Virtualization with XenEnterprise Virtualization with Xen
Enterprise Virtualization with Xen
 
MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)MySQL Cluster Schema management (2014)
MySQL Cluster Schema management (2014)
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Modernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabricModernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabric
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
 
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications (SpringOne China 2012)
 

Viewers also liked

Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, strongerPatrick McFadin
 
Cassandra data modeling talk
Cassandra data modeling talkCassandra data modeling talk
Cassandra data modeling talkPatrick McFadin
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 

Viewers also liked (12)

Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
Cassandra data modeling talk
Cassandra data modeling talkCassandra data modeling talk
Cassandra data modeling talk
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 

Similar to Toronto jaspersoft meetup

State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionDATAVERSITY
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...DataStax Academy
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Futurepcmanus
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversMichaël Figuière
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big DataDataStax
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Architecture et modèle de données Cassandra
Architecture et modèle de données CassandraArchitecture et modèle de données Cassandra
Architecture et modèle de données CassandraClaude-Alain Glauser
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEUlises Fasoli
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 

Similar to Toronto jaspersoft meetup (20)

State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Future
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra Drivers
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Architecture et modèle de données Cassandra
Architecture et modèle de données CassandraArchitecture et modèle de données Cassandra
Architecture et modèle de données Cassandra
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSE
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Sql vs nosql
Sql vs nosqlSql vs nosql
Sql vs nosql
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 

More from Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 

More from Patrick McFadin (18)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 

Toronto jaspersoft meetup

  • 1. Toronto Jaspersoft User Group Move. Faster. Patrick McFadin, Principal Solution Architect @PatrickMcFadin ©2012 DataStax 1
  • 2. About Me/Moi? • Principal Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Prior - Chief Architect at Hobsons - Started a software services company. Link-11 • Follow me here: @PatrickMcFadin ©2012 DataStax ©2012 DataStax 2 2
  • 3. Who is • We employ most of the Cassandra committers • 24/7 support • Consulting • DataStax enterprise ©2012 DataStax ©2012 DataStax 3 3
  • 4. And beer! And cupcakes! (??) ©2012 DataStax 4
  • 5. Our Solution DataStax Enterprise allows you to focus on your Big Data applications instead of battling your underlying infrastructure: •Velocity •Volume •Variety •Complexity •Distribution ©2012 DataStax 5
  • 6. DATASTAX Enterprise also includes… •Log4j application log integration •A single graphical management tool •World-class support ©2012 DataStax 6
  • 7. Cassandra as real- time foundation •Continuous availability •Extreme scale •Multi-datacenter support •Cloud enablement •Operational simplicity ©2012 DataStax 7
  • 8. Hadoop in the same system: •Batch analytics •Reduced data movement, less ETL operations •No complex architectures •Integrated mahout, sqoop, hive, pig, etc. ©2012 DataStax 8
  • 9. And we integrate Solr: •Enterprise search •Always indexed data •Scalable performance •Mission-critical dependability ©2012 DataStax 9
  • 10. Can we just talk about Cassandra ... and aliens. ©2012 DataStax 10
  • 11. Roots Dynamo BigTable ©2012 DataStax 11
  • 12. Core concepts Shared Nothing ©2012 DataStax 12
  • 13. Core concepts Replicated ©2012 DataStax 13
  • 14. Core concepts WAN Replication ©2012 DataStax 14
  • 15. Core concepts Scaling • Need more write throughput? - add nodes • Need more read throughput? - add nodes • Cassandra scales in a linear fashion • Massive number of ops/sec ©2012 DataStax 15
  • 16. Core concepts Scaling Source: Solving big data challenges for enterprise application performance management Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, Pages 1724-1735 ©2012 DataStax 16
  • 17. Core concepts CAP Theorem Partition- onsistency- C Nodes can’t see Eventual, but each other but Cassandra will not cluster is still up lose your data. Cassandra lives Availability- ...and sometimes Max uptime for here clients lives here It’s your choice! ©2012 DataStax 17
  • 18. Core concepts Availability Text Continuous Availability > High Availability Your infrastructure will fail ...deal with it. ©2012 DataStax 18
  • 20. Data Model Basics Cluster Cluster - Multiple Nodes acting together. Even over WAN. Keyspace - Logical collection of Column Families. Stores replication strategy. Column Family (Table) - Stores rows of data ©2012 DataStax 20
  • 21. Data Model Basics Rows • Unique in column family • Hashed • Randomly assigned to node* • Indexed for speed *You pick the partitioner. Please pick random. Please. Please. Please ©2012 DataStax 21
  • 22. Data Model Basics Columns • Assigned to a row • Column Name: 64k ByteArray • Column Value: 2G ByteArray (!!) • Timestamp of when set • Optional: Expire TTL • Dynamic Row Column Name ... Column Value Timestamp TTL ©2012 DataStax 22
  • 23. Data Model Basics Wide Rows • How wide? 2 Billion columns!!! • No schema needed • Row key, many columns • Add columns as needed per row ©2012 DataStax 23
  • 24. Data Model Basics Data Access Thrift • Cassandra's client API built entirely on top of Thrift* • Provides for manipulation of Data Model and Data • Almost all current clients implement this API CQL • Cassandra Query Language • New binary driver as of 1.2 • Extends functionality beyond Thrift ©2012 DataStax 24
  • 25. Data Model Basics Data Access More about CQL • Rapidly evolving spec - Version 1 since Cassandra 0.8 - Version 2 since Cassandra 1.0 - Version 3 since Cassandra 1.1 - Final cut in 1.2 • Offers more enhanced features than thrift • DataStax Drivers ©2012 DataStax 25
  • 26. Data Model Basics Fixed schema • Similar to a RDBMS table. Fairly fixed columns • This example: Row key = username and is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy (no downtime) CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); CREATE INDEX user_firstname ON users (firstname); CREATE INDEX user_lastname ON users (lastname); ©2012 DataStax 26
  • 27. Data Model Basics One-to-many • Videos have many comments • Comments have many users • Order is as inserted (Reversable if needed) • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) ); ©2012 DataStax 27
  • 28. Data Model Basics One-to-many pt2 • Underlying storage model is still wide rows • CQL presents as a table • username and comment_ts are filterable Wide row Time ordered SELECT comment FROM comments WHERE username = ‘ctodd’ AND comment_ts > ‘2012-07-12 10:30:00’; ©2012 DataStax 28
  • 29. Data Model Basics Query Tables • No joins in Cassandra • Filtering and scans can be expensive • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, Powerful performance tool! videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) ); ©2012 DataStax 29
  • 30. Data Model Basics Loading data > 1 Million rows • BI Tools - Talend, Pentaho, JasperSoft • Custom code - My personal favorite • sstable loader - Only for specific file types sstableloader -d 10.0.0.100 /home/pmcfadin/dbfiles Requires files to be in sstable format ©2012 DataStax 30
  • 31. Data Model Basics Loading data < 1 Million rows • Everything that worked for 1 Million + • CQL copy command • Loads a delimited file into a table COPY customers(Card_ID, Registration_Date, Gender, Birth_Date) FROM 'Customers_File.txt' WITH HEADER=true AND DELIMITER=’,'; ©2012 DataStax 31
  • 32. Cassandra 1.2 Data Access •Collections (maps, sets, lists)Support for virtual nodes (vnodes)Query ProfilerAtomic batchesEnhanced JBOD supportNative binary CQL transport (no Thrift)Parallel leveled compactionsOff-heap bloom filters ©2012 DataStax 32
  • 33. Collections •Structure to column values •Insert and update • Map • List cqlsh> CREATE TABLE users ( • Set user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> ); http://www.datastax.com/dev/blog/cql3_collections ©2012 DataStax 33
  • 34. Request tracing •Automatically stored for 24h •Full path trace cqlsh> tracing on; Now tracing requests. •Includes node info cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Request complete | 00:02:37,017 | 127.0.0.1 | 2581 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 ©2012 DataStax 34
  • 35. Virtual Nodes (vnodes) •Many nodes per JVM •Tokens are auto-assigned (!!!) •Faster... ✓repair ✓bootstrap ✓decommission http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 ©2012 DataStax 35
  • 36. Data Model Basics Data Access DEMO ©2012 DataStax 36