Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1
DILEEP KALIDINDI
23rd February 2015
Explore, Build & Operate
NoSQL
with
Apache Cassandra
Who am I ?
 Dileep Varma Kalidindi
 Current: Senior Engineer @Responsys (since Apr’14), Circles Team.
 Fascination: Problem Solving , Distributed & BigData churning systems.
 Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.
 Hobbies: Adventure sports.
Are we good ?
4/5/2016
3
Data
Data
 Data has never been in same structure, so as their modelling techniques.
 Applications evolved from OLAP, OLTP to Web, Mobile & Social.
 Big Data comes with different characteristics – Volume, Velocity, Variety,
Veracity & Value.
 Responsys Data:
 Need for better suitable Data models and Storage models
- but why ?
Impending Mismatch –Data model & Storage model
 SQL relational model is User oriented
 in store concurrency, integrity, consistency, or data type validity
 Transactional guarantees, schemas and referential integrity
 Purpose applications tend to control integrity and validity (not aggregation fancy)
 Difference between the persistent data model and the in-memory data structures.
 Data duplication and denormalization are now First class citizens !!
 Scale–up to Scale– wide – NoSQL Multinode vs RDBMS clustering.
Conceptual – ACID, BASE & CAP
Transactions, consistency and availability – could we prioritize ?
CAP theorem - consequences
Agenda
 NoSQL
 NoSQL Implementations – for various purposes
 Architecture fit – Polyglot persistence
 Data modelling – concepts in view of NoSQL .
 Cassandra – Architecture
 Database Internals
 CQL & DEMO
 Installation, Configuration & tools
 Oracle NoSQL – pitch by Sheetal
# NoSQL
5 April 2016 11
NoSQL
 Non-relational, distributed, open-source & horizontally scalable #nxtGen
NoSQL is an accidental neologism.
Schema less storage systems built for 5 v’s of Bigdata.
Decentralized – Every node in cluster is identical
High Availability - No SPoF – No Network failures
Open source and No cost models (Except for enterprise support)
NoSQL – Architecture fit-in
Polyglot persistence thinking fits in right data store for appropriate data
sets.
Service usage over Direct data usage.
Concerns
 Operational concerns like licensing, support, tools, upgrade, auditing.
 Security of Datastore, Context’s, Authorization etc ..
 Integration with ETL and Data transfer utilities.
 Deployment complexity
Data models – in view of NoSQL
 NoSQL models are application specific “What questions do I have?”
 Relational models are driven by structure of data “What answers do I have?”
Modelling techniques
 Conceptual: Denormalization, Aggregates & Application side joins
 General: Atomic aggregates, Enumerable Keys, Dimensionality
reduction, Index table & Composite key index.
 Hierarchical: Tree aggregation, Materialized paths, Nested sets &
batch graph processing.
Data models – deep view
 Conceptual: DeNormalization
 Query data volume or IO per query VS total data volume
 Processing complexity VS total data volume
 Aggregates:
 Simple
 Atomic
Tree aggregation:
NoSQL - implementations
If one implementation fits all then why not RDBMS ?
Classification is driven in application point of view !
Key-Value
 Strong aggregation which is opaque to the database
 Oracle NoSQL, Windows Azure & Redis
Document database
 Structure in the aggregate
 MongoDb, CouchDb & Raven DB
NoSQL - implementations
Column family structures
 Two level aggregate structure
 Key & a row aggregate, Row aggregate is a group of columns.
 Big table, Hbase & Cassandra
Graphs database
 Neo 4j
NoSQL – implementations – CAP fit
5 April 2016 21
Apache Cassandra - Continuous availability, linear scalability & operational simplicity
 About
 Column store NoSQL Database.
 Originally developed by Facebook (2007) and now an Apache project
 Master less architecture with all nodes in Ring topology
 Commercial add-ons & support (“enterprise edition”) by Datastax
 Data center replication, Scalability (wide), Fault-tolerance & Tunable
consistency.
 Online load balancing, flexible schema, key-oriented queries & CAP-aware
 Implementation of good Security standards, Operations, Monitoring & utilities.
 Column – Key-value pair
 Counter column
 Expiring column
 Super column
 Column family – Collection of rows - Map <RowKeys, OrderedColumn Collection>
 Dynamic (Wide)
 Static (Narrow)
 KeyStore – containts column families & super column familes
Cassandra – data model
 CAP Values – AP (Availability & Partition tolerance). Consistency (eventual)
available with latency. No row locking (Hbase wins!)
 Linear scaling of Cassandra – throughput vs no-of nodes.
 Casandra Cluster – Partioner generates tokens for rowKeys
 Write in action
 Read in action
Cassandra – Architecture
Installation & Configuration
 Yum installation is the easiest - /etc/yum.repos.d/datastax.repo
 Cassandra.yaml configuration
 Cluster_name, data_file_dir, commitlog_dir
 Directory locations
 Start Cassandra :– Cassandra –f
 Start CLI:- cqlsh
 Stop Cassandra – service stop or process kill
Demo
5 April 2016 26
CQL in action
 CQL 3.0 is much like SQL.
 All names are case-insensitive
 CQL Data types:
 Create KeySpace: Responsys_Demo
 Create table, index, user
 All other SQL like functions !!
Cassandra – Monitoring
 JMX Interface – DEMO
 Nodetool – Cassandra JMX interface
 cfstats
 Netstats
 Ring & other operations
 DataStax Ops center
 Nagios monitoring
 Cassandra logging & GC logging
4/5/2016 29
Summary, Conclusions
&
References
Summary – Quick recap
 Data evolution
 ACID, BASE & CAP
 NoSQL, data models, implementations
 Cassandra & Data model
 Architecture
 Installations & Operations
Links & References
• https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
• http://www.thoughtworks.com/insights/blog/nosql-databases-overview
• http://www.dia.uniroma3.it/~torlone/bigdata/L6-NoSQL.pdf
• https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
• http://radar.oreilly.com/2013/03/returning-transactions-to-distributed-data-stores.html
4/5/2016 Confidential32
Q & A
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.33
Thank you
APPENDIX

Exploring NoSQL and implementing through Cassandra

  • 1.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.1 DILEEP KALIDINDI 23rd February 2015 Explore, Build & Operate NoSQL with Apache Cassandra
  • 2.
    Who am I?  Dileep Varma Kalidindi  Current: Senior Engineer @Responsys (since Apr’14), Circles Team.  Fascination: Problem Solving , Distributed & BigData churning systems.  Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.  Hobbies: Adventure sports.
  • 3.
    Are we good? 4/5/2016 3 Data
  • 4.
    Data  Data hasnever been in same structure, so as their modelling techniques.  Applications evolved from OLAP, OLTP to Web, Mobile & Social.  Big Data comes with different characteristics – Volume, Velocity, Variety, Veracity & Value.  Responsys Data:  Need for better suitable Data models and Storage models - but why ?
  • 6.
    Impending Mismatch –Datamodel & Storage model  SQL relational model is User oriented  in store concurrency, integrity, consistency, or data type validity  Transactional guarantees, schemas and referential integrity  Purpose applications tend to control integrity and validity (not aggregation fancy)  Difference between the persistent data model and the in-memory data structures.  Data duplication and denormalization are now First class citizens !!  Scale–up to Scale– wide – NoSQL Multinode vs RDBMS clustering.
  • 7.
    Conceptual – ACID,BASE & CAP Transactions, consistency and availability – could we prioritize ?
  • 8.
    CAP theorem -consequences
  • 10.
    Agenda  NoSQL  NoSQLImplementations – for various purposes  Architecture fit – Polyglot persistence  Data modelling – concepts in view of NoSQL .  Cassandra – Architecture  Database Internals  CQL & DEMO  Installation, Configuration & tools  Oracle NoSQL – pitch by Sheetal
  • 11.
  • 12.
    NoSQL  Non-relational, distributed,open-source & horizontally scalable #nxtGen NoSQL is an accidental neologism. Schema less storage systems built for 5 v’s of Bigdata. Decentralized – Every node in cluster is identical High Availability - No SPoF – No Network failures Open source and No cost models (Except for enterprise support)
  • 14.
    NoSQL – Architecturefit-in Polyglot persistence thinking fits in right data store for appropriate data sets. Service usage over Direct data usage. Concerns  Operational concerns like licensing, support, tools, upgrade, auditing.  Security of Datastore, Context’s, Authorization etc ..  Integration with ETL and Data transfer utilities.  Deployment complexity
  • 15.
    Data models –in view of NoSQL  NoSQL models are application specific “What questions do I have?”  Relational models are driven by structure of data “What answers do I have?” Modelling techniques  Conceptual: Denormalization, Aggregates & Application side joins  General: Atomic aggregates, Enumerable Keys, Dimensionality reduction, Index table & Composite key index.  Hierarchical: Tree aggregation, Materialized paths, Nested sets & batch graph processing.
  • 16.
    Data models –deep view  Conceptual: DeNormalization  Query data volume or IO per query VS total data volume  Processing complexity VS total data volume  Aggregates:  Simple  Atomic Tree aggregation:
  • 18.
    NoSQL - implementations Ifone implementation fits all then why not RDBMS ? Classification is driven in application point of view ! Key-Value  Strong aggregation which is opaque to the database  Oracle NoSQL, Windows Azure & Redis Document database  Structure in the aggregate  MongoDb, CouchDb & Raven DB
  • 19.
    NoSQL - implementations Columnfamily structures  Two level aggregate structure  Key & a row aggregate, Row aggregate is a group of columns.  Big table, Hbase & Cassandra Graphs database  Neo 4j
  • 20.
  • 21.
  • 22.
    Apache Cassandra -Continuous availability, linear scalability & operational simplicity  About  Column store NoSQL Database.  Originally developed by Facebook (2007) and now an Apache project  Master less architecture with all nodes in Ring topology  Commercial add-ons & support (“enterprise edition”) by Datastax  Data center replication, Scalability (wide), Fault-tolerance & Tunable consistency.  Online load balancing, flexible schema, key-oriented queries & CAP-aware  Implementation of good Security standards, Operations, Monitoring & utilities.
  • 23.
     Column –Key-value pair  Counter column  Expiring column  Super column  Column family – Collection of rows - Map <RowKeys, OrderedColumn Collection>  Dynamic (Wide)  Static (Narrow)  KeyStore – containts column families & super column familes Cassandra – data model
  • 24.
     CAP Values– AP (Availability & Partition tolerance). Consistency (eventual) available with latency. No row locking (Hbase wins!)  Linear scaling of Cassandra – throughput vs no-of nodes.  Casandra Cluster – Partioner generates tokens for rowKeys  Write in action  Read in action Cassandra – Architecture
  • 25.
    Installation & Configuration Yum installation is the easiest - /etc/yum.repos.d/datastax.repo  Cassandra.yaml configuration  Cluster_name, data_file_dir, commitlog_dir  Directory locations  Start Cassandra :– Cassandra –f  Start CLI:- cqlsh  Stop Cassandra – service stop or process kill
  • 26.
  • 27.
    CQL in action CQL 3.0 is much like SQL.  All names are case-insensitive  CQL Data types:  Create KeySpace: Responsys_Demo  Create table, index, user  All other SQL like functions !!
  • 28.
    Cassandra – Monitoring JMX Interface – DEMO  Nodetool – Cassandra JMX interface  cfstats  Netstats  Ring & other operations  DataStax Ops center  Nagios monitoring  Cassandra logging & GC logging
  • 29.
  • 30.
    Summary – Quickrecap  Data evolution  ACID, BASE & CAP  NoSQL, data models, implementations  Cassandra & Data model  Architecture  Installations & Operations
  • 31.
    Links & References •https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ • http://www.thoughtworks.com/insights/blog/nosql-databases-overview • http://www.dia.uniroma3.it/~torlone/bigdata/L6-NoSQL.pdf • https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ • http://radar.oreilly.com/2013/03/returning-transactions-to-distributed-data-stores.html
  • 32.
  • 33.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.33 Thank you
  • 35.