SlideShare a Scribd company logo
NoSQL (Not Only SQL)
Next generation web-
scale databases
A brief look at Apache Cassandra
Distributed Database
Who am I
• Joe Alex
– Software Architect / Data Scientist
Loves to code in Java, Scala
– Areas of Interest: Big Data, Data Analytics,
Machine Learning, Hadoop, Cassandra
– Currently working as Team Lead for Managed
Security Services Portal at Verizon
3
New Face of data
Scale out not up
•Big Data
–user generated; Amazon, Social Networks: Twitter, Facebook, Four
Square
–machine generated; credit cards, RFID, POS, cell phones, GPS,
firewalls, routers
–more and more connected
–less structured
–data sets becoming larger and larger
–joins and relationships are exploding
–cloud computing - scaling and tolerance needs
–backing up is replaced with having multiple active copies
–nodes can crash and applications should survive
–nodes can be added or removed at any point of time
4
New Face of data
Internet of Things (real-world objects connect to the Internet)
– 'Internet of Things' will infuse intelligence into all our systems and
present us with a whole new way to run a home, an enterprise, a
community or an economy. In a 4G world, wireless will connect
everything and that there's really no limit to the number of
connections that can be part of the mobile grid: vehicles,
appliances, buildings, roads, medical monitors.“
– recently announced a partnership with American Security
Logistics (ASL), to "wirelessly connect a series of location based
tracking devices that can be used to help keep tabs on an array of
valuables - from people to pets to pallets.
– 2013, the number of devices connected to the Internet will reach
1 trillion - up from 500 million in 2007.
5
New Face of data
Scale out not up
•Traditional RDBMS
– neither economical or capable
– scaling up doesn't work
– scaling out with traditional DB is not easy
• scaling reads to a relational DB is hard
• scaling writes is almost impossible
– when you try to do, it is not relational anymore
– sharding scales
• but you lose all features that make RDBMS useful
• operational nightmare
– volumes of data strain commercial RDBMS
– cloud computing
– rethink how we store data. Understand your data, find the most efficient model
– de-normalization. normalization strives to remove duplication but duplication is an
interesting alternative to joins
6
New Face of data
What is wrong with RDBMS
•Pros
–SQL lets you query all data at once
–enforces data integrity
–minimizes repetition
–proven
–familiar to DBA, users
•Cons
–rigidly schematic
–joins rapidly become a bottleneck
–difficult to scale up
–gets in way of parallization
–optimization may mitigate benefits of normalization (Sharding)
7
New Face of data
What is good with NRDBMS
•Pros
–schemaless
–master-master replication
–scales well
–everything runs in parallel
–built for the web
•Cons
–integrity-enforcement migrates to code
–limited ORM tooling
–significant learning curve
–proven only in a sub-set of cases
–Unlearning normalization is difficult
8
New Face of data
What is good with NRDBMS
– Relational databases do not fit every problem
– stuffing files in to an RDBMS, maybe there is something better
– using RDBMS for caching, perhaps a lighter weight solution is better
– cramming log data into a RDBMS, perhaps a KeyValue store is better
– trying to do parallel processing with a DB maybe Hadoop MapReduce is better
– executing a long running process taking few hours, may be MapReduce with
Hadoop/Hbase is better and get it done in minutes
– Despite the hype, RDBMS are not doomed, but
– their role and place will certainly change
– Scaling is a real challenge for relational db
• sharding is a band-aid, not feasible beyond a few nodes
– There is a hit in overcoming the initial leaning curve
• it changes how you build applications (jsp, jsf, jpa)
– Drop ACID and think about data
9
New Face of data
What is good with NRDBMS
–Webapps need
• elastic scalability
• flexible schemas
• geographic distribution
• high availability
• reliable storage
–Webapps can do without
• complicated queries
• strong transactions ( some form of consistency is still desirable)
–DB vs NoSQL
• Strong consistency vs Eventual consistency
• Big dataset vs Huge Datasets
• Scaling is possible vs Scaling is easy
• SQL vs MapReduce, API etc
• Good availability vs Very high availability
10
CAP Theorem
You cant have it all
–What is ACID
• Atomic
• Consistent
• Isolated
• Durable
–ACID trips when
• downtime is unacceptable
• reliability is >= 2 nodes
• challenging over Networks
11
CAP Theorem
You cant have it all
•What is CAP Theorem
– Distributed systems can have any two
• Consistency (data is correct at all times)
– ACID transactions
• Availability (read and write all the time)
– Total Redundancy
• Partition Tolerance (plug and play nodes)
– Infinite scale out
– CA - corruption is possible if live nodes cant communicate
– CP - completely inaccessible if any nodes are dead
– AP - always available, but not always read most recent
– Cassandra chooses A and P but allows them to be tunable to have more C
– RDBMS are typically CA
12
CAP Theorem
You cant have it all
•What is BASE
– ACID Alternative
– Basically Available (appears to work all the time)
– Soft state (doesn't have to be consistent all the time)
– Eventually consistent (but eventually it will be)
–BASE (basically available, soft state, eventually consistent) rather than ACID
(atomicity, consistency, isolation, durability )
13
NoSQL
It is really Not Only SQL
•What problems does it solve
–Reliable and simple scaling
–No single point of failure (all nodes are identical)
–High write throughput
–Large data sets
–Scale out not up
–Online load balancing, cluster growth
–flexible schema
–key-oriented queries
–CAP aware
14
NoSQL
It is really Not Only SQL
•Many choices
–Key/Value Stores (distributed hash tables)
Stores entities as key value pairs in large hash tables
– Voldemort, Redis, Riak, SimpleDB, Tokyo Cabinet, Dynomite, MemcacheDB
–Column Oriented (semi-structured)
Stores entities by Column
– Cassandra, Bigtable, HBase, Hypertable, Azure table services
–Document (semi-structured)
stores documents (JSON)
– CouchDB, MongoDB
–Graph (stores entities as nodes and edges)
– Neo4j
15
NoSQL
It is really Not Only SQL
16
Cassandra
Highly scalable distributed database
• Created at Facebook
– Designed by Avinash Lakshman and Prashant Malik
– Open sourced by Facebook in 2008
– Apache Incubator
– Graduated in March 2009
– Dynamo's fully distributed design
– Bigtable's Column Family-based data model
17
Cassandra
Highly scalable distributed database
– Proven
• largest production cluster has over 100 TB of data in over 150 machines.
– Fault Tolerant
• automatically replicated to multiple nodes for fault-tolerance
• Replication across multiple data centers supported
• Failed nodes can be replaced with no downtime
– Decentralized
• Every node in the cluster is identical
• no network bottlenecks
• no SPOF
– You're in control
• Choose between synchronous or asynchronous replication for each update
• Highly available asynchronous operations are optimized with features like Hinted Handoff
and Read Repair
– Rich Data Model
• Allows efficient use for many applications beyond simple key/value
– Elastic
• Read and write throughput both increase linearly as new machines are added, with no
downtime or interruption to application
– Durable
• Cassandra is suitable for applications that can't afford to lose data, even when an entire
data center goes down
18
Cassandra
Highly scalable distributed database
–High Availability. Writes never fail.
–Incremental scalability
–Eventually Consistent (Hinted Handoff, Read Repair)
–Tunable tradeoffs between consistency and latency
– partitioning, replication
–Minimal administration
–No Single Point Of Failure (SPOF)
–Key-Value store (with some structure)
–Schemaless
–MapReduce support
–Two read paths available: high-performance weak reads/quorum
reads
–Reads and writes atomic within a single Column Family
–Versioning and conflict resolution (last update wins)
19
Cassandra
Who is using it
• Used by
– Twitter
– Facebook
– Digg
– Rackspace
– Reddit
– IBM
– Cisco
– SimpleGeo
– Cloudkick
– Comcast
– Mahalo
– Ooyala
– OpenX
20
Dynamo architecture & Lookup
21
Cassandra
Highly scalable distributed database
22
Memtable SSTable
23
Cassandra
Highly scalable distributed database
• Writes
– no reads
– no seeks
– sequential disk access
– atomic within CF
– Fast
– Any node
– Always writable (hinted hand-off)
– Writes go to a commit log and in-memory storage (memtable)
– Memtable is occasionally flushed to disk (SSTable)
– The SSTables are periodically compacted
– Partitioner
– Wait for W responses
– client issues a write req to a random node in the cassandra cluster partitioner determines
the nodes responsible for the data
– No locks in critical path
– always writable - accepts writes during failure scenarios
24
Cassandra
Highly scalable distributed database
• Reads
– Any nodes
– read repair
– usual cache conventions apply
– Bloom Filters before SSTable
– reads (memtable, sstable)
– Partitioner
– Wait for N – R responses in the background and perform read repair
– Read multiple SSTables
– Slower than writes (but still fast)
– Scales to billions of rows
– Read repair when out of synch
– Row Cache avoid SSTable lookup
– key cache avoid index scan
25
Cassandra
Highly scalable distributed database
Messaging service
Gossip
Failure detection
Cluster state
Partitioner
Replication
Commit log
Memtable
SSTable
Indexes
Compaction
Tombstones
Hinted handoff
Read repair
Bootstrap
Monitoring
Admin tools
26
Compared with MySQL
• MySQL
– 300ms write
– 350ms read
• Cassandra
– 0.12 ms write
– 15ms read
– on 50GB data
27
Clients
• Most common way to access is via Thrift Interface.
• Other clients for most languages
• http://wiki.apache.org/cassandra/ClientExamples
• Fauna – Twitter’s Ruby client
• Lazyboy - Digg’s Python library
28
Datamodels
• Cluster: machines (nodes) in logical Cassandra instance. Clusters can contain
multiple keyspaces.
• Keyspace: namespace for ColumnFamilies. (Analogous to DB schema)
• ColumnFamilies: contain multiple columns, referenced by row keys. (Analogous to
table)
• SuperColumns: columns that themselves have subcolumns.
29
Datamodel
<Keyspaces>
<Keyspace Name="Keyspace1">
<ColumnFamily CompareWith="BytesType" Name="Standard1" />
<ColumnFamily CompareWith="UTF8Type" Name="Super1"
CompareSubcolumnsWith="UTF8Type" ColumnType="Super" />
</Keyspace>
</Keyspaces>
30
Column
• Lowest increment of data. Analogous to Name/Value pairs or Attribute. Key is ID.
• { "name": "emailAddress",
"value": "foo@bar.com",
"timestamp": 123456789 }
31
SuperColumn
• Value is a Map of Columns
• {name: “address",
value: {
street: {name: "street", value: “888 anywhere", timestamp: 123456789},
city: {name: "city", value: “reston", timestamp: 123456789},
zip: {name: "zip", value: “20190", timestamp: 123456789},
}
}
32
Column Families
• Analogous to Tables. Rows can have different columns. Columns can be created
dynamically. Columns are always sorted in row by Column name.
• User = {
keyhole : {
username: “keyhole",
email: " keyhole@bar.com“},
spacer: {
username: “spacer",
email: “spacer@bar.com",
phone: "(888) 888-8888“}
}
33
Column Families
34
Super Column Families
35
Column Families
• Analogous to Tables. Rows can have different columns. Columns can be created
dynamically. Columns are always sorted in row by Column name.
• User = {
keyhole : {
username: “keyhole",
email: " keyhole@bar.com“},
spacer: {
username: “spacer",
email: “spacer@bar.com",
phone: "(888) 888-8888“}
}
36
Type of Queries
• Single column
• Slice
• Key range
• Quering : get(), multiget(), get_slice(), multiget_slice(0, get_count, get_range_slice()
• Column comparators - TimeuUID, LexicalUUID, UTF8, Long, Bytes, ...
• Updating - insert(), batch_insert(), remove(), batch_mutate(), remove key range
37
Cassandra
• Conclusions
– You probably do not need an NRDBMS now, but ought to learn one anyway
– Its not just for Twitter and bleeding edge startups Amazon, Facebook, Google, IBM,
Microsoft all get this
– Sometimes it is simply the right tool for the job
– if you are in the cloud you are going to use them
– best of both worlds - external mapping layer JPA driver
– Next Big thing - In Memory elastic DB
• memory can be much more efficient than disk
• RAMClouds become much more attractive for apps with high throughputs requirements
38
More…
•Other articles/videos about Cassandra
–http://wiki.apache.org/cassandra/
–#cassandra on irc.freenode.net
–http://wiki.apache.org/cassandra/ArticlesAndPresent
ations
Questions
Twitter @joealex
Email joe.m.alex@gmail.com

More Related Content

What's hot

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Clustrix
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
Andraz Tori
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
Benoit Perroud
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Clustrix
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
Daniel Abadi
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
Rutuja Gholap
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
DataStax Academy
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?DataStax
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
Ines Sombra
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
Dr. Awase Khirni Syed
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 

What's hot (20)

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
NoSql
NoSqlNoSql
NoSql
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
Mongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirniMongo db groundup-0-nosql-intro-syedawasekhirni
Mongo db groundup-0-nosql-intro-syedawasekhirni
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 

Similar to NoSQL A brief look at Apache Cassandra Distributed Database

Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Dave Anselmi
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
No sql databases
No sql databasesNo sql databases
No sql databases
swathika rajan
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Big data
Big dataBig data
Big data
shrinivasvasala
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
KarthikR780430
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
Ramakrishna kapa
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
Big Data Joe™ Rossi
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
Sergey Enin
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
NoSQL
NoSQLNoSQL
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
Dr. Awase Khirni Syed
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
Oleksandr Semenov
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
Sanura Hettiarachchi
 

Similar to NoSQL A brief look at Apache Cassandra Distributed Database (20)

No sql
No sqlNo sql
No sql
 
No sql
No sqlNo sql
No sql
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Big data
Big dataBig data
Big data
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
NoSQL
NoSQLNoSQL
NoSQL
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 

Recently uploaded

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
hackersuli
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
Laura Szabó
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
Danica Gill
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
nhiyenphan2005
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
zyfovom
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
wolfsoftcompanyco
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
ukwwuq
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
harveenkaur52
 

Recently uploaded (20)

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
 
7 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 20247 Best Cloud Hosting Services to Try Out in 2024
7 Best Cloud Hosting Services to Try Out in 2024
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
 

NoSQL A brief look at Apache Cassandra Distributed Database

  • 1. NoSQL (Not Only SQL) Next generation web- scale databases A brief look at Apache Cassandra Distributed Database
  • 2. Who am I • Joe Alex – Software Architect / Data Scientist Loves to code in Java, Scala – Areas of Interest: Big Data, Data Analytics, Machine Learning, Hadoop, Cassandra – Currently working as Team Lead for Managed Security Services Portal at Verizon
  • 3. 3 New Face of data Scale out not up •Big Data –user generated; Amazon, Social Networks: Twitter, Facebook, Four Square –machine generated; credit cards, RFID, POS, cell phones, GPS, firewalls, routers –more and more connected –less structured –data sets becoming larger and larger –joins and relationships are exploding –cloud computing - scaling and tolerance needs –backing up is replaced with having multiple active copies –nodes can crash and applications should survive –nodes can be added or removed at any point of time
  • 4. 4 New Face of data Internet of Things (real-world objects connect to the Internet) – 'Internet of Things' will infuse intelligence into all our systems and present us with a whole new way to run a home, an enterprise, a community or an economy. In a 4G world, wireless will connect everything and that there's really no limit to the number of connections that can be part of the mobile grid: vehicles, appliances, buildings, roads, medical monitors.“ – recently announced a partnership with American Security Logistics (ASL), to "wirelessly connect a series of location based tracking devices that can be used to help keep tabs on an array of valuables - from people to pets to pallets. – 2013, the number of devices connected to the Internet will reach 1 trillion - up from 500 million in 2007.
  • 5. 5 New Face of data Scale out not up •Traditional RDBMS – neither economical or capable – scaling up doesn't work – scaling out with traditional DB is not easy • scaling reads to a relational DB is hard • scaling writes is almost impossible – when you try to do, it is not relational anymore – sharding scales • but you lose all features that make RDBMS useful • operational nightmare – volumes of data strain commercial RDBMS – cloud computing – rethink how we store data. Understand your data, find the most efficient model – de-normalization. normalization strives to remove duplication but duplication is an interesting alternative to joins
  • 6. 6 New Face of data What is wrong with RDBMS •Pros –SQL lets you query all data at once –enforces data integrity –minimizes repetition –proven –familiar to DBA, users •Cons –rigidly schematic –joins rapidly become a bottleneck –difficult to scale up –gets in way of parallization –optimization may mitigate benefits of normalization (Sharding)
  • 7. 7 New Face of data What is good with NRDBMS •Pros –schemaless –master-master replication –scales well –everything runs in parallel –built for the web •Cons –integrity-enforcement migrates to code –limited ORM tooling –significant learning curve –proven only in a sub-set of cases –Unlearning normalization is difficult
  • 8. 8 New Face of data What is good with NRDBMS – Relational databases do not fit every problem – stuffing files in to an RDBMS, maybe there is something better – using RDBMS for caching, perhaps a lighter weight solution is better – cramming log data into a RDBMS, perhaps a KeyValue store is better – trying to do parallel processing with a DB maybe Hadoop MapReduce is better – executing a long running process taking few hours, may be MapReduce with Hadoop/Hbase is better and get it done in minutes – Despite the hype, RDBMS are not doomed, but – their role and place will certainly change – Scaling is a real challenge for relational db • sharding is a band-aid, not feasible beyond a few nodes – There is a hit in overcoming the initial leaning curve • it changes how you build applications (jsp, jsf, jpa) – Drop ACID and think about data
  • 9. 9 New Face of data What is good with NRDBMS –Webapps need • elastic scalability • flexible schemas • geographic distribution • high availability • reliable storage –Webapps can do without • complicated queries • strong transactions ( some form of consistency is still desirable) –DB vs NoSQL • Strong consistency vs Eventual consistency • Big dataset vs Huge Datasets • Scaling is possible vs Scaling is easy • SQL vs MapReduce, API etc • Good availability vs Very high availability
  • 10. 10 CAP Theorem You cant have it all –What is ACID • Atomic • Consistent • Isolated • Durable –ACID trips when • downtime is unacceptable • reliability is >= 2 nodes • challenging over Networks
  • 11. 11 CAP Theorem You cant have it all •What is CAP Theorem – Distributed systems can have any two • Consistency (data is correct at all times) – ACID transactions • Availability (read and write all the time) – Total Redundancy • Partition Tolerance (plug and play nodes) – Infinite scale out – CA - corruption is possible if live nodes cant communicate – CP - completely inaccessible if any nodes are dead – AP - always available, but not always read most recent – Cassandra chooses A and P but allows them to be tunable to have more C – RDBMS are typically CA
  • 12. 12 CAP Theorem You cant have it all •What is BASE – ACID Alternative – Basically Available (appears to work all the time) – Soft state (doesn't have to be consistent all the time) – Eventually consistent (but eventually it will be) –BASE (basically available, soft state, eventually consistent) rather than ACID (atomicity, consistency, isolation, durability )
  • 13. 13 NoSQL It is really Not Only SQL •What problems does it solve –Reliable and simple scaling –No single point of failure (all nodes are identical) –High write throughput –Large data sets –Scale out not up –Online load balancing, cluster growth –flexible schema –key-oriented queries –CAP aware
  • 14. 14 NoSQL It is really Not Only SQL •Many choices –Key/Value Stores (distributed hash tables) Stores entities as key value pairs in large hash tables – Voldemort, Redis, Riak, SimpleDB, Tokyo Cabinet, Dynomite, MemcacheDB –Column Oriented (semi-structured) Stores entities by Column – Cassandra, Bigtable, HBase, Hypertable, Azure table services –Document (semi-structured) stores documents (JSON) – CouchDB, MongoDB –Graph (stores entities as nodes and edges) – Neo4j
  • 15. 15 NoSQL It is really Not Only SQL
  • 16. 16 Cassandra Highly scalable distributed database • Created at Facebook – Designed by Avinash Lakshman and Prashant Malik – Open sourced by Facebook in 2008 – Apache Incubator – Graduated in March 2009 – Dynamo's fully distributed design – Bigtable's Column Family-based data model
  • 17. 17 Cassandra Highly scalable distributed database – Proven • largest production cluster has over 100 TB of data in over 150 machines. – Fault Tolerant • automatically replicated to multiple nodes for fault-tolerance • Replication across multiple data centers supported • Failed nodes can be replaced with no downtime – Decentralized • Every node in the cluster is identical • no network bottlenecks • no SPOF – You're in control • Choose between synchronous or asynchronous replication for each update • Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair – Rich Data Model • Allows efficient use for many applications beyond simple key/value – Elastic • Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to application – Durable • Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down
  • 18. 18 Cassandra Highly scalable distributed database –High Availability. Writes never fail. –Incremental scalability –Eventually Consistent (Hinted Handoff, Read Repair) –Tunable tradeoffs between consistency and latency – partitioning, replication –Minimal administration –No Single Point Of Failure (SPOF) –Key-Value store (with some structure) –Schemaless –MapReduce support –Two read paths available: high-performance weak reads/quorum reads –Reads and writes atomic within a single Column Family –Versioning and conflict resolution (last update wins)
  • 19. 19 Cassandra Who is using it • Used by – Twitter – Facebook – Digg – Rackspace – Reddit – IBM – Cisco – SimpleGeo – Cloudkick – Comcast – Mahalo – Ooyala – OpenX
  • 23. 23 Cassandra Highly scalable distributed database • Writes – no reads – no seeks – sequential disk access – atomic within CF – Fast – Any node – Always writable (hinted hand-off) – Writes go to a commit log and in-memory storage (memtable) – Memtable is occasionally flushed to disk (SSTable) – The SSTables are periodically compacted – Partitioner – Wait for W responses – client issues a write req to a random node in the cassandra cluster partitioner determines the nodes responsible for the data – No locks in critical path – always writable - accepts writes during failure scenarios
  • 24. 24 Cassandra Highly scalable distributed database • Reads – Any nodes – read repair – usual cache conventions apply – Bloom Filters before SSTable – reads (memtable, sstable) – Partitioner – Wait for N – R responses in the background and perform read repair – Read multiple SSTables – Slower than writes (but still fast) – Scales to billions of rows – Read repair when out of synch – Row Cache avoid SSTable lookup – key cache avoid index scan
  • 25. 25 Cassandra Highly scalable distributed database Messaging service Gossip Failure detection Cluster state Partitioner Replication Commit log Memtable SSTable Indexes Compaction Tombstones Hinted handoff Read repair Bootstrap Monitoring Admin tools
  • 26. 26 Compared with MySQL • MySQL – 300ms write – 350ms read • Cassandra – 0.12 ms write – 15ms read – on 50GB data
  • 27. 27 Clients • Most common way to access is via Thrift Interface. • Other clients for most languages • http://wiki.apache.org/cassandra/ClientExamples • Fauna – Twitter’s Ruby client • Lazyboy - Digg’s Python library
  • 28. 28 Datamodels • Cluster: machines (nodes) in logical Cassandra instance. Clusters can contain multiple keyspaces. • Keyspace: namespace for ColumnFamilies. (Analogous to DB schema) • ColumnFamilies: contain multiple columns, referenced by row keys. (Analogous to table) • SuperColumns: columns that themselves have subcolumns.
  • 29. 29 Datamodel <Keyspaces> <Keyspace Name="Keyspace1"> <ColumnFamily CompareWith="BytesType" Name="Standard1" /> <ColumnFamily CompareWith="UTF8Type" Name="Super1" CompareSubcolumnsWith="UTF8Type" ColumnType="Super" /> </Keyspace> </Keyspaces>
  • 30. 30 Column • Lowest increment of data. Analogous to Name/Value pairs or Attribute. Key is ID. • { "name": "emailAddress", "value": "foo@bar.com", "timestamp": 123456789 }
  • 31. 31 SuperColumn • Value is a Map of Columns • {name: “address", value: { street: {name: "street", value: “888 anywhere", timestamp: 123456789}, city: {name: "city", value: “reston", timestamp: 123456789}, zip: {name: "zip", value: “20190", timestamp: 123456789}, } }
  • 32. 32 Column Families • Analogous to Tables. Rows can have different columns. Columns can be created dynamically. Columns are always sorted in row by Column name. • User = { keyhole : { username: “keyhole", email: " keyhole@bar.com“}, spacer: { username: “spacer", email: “spacer@bar.com", phone: "(888) 888-8888“} }
  • 35. 35 Column Families • Analogous to Tables. Rows can have different columns. Columns can be created dynamically. Columns are always sorted in row by Column name. • User = { keyhole : { username: “keyhole", email: " keyhole@bar.com“}, spacer: { username: “spacer", email: “spacer@bar.com", phone: "(888) 888-8888“} }
  • 36. 36 Type of Queries • Single column • Slice • Key range • Quering : get(), multiget(), get_slice(), multiget_slice(0, get_count, get_range_slice() • Column comparators - TimeuUID, LexicalUUID, UTF8, Long, Bytes, ... • Updating - insert(), batch_insert(), remove(), batch_mutate(), remove key range
  • 37. 37 Cassandra • Conclusions – You probably do not need an NRDBMS now, but ought to learn one anyway – Its not just for Twitter and bleeding edge startups Amazon, Facebook, Google, IBM, Microsoft all get this – Sometimes it is simply the right tool for the job – if you are in the cloud you are going to use them – best of both worlds - external mapping layer JPA driver – Next Big thing - In Memory elastic DB • memory can be much more efficient than disk • RAMClouds become much more attractive for apps with high throughputs requirements
  • 38. 38 More… •Other articles/videos about Cassandra –http://wiki.apache.org/cassandra/ –#cassandra on irc.freenode.net –http://wiki.apache.org/cassandra/ArticlesAndPresent ations