SlideShare a Scribd company logo
1 of 113
noSQL choices
What is mySQL?
What is noSQL?
Types of noSQL databases
Why noSQL?
Differences between noSQL and
MYSQL
Aggregated data vs tuples
ACID vs BASE transactions
• A – Atomicity
• C – Consistency
• I – Isolation
• D - Durabilty
Schema vs Schema-less
The 5 main data stores
• Relational Databases
• Key-store
• Document Databases
• Graph Stores
• Column Stores
Relational Databases
AKA RDBMS
Why is it good?
• Super flexible
• Proven to work, dominant in the market for 3
years
• Robust, Stable
• Very consistent
• Follows ACID transitions, making it industry
standard
Why is it bad?
• Strongly typed columns
• Inefficient with high volumes of data
• Not designed for clusters
• ONLY EFFICIENT WITH STRUCTRED DATA
• Vertical scaling, need to buy bigger computer
to process bigger data
mySQL
NOSQL databases
Key-value stores
Why is it good?
• Hyper fast data storing and retrievals
• Good for storing sessions from users
– User profiles on forums
– Shopping carts on websites
Why is it bad?
• Can’t query for values within the values
• Need to know the key to properly query
Examples of key-stores
• CouchDB
• Aerospike
• Hyperdex
• Flare
• Dynamo
• Redis
Most popular key-store: Redis
• Able to write 114293.71 requests per second
• Able to read 81234.77 requests per second
• https://redis-
docs.readthedocs.org/en/latest/Benchmarks.
html
Companies that use Redis
• Twitter
• Github
• Pinterest
• Snapchat
• Flickr
• Hulu
• Vine
• Imgur
• Craigslist
Document Databases
Why is it good?
• Very easy to write up
• Turn objects directly into Json files and easily turn Json
files into objects
• Easy to store data, documents contain
whatever key and value you want
• No schema
• Documents are independent units, easy to
distribute
• No need for data to be related at all
Why is it good? (cont)
• Very, very programmer friendly
• Good for:
– Event logging
– Content managing systems
– E-commerce applications
– Real-time analytics
Why is it bad?
• Tends to struggle when database is too big.
• Not good at handling data that are very
related to each other
• Not designed to handle cross-document operations
• Can’t slice data
Examples of document stores
• Mongo DB
• lotusNotes
• Apache Couch DB
Most popular Document Store: Mongo
DB
Companies that use MongoDB
• Expedia
• The Weather Channel
• Forbes
• Otto
Graph Stores
“If you can whiteboard it, you can
graph it”
Why is it good?
• Well suited for analyzing interconnections
• Very good for data that involve complex
relationships
• High interest in mining social media data
• Used for creating “recommended products”
on sales websites
Why is it bad?
• Not good at updating all, or a subset of
entities
• Changing a property on all nodes in not a
straight-forward approach
• Some databases may not be able to handle
large amounts of data
Most popular graph database: Neo4j
Companies that use Neo4j
• Ebay
• Tomtom
• Hp
• Walmart
• eHarmony
Column Stores
Row vs Column store
Why is it good?
• Designed for gigantic amounts of data
• Far better than row store, doesn’t waste time
searching
• 10,000 rows. If you are looking for a value in a
single column, no need to read every single row.
• Good for blogs, forums
• Event logging
• When you want to count and categorize certain
values
Why is it bad?
• Not good at working with systems that require
ACID transactions for writes and reads
• If the data set is small, it is better of to use
relational databases
– If you just need to look at rows, relational
database is much better
• Or a bunch of columns
Most popular Column-family store:
Cassandra
Companies that use Cassandra
• Walmart
• VMWare
• Unity
• Ubisoft
• Sony
• Reddit
• Paypal
• Netflix
• Nasa
• Instagram
• IBM
• Fedix
• Ebay
• Call of Duty
Scaling in Cassandra
• Horizontal scaling
• A matter of adding more nodes
• Add more nodes = cluster support more writes
and reads
• While clusters are working, you can still add
more nodes
Benchmark reports
Throughput
• Higher, the better
• The power of the database engine
Latency guidelines
• Excellent: < 1ms
• Very good: < 5ms
• Good: 5 – 10ms
• Poor: 10 – 20ms
• Bad: 20 – 100ms
• Really bad: 100 – 500ms
• OMG!: > 500ms
The University of Toronto test (2012)
• Cassandra 1.0.0 rc2
• Redis 2.4.2
• Hbase v0.90.4
• Voldmort 0.90.1
• MySQL – 5.5.17
The tests
• Workload R (95% reads)
• Workload RW (50% writes, 50% reads)
• Workload W (99% writes)
Conclusion
• Cassandra – Highest Scalability, suffered in
latency
• Redis – Highest initial troughput in read-
intensive workloads. Latency very low
Conclusion (cont.)
• MySQL – Almost the same as Cassandra,
latency is better
• HBase – Lowest throughput. Highest latency
for reading. Lower latency for writing
EndPoint: Benchmarking Top NoSQL
Databases
• Published: April 13, 2015
• Updated: May 27, 2015
• Cassandra (2.1.0)
• Couchbase (3.0.1)
• MongoDB (3.0)
• Hbase(0.98.6-1 and Hadoop (2.6.0))
What was updated?
• Cassandra’s and Hbase’s performance went far
up after updating results
Workload selection
• Workloads selected to be similar to today’s
applications
• Database nodes: (30.5 GB RAM, 4 CPU cores,
and a single volume of 800 GB of SSD local
storage)
• All data had no data loss
• Used data volumes that exceeded RAM
capacity on each node
Workloads
• Read-mostly: 95% read, 5% update ratio
• Read/write: 50% read, 50% update
• Read-modify-write: 50% read to 50% read-
modify-write ratio
• Insert mostly: 90% insert, 10% read
• 9 million operations per workload
Problems
• Couchbase
• HBase
• MongoDB
Conclusion
• Cassandra outperform everyone heavily in
latency and troughput
• Hbase or CouchDB came second
• MongoDB came last in most test cases
Altoros: The NoSQL Technical
Comparison Report
• Published September 2014
• Pretty unbiased
• Couchbase: 2.5.1
• MongoDB: 2.6.1
• Cassandra: 2.0.8
Workload B
• 50% read operations
• 40% update operations
• 5% insert operations
• 5% delete operations
• 50 million 1 KB records
Workload B
• 3 million 10 KB records
Workload C
• 90% read operations
• 8% update operations
• 1% insert operations
• 1% delete operations.
• 3 million 10 KB records (50 million records is
similar to workload B results)
Scalability
Conclusions
• Cassandra has amazing scalability again
• Cassandra is weaker at reading in terms of
latency
• MongoDB has the worst latency results in
almost all fields
Overall conclusion
• Can’t state a single noSQL structure beats all
• How about combining?
• POLYGOT PERSISTENCE
Example: Shopping Site
E-Commerce platform
Key/value
E-Commerce platform
Key/value
E-Commerce platform
RDBMS
Key/value
E-Commerce platform
RDBMS Document
Key/value
E-Commerce platform
RDBMS Document Graph
noSQL choices

More Related Content

What's hot

Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseCloudera, Inc.
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scalesOren Eini
 
Auto-Scalable REST APIs with YAWP! and Google Cloud
Auto-Scalable REST APIs with YAWP! and Google CloudAuto-Scalable REST APIs with YAWP! and Google Cloud
Auto-Scalable REST APIs with YAWP! and Google CloudFernando Ultremare
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
 
Stumbling stones when migrating from Oracle
 Stumbling stones when migrating from Oracle Stumbling stones when migrating from Oracle
Stumbling stones when migrating from OracleEDB
 
Infinispan - Galder Zamarreno - October 2010
Infinispan - Galder Zamarreno - October 2010Infinispan - Galder Zamarreno - October 2010
Infinispan - Galder Zamarreno - October 2010JUG Lausanne
 
Sisense and Simba MongoDB Analytics Webinar
Sisense and Simba MongoDB Analytics WebinarSisense and Simba MongoDB Analytics Webinar
Sisense and Simba MongoDB Analytics WebinarSimba Technologies
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage EnginesKarthik .P.R
 
Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Felix GV
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...BIWUG
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
 
Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseAlexander Petrov
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
 
Ramunas Balukonis. Research DWH
Ramunas Balukonis. Research DWHRamunas Balukonis. Research DWH
Ramunas Balukonis. Research DWHVolha Banadyseva
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.Pini Krisher
 
How PostgreSQL became King
How PostgreSQL became KingHow PostgreSQL became King
How PostgreSQL became KingChris Travers
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)guest0f8e278
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...VoltDB
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 

What's hot (20)

Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
RavenDB 4.0
RavenDB 4.0RavenDB 4.0
RavenDB 4.0
 
Auto-Scalable REST APIs with YAWP! and Google Cloud
Auto-Scalable REST APIs with YAWP! and Google CloudAuto-Scalable REST APIs with YAWP! and Google Cloud
Auto-Scalable REST APIs with YAWP! and Google Cloud
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
Stumbling stones when migrating from Oracle
 Stumbling stones when migrating from Oracle Stumbling stones when migrating from Oracle
Stumbling stones when migrating from Oracle
 
Infinispan - Galder Zamarreno - October 2010
Infinispan - Galder Zamarreno - October 2010Infinispan - Galder Zamarreno - October 2010
Infinispan - Galder Zamarreno - October 2010
 
Sisense and Simba MongoDB Analytics Webinar
Sisense and Simba MongoDB Analytics WebinarSisense and Simba MongoDB Analytics Webinar
Sisense and Simba MongoDB Analytics Webinar
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql database
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
Ramunas Balukonis. Research DWH
Ramunas Balukonis. Research DWHRamunas Balukonis. Research DWH
Ramunas Balukonis. Research DWH
 
SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.SQL Azure - the good, the bad and the ugly.
SQL Azure - the good, the bad and the ugly.
 
How PostgreSQL became King
How PostgreSQL became KingHow PostgreSQL became King
How PostgreSQL became King
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 

Similar to noSQL choices

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Where do I put this data? #lessql
Where do I put this data? #lessqlWhere do I put this data? #lessql
Where do I put this data? #lessqlEzra Zygmuntowicz
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Gavin Heavyside
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql databaseHeman Hosainpana
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 

Similar to noSQL choices (20)

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Where do I put this data? #lessql
Where do I put this data? #lessqlWhere do I put this data? #lessql
Where do I put this data? #lessql
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Data engineering
Data engineeringData engineering
Data engineering
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
NoSQL
NoSQLNoSQL
NoSQL
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 

noSQL choices

  • 3.
  • 5.
  • 6. Types of noSQL databases
  • 7.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 20.
  • 21. ACID vs BASE transactions
  • 22. • A – Atomicity • C – Consistency • I – Isolation • D - Durabilty
  • 23.
  • 25. The 5 main data stores • Relational Databases • Key-store • Document Databases • Graph Stores • Column Stores
  • 27.
  • 28. Why is it good? • Super flexible • Proven to work, dominant in the market for 3 years • Robust, Stable • Very consistent • Follows ACID transitions, making it industry standard
  • 29. Why is it bad? • Strongly typed columns • Inefficient with high volumes of data • Not designed for clusters • ONLY EFFICIENT WITH STRUCTRED DATA • Vertical scaling, need to buy bigger computer to process bigger data
  • 30. mySQL
  • 33.
  • 34.
  • 35. Why is it good? • Hyper fast data storing and retrievals • Good for storing sessions from users – User profiles on forums – Shopping carts on websites
  • 36. Why is it bad? • Can’t query for values within the values • Need to know the key to properly query
  • 37. Examples of key-stores • CouchDB • Aerospike • Hyperdex • Flare • Dynamo • Redis
  • 39. • Able to write 114293.71 requests per second • Able to read 81234.77 requests per second • https://redis- docs.readthedocs.org/en/latest/Benchmarks. html
  • 40. Companies that use Redis • Twitter • Github • Pinterest • Snapchat • Flickr • Hulu • Vine • Imgur • Craigslist
  • 42.
  • 43.
  • 44. Why is it good? • Very easy to write up • Turn objects directly into Json files and easily turn Json files into objects • Easy to store data, documents contain whatever key and value you want • No schema • Documents are independent units, easy to distribute • No need for data to be related at all
  • 45. Why is it good? (cont) • Very, very programmer friendly • Good for: – Event logging – Content managing systems – E-commerce applications – Real-time analytics
  • 46. Why is it bad? • Tends to struggle when database is too big. • Not good at handling data that are very related to each other • Not designed to handle cross-document operations • Can’t slice data
  • 47. Examples of document stores • Mongo DB • lotusNotes • Apache Couch DB
  • 48. Most popular Document Store: Mongo DB
  • 49. Companies that use MongoDB • Expedia • The Weather Channel • Forbes • Otto
  • 51.
  • 52.
  • 53. “If you can whiteboard it, you can graph it”
  • 54. Why is it good? • Well suited for analyzing interconnections • Very good for data that involve complex relationships • High interest in mining social media data • Used for creating “recommended products” on sales websites
  • 55. Why is it bad? • Not good at updating all, or a subset of entities • Changing a property on all nodes in not a straight-forward approach • Some databases may not be able to handle large amounts of data
  • 56. Most popular graph database: Neo4j
  • 57. Companies that use Neo4j • Ebay • Tomtom • Hp • Walmart • eHarmony
  • 59.
  • 60. Row vs Column store
  • 61. Why is it good? • Designed for gigantic amounts of data • Far better than row store, doesn’t waste time searching • 10,000 rows. If you are looking for a value in a single column, no need to read every single row. • Good for blogs, forums • Event logging • When you want to count and categorize certain values
  • 62. Why is it bad? • Not good at working with systems that require ACID transactions for writes and reads • If the data set is small, it is better of to use relational databases – If you just need to look at rows, relational database is much better • Or a bunch of columns
  • 63. Most popular Column-family store: Cassandra
  • 64. Companies that use Cassandra • Walmart • VMWare • Unity • Ubisoft • Sony • Reddit • Paypal • Netflix • Nasa • Instagram • IBM • Fedix • Ebay • Call of Duty
  • 65. Scaling in Cassandra • Horizontal scaling • A matter of adding more nodes • Add more nodes = cluster support more writes and reads • While clusters are working, you can still add more nodes
  • 67. Throughput • Higher, the better • The power of the database engine
  • 68. Latency guidelines • Excellent: < 1ms • Very good: < 5ms • Good: 5 – 10ms • Poor: 10 – 20ms • Bad: 20 – 100ms • Really bad: 100 – 500ms • OMG!: > 500ms
  • 69. The University of Toronto test (2012) • Cassandra 1.0.0 rc2 • Redis 2.4.2 • Hbase v0.90.4 • Voldmort 0.90.1 • MySQL – 5.5.17
  • 70. The tests • Workload R (95% reads) • Workload RW (50% writes, 50% reads) • Workload W (99% writes)
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77. Conclusion • Cassandra – Highest Scalability, suffered in latency • Redis – Highest initial troughput in read- intensive workloads. Latency very low
  • 78. Conclusion (cont.) • MySQL – Almost the same as Cassandra, latency is better • HBase – Lowest throughput. Highest latency for reading. Lower latency for writing
  • 79. EndPoint: Benchmarking Top NoSQL Databases • Published: April 13, 2015 • Updated: May 27, 2015 • Cassandra (2.1.0) • Couchbase (3.0.1) • MongoDB (3.0) • Hbase(0.98.6-1 and Hadoop (2.6.0))
  • 80. What was updated? • Cassandra’s and Hbase’s performance went far up after updating results
  • 81. Workload selection • Workloads selected to be similar to today’s applications • Database nodes: (30.5 GB RAM, 4 CPU cores, and a single volume of 800 GB of SSD local storage) • All data had no data loss • Used data volumes that exceeded RAM capacity on each node
  • 82. Workloads • Read-mostly: 95% read, 5% update ratio • Read/write: 50% read, 50% update • Read-modify-write: 50% read to 50% read- modify-write ratio • Insert mostly: 90% insert, 10% read • 9 million operations per workload
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 92. Conclusion • Cassandra outperform everyone heavily in latency and troughput • Hbase or CouchDB came second • MongoDB came last in most test cases
  • 93. Altoros: The NoSQL Technical Comparison Report • Published September 2014 • Pretty unbiased • Couchbase: 2.5.1 • MongoDB: 2.6.1 • Cassandra: 2.0.8
  • 94.
  • 95. Workload B • 50% read operations • 40% update operations • 5% insert operations • 5% delete operations • 50 million 1 KB records
  • 96.
  • 97. Workload B • 3 million 10 KB records
  • 98.
  • 99. Workload C • 90% read operations • 8% update operations • 1% insert operations • 1% delete operations. • 3 million 10 KB records (50 million records is similar to workload B results)
  • 100.
  • 102.
  • 103.
  • 104.
  • 105. Conclusions • Cassandra has amazing scalability again • Cassandra is weaker at reading in terms of latency • MongoDB has the worst latency results in almost all fields
  • 106. Overall conclusion • Can’t state a single noSQL structure beats all • How about combining? • POLYGOT PERSISTENCE