28. Why is it good?
• Super flexible
• Proven to work, dominant in the market for 3
years
• Robust, Stable
• Very consistent
• Follows ACID transitions, making it industry
standard
29. Why is it bad?
• Strongly typed columns
• Inefficient with high volumes of data
• Not designed for clusters
• ONLY EFFICIENT WITH STRUCTRED DATA
• Vertical scaling, need to buy bigger computer
to process bigger data
35. Why is it good?
• Hyper fast data storing and retrievals
• Good for storing sessions from users
– User profiles on forums
– Shopping carts on websites
36. Why is it bad?
• Can’t query for values within the values
• Need to know the key to properly query
39. • Able to write 114293.71 requests per second
• Able to read 81234.77 requests per second
• https://redis-
docs.readthedocs.org/en/latest/Benchmarks.
html
40. Companies that use Redis
• Twitter
• Github
• Pinterest
• Snapchat
• Flickr
• Hulu
• Vine
• Imgur
• Craigslist
44. Why is it good?
• Very easy to write up
• Turn objects directly into Json files and easily turn Json
files into objects
• Easy to store data, documents contain
whatever key and value you want
• No schema
• Documents are independent units, easy to
distribute
• No need for data to be related at all
45. Why is it good? (cont)
• Very, very programmer friendly
• Good for:
– Event logging
– Content managing systems
– E-commerce applications
– Real-time analytics
46. Why is it bad?
• Tends to struggle when database is too big.
• Not good at handling data that are very
related to each other
• Not designed to handle cross-document operations
• Can’t slice data
54. Why is it good?
• Well suited for analyzing interconnections
• Very good for data that involve complex
relationships
• High interest in mining social media data
• Used for creating “recommended products”
on sales websites
55. Why is it bad?
• Not good at updating all, or a subset of
entities
• Changing a property on all nodes in not a
straight-forward approach
• Some databases may not be able to handle
large amounts of data
61. Why is it good?
• Designed for gigantic amounts of data
• Far better than row store, doesn’t waste time
searching
• 10,000 rows. If you are looking for a value in a
single column, no need to read every single row.
• Good for blogs, forums
• Event logging
• When you want to count and categorize certain
values
62. Why is it bad?
• Not good at working with systems that require
ACID transactions for writes and reads
• If the data set is small, it is better of to use
relational databases
– If you just need to look at rows, relational
database is much better
• Or a bunch of columns
64. Companies that use Cassandra
• Walmart
• VMWare
• Unity
• Ubisoft
• Sony
• Reddit
• Paypal
• Netflix
• Nasa
• Instagram
• IBM
• Fedix
• Ebay
• Call of Duty
65. Scaling in Cassandra
• Horizontal scaling
• A matter of adding more nodes
• Add more nodes = cluster support more writes
and reads
• While clusters are working, you can still add
more nodes
69. The University of Toronto test (2012)
• Cassandra 1.0.0 rc2
• Redis 2.4.2
• Hbase v0.90.4
• Voldmort 0.90.1
• MySQL – 5.5.17
70. The tests
• Workload R (95% reads)
• Workload RW (50% writes, 50% reads)
• Workload W (99% writes)
71.
72.
73.
74.
75.
76.
77. Conclusion
• Cassandra – Highest Scalability, suffered in
latency
• Redis – Highest initial troughput in read-
intensive workloads. Latency very low
78. Conclusion (cont.)
• MySQL – Almost the same as Cassandra,
latency is better
• HBase – Lowest throughput. Highest latency
for reading. Lower latency for writing
79. EndPoint: Benchmarking Top NoSQL
Databases
• Published: April 13, 2015
• Updated: May 27, 2015
• Cassandra (2.1.0)
• Couchbase (3.0.1)
• MongoDB (3.0)
• Hbase(0.98.6-1 and Hadoop (2.6.0))
80. What was updated?
• Cassandra’s and Hbase’s performance went far
up after updating results
81. Workload selection
• Workloads selected to be similar to today’s
applications
• Database nodes: (30.5 GB RAM, 4 CPU cores,
and a single volume of 800 GB of SSD local
storage)
• All data had no data loss
• Used data volumes that exceeded RAM
capacity on each node
82. Workloads
• Read-mostly: 95% read, 5% update ratio
• Read/write: 50% read, 50% update
• Read-modify-write: 50% read to 50% read-
modify-write ratio
• Insert mostly: 90% insert, 10% read
• 9 million operations per workload
99. Workload C
• 90% read operations
• 8% update operations
• 1% insert operations
• 1% delete operations.
• 3 million 10 KB records (50 million records is
similar to workload B results)
105. Conclusions
• Cassandra has amazing scalability again
• Cassandra is weaker at reading in terms of
latency
• MongoDB has the worst latency results in
almost all fields
106. Overall conclusion
• Can’t state a single noSQL structure beats all
• How about combining?
• POLYGOT PERSISTENCE