2. INTRODUCTION
WHY DOES THIS MATTER TO US?
▸ Era of Big data, web 2.0 distributive systems
▸ New demand to effectively achieve performance issues and scalability
▸ To achieve "loosely coupled" app, we need to compose databases
loosely coupled as well.
▸ Distributive system, distributable database
2
5. INTRODUCTION
WHAT PROBLEMS ARE WE FACING NOW?
5
Needs How to solve
Difficult to get data with complicated
query
Should support various types of
indexes
It would be better if unique index is
supported.
Handle it on application-level, or
database should support unique index.
Difficult to discover on some errors
Detailed logs (audit log, DDL log, ...),
only accessible on specific network.
Ready to store large number of data Scale up/out free and easily
6. INTRODUCTION
PRICING
Price Free available Price can vary with..
AWS Aurora,
MySQL
$ 15,000+
db.r4.xlarge
1 master and 2 slave
in 1 region
MySQL
- Which machine type
will be used.
- How many replicas
will be used.
MongoDB Atlas
$ 52,210 ~ $ 68,000
M40 - low CPU
3 RS, 1 read-only RS, 1 shard
in 2 region
Community version
available
- Which Atlas type will
be used.
- How many nodes will
be used in a cluster.
Datastax
Cassandra
Unknown
Apache Cassandra
available
Unknown
6
7. RDBMS: INTRODUCTION
PROS
▸ No learning curve
▸ Best way to represent Relations
▸ ACID
▸ Reliable data protection
▸ Can perform almost all of hard and complex query
7
11. RDBMS
WHAT MAKES RDBMS HARD TO SCALE?
▸ Designed to run on a single server
11
SELECT user.*, pn.phone_number, setting.*
FROM user
OUTER JOIN setting ON setting.signup_at < 2018
LEFT JOIN pn ON user.id = pn.user_id
AND pn.verified = 1
WHERE user.id > 100000
12. MONGODB: INTRODUCTION 12
PROS
▸ Support powerful indexing
▸ Support reference
▸ Can avoid impedance mismatch
▸ Easy management using Atlas
▸ Powerful aggregation/MapReduce function to achieve 'JOIN'-LIKE
operation in RDBMS.
▸ Support transaction from version 4.0
▸ Well-documented
13. MONGODB: INTRODUCTION 13
CONS
▸ Little bit slower compared with other NoSQL databases
▸ Has single point of failure
▸ Support MapReduce, but slow
▸ Very slow on not-indexed query
▸ Consumes much, much memory
16. CASSANDRA: INTRODUCTION
PROS
▸ Wide-column Family
▸ Focus on performance
▸ Avoid impedance mismatch
▸ No single point of failure
▸ Support consistency can be tuned. (Choose CP or AP through
consistency level of read or write)
16
17. CASSANDRA: INTRODUCTION
CONS
▸ Limit of using index
▸ Cannot make unique index (no reading operation on write)
▸ Only support querying index with equality (cannot query by range,
regex, greater, or less)
▸ Difficult to design database
▸ Should concentrate on deciding what primary keys will be used for
Partitioner to choose the best first replica node.
▸ Should choose correct read/write consistency level value for each
query or the data might be loss.
17
19. CASSANDRA: READ OPERATION
NODE 1
NODE 2 NODE 3
NODE 6
NODE 4
NODE 5
19
Replication Strategy: SimpleStrategy
Replication Factor: 3
Partitioner: Murmur3Partitioner
user_id: f6dfbe209cb24453859ec542f546c705
Read Consistency: QUORUM
not found
found (1 out of 3)
found (2 out of 3)
acknowledged
decided by partitioner