System design fundamentals CAP.pdf

SYSTEM DESIGN
FUNDAMENTALS
Parallel & Distributed Computing

INTRODUCTION
• CAP theorem or Brewer’s theorem
• States that a distributed system can only provide two of three
properties at once: consistency, availability, and partition tolerance.
• The CAP theorem applies a similar type of logic to distributed
systems—namely, that a distributed system can deliver only two of
three desired characteristics: consistency, availability, and partition
tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP).
• A distributed system is a collection of computers that work together
to form a single computer for end users.
• All of the distributed machines have one shared state and operate
concurrently.

CAP THEOREM / BREWER’S THEOREM
• Imagine a distributed system consisting of two nodes:
• The distributed system acts as a plain register with the value of variable X.
• There’s a network failure that results in a network partition between the two nodes in
the system.
• An end-user performs a write request, and then a read request.
• Let’s examine a case where a different node of the system processes each request.
• In this case, our system has two options:

CAP THEOREM / BREWER’S THEOREM
• It can fail at one of the requests, breaking the system’s availability
• It can execute both requests, returning a stale value from the read
request and breaking the system’s consistency
• The system can’t process both requests successfully while also
ensuring that the read returns the latest value written by the write.
• This is because the results of the write operation can’t be
propagated from node A to node B because of the network
partition.

CONSISTENCY, AVAILABILITY &
PARTITION TOLERANCE
• Consistency
• In a consistent system, all nodes see the same data at the same
time.
• If we perform a read operation on a consistent system, it should
return the value of the most recent write operation.
• The read should cause all nodes to return the same data.
• All users see the same data at the same time, regardless of the
node they connect to.
• When data is written to a single node, it is then replicated across
the other nodes in the system.

PARTITION TOLERANCE
• Availability
• When availability is present in a distributed system, it means that
the system remains operational all of the time.
• Every request will get a response regardless of the individual
state of the nodes.
• This means that the system will operate even if there are multiple
nodes down.
• Unlike a consistent system, there’s no guarantee that the
response will be the most recent write operation.

PARTITION TOLERANCE
• Partition tolerance
• When a distributed system encounters a partition, it means that
there’s a break in communication between nodes.
• If a system is partition-tolerant, the system does not fail,
regardless of whether messages are dropped or delayed
between nodes within the system.
• To have partition tolerance, the system must replicate records
across combinations of nodes and networks.

CAP THEOREM NOSQL DATABASES
• NoSQL databases are great for
distributed networks.
• They can quickly scale across multiple
nodes.
• When deciding which NoSQL database
to use, it’s important to keep the CAP
theorem in mind.
• NoSQL databases can be classified
based on the two CAP features they
support:

CA DATABASES
• CA databases enable consistency and
availability across all nodes.
• Unfortunately, CA databases can’t deliver
fault tolerance.
• In any distributed system, partitions are
bound to happen, which means this type
of database isn’t a very practical choice.
• That being said, you still can find a CA
database if you need one.
• Some relational databases, such
as PostgreSQL, allow for consistency and
availability.

CP DATABASES
• CP databases enable consistency and
partition tolerance, but not availability.
• When a partition occurs, the system has
to turn off inconsistent nodes until the
partition can be fixed.
• The CP system is structured so that there’s
only one primary node that receives all of
the write requests in a given replica set.
• Secondary nodes replicate the data in the
primary nodes, so if the primary node fails,
a secondary node can stand-in.

AP DATABASES
• AP databases enable availability and
partition tolerance, but not consistency.
• In the event of a partition, all nodes are
available, but they’re not all updated.
• For example, if a user tries to access data
from a bad node, they won’t receive the
most up-to-date version of the data.
• When the partition is eventually resolved,
most AP databases will sync the nodes to
ensure consistency across them.

MONGODB AND THE CAP THEOREM
• MongoDB is a popular NoSQL database management system that stores data
as documents.
• It's frequently used for big data and real-time applications running at multiple
different locations.
• Relative to the CAP theorem, MongoDB is a CP data store—it resolves network
partitions by maintaining consistency, while compromising on availability.
• MongoDB is a single-master system—each replica set can have only one
primary node that receives all the write operations.
• All other nodes in the same replica set are secondary nodes that replicate the
primary node's operation log and apply it to their own data set.
• By default, clients also read from the primary node, but they can also specify
a read preference (link resides outside) that allows them to read from
secondary nodes.

MONGODB AND THE CAP THEOREM
• When the primary node becomes unavailable, the secondary node with the
most recent operation log will be elected as the new primary node.
• Once all the other secondary nodes catch up with the new master, the
cluster becomes available again.
• As clients can't make any write requests during this interval, the data remains
consistent across the entire network.

CASSANDRA AND THE CAP THEOREM
• Apache Cassandra is an open source NoSQL database maintained by the
Apache Software Foundation.
• It’s a wide-column database that lets you store data on a distributed network.
• However, unlike MongoDB, Cassandra has a master less architecture, and as
a result, it has multiple points of failure, rather than a single one.
• Relative to the CAP theorem, Cassandra is an AP database—it delivers
availability and partition tolerance but can't deliver consistency all the time.
• Because Cassandra doesn't have a master node, all the nodes must be
available continuously.
• However, Cassandra provides eventual consistency by allowing clients to
write to any nodes at any time and merging inconsistencies as quickly as
possible.

CASSANDRA AND THE CAP THEOREM
• As data only becomes inconsistent in the case of a network partition and
inconsistencies are quickly resolved, Cassandra offers “repair” functionality to
help nodes catch up with their peers.
• However, constant availability results in a highly performant system that might
be worth the trade-off in many cases.

MICROSERVICES AND THE CAP THEOREM
• Microservices are defined as loosely coupled services that can be independently
developed, deployed, and maintained.
• They include their own stack, database, and database model, and communicate
with each other through a network.
• Microservices are also widely used in on-premises data centers.
• CAP theorem can help to choose the best database when designing a
microservices-based application running from multiple locations.
• For example, if the ability to quickly iterate the data model and scale horizontally is
essential to your application, but you can tolerate eventual (as opposed to strict)
consistency, an AP database like Cassandra or Apache CouchDB can meet your
requirements and simplify your deployment. On the other hand, if your application
depends heavily on data consistency—as in an eCommerce application or a
payment service—you might opt for a relational database like PostgreSQL.

MICROSERVICES AND THE CAP THEOREM
• For example, if the ability to quickly iterate the data model and
scale horizontally is essential to your application, but you can
tolerate eventual (as opposed to strict) consistency, an AP
database like Cassandra or Apache CouchDB can meet your
requirements and simplify your deployment.
• On the other hand, if your application depends heavily on data
consistency—as in an eCommerce application or a payment
service—you might opt for a relational database like PostgreSQL.

MICROSERVICES ARCHITECTURE
• There is no universal definition of the term microservices.
• The simplest definition of microservices, also called microservice
architecture, is an architectural style that structures an application
using loosely coupled services.
• These collections or modules can be independently developed,
deployed, and maintained.

MICROSERVICES ARCHITECTURE
• They operate at a much faster and reliable speed than the
traditional complex, monolithic applications.
• Using microservice architectures, an organization of any size can
evolve technology stacks tailored to their capabilities.
• There are many tangible benefits to using microservices, but there
is still some controversy over whether or not companies should
switch from a monolithic to microservices architecture.

MONOLITHIC VS. MICROSERVICES
• The monolithic architecture is the traditional way of building and
deploying applications.
• This structure is based around the concept of a single, indivisible unit,
including the server side, client side, and database.
• All facets are unified and managed as a single unit and codebase.
• This means that any updates must be made to the same codebase, so
the whole stack must be altered.
• A microservices architecture, on the other hand, breaks down that unit
into independent ones that function as separate services.
• This means that every service has its own logic and codebase.
• They communicate with each other through APIs.

MONOLITHIC VS. MICROSERVICES
• Choosing a monolithic architecture
• If your company is a small team. This way you don’t have to deal with the
complexity of deploying a microservice architecture.
• If you want a quicker launch. Monolithic architecture requires less time to
launch.
• This system will require more time later on to update your system, but the
initial launch is quicker.
• Choosing a microservices architecture
• If you want to develop a more scalable application. Scaling a
microservices architecture is far easier. New capabilities and modules can
be added with much ease and speed.
• If your company is larger or plans to grow. Using microservices is great for a
company that plans to grow, as a microservices architecture is far more
scalable and easier to customize over time.

• The concept of parallel computing is that more than one processor can work on a
given task simultaneously, which will speed up a task by a certain factor. That’s
where Amdahl’s law applies, in finding that factor’s benchmark.
• The implementation of parallel computing is most commonly done with a system
known as multicore processing. Years ago, chip makers started introducing
microprocessors with more than one processor core, known as ‘multicore’ design,
and that quickly became part of how to innovate for speed.

System design fundamentals CAP.pdf

System design fundamentals CAP.pdf

Recommended

Recommended

More Related Content

Similar to System design fundamentals CAP.pdf

Similar to System design fundamentals CAP.pdf (20)

Recently uploaded

Recently uploaded (20)

System design fundamentals CAP.pdf