The document provides an overview of distributed systems concepts including distributed storage, transactions, computation, and coordination. It discusses various approaches to distributed storage including replication, sharding, consistent hashing, and distributed file systems. It also covers distributed transactions and the challenges of maintaining ACID properties in a distributed environment. The document introduces concepts of distributed computation frameworks like MapReduce and Spark and how they move computation towards data rather than moving large amounts of data. It discusses distributed coordination challenges and approaches like NTP for time synchronization, gossip protocols, and Paxos for reliable distributed logging.
Overview of distributed systems including definition and key components such as storage, transactions, computation, and coordination.
Exploration of distributed storage techniques including read replication, sharding, and examples like MongoDB and HDFS.
Problems associated with replication and sharding, highlighting complexity and consistency issues therein.
Concept of consistent hashing illustrated through numerical examples, focusing on its relevance in distributed systems.Introduction to the CAP theorem emphasizing consistency, availability, and partition tolerance with real-world implications.Core concepts around distributed transactions using the ACID model, with practical scenarios like order processing in coffee shops.
Explanation of distributed computation including MapReduce paradigm, data handling techniques, and process flow.
Practical applications of MapReduce leveraging tools like Hadoop, including its ecosystem and user-friendly interfaces.
Introduction to Spark's architecture and its operational model, detailing concepts like RDDs and execution plans.
Discussion on methods for distributed coordination such as NTP, epidemic protocols, and the Paxos consensus algorithm.
Concise review of the core topics like storage, transactions, computation, and coordination within distributed systems.
MapReduce
Once upon amidnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore,
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
"'Tis some visitor," I muttered, "tapping at my chamber door-
Only this, and nothing more."
Ah, distinctly I remember it was in the bleak December,
And each separate dying ember wrought its ghost upon the floor.
Eagerly I wished the morrow;- vainly I had sought to borrow
poems/raven.txt:
45.
MapReduce
It shall claspa sainted maiden whom the angels name Lenore-
Clasp a rare and radiant maiden whom the angels name Lenore."
Quoth the Raven, "Nevermore."
"Be that word our sign in parting, bird or fiend," I shrieked, upstarting-
"Get thee back into the tempest and the Night's Plutonian shore!
Leave no black plume as a token of that lie thy soul hath spoken!
Leave my loneliness unbroken!- quit the bust above my door!
Take thy beak from out my heart, and take thy form from off my door!"
Quoth the Raven, "Nevermore."
And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamp-light o'er him streaming throws his shadow on the floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted- nevermore!
poems/raven.txt:
Spark
• ScaEer/gather
paradigm
(similar
to
MapReduce)
• More
general
data
model
(RDDs)
• More
general
programming
model
(transform/ac6on)
• Storage
agnos6c
What’s
an
RDD?
•Bigger
than
a
computer
• Read
from
an
input
source
• Output
of
a
pure
func6on
• Immutable
• Typed
• Ordered
• Lazily
evaluated
• Par66oned
• Collec6on
of
things