3. History
“A distributed system is one in which the
failure of a computer you didn’t even know
existed can render your own computer
unusable.”
Leslie Lamport
28.03.1987.
7. Problems of distributed systems
Hardware failure
Network failure and splits
Network latency
Availability zone and region outages
Data corruption
Inconsistency
Security
13. General purpose data processing engine
Data abstraction as RDD (Resilient Distributed Dataset)
Batch, iterative and streaming analysis
Faster than MapReduce: 10x on disk / 100x in memory
Runs on YARN, Mesos or standalone
Spark
Word count
val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
14. Mesos
Server kernel with resource management and scheduling APIs
Hardware abstraction
Fault tolerant with master election
Scalable
Native container isolation
Docker support
15. Akka
Toolbox and runtime for CDR systems
Actor model
Mathematical model of concurrent computation
Message based local decisions
High level of abstraction
Fault tolerant system
“Let it crash” model
Supervisor hierarchies
Thread
Actor
Behavior
State
mailbox
16. Cassandra
Amazon Dynamo & Google Bigtable
Open sourced by Facebook in 2008.
Share nothing architecture
Fast, scalable, distributed partitioned row store
Multi data center deployments
Masterless replication