Distributed computation

A Note on Distributed Computing
Jim Waldo, Geoff Wyant, Ann Wolrath, Sam Kendall
Lucas Shen Aug/5/2014

✤ Why this subject? For who?
✤ Terminology
✤ Unified vision
✤ What’s the problem?
✤ Example: NFS @Sun
✤ Conclusion

Why this subject? For who?
CPU cluster
cluster App
cloud
google
Azure
Amazon
simple instance
GPU cluster Hadoop
Spark
IaaS
SaaS
Designer
Programmer
Dropbox

Terminology
<Local computing>
programs are confined to a single
address space
<Distributed computing>
programs make calls to other address
space, even another machine.

Unified Vision
from the programmer’s point of view, there is
no essential distinction between objects that
share an address space and objects that are
on two machines with different architectures
located on different continents.

How?
1. write the application without worrying about where
objects are located and how their communication is
implemented.
2. tune performance by “concretizing” object locations
and communication methods.
3. test with “real bullets” (e.g., networks being
partitioned, machines going down)

Advantage to do so..
✤ The granularity of change could be done from the
level of the entire system to the level of the individual
object.
✤ As long as the interfaces between objects remain
constant, the implementations of those objects can be
altered at will.
✤ An object can be repaired and the repair installed
without worry that the change will impact the other
objects that make up the system.

Based on…….. what belief?
1. there is a single natural object-oriented design for a given
application, regardless of the context in which that
application will be deployed
2. failure and performance issues are tied to the
implementation of the components of an application, and
consideration of these issues should be left out of an initial
design
3. the interface of an object is independent of the context in
which that object is used.

01
What’s wrong?
✤ Local and distributed computing
are very different. You should
take it into account at the very
beginning.
✤ You? who?

Stop avoiding problems
Designer vs
Programmer
The danger lies in promoting the myth that
“remote access and local access are exactly the
same” and not enforcing the myth.

Differences
✤ Latency
✤ Memory Access
✤ Partial failure

Latency
✤ local object invocation vs remote: 4~5 order of magnitude
✤ should decide what object should be local and what could be remote?
✤ two solution:
1. Just ignore this issue, hardware advancement will make the
difference irrelevant
2. need tools that will allow one to see the pattern of
communication between objects that make up an application.
then tune the system

Memory access
✤ pointers: ptr in local address space is not valid in in
another address space
✤ two choice:
v
s
1. all memory access must be controlled by an
underlying system, like distributed shared memory
2. programmer be aware of the different type mem
access
Designer Programmer

Partial failure
✤ Components fail are common, not exceptions
✤ no common agent that is able to determine what
component has failed and informs others of that failure
✤ Since no so called global-state in distributed system,
how to take and fast recover from failures?

Two paths
1. design interfaces of objects as if they were all local
GFS, master node <—> fully distributed
✤ fragile & not robust in any sense =.=
why so hard?
2. design interfaces as if they were all remote
:Distributed system has no single point of resource
allocation, synchronization, or failure recovery, and
thus ✤ worst is conceptually case scenario
very different.
✤ introduces unnecessary guarantees for object that
are never intended to be used remotely..

Lesson learned : NFS@Sun
✤ NFS: Sun’s distributed file system
✤ Designers were unwilling to change the interface to the
file system to reflect the distributed nature of file
access.
✤ example of non-distributed API(open,read, write, close)
reimplemented in a distributed way

Soft mount: NFS@Sun
✤ expose network or server failure to the client
program. Read and write operations return a failure
status much more often than in the single-system
case.
✤ programs written with no allowance for these failures
can easily corrupt the files used by the program.

Hard mount: NFS@Sun
✤ means: the application hangs until the server comes
back up
✤ one server crashes, and many workstations—even
those apparently having nothing to do with that
server—freeze

why?
✤ The limitations on the reliability and robustness of NFS
is not because the implementation of the parts of that
system.
✤ In the NFS, an interface was designed for non-distributed
computing where partial failure was not
possible.
✤ the limitations on the robustness have set a limitation
on the scalability of NFS.

conclusion
(knowing the difference is the start of advancement) @1994
✤ They are different, and you should take the differences seriously.
✤ to be conscious of those differences at all stages of the design and
implementation of distributed applications.
✤ Organization: allocate its research and engineering resources more wisely. Rather
than using those resources in attempts to paper over the differences between the
two kinds of computing, resources can be directed at improving the performance
and reliability of each.
✤ Engineers: have to know whether they are sending messages to local or remote
objects, and access those objects differently.
✤ As an user of nowadays cloud services, they work pretty good. But if we want to
build a private cloud or cluster in the garage, we need to take care of those details.

Distributed computation

Recommended

Recommended

More Related Content

Similar to Distributed computation

Similar to Distributed computation (20)

Recently uploaded

Recently uploaded (20)

Distributed computation