A Note on Distributed Computing is a seminal paper that argues that the distributed and the local models of computing are fundamentally different, and efforts to unify them are flawed and outlines the reasons behind this assertion. This was presented at the second Papers We Love meetup at Hyderabad, IN.
Unlocking the Future of AI Agents with Large Language Models
A Note on Distributed Computing - Papers We Love Hyderabad
1. A Note on
Distributed Computing
Jim Waldo, Geoff Wyant, Ann Wolrath, Sam Kendall
Sun Microsystems Laboratories, 1994
Papers We Love, Hyderabad
11 Jan 2015
2. • There are fundamental differences between local
computing and distributed computing
• These differences cannot be ignored when designing
systems
• System designs that ignore these differences are "doomed
to failure"
Introduction
3. • RPC (Remote Procedure Calls)
• RFC 707
• Sun RPC
• Argus, Emerald - 80s
• CORBA, DCOM - 90s
• Java RMI - 90s
• EJBs, XML-RPC, SOAP, REST - 90s to now
A bit of history
4. Difference in location is an implementation detail
Steps in application development
• Writing using interfaces that are location agnostic
• Finalize which objects will be local and which will be
remote
• Test with actual real world scenarios
Unified View of Objects
5. Principles of this unified design
• There exists a single, natural design for the application regardless of
whether parts of it will be deployed locally or remotely
• Failures and performance issues are implementation dependent and
should not be considered in the initial design
• The interface of an object is independent of the context in which that
object is used
Unified View of Objects
6. The unified view is seductive because it promises these
• Changing implementations without any change in the
interface or the calling entities
• Changing locations of implementations without impacting
clients
7. • Latency
• Memory Access
• Partial Failure
• Concurrency
The Differences, or why the twain shan't meet
8. • Network speed versus processor speed improvements
• Counterpoints
• Hardware improvements
• Communication analysis tools to better locate objects
Latency
9. • A local pointer is not valid in a remote address space
• Centralize memory access?
Memory Access
10. • Local
• No partial failures
• Detection possible
• Distributed
• Failure of components is independent, partial
• No central entity that can be reliably queried to find out
the state of failed components
Dealing with indeterminism is required to deal with partial
failure.
Partial Failure
11. A central problem in distributed computing is to ensure that
the state of the system is consistent after partial failures.
To have a unified model
• Treat everything as local. Indeterministic and fragile.
• Treat everything as remote. Introduces unnecessary
complexity and defeats the unified model's purpose.
Partial Failure
12. • Suffers from the same problems if attempts are made to
unify interfaces
Concurrency
13. • Robustness is not a function of the implementation
• You cannot push robustness, performance problems under
"Quality of Service".
• These are interface issues
• Naming service locking example
It's not about QOS
14. • "Local" filesystem APIs used for a distributed system
• Soft mounting
• Exposes network / server failures to client program
• Can lead to filesystem corruption
• Hard mounting
• Application hangs until the server comes back up
• Chain of "frozen workstations"
Lessons from NFS
15. • Specify whether local or remote as part of interface
definition
• For dual-needs, split the interface into local and remote
• Compilers will generate code based on intended usage
• Access objects differently while writing code
• A change in thinking
Designing with these differences
16. Objects in different address spaces but on the same machine
• Share attributes of both local and remote
• Simpler code generation (shared memory etc)
A middle ground
17. “I wish….that the temptation of developer convenience hadn’t
led us to view distributed computing’s necessary complexity
as too hard, leaving us to try to replace it with accidental
complexity that doesn’t really work.”
- Steve Vinoski in his article Convenience over
Correctness
18. • The paper that was discussed -
https://www.cc.gatech.edu/classes/AY2010/cs4210_fall/papers/smli_tr-94-29.pdf
• Convenience over Correctness - http://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf
• Distributed Programming in Argus - http://bit.ly/1BlCb6G
• A Distributed Object Model for the Java System - http://pdos.csail.mit.edu/6.824/papers/waldo-rmi.pdf
• RPC and its Offspring: Convenient, Yet Fundamentally Flawed -
http://www.infoq.com/presentations/vinoski-rpc-convenient-but-flawed
• A Critique of the Remote Procedure Call Paradigm - http://bit.ly/14KyZrn
References and Further Reading