Page 1Page 1
Distributed Systems
Page 2Page 2
What can we do now
that we could not do before?
Page 3
Technology advances
Processors
Memory
Networking
Storage
Protocols
Page 4
Networking: Ethernet - 1973, 1976
June 1976: Robert Metcalfe presents the concept of
Ethernet at the National Computer Conference
1980: Ethernet introduced as de facto standard (DEC,
Intel, Xerox)
Page 5
Network architecture
LAN speeds
• Original Ethernet: 2.94 Mbps
• 1985: thick Ethernet: 10 Mbps
1 Mbps with twisted pair networking
• 1991: 10BaseT - twisted pair: 10 Mbps
Switched networking: scalable bandwidth
• 1995: 100 Mbps Ethernet
• 1998: 1 Gbps (Gigabit) Ethernet
• 1999: 802.11b (wireless Ethernet) standardized
• 2001: 10 Gbps introduced
• 2005: 100 Gbps (over optical link)
348->35,000x
faster
348->35,000x
faster
Page 6
Network ConnectivityThen:
• large companies and universities on Internet
• gateways between other networks
• dial-up bulletin boards
• 1985: 1,961 hosts on the Internet
Now:
• One Internet (mostly)
• 2006: 439,286,364 hosts on the Internet
• widespread connectivity
High-speed WAN connectivity: 1– >50 Mbps
• Switched LANs
• wireless networking
439 million
more hosts
439 million
more hosts
Page 7
Computing power
Computers got…
•Smaller
•Cheaper
•Power efficient
•Faster
Microprocessors became technology leaders
Page 8
year $/MB typical
1977 $32,000 16K
1987 $250 640K-2MB
1997 $2 64MB-256MB
2007 $0.06 512MB-2GB+
9,000x cheaper
4,000x more capacity
9,000x cheaper
4,000x more capacity
Page 9
Storage: disk
131,000x cheaper in 20 years
30,000x more capacity
131,000x cheaper in 20 years
30,000x more capacity
Recording density increased over
60,000,000 times over 50 years
1977: 310KB floppy drive –
$1480 1987: 40 MB drive for –
$679
2008: 750 GB drive for – $99
$0.13 / GB
Page 10
Music Collection4,207 Billboard hits
• 18 GB
• Average song size: 4.4 MB
Today
• Download time per song @12.9 Mbps: 3.5 sec
• Storage cost: $5.00
20 years ago (1987)
• Download time per song, V90 modem @44 Kbps:
15 minutes
• Storage cost: $76,560
Page 11
Protocols
Faster CPU →
more time for protocol processing
• ECC, checksums, parsing (e.g. XML)
• Image, audio compression feasible
Faster network →
→ bigger (and bloated) protocols
• e.g., SOAP/XML, H.323
Page 12
Why do we want to network?• Performance ratio
• Scaling multiprocessors may not be possible or cost effective
• Distributing applications may make sense
• ATMs, graphics, remote monitoring
• Interactive communication & entertainment
• work and play together:
email, gaming, telephony, instant messaging
• Remote content
• web browsing, music & video downloads, IPTV, file servers
• Mobility
• Increased reliability
• Incremental growth
Page 13
Problems
Designing distributed software can be difficult
• Operating systems handling distribution
• Programming languages?
• Efficiency?
• Reliability?
• Administration?
Network
• disconnect, loss of data, latency
Security
• want easy and convenient access
Page 14Page 14
Building and classifying
distributed systems
Page 15
Flynn’s Taxonomy (1972)
SISD
• traditional uniprocessor system
SIMD
• array (vector) processor
• Examples:
• APU (attached processor unit in Cell processor)
• SSE3: Intel’s Streaming SIMD Extensions
• PowerPC AltiVec (Velocity Engine)
MISD
• Generally not used and doesn’t make sense
• Sometimes applied to classifying redundant systems
MIMD
• multiple computers, each with:
• program counter, program (instructions), data
• parallel and distributed systems
number of instruction streams
and number of data streams
Page 16
Subclassifying MIMD
memory
• shared memory systems: multiprocessors
• no shared memory: networks of computers, multicomputers
interconnect
• bus
• switch
delay/bandwidth
• tightly coupled systems
• loosely coupled systems
Page 17
BusBus
Bus-based multiprocessors
CPU ACPU A
SMP: Symmetric Multi-Processing
All CPUs connected to one bus (backplane)
Memory and peripherals are accessed via shared bus.
System looks the same from any processor.
CPU BCPU B
memorymemory
Device
I/O
Device
I/O
Page 18
Bus-based multiprocessors
Dealing with bus overload
- add local memory
CPU does I/O to cache memory
- access main memory on cache miss
BusBus
memorymemory
Device
I/O
Device
I/O
CPU ACPU A
cachecache
CPU BCPU B
cachecache
Page 19
Working with a cache
CPU A reads location 12345 from memory
12345:712345:7
Device
I/O
Device
I/O
CPU ACPU A
12345: 712345: 7
CPU BCPU B
Bus
Page 20
Working with a cache
CPU A modifies location 12345
BusBus
12345:712345:7
Device
I/O
Device
I/O
CPU ACPU A
12345: 7
CPU BCPU B
12345: 312345: 3
Page 21
Working with a cache
CPU B reads location 12345 from memory
12345:712345:7
Device
I/O
Device
I/O
CPU ACPU A
12345: 312345: 3
CPU BCPU B
12345: 712345: 7
Gets old value
Memory not coherent!
Bus
Page 22
Write-through cache
Fix coherency problem by writing all values
through bus to main memory
12345:7
Device
I/O
Device
I/O
CPU ACPU A
12345: 7
CPU BCPU B
CPU A modifies location 12345 – write-through
main memory is now coherent
12345: 312345: 3
12345:312345:3
Bus
Page 23
Write-through cache … continued
CPU B reads location 12345 from memory
- loads into cache
12345:312345:3
Device
I/O
Device
I/O
CPU ACPU A
12345: 312345: 3
CPU BCPU B
12345: 312345: 3
Bus
Page 24
Write-through cache
CPU A modifies location 12345
- write-through
12345:3
Device
I/O
Device
I/O
CPU ACPU A
12345: 3
CPU BCPU B
12345: 312345: 3
Cache on CPU B not updated
Memory not coherent!
12345:012345:0
12345: 012345: 0
Bus
Page 25
Snoopy cache
Add logic to each cache controller:
monitor the bus
12345: 3
Device
I/O
Device
I/O
CPU ACPU A
12345: 3
CPU BCPU B
12345: 3
write [12345]← 0
12345: 3
Virtually all bus-based architectures
use a snoopy cache
Bus
12345: 012345: 0
12345: 012345: 0
12345: 012345: 0
Page 26
Switched multiprocessors
•Bus-based architecture does not scale to a
large number of CPUs (8+)
Page 27
Switched multiprocessorsDivide memory into groups and connect chunks
of memory to the processors with a crossbar
switch
n2
crosspoint switches – expensive switching fabric
CPUCPU
CPUCPU
CPUCPU
CPUCPU
memmem memmem memmem memmem
Page 28
Crossbar alternative: omega network
Reduce crosspoint switches by adding
more switching stages
CPUCPU
CPUCPU
CPUCPU
CPUCPU
memmem
memmem
memmem
memmem
Page 29
Crossbar alternative: omega network
with n CPUs and n memory modules:
need log2n switching stages,
each with n/2 switches
Total: (nlog2n)/2 switches.
• Better than n2
but still a quite expensive
• delay increases:
1024 CPU and memory chunks
overhead of 10 switching stages to memory and 10 back.
CPUCPU
CPUCPU
CPUCPU
CPUCPU
memmem
memmem
memmem
memmem
Page 30
NUMA
• Hierarchical Memory System
• Each CPU has local memory
• Other CPU’s memory is in its own address space
• slower access
• Better average access time than omega network if most accesses are
local
• Placement of code and data becomes difficult
Page 31
NUMA
• SGI Origin’s ccNUMA
• AMD64 Opteron
• Each CPU gets a bank of DDR memory
• Inter-processor communications are sent over a HyperTransport link
• Linux 2.5 kernel
• Multiple run queues
• Structures for determining layout of memory and processors
Page 32
Bus-based multicomputers• No shared memory
• Communication mechanism needed on bus
• Traffic much lower than memory access
• Need not use physical system bus
• Can use LAN (local area network) instead
Page 33
Bus-based multicomputers
Collection of workstations on a LAN
InterconnectInterconnect
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
Page 34
Switched multicomputers
Collection of workstations on a LAN
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
CPUCPU
memorymemory
LAN
connector
LAN
connector
n-port switchn-port switch
Page 35
Software
Single System Image
Collection of independent computers that appears as a single system to the
user(s)
• Independent: autonomous
• Single system: user not aware of distribution
Distributed systems software
Responsible for maintaining single system image
Page 36
You know you have a distributed
system when the crash of a
computer you’ve never heard of
stops you from getting any work
done.
– Leslie Lamport
Page 37
Coupling
Tightly versus loosely coupled software
Tightly versus loosely coupled hardware
Page 38
Design issues: Transparency
High level: hide distribution from users
Low level: hide distribution from software
• Location transparency:
users don’t care where resources are
• Migration transparency:
resources move at will
• Replication transparency:
users cannot tell whether there are copies of resources
• Concurrency transparency:
users share resources transparently
• Parallelism transparency:
operations take place in parallel without user’s knowledge
Page 39
Design issuesReliability
• Availability: fraction of time system is usable
• Achieve with redundancy
• Reliability: data must not get lost
• Includes security
Performance
• Communication network may be slow and/or unreliable
Scalability
• Distributable vs. centralized algorithms
• Can we take advantage of having lots of computers?
Page 40Page 40
Service Models
Page 41
Centralized model
• No networking
• Traditional time-sharing
system
• Direct connection of user terminals to system
• One or several CPUs
• Not easily scalable
• Limiting factor: number of CPUs in system
• Contention for same resources
Page 42
Client-server model
Environment consists of clients and servers
Service: task machine can perform
Server: machine that performs the task
Client: machine that is requesting the service
Directory server Print server File server
clientclient clientclient
Workstation model
assume client is used by one user at a time
Page 43
Peer to peer model
• Each machine on network has (mostly) equivalent capabilities
• No machines are dedicated to serving others
• E.g., collection of PCs:
• Access other people’s files
• Send/receive email (without server)
• Gnutella-style content sharing
• SETI@home computation
Page 44
Processor pool model
What about idle workstations
(computing resources)?
• Let them sit idle
• Run jobs on them
Alternatively…
• Collection of CPUs that can be assigned processes on demand
• Users won’t need heavy duty workstations
• GUI on local machine
• Computation model of Plan 9
Page 45
Grid computing
Provide users with seamless access to:
• Storage capacity
• Processing
• Network bandwidth
Heterogeneous and geographically distributed systems
Page 46Page 46
Multi-tier client-server
architectures
Page 47
Two-tier architecture
Common from mid 1980’s-early 1990’s
• UI on user’s desktop
• Application services on server
Page 48
Three-tier architecture
client
middle
tier
middle
tier
back-endback-end
- queueing/scheduling of
user requests
-
transaction processor
(TP)
- Connection mgmt
- Format converision
- queueing/scheduling of
user requests
-
transaction processor
(TP)
- Connection mgmt
- Format converision
- Database
- Legacy
application
processing
- Database
- Legacy
application
processing
- User
interface
- some data
validation/
formatting
- User
interface
- some data
validation/
formatting
Page 49
Beyond three tiersMost architectures are multi-tiered
client
web
server
Java
application
server
load
balancer
firewall
firewall database
Object
Store
Page 50Page 50
The end.

Distributed systems

  • 1.
  • 2.
    Page 2Page 2 Whatcan we do now that we could not do before?
  • 3.
  • 4.
    Page 4 Networking: Ethernet- 1973, 1976 June 1976: Robert Metcalfe presents the concept of Ethernet at the National Computer Conference 1980: Ethernet introduced as de facto standard (DEC, Intel, Xerox)
  • 5.
    Page 5 Network architecture LANspeeds • Original Ethernet: 2.94 Mbps • 1985: thick Ethernet: 10 Mbps 1 Mbps with twisted pair networking • 1991: 10BaseT - twisted pair: 10 Mbps Switched networking: scalable bandwidth • 1995: 100 Mbps Ethernet • 1998: 1 Gbps (Gigabit) Ethernet • 1999: 802.11b (wireless Ethernet) standardized • 2001: 10 Gbps introduced • 2005: 100 Gbps (over optical link) 348->35,000x faster 348->35,000x faster
  • 6.
    Page 6 Network ConnectivityThen: •large companies and universities on Internet • gateways between other networks • dial-up bulletin boards • 1985: 1,961 hosts on the Internet Now: • One Internet (mostly) • 2006: 439,286,364 hosts on the Internet • widespread connectivity High-speed WAN connectivity: 1– >50 Mbps • Switched LANs • wireless networking 439 million more hosts 439 million more hosts
  • 7.
    Page 7 Computing power Computersgot… •Smaller •Cheaper •Power efficient •Faster Microprocessors became technology leaders
  • 8.
    Page 8 year $/MBtypical 1977 $32,000 16K 1987 $250 640K-2MB 1997 $2 64MB-256MB 2007 $0.06 512MB-2GB+ 9,000x cheaper 4,000x more capacity 9,000x cheaper 4,000x more capacity
  • 9.
    Page 9 Storage: disk 131,000xcheaper in 20 years 30,000x more capacity 131,000x cheaper in 20 years 30,000x more capacity Recording density increased over 60,000,000 times over 50 years 1977: 310KB floppy drive – $1480 1987: 40 MB drive for – $679 2008: 750 GB drive for – $99 $0.13 / GB
  • 10.
    Page 10 Music Collection4,207Billboard hits • 18 GB • Average song size: 4.4 MB Today • Download time per song @12.9 Mbps: 3.5 sec • Storage cost: $5.00 20 years ago (1987) • Download time per song, V90 modem @44 Kbps: 15 minutes • Storage cost: $76,560
  • 11.
    Page 11 Protocols Faster CPU→ more time for protocol processing • ECC, checksums, parsing (e.g. XML) • Image, audio compression feasible Faster network → → bigger (and bloated) protocols • e.g., SOAP/XML, H.323
  • 12.
    Page 12 Why dowe want to network?• Performance ratio • Scaling multiprocessors may not be possible or cost effective • Distributing applications may make sense • ATMs, graphics, remote monitoring • Interactive communication & entertainment • work and play together: email, gaming, telephony, instant messaging • Remote content • web browsing, music & video downloads, IPTV, file servers • Mobility • Increased reliability • Incremental growth
  • 13.
    Page 13 Problems Designing distributedsoftware can be difficult • Operating systems handling distribution • Programming languages? • Efficiency? • Reliability? • Administration? Network • disconnect, loss of data, latency Security • want easy and convenient access
  • 14.
    Page 14Page 14 Buildingand classifying distributed systems
  • 15.
    Page 15 Flynn’s Taxonomy(1972) SISD • traditional uniprocessor system SIMD • array (vector) processor • Examples: • APU (attached processor unit in Cell processor) • SSE3: Intel’s Streaming SIMD Extensions • PowerPC AltiVec (Velocity Engine) MISD • Generally not used and doesn’t make sense • Sometimes applied to classifying redundant systems MIMD • multiple computers, each with: • program counter, program (instructions), data • parallel and distributed systems number of instruction streams and number of data streams
  • 16.
    Page 16 Subclassifying MIMD memory •shared memory systems: multiprocessors • no shared memory: networks of computers, multicomputers interconnect • bus • switch delay/bandwidth • tightly coupled systems • loosely coupled systems
  • 17.
    Page 17 BusBus Bus-based multiprocessors CPUACPU A SMP: Symmetric Multi-Processing All CPUs connected to one bus (backplane) Memory and peripherals are accessed via shared bus. System looks the same from any processor. CPU BCPU B memorymemory Device I/O Device I/O
  • 18.
    Page 18 Bus-based multiprocessors Dealingwith bus overload - add local memory CPU does I/O to cache memory - access main memory on cache miss BusBus memorymemory Device I/O Device I/O CPU ACPU A cachecache CPU BCPU B cachecache
  • 19.
    Page 19 Working witha cache CPU A reads location 12345 from memory 12345:712345:7 Device I/O Device I/O CPU ACPU A 12345: 712345: 7 CPU BCPU B Bus
  • 20.
    Page 20 Working witha cache CPU A modifies location 12345 BusBus 12345:712345:7 Device I/O Device I/O CPU ACPU A 12345: 7 CPU BCPU B 12345: 312345: 3
  • 21.
    Page 21 Working witha cache CPU B reads location 12345 from memory 12345:712345:7 Device I/O Device I/O CPU ACPU A 12345: 312345: 3 CPU BCPU B 12345: 712345: 7 Gets old value Memory not coherent! Bus
  • 22.
    Page 22 Write-through cache Fixcoherency problem by writing all values through bus to main memory 12345:7 Device I/O Device I/O CPU ACPU A 12345: 7 CPU BCPU B CPU A modifies location 12345 – write-through main memory is now coherent 12345: 312345: 3 12345:312345:3 Bus
  • 23.
    Page 23 Write-through cache… continued CPU B reads location 12345 from memory - loads into cache 12345:312345:3 Device I/O Device I/O CPU ACPU A 12345: 312345: 3 CPU BCPU B 12345: 312345: 3 Bus
  • 24.
    Page 24 Write-through cache CPUA modifies location 12345 - write-through 12345:3 Device I/O Device I/O CPU ACPU A 12345: 3 CPU BCPU B 12345: 312345: 3 Cache on CPU B not updated Memory not coherent! 12345:012345:0 12345: 012345: 0 Bus
  • 25.
    Page 25 Snoopy cache Addlogic to each cache controller: monitor the bus 12345: 3 Device I/O Device I/O CPU ACPU A 12345: 3 CPU BCPU B 12345: 3 write [12345]← 0 12345: 3 Virtually all bus-based architectures use a snoopy cache Bus 12345: 012345: 0 12345: 012345: 0 12345: 012345: 0
  • 26.
    Page 26 Switched multiprocessors •Bus-basedarchitecture does not scale to a large number of CPUs (8+)
  • 27.
    Page 27 Switched multiprocessorsDividememory into groups and connect chunks of memory to the processors with a crossbar switch n2 crosspoint switches – expensive switching fabric CPUCPU CPUCPU CPUCPU CPUCPU memmem memmem memmem memmem
  • 28.
    Page 28 Crossbar alternative:omega network Reduce crosspoint switches by adding more switching stages CPUCPU CPUCPU CPUCPU CPUCPU memmem memmem memmem memmem
  • 29.
    Page 29 Crossbar alternative:omega network with n CPUs and n memory modules: need log2n switching stages, each with n/2 switches Total: (nlog2n)/2 switches. • Better than n2 but still a quite expensive • delay increases: 1024 CPU and memory chunks overhead of 10 switching stages to memory and 10 back. CPUCPU CPUCPU CPUCPU CPUCPU memmem memmem memmem memmem
  • 30.
    Page 30 NUMA • HierarchicalMemory System • Each CPU has local memory • Other CPU’s memory is in its own address space • slower access • Better average access time than omega network if most accesses are local • Placement of code and data becomes difficult
  • 31.
    Page 31 NUMA • SGIOrigin’s ccNUMA • AMD64 Opteron • Each CPU gets a bank of DDR memory • Inter-processor communications are sent over a HyperTransport link • Linux 2.5 kernel • Multiple run queues • Structures for determining layout of memory and processors
  • 32.
    Page 32 Bus-based multicomputers•No shared memory • Communication mechanism needed on bus • Traffic much lower than memory access • Need not use physical system bus • Can use LAN (local area network) instead
  • 33.
    Page 33 Bus-based multicomputers Collectionof workstations on a LAN InterconnectInterconnect CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector
  • 34.
    Page 34 Switched multicomputers Collectionof workstations on a LAN CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector CPUCPU memorymemory LAN connector LAN connector n-port switchn-port switch
  • 35.
    Page 35 Software Single SystemImage Collection of independent computers that appears as a single system to the user(s) • Independent: autonomous • Single system: user not aware of distribution Distributed systems software Responsible for maintaining single system image
  • 36.
    Page 36 You knowyou have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done. – Leslie Lamport
  • 37.
    Page 37 Coupling Tightly versusloosely coupled software Tightly versus loosely coupled hardware
  • 38.
    Page 38 Design issues:Transparency High level: hide distribution from users Low level: hide distribution from software • Location transparency: users don’t care where resources are • Migration transparency: resources move at will • Replication transparency: users cannot tell whether there are copies of resources • Concurrency transparency: users share resources transparently • Parallelism transparency: operations take place in parallel without user’s knowledge
  • 39.
    Page 39 Design issuesReliability •Availability: fraction of time system is usable • Achieve with redundancy • Reliability: data must not get lost • Includes security Performance • Communication network may be slow and/or unreliable Scalability • Distributable vs. centralized algorithms • Can we take advantage of having lots of computers?
  • 40.
  • 41.
    Page 41 Centralized model •No networking • Traditional time-sharing system • Direct connection of user terminals to system • One or several CPUs • Not easily scalable • Limiting factor: number of CPUs in system • Contention for same resources
  • 42.
    Page 42 Client-server model Environmentconsists of clients and servers Service: task machine can perform Server: machine that performs the task Client: machine that is requesting the service Directory server Print server File server clientclient clientclient Workstation model assume client is used by one user at a time
  • 43.
    Page 43 Peer topeer model • Each machine on network has (mostly) equivalent capabilities • No machines are dedicated to serving others • E.g., collection of PCs: • Access other people’s files • Send/receive email (without server) • Gnutella-style content sharing • SETI@home computation
  • 44.
    Page 44 Processor poolmodel What about idle workstations (computing resources)? • Let them sit idle • Run jobs on them Alternatively… • Collection of CPUs that can be assigned processes on demand • Users won’t need heavy duty workstations • GUI on local machine • Computation model of Plan 9
  • 45.
    Page 45 Grid computing Provideusers with seamless access to: • Storage capacity • Processing • Network bandwidth Heterogeneous and geographically distributed systems
  • 46.
    Page 46Page 46 Multi-tierclient-server architectures
  • 47.
    Page 47 Two-tier architecture Commonfrom mid 1980’s-early 1990’s • UI on user’s desktop • Application services on server
  • 48.
    Page 48 Three-tier architecture client middle tier middle tier back-endback-end -queueing/scheduling of user requests - transaction processor (TP) - Connection mgmt - Format converision - queueing/scheduling of user requests - transaction processor (TP) - Connection mgmt - Format converision - Database - Legacy application processing - Database - Legacy application processing - User interface - some data validation/ formatting - User interface - some data validation/ formatting
  • 49.
    Page 49 Beyond threetiersMost architectures are multi-tiered client web server Java application server load balancer firewall firewall database Object Store
  • 50.

Editor's Notes

  • #6 2.94 Mbps is what you get by using the Alto’s 170 nanosecond clock, ticking twice per bit.
  • #10 Memory prices plunged from (price per MB) 1977: $32K, 1982: $8K, 1987: $250, 1998: $2. Mass storage prices plummeted (price per MB) 1977: microcomputers don’t have disks! MITS altair floppy disk (88-DCDD) $1480/kit 250K bps transfer, 310000 bytes capacity, 10ms track to track avg time to read/write: 400ms 1987: 40MB miniscribe (28ms access time) $679 1997: IBM 14.4GB (9.5ms access): $349 2000: IBM 75 GB