Parallel disk head emulation

Parallel head disk emulation
Andy Twigg
Computer Lab

Outline
● Outline
● Parallel disk models
● Emulations
● Open problems
● Bibliography
– [Sanders et al, soda00, spaa00, soda02] and
related work on balanced allocations [Czumaj,
Berenbrink,]

Parallel disk models
● Ideally: want a large disk that can access D
arbitrary blocks in one I/O (parallel disk head
model)

model)
● Reality: have kD disks that can each access 1
block per I/O
– But can access them in parallel (parallel disk model)

model)
● Reality: have kD disks that can each access 1
block per I/O
– But can access them in parallel (parallel disk model)
● Can we emulate the parallel head model on pdm?
– Quality of emulation: throughput and delay of
requests, space overhead, ...

Assumptions
● One global buffer of size m
– Shared among all disks
● Can access exactly one block
per disk per I/O (no rotational
latency, seeking, ...)
● Redundancy: each block will
be stored on two disks
– More generally, r outof (r+1)

Emulation: queued writing
● Assume pairwiseindependent hash functions
f,g:[n]>[n]. Consider D queues Q1
...QD
● Each block i will be stored at f(i),g(i)
● Write((1)D blocks): append blocks to queues,
keep writing from queues until ∑i
|Qi
| < O(D/)

Emulation: queued writing
● Assume pairwiseindependent hash functions
f,g:[n]>[n]. Consider D queues Q1
...QD
● Each block i will be stored at f(i),g(i)
● Write((1)D blocks): append blocks to queues,
keep writing from queues until ∑i
|Qi
| < O(D/)
● Theorem [Sanders]:
– E[time to write (1)D blocks] < 1+exp(D)

Aside: allocation processes
● Eg [Azar, Broder, Karlin, Upfal STOC94],
[Mitzenmacher 96], [Czumaj, Berenbrink, ..]
● m bins, n balls; ball i can go to 2 bins f(i),g(i)
chosen independently and uar
● Balls arrive online, thrown into leastloaded of
f(i),g(i)
● Interested in maxj
load(j)

Allocation graphs and schedules
● Allocation graph GA
: nodes are disks(bins), edges
are blocks(balls). Undirected edge e={i,j} means
that block e stored on disks i,j.

● Schedule: given a set of requested edges S, GS
is
an orientation of GA
[S].

● Schedule: given a set of requested edges S, GS
is
an orientation of GA
[S].
● Load(disk j) = indegree(j) in GS
● #I/O steps = load(schedule) = maxj
indegree(j)
→ maintain online an orientation of low indegree
– If blocks stored at several disks, GA
is a hypergraph

Warm up
● Fact: Every connected component of
G ~ G(D, (1/2)D) is either a tree or a
tree with one cycle whp
● → Max load of a schedule with D/2
requests?

Warm up
● Orienting a tree:pick root, orient edges away from r

Warm up
● Orienting tree + cocycle of edge {u,v}: orient (v,u),
choose u as root and orient the remaining tree

Warm up
● Orienting tree + cocycle of edge {u,v}: orient (v,u),
choose u as root and orient the remaining tree
● Strategy: Divide requests into subsequences of
length D/2 and schedule each as above
– Max load 1 for each D/2 requests load 2N/D for N →
requests

Max load 1.2*N/D
● Lemma[Pittel,Spencer,Wormald]: G ~ G(D,1.67D)
has no 3core whp

Max load 1.2*N/D
has no 3core whp
● Strategy: Repeatedly pick the node with largest
remaining degree, orient edges toward it and
remove it
● max load 2 for each 1.67D requests

Max load 1.2*N/D
has no 3core whp
● Strategy: Repeatedly pick the node with largest
remaining degree, orient edges toward it and
remove it
● max load 2 for each 1.67D requests
● BUT: all these must buffer requests before
scheduling them

Asynchronous reading:
Shortestqueue first
● Write(block i): buffer i, write i to both f(i),g(i) when
each becomes free
● Read(block i): buffer the request at the leastloaded
of f(i),g(i)
– each disk serves its queue in FIFO order

Asynchronous reading:
Shortestqueue first
● Write(block i): buffer i, write i to both f(i),g(i) when
each becomes free
● Read(block i): buffer the request at the leastloaded
of f(i),g(i)
– each disk serves its queue in FIFO order
● Requests are scheduled online
● Conjecture[Sanders]: Delay O(log 1/) is achievable
for average arrival rate (1)D
– If 2 copies of each block allowed (Theta(1/) for 1 copy)

Max load O(log log n)
● Easier proof for lightly loaded case (n<d)
● Let G ~ G(n,n/8) and consider the following
while there exists a node of degree ≤ 13
for each such node
orient its edges towards it & remove
● Thm: max load = O(log log n)
– Claim 1: balls added at step i have height ≤ 13i
– Claim 2: largest connected component in G has size
O(log n)
– Claim 3: procedure terminates in O(log log n) steps

Neat: majority method
● Use 3 (3way ind) hash functions f,g,h
● Writing: Write block i to the leastloaded two of
f(i),g(i),h(i) along with a timestamp
● Reading: Read i from the leastloaded two of
f(i),g(i),h(i) and return the latest version
● Max load O(log log n / log n) for writing and
reading
● + writes and reads can be scheduled together

Virtual disk model
● Want: A set of virtual disks V_1...V_m, each with
specified bandwidth b(V_i) and capacity c(V_i)
● Have: a collection of physical disks D_1...D_n,
each with bandwidth 1 and capacity c

Virtual disk model
● Want: A set of virtual disks V_1...V_m, each with
specified bandwidth b(V_i) and capacity c(V_i)
● Have: a collection of physical disks D_1...D_n,
each with bandwidth 1 and capacity c
● Efficient emulation of virtual disk model?
– Admission control + (1)bandwidth emulation for
pdhm would imply ∑i
b(Vi
) < (1)n and ∑i
c(Vi
) < cn/2
are sufficient conditions for vdm emulation
● Adding/removing virtual disks, changing
capacities, ...

Extensions
Open:
● Prove good delay bounds for asynchronous
reading
● Deterministic guarantees (expanders?)
● Emulation of virtual disk model
● Handling rotational latencies, seek times

Parallel disk head emulation

More Related Content

Recently uploaded

Featured

Parallel disk head emulation