• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Distributed Coordination

Distributed Coordination



A brief overview of several alternatives for Distributed Counting and Sorting.

A brief overview of several alternatives for Distributed Counting and Sorting.



Total Views
Views on SlideShare
Embed Views



2 Embeds 12

http://www.slideshare.net 10
http://www.linkedin.com 2



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Distributed Coordination Distributed Coordination Presentation Transcript

    • Counting, Sorting and Distributed Coordination Luis Galárraga Saarland University
    • Agenda
      • Justification
      • Concurrent objects and basic concepts
      • Distributed counting
        • Software Combining Trees
        • Counting networks
      • Distributed sorting
        • Sorting networks
        • Sample sorting
    • Some justification
      • Couting and Sorting are basic tasks in software and algorithms.
      • At any stage our programs rely intensively on one of this tasks.
        • Take advantage of new system architecture trends like multiprocessor environments
        • Avoid memory contention!!!
    • Memory contention
      • Defined as a scenario in which many threads in different processors try to access the same memory location.
      • Not serious for reads.. but for writes
    • Concurrent Objects
    • Quiescent consistency
      • Quiescent periods for concurrent objects:
        • No pending calls
      • The state of any quiescent object must be equivalent to some sequential order of the completed method calls.
      • A quiescent counter:
        • Neither omissions nor duplicates!!
    • Quiescent consistency
    • Sequential Consistency
      • Calls should appear to take effect in program order.
      • Method calls by different threads are unrelated by program order.
    • Sequential Consistency
    • To discuss...
      • What about these examples?
    • Linearizability
      • If one call precedes another (even across different threads), then the earlier call must have taken effect before the later call.
      • If two calls overlap, then their order is ambiguous and we are free to order them in any convenient way.
      • Linearization points: points where the method seems to take effect.
    • Linearizability
    • Measuring the performance
      • Latency
        • Time it takes an individual call to complete
      • Throughput
        • Average rate at which a set of method calls complete
      • Lock-based approaches benefit latency.
    • Distributed Counting
    • The “classical” approach
      • Counter:
        • Shared object holding an integer value with getandIncrement(int n = 1) method which returns the value and then adds n.
      • Do you want to increment the counter? Use a lock:
        • Acquire the lock
        • Perform the increment
        • Free the lock
    • Locks.. so many locks
      • They come in different flavors:
        • Spin locks (Filter and Bakery)
        • Test and set (with/without Exponential Backoff)
        • Queue locks (MCS, CLH, etc... )
        • Chapters 7 and 8 of [1]
      • Some are better than others but all suffer from memory contention in some degree.
    • Software Combining Trees
      • Several threads decide to call getandIncrement at “more or less” the same time.
      • Some threads become responsible of gathering other increments and combine them.
      • Some hierarchical data structure must be used.
      • Binary trees
    • Software Combining Trees
      • p threads
      • Balanced binary tree with k levels
        • k = min{ j | 2 j >= p}, 2 k-1 leave nodes
      • Each nodes holds 2 values to combine and a state.
      • Each thread is assigned to a leaf. A leaf can be assigned to 2 threads at most.
      • The value of the counter is stored at the root
    • Software Combining Trees
      • To increment the counter, threads have to traverse the tree from leaf node to the root.
        • Might combine their values with some other threads during the way.
      • Node states
        • IDLE: Initial state of the nodes
        • FIRST: One thread has visited this node and becomes the master
        • SECOND: A second thread (slave) is waiting for the master to combine.
    • Software Combining Trees
    • Performance analysis
      • They benefit throughput instead of latency:
        • Calls take O(log p)
      • Reduce in memory contention
      • Linearizable alternative
      • Sensitive to changes in concurrency rate
        • Threads might fail to combine immediately
      • What about n-ary trees?
      • How long to wait for other threads to combine?
    • Counting networks - Balancers
      • A balancer distributes tokens coming asynchronously from its 2 input wires.
      • Balancing networks are constructed connecting balancer's outputs to other balancer's inputs.
    • Balancing network
      • A balancing network has width w with w inputs x 0 , x 1 ,...,x w-1 and outputs y 0 , y 1 ,... y w-1 and in quiescent periods:
      • The depth d is defined as the maximum number of balancers a token can traverse starting from any input wire.
    • The step property
      • If a balancing network follows the step property it is called a Counting Network
      • Threads sheperd tokens through the network.
        • Given the step property it is easy to see, we can use the network to count how many tokens have traversed it.
    • Bitonic Counting Network
      • For k = 1, a single balancer
      • For k > 1, k is a power of 2:
    • Bitonic Counting Network
      • 2k-merger (2k = 8):
    • Periodic Counting network
      • For k = 1, a single balancer
      • For k > 1, a sequence of lg(2k) block networks:
      log(2k) blocks
    • Periodic Counting network
      • 2k-block (2k = 8)
    • Counting networks
      • Bitonic and Sorting networks have depth O(lg 2 (w)), w = 2k = width
    • Counting networks
      • Saturation measures the ratio token/balancers
        • S > 1 oversaturated, S <1 undersaturated
      • 2k-block and 2k-merger are used in Barrier implementations
        • They are threshold networks
    • Counting networks
      • Periodic and Bitonic are not the only:
        • Difracting trees, O(lg w)
        • BM or Busch-Mavronikolas, w inputs, p*w outputs for some p>1
        • BM outperforms Bitonic under most conditions
        • Bitonic are the best in situations of low concurrency.
      • Robust to changes in concurrency rate
    • Distributed Sorting
    • Sorting networks
      • A comparator is to a sorting network which a balancer is to counting network.
        • But they are synchronous !!!
    • Wow... This saves time!!
      • Isomorphism:
        • If a balancing network counts, then its comparison counterpart also does.
        • Proof in [3]
    • Bitonic Sorting Network
      • Same structure of Bitonic Counting network.
      • Collection of:
        • p threads, d layers of w/2 comparators. Layers define rounds
        • Table of size d * w/2 storing which entries (wires) compare in each layer
        • Synchronization via Barriers, see [4]
        • Each thread/processor does s comparisons in every round, p*s is a power of 2
    • Bitonic Sort
    • Sorting networks
      • Swaps do not need synchronization
      • All threads must be always in the same stage
        • Synchronization via a Barrier
        • barrier.await() returns when all threads have called it.
      • Time O(s*log 2 p), p = no. of threads
      • Suitable for small sets
        • In every round a key will be compared by a different thread
        • No cache efficient!!!
    • Sample sorting
      • Designed for large sets which do not fit in main memory
        • Accessing them can be very expensive (if they are in disk.. auch!!)
        • We need more locality of reference, how?
      • p threads, n input keys
    • Sample sorting – 3 magic steps
      • Step 1: Choose p-1 splitter keys to divide the set evenly.
        • But they are not sorted!!
        • Take s samples then sort them using BitonicSort
        • Select keys in positions s, 2s,... (p-1)*s as splitters
      • Now we have divided the big set into subsets of size n/p approx.
    • Sample sorting – 3 magic steps
      • Step 2: Each thread sequentially process n/p moving each item to its bucket (defined by the splitters)
      • Step 3: Each thread sequentially sorts the items in its bucket.
    • Sample sorting
      • Time O(n/p * log(n/p))
        • Assuming a comparison-based sequential algorithm
        • What about integer, fixed keys? Radix sort!!
      • Sample might be avoided with prior knowledge of data probability distribution
    • Other alternatives to Sample Sorting
      • Flash sorting
      • Parallel Merge Sorting
      • Parallel Radix Sorting
        • Load Balanced Parallel Radix Sort [5]
        • Partitioned Parallel Radix Sort [6]
        • An equilibrium between fairness in terms of work distribution and communication effort.
    • Sources [1] M. Herlihy and N. Shavit, “Concurrent objects,” in The Art of Multiprocessor Programming , pp. 45–69, Burlington, USA: Elsevier Inc., 2008. [2] E. N. Klein, C. Busch, and D. R. Musser, “An experimental analysis of counting networks,” Journal of the ACM , pp. 1020–1048, September 1994. [3] J. Aspnes, M. Herlihy, and N. Shavit, “Counting networks,” Journal of the ACM , 1994. [4] M. Herlihy and N. Shavit, “Barriers,” in The Art of Multiprocessor Programming , pp. 397–415, Burlington, USA: Elsevier Inc., 2008. [5] A. Sohn and Y. Kodama, “Load balanced parallel radix sort,” in ICS ’98: Proceedings of the 12th international conference on Supercomputing , (New York, NY, USA), pp. 305–312, ACM, 1998. [6] S.-J. Lee, M. Jeon, D. Kim, and A. Sohn, “Partitioned parallel radix sort,” J. Parallel Distrib. Comput. , vol. 62, no. 4, pp. 656–668, 2002.