• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
chiba-research 2010-01-22 at rakuten meeting
 

chiba-research 2010-01-22 at rakuten meeting

on

  • 793 views

 

Statistics

Views

Total Views
793
Views on SlideShare
793
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    chiba-research 2010-01-22 at rakuten meeting chiba-research 2010-01-22 at rakuten meeting Presentation Transcript

    •                 Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •       pay-as-you-go   Web HPC Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   HPC     Amazon Web Services     Science Cloud my resource others VM VM VM VM VM Monitor/resource manager cloud users provided virtual compute resources over HTTP, SSH black box cloud physical infrastructure = Data Center Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   Compute-intensive Applications [Walker ’08]       HPC   Data-intensive Applications   [Deelman et al. ‘’08] [Palankar et al. ‘’08]     I/O * Clouds Dynamic Load-Balanced Multicast for Data-Intensive Applications onhttp://montage.ipac.caltech.edu/
    •                       cloud storage cloud users only once cloud compute resources high network transfer charge Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                       Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                   Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • B C T= + M/B + …. • Optimal Tree clusters and grids •  •  WAN •  A Bandwidth(B-C) = 800Mbps Latency(B-C) = 2ms •  D               & Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   structured (ALM)   [Castro et al. ’02] application-level multicast   [Castro et al. ’03] structured overlay   [Cohen et al. ’03]                   Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • Algorithm features cluster P2P multicast topology spanning tree overlay communication type push pull network performance high row network proximity dense sparse node-to-node performance homo. hetero. topology stable unstable adaptability for dynamic change bad good Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • total I/O throughput I/O bandwidth bottleneck bucket nodes   flat tree algorithm       Flat Tree Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   experiment 1:       experiment 2:             Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • throughput 70 10MB 60 100MB frequency (%) 50 1GB 40 iterations = 1000 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 download throughput from S3 (MB/sec)       Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • comp. time & throughput slowest average fastest 300 40 8 nodes 8nodes (N = 80) Completion Time (sec) 250 35 16nodes (N = 160) 30 frequency (%) 200 32nodes (N = 320) 25 150 20 file size: 1GB 15 100 10 50 5 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 runs download throughput from S3 (MB/sec)   ( )       1 ( )     Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • total throughput 70 200 Total Throughput (MB/sec) total throughput (MB/sec) node 1 180 187 60 node 2 160 50 140 node 3 40 120 node 4 100 30 node 5 106 80 20 node 6 60 10 node 7 40 52 node 8 20 0 0 1 2 3 4 5 6 7 8 9 10 8nodes 16nodes 32nodes runs         S3       Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   S3         ,         Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                                   I/O, Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                 Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                   Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • cluster P2P clouds multicast topology spanning tree overlay tree + overlay communication type push pull pull network performance high row middle network proximity dense sparse dense node-to-node performance homo. hetero. hetero. topology stable unstable (un)stable adaptability for dynamic bad good good change cluster multicast proposed multicast algorithm on clouds P2P multicast Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •           Scatter (phase 1) Allgather (phase 2) [van de Geijn algorithm ’93]         bucket Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • phase 1   32KB   iP (i + 1)P   Ri = ( , − 1) N N   32KB 32KB expression meaning P N Ri i ID node 0 node 1 node N-1 Wi i i, j (0 i, j < N) Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • phase 2           BitTorrent-like   possession(i) ]   p request(p) ]   p have(p) ]   update( possession(i) ) ] pos. list update update Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • phase 1 (1/2)     Download Work Stealing     32KB 32KB       node 0 node 1 node N-1 Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • phase 1 (2/2) 1)  steal request   2)  divide current work   i j j 3)  send new work list     j assigned download pieces already downloaded 2 MB/sec not yet download 2) divide current work slow node j Wj = ￿W ∗ Bj /(Bi + Bj )￿ 1) steal request 3) send new work list = ￿5 ∗ 2/(8 + 2)￿ = 1 Wi = ￿W ∗ Bi /(Bi + Bj )￿ fast node i 8 MB/sec = ￿5 ∗ 8/(8 + 2)￿ = 4 Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •   non-steal, steal algorithm flat tree   (completion time) (stability)   (node scalability)   (performance analysis)       CPU memory HDD price small 1 ECU (1 core) 1.7 GB 160 GB $0.10/hour http://aws.amazon.com/ec2/instance-types/ ECU : EC2 Compute Unit 1.0 ~ 1.2 GHz Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • completion time and stability non-steal algorithm steal algorithm slowest average fastest slowest average fastest 140 140 completion time (sec) completion time (sec) 120 120 100 100 80 80 60 60 40 40 20 20 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 runs runs   1GB 8   non-steal, steal flat tree         Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • node scalability flat tree non-steal steal 10 total throughput (MB/sec) 9 8 7 6 5 4 3 2 1 0 4nodes 8nodes 16nodes 32nodes   1GB           Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    • finer analysis of our algorithms phase 1 (steal) phase 1 (non-steal) steal algorithm 12 phase 2 (steal) phase 2 (non-steal) 160 phase 2 average throughput (MB/sec) 140 phase 1 10 completion time (sec) 120 40 8 100 70 6 80 81 4 60 95 103 102 2 40 55.5 20 0 29 15 10 0 4nodes 8nodes 16nodes 32nodes 2nodes 4nodes 8nodes 16nodes 32nodes   non-steal     steal         Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •                 Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
    •       Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds