ppt
Upcoming SlideShare
Loading in...5
×
 

ppt

on

  • 744 views

 

Statistics

Views

Total Views
744
Views on SlideShare
744
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ppt ppt Presentation Transcript

  • gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura The University of Tokyo 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Barriers of Grid Environments
    • Grid = Multiple Clusters (LAN/WAN)
    • Complex environment
      • Dynamic node joins
      • Resource removal/failure
        • Network and nodes
      • Connectivity
        • NAT/firewall
    leave join Fire Wall Grid enabled frameworks are crucial to facilitate computing in these environments 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • What type of applications?
    • Typical Usage
      • Standalone jobs
      • No interaction among nodes
    • Parallel and distributed Applications
      • Orchestrate nodes for a single application
        • Map an existing application on the Grid
      • Requires complex interaction
      • ⇒ frameworks must make it
      • simple and manageable
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Common Approaches(1)
    • Programming-less
      • Batch Scheduler
        • Task placement (inter-cluster)
        • Transparent retries on failure
      • Enables minimal interaction
        • Pass data via files/raw sockets
        • Embarrassingly parallel tasks
        • Very limited for application
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy execute redo SUBMIT
  • Common Approaches(2)
    • Incorporate some user programming
      • e.g. : Master-Worker framework
        • Program the master/worker(s)
          • Job distribution
          • Handling worker join/leave
          • Error handling
    • Enables simple interaction
      • Still limited in application
    •  
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy error() join() doJob() For more complex interaction (larger problem set) must allow more flexible/general programming
  • The most flexible approach
    • Parallel Programming Languages
      • Extend existing languages: retains flexibility
      • Countless past examples
        • (MultiLisp[Halstead ‘85], JavaRMI, ProActive[Huet et al. ‘04], …)
      • Problem : not in context of the Grid
        • Node joins/leaves?
        • Resolve connectivity with NAT/firewall?
      • Coding becomes complex/overwhelming
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Can we not complement this?
  • Our Contribution
    • Grid-enabled distributed object-oriented framework
      • a focus on coping with complex environment
        • Joins, failures, connectivity
      • simple Programming & minimal Configuration
        • Simple tool to act as a glue for the Grid
      • Implemented parallel applications on Grid environment with 900 cores (9 clusters)
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Agenda
    • Introduction
    • Related Work
    • Proposal
    • Evaluation
    • Conclusion
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Programming-less frameworks
    • Condor/DAGMan [Thain et al. ‘05]
      • Batch scheduler
      • Transparent retires/ handle multiple clusters
      • Extremely limited interaction among nodes
        • Tasks with DAG dependencies
        • Pass on data using intermediate/scratch files
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Central Manager Busy Nodes Assign Cluster Interaction using files Task
  • “ Restricted” Programming frameworks
    • Master-Worker Model: Jojo2 [Aoki et al. ‘06], OmniRPC [Sato et al. ‘01],
    •            Ninf-C [Nakata et al. ‘04], NetSolve [Casanova et al. ‘96]
      • Event driven master code: handle join/leave
    • Map-Reduce [Dean et al. ‘05]
      • define 2 functions: map(), reduce()
      • Partial retires when nodes fail
    • Ibis – Satin [Wrzesinska et al. ‘06]
      • Distributed divide-and-conquer
      • Random work stealing: accommodate join/leave
    • Effective for specialized problem sets
      • Specialize on a problem/model, made mapping/programming easy
      • For “unexpected models”, users have to resort to out-of-band/Ad-hoc means
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Map() Map() Map() Reduce() Reduce() Input Data Join Failure Handler Join Handler divide fib(n) fib(n-1)
  • Distributed Object Oriented frameworks
    • ABCL [Yonezawa ‘90]
    • JavaRMI, Manta [Maassen et al. ‘99]
    • ProActive [Huet et al. ‘04]
    • Distributed Object oriented
      • Disperse objects among resources
    • Load delegation/distribution
      • Method invocations
      • RMI (Remote Method Invocation)
      • Async. RMIs for parallelism
    • RMI:
      • good abstraction
    • Extension of general language:
      • Allow flexible coding
    foo.doJob(args) RMI compute foo Async. RMI 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Hurdles for DOO on the Grid
    • Race conditions
      • Simultaneous RMIs on 1 object
      • Active Objects
        • 1 object = 1 thread
        • Deadlocks:
        • e.g.: recursive calls
    • Handling asynchronous events
      • e.g., handling node joins
      • Why not event driven?
        • The flow of the program is segmented, and hard to flow
    • Handling joins/failures
      • Difficult to handle them transparently in a reasonable manner
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy b.f() a.g() a b deadlock event if join: add if done: give more … failure Checkpoint? Automatic retry? …
  • Hurdles for Implementation
    • Connecivity with NAT/firewall
      • Solution: Build an overlay
    • Existing implementations
      • ProActive [Huet et al. ‘04]
        • Tree topology overlay
        • User must hand write connectable points
      • Jojo2 [Aoki et al. ‘06]
        • 2-level Hierarchical topology
          • SSH / UDP broadcast
        • assumes network topology/setting
          • out of user control
    • Requirements
      • Minimal user burden
    NAT Firewall Configure each link 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Connection Configuration File
  • Summarization of the Problems
    • Distributed Object-Oriented on the Grid
      • Thread race conditions
      • Event handling
      • Node join/leave
      • underlying Connectivity
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Proposal : gluepy
    • Grid enabled distributed object oriented framework
      • As a Python Library
      • glue together Grid resources via simple and flexible coding
    • Resolve the issues in an object-oriented paradigm
      • SerialObjects
        • define “ownership” for objects
        • blocking operations unblock on events
      • Constructs for handling Node join/leave
        • Resolve the “first reference” problem
        • Failures are abstracted as exceptions
      • Connectivity (NAT/firewall)
        • Peers automatically construct an overlay
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • The Basic Programming Model
    • RemoteObjects
      • Created/mapped to a process
      • Accessible from other processes (RMI)
      • Passive Objects
        • Threads are not bound to objects
    • Thread
      • Simply to gain parallelism
      • RMIs / async. invocations (RMIs) implicitly spawn a thread
    • Future
      • Returned for async. invocation
      • placeholder for result
      • Uncaught exception is stored
      • and re-raised at collection
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy a f() Proc: A Proc: B a.f() Spawn for RMI a f() Proc Spawn for async store in F F = a.f() async
  • Programming in gluepy
    • Basics: RemoteObject
      • Inherit Base class
      • Externally referenceable
    • Async. invocation with futures
      • No explicit threads
      • Easier to maintain
      • sequential flow
    • mutual exclusion? events?
    • ⇒ SerialObjects
      •  
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy class Peer( RemoteObject ): def run(self, arg): # work here… return result futures = [] for p in peers: f = p.run.future(arg) futures.append(f) waitall(futures) for f in futures: print f.get() async. RMI run() on all wait for all results read for all results inherit Remote Object
  • “ ownership” with SerialObjects
    • SerialObjects
      • Objects with mutual exclusion
      • RemoteObject sub-class
    • No explicit locks
    • Ownership for each object
      • call ⇒ acquire
      • return ⇒ release
      • Method execution by only 1 thread
        • The “owner thread”
    • Owner releases ownership on
    • blocking operations
      • e.g: waitall(), RMI to other SerialObject
      • Pending threads contest for ownership
      • Arbitrary thread is scheduled
      • Eliminate deadlocks for recursive calls
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Th Th Th Th object new owner thread Give-up Owner ship block Th Th Th Th object unblock re-contest for ownership Th Th Th Th object owner thread waiting threads
  • Signals to SerialObjects
    • We don’t want event-driven loops!
    • Events -> “signals”
      • Blocking op. unblock on signal
    • Signals to objects
      • Unblock a thread blocking
      • in object’s context
        • If none, unblock a next blocking thread
      • Unblocked thread can handle
      • the signal(event)
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Th object unblock SIGNAL Th object handle
  • SerialObjects in gluepy
    • e.g. : A Queue
      • pop()
        • blocks on empty Queue
      • add()
        • call signal() to unblock waiter
    • Atomic Section:
      • Between blocking ops
      • in a method
      • Can update obj. attr.s
      • and do invocation on
      • Non -Serial Objects
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy class DistQueue( SerialObject ): def __init__(self): self.queue = [] def add(self, x): self.queue.append(x) if len(self.queue) == 1: self.signal() def pop(self): while len(self.queue) == 0: wait([]) x = self.queue.pop(0) return x Signal & wake Block until signal Atomic Section
  • Managing dynamic resources
    • Node Join:
      • Python process starts
    • Node leave:
      • Process termination
    • Constructs for node joins/leaves
      • Node Join
        • ⇒“ first reference” problem
        • Object lookup
        • obtain ref. to existing objects in computation
      • Node Leave
        • ⇒ RMI exception
        • Catch to handle failure
    Exception! Objects in computation joining node Object on failed node 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy lookup
  • e.g.:Master-worker in gluepy (1/3)
    • Handles join/leave
    • code for join :
      • join will invoke signal
      • signal will unblock main
      • master thread
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy class Master(SerialObject): ... def nodeJoin(self, node): self.nodes.append(node) self.signal() def run (self): assigned = {} while True: while len(self.nodes) >0 and len(self.jobs) >0: ASYNC. RMIS TO IDLE WORKERS readys = wait(futures) if readys == None: continue for f in readys: HANDLE RESULTS Signal for join Block & Handle join
  • e.g. : Master-worker in gluepy (2/3)
    • Failure handling
      • Exception on collection
      • Handle exception to resubmit task
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job) Failure handling
  • e.g.: Master-worker in gluepy (3/3)
    • Deployment
      • Master exports object
      • Workers get reference
      • and do RMI to join
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy worker = Worker() master = RemoteRef(“master”) master.nodeJoin(worker) while True: sleep(1) master = Master() master.register(“master”) master.run() lookup on join Worker init Master init
  • Automatic Overlay Construction(1)
    • Solution for Connectivity
      • Automatically construct
      • an overlay
    • TCP overlay
      • On boot, acquire other peer info.
      • Each node connects to a small number of peers
      • Establish a connected connection graph
    NAT Firewall Global IP Attempt connection established connections 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Automatic Overlay Construction(2)
    • Firewalled clusters
      • Automatic
      • port-forwarding
      • User configure SSH info
    • Transparent routing
      • P-P communication is routed
      • (AODV [Perkins ‘97])
    SSH Firewall traversal P-to-P communication 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy #config file use src_pat dst_pat , prot=ssh, user=kenny
  • RMI failure detection on Overlay
    • Problem with overlay
      • A route consists of a number of connections
    • RMI failure
      • ⇒ failure of any intermediate
      • connection
    • Path Pointers
      • Recorded on each forwarding node
      • RMI reply returns the path it came
    • Failure of intermediate connection
      • The preceding forwarding node back-propagates the failure
    Path pointer RMI handler failure Backpropagate RMI invoker 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Agenda
    • Introduction
    • Related Work
    • Proposal
    • Evaluation
    • Conclusion
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Experimental Environment hongo:98 chiba:186 okubo:28 suzuk:72 imade:60 kototoi:88 kyoto:70 istbs:316 tsubame:64 Firewall Private IPs All packets dropped hiro:88 mirai:48 Global IPs Max. scale : 9 clusters, over 900 cores 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy InTrigger Grid Platform in Japan InTrigger requires SSH forwarding
  • Necessary Configuration
    • Configuration necessary for Overlay
      • 2 clusters( tsubame, istbs) require SSH-portforwarding to other clusters
        • ⇒ 2 lines of configuration
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy # istbs cluster uses SSH for inter-cluster conn. use 133.11.23. (?!133.11.23.), prot=ssh, user=kenny # tsubame cluster gateway uses SSH for inter-cluster conn. use 131.112.3.1 (?!172.17.), prot=ssh, user=kenny add connection instruction by regular expression
  • Overlay Construction Simulation
      • Evaluate the overlay construction scheme
      • For different cluster configurations, modified number of attempted connections per peer
      • 1000 trials per each cluster/attempted connection configuration
    28 Global/ 238 Private Peers Case: 95 % 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Dynamic Master-Worker
    • Master object distributes work to Worker objects
      • 10,000 tasks as RMI
    • Workers repeat join/leave
      • Tasks for failed nodes are redistributed
      • No tasks were lost during the experiment
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • A Real-life Application
    • A combination optimization problem
      • Permutation Flow Shop Problem
      • parallel branch-and-bound
        • Master-Worker like
        • Requires periodic exchange of bounds
      • Code
        • 250 lines of Python code as glue code
        • Worker node starts up sequential C++ code
          • Communicate with local Python through pipes
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Master-Worker interaction
    • Master does RMI to worker
      • Worker: periodical RMI to master
      • Not your typical master-worker
      • requires a flexible framework like ours
    Master Worker doJob() exchange_bound() 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Performance
    • Work Rate
      • ci : total comp. time per core
      • N : num. of cores
      • T : completion time
    • Slight drop with 950 cores
      • due to master node becoming overloaded
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Troubleshoot Search Engine
    • Ever stuck debugging, or troubleshooting?
    • Re-rank query results obtained from google
      • Use results from machine learning web-forums
      • Perform natural language processing on page contents
      • at query time
    • Use a Grid backend
      • Computationally intensive
      • Require good response time
        • in 10s of seconds
    backend Search Engine Query: “ vmware kernel panic” Compute!! Compute!! Compute!! 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Troubleshoot Search Engine Overview 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy Python CGI Leveraged sync/async RMIs to seamlessly integrate parallelism into a sequential program. Merged CGIs with Grid backend async. doQuery() doSearch() async. doWork() parsing Graph extraction rescoring
  • Agenda
    • Introduction
    • Related Work
    • Proposal
    • Evaluation
    • Conclusion
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Conclusion
    • gluepy: Grid enabled distributed object oriented framework
      • Supports simple and flexible coding for complex Grid
        • SerialObjects
        • Signal semantics
        • Object lookup / exception on RMI failure
        • Automatic overlay construction
      • as a tool to glue together Grid resources simply and flexibly
    • Implemented and evaluated applications on the Grid
      • Max. scale: 900 core (9 cluster)
        • NAT/Firewall, with runtime joins/leaves
      • Parallelized real-life applications
        • Take full advantage of gluepy constructs for seamless programming
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
  • Questions?
    • gluepy is available from its homepage
      • www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
    2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy