Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. A Framework for Flexible Programming in Complex Grid Environments 04/24/08 Taura Lab. 2 nd Year 76426 Ken Hironaka
  2. 2. New Context for Grid Computing <ul><li>Grid Computing: </li></ul><ul><ul><li>Computation across multiple clusters over WAN </li></ul></ul><ul><li>Conventionally, high performance computing </li></ul><ul><ul><li>Parallel programming experts </li></ul></ul><ul><li>Broadening of demands and needs </li></ul><ul><ul><li>Natural Language Processing </li></ul></ul><ul><ul><li>Genetic Sequence Analysis </li></ul></ul><ul><ul><li>The users are extending to </li></ul></ul><ul><ul><li> Non-parallel programming experts </li></ul></ul>
  3. 3. More Applications for Grid Computing <ul><li>Just computing ⇒ computing is only 1 part </li></ul><ul><li>“ Cloud Computing” </li></ul><ul><ul><li>Applications with Grid computing backend </li></ul></ul><ul><ul><ul><li>Handle intensive computation </li></ul></ul></ul><ul><ul><ul><li>load-balancing </li></ul></ul></ul><ul><ul><li>e.g.: Web-Applications </li></ul></ul>backend Application Publicly accessible Simple Job Submitter is not enough
  4. 4. Problems with conventional Frameworks <ul><li>Conventional Grid Computing </li></ul><ul><ul><li>Distributed Task computation frameworks </li></ul></ul><ul><ul><li>No interaction </li></ul></ul><ul><li>Answer to broader Application and Demands </li></ul><ul><ul><li>Complex interaction </li></ul></ul>Task File Fine Grain Interaction <ul><li>Need for flexible workflow coordination without loss of simplicity </li></ul><ul><ul><li>Programming support for Grid </li></ul></ul>
  5. 5. Problems with Grid Computing <ul><li>Deployment Complexity on Grid Environments </li></ul><ul><ul><li>Dynamically joining nodes </li></ul></ul><ul><ul><li>Node/Network failures </li></ul></ul><ul><ul><li>Network environment </li></ul></ul><ul><ul><ul><li>Prevalence of NAT/firewall </li></ul></ul></ul><ul><ul><ul><li>Unreliable WAN connections </li></ul></ul></ul>leave join Fire Wall Faulty link Configuration? Communication (sockets)? Error Handling? Need for simple deployment on complex environments
  6. 6. Our Contribution <ul><li>A distributed object-oriented Programming Framework that alleviates the burden of Grid environments </li></ul><ul><ul><li>Flexibility of programming without loss of simplicity </li></ul></ul><ul><ul><li>Simplicity of deployment </li></ul></ul><ul><ul><ul><li>Run on the Grid with minimal configuration </li></ul></ul></ul><ul><ul><li>Real – Life Applications </li></ul></ul><ul><ul><ul><li>Deployed an application on over 900 cores across 9 clusters </li></ul></ul></ul><ul><ul><ul><li>“ trouble-shooting” search engine on the Grid </li></ul></ul></ul><ul><ul><ul><ul><li>As example of Cloud Computing </li></ul></ul></ul></ul>
  7. 7. Agenda <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Proposal </li></ul><ul><li>Preliminary Experiments </li></ul><ul><li>Conclusion and Future Work </li></ul>
  8. 8. Distributed Objects and RMI <ul><li>ProActive [Huet ‘04] </li></ul><ul><li>Distributed Object Oriented </li></ul><ul><ul><li>Objects on remote nodes </li></ul></ul><ul><li>Work Delegation </li></ul><ul><ul><li>RMI (Remote Method Invocation) </li></ul></ul><ul><li>Parallel Computation via </li></ul><ul><li>asynchronous RMI </li></ul><ul><ul><li>Possible race-conditions </li></ul></ul><ul><li>Active Objects </li></ul><ul><ul><li>1 object = 1 thread </li></ul></ul><ul><ul><li>induces deadlocks </li></ul></ul>foo.doJob(args) RMI compute foo <ul><li>Need for synchronization ⇒ </li></ul><ul><li>cluttered with locks/synchronization </li></ul><ul><li>Coding becomes complex </li></ul>Async. RMI b.f() a.g() a b deadlock
  9. 9. Handling Joins and Failures <ul><li>JoJo [Nakada ‘04] </li></ul><ul><ul><li>Master – Worker framework </li></ul></ul><ul><ul><li>Event driven coding </li></ul></ul><ul><ul><li>A handler is invoked for each event </li></ul></ul><ul><ul><ul><li>Task completion </li></ul></ul></ul><ul><ul><ul><li>Node Joins </li></ul></ul></ul><ul><ul><ul><li>Node Failures </li></ul></ul></ul>Join Failure Handler Join Handler <ul><li>Synchronization issues </li></ul><ul><li>Event driven programming </li></ul><ul><li>For more complex problems, coding easily becomes unreadable </li></ul>
  10. 10. Resolving Connectivity on the Grid <ul><li>ProActive [Huet ‘04] </li></ul><ul><ul><li>overlay Network for communication </li></ul></ul><ul><ul><li>Resorts to manual network configuration files </li></ul></ul><ul><ul><ul><li>Specify each connection </li></ul></ul></ul>Connection Configuration File NAT Firewall <ul><li>configuration overhead becomes enormous on Grid scale </li></ul>Configure each link
  11. 11. Agenda <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Proposal </li></ul><ul><li>Preliminary Experiments </li></ul><ul><li>Conclusion and Future Work </li></ul>
  12. 12. Our Proposal <ul><li>A distributed object oriented framework for the Grid </li></ul><ul><ul><li>Distributed Objects with Grid programming support </li></ul></ul><ul><ul><ul><li>deadlock-free synchronization </li></ul></ul></ul><ul><ul><ul><li>Additional constructs to cope with join/failure of node </li></ul></ul></ul><ul><ul><li>Automatic and Adaptive Overlay Construction for Grid Runtime </li></ul></ul><ul><li>object oriented with support for race-condition/join/failure : flexibility and simplicity </li></ul><ul><li>deployment requires minimal configuration : simplicity </li></ul>
  13. 13. Object Synchronization Model <ul><li>parallel programming with minimal use of explicit locks </li></ul><ul><li>Distributed objects with ownership </li></ul><ul><ul><li>Its method can only be executed by 1 thread at a time : the owner thread </li></ul></ul><ul><ul><li>Eliminates data races </li></ul></ul><ul><li>Owner gives up ownership for blocking operations </li></ul><ul><ul><li>Other threads may contest for ownership </li></ul></ul><ul><ul><li>Eliminates deadlocks for common cases </li></ul></ul>Th Th Th Th object owner thread Th Th Th Th object new owner thread Give-up Owner ship block Th Th Th Th object unblock re-contest for ownership waiting threads
  14. 14. Adaptation to Dynamic Resources <ul><li>programming support for joining/leaving nodes </li></ul><ul><li>Decentralized object lookup </li></ul><ul><ul><li>Allow joining nodes to access other objects and join the computation </li></ul></ul><ul><li>Node Failure ⇒ RMI Failure </li></ul><ul><ul><li>Failure returned as exception to method invocation </li></ul></ul><ul><ul><li>The user can catch the exception, and perform rollback procedures if necessary </li></ul></ul>Exception! Objects in computation New object on joining node lookup Object on failed node
  15. 15. Automatic Overlay Construction (1) <ul><li>Automatic/Transparent communication </li></ul><ul><li>Configuration ONLY for firewalled clusters </li></ul><ul><li>Adapts to dynamic joins/leaves </li></ul><ul><li>Nodes create a TCP overlay network cooperatively </li></ul><ul><ul><li>Each node picks a small number of nodes to connect </li></ul></ul><ul><ul><li>Created connected graph </li></ul></ul>NAT Firewall Global IP Attempt connection established connections
  16. 16. Automatic Overlay Construction (2) <ul><li>NAT Clusters </li></ul><ul><ul><li>NAT nodes can connect to global nodes </li></ul></ul><ul><li>Firewalled Clusters </li></ul><ul><ul><li>Automatic SSH port-forwarding </li></ul></ul><ul><ul><ul><li>User specifies points </li></ul></ul></ul><ul><li>Transparent Communication </li></ul><ul><ul><li>Point-Point communication is routed over the network </li></ul></ul><ul><ul><li>Ad-hoc routing Protocol </li></ul></ul><ul><ul><ul><li>AODV [Perkins ‘97] </li></ul></ul></ul><ul><ul><li>Adapts to node joins/leaves </li></ul></ul>SSH Firewall traversal P-to-P communication
  17. 17. Failure Detection on Overlay <ul><li>How do we detect failures on the overlay? </li></ul><ul><li>RMI Failure </li></ul><ul><ul><li>Intermediate/end node failure </li></ul></ul><ul><ul><li>⇒ link failure </li></ul></ul><ul><li>Path Pointers </li></ul><ul><ul><li>Forwarding nodes remember the nexthop </li></ul></ul><ul><ul><li>RMI reply is returned the same way </li></ul></ul><ul><li>For link failure along pointer, back-propagate the failure to the invoker </li></ul>Path pointer RMI handler failure Backpropagate
  18. 18. Agenda <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Proposal </li></ul><ul><li>Preliminary Experiments </li></ul><ul><li>Conclusion and Future Work </li></ul>
  19. 19. Experiment Cluster Settings <ul><li>900 cores over 9 clusters </li></ul>hongo chiba okubo suzuk imade kototoi kyoto istbs tsubame Global IPs Firewall Private IPs All packets dropped
  20. 20. Overlay Construction Simulation <ul><ul><li>Evaluate the overlay construction scheme </li></ul></ul><ul><ul><li>For different cluster configurations, modified number of attempted connections per peer </li></ul></ul><ul><ul><li>1000 trials per each cluster/attempted connection configuration </li></ul></ul>Even for pathological case, 20 connections per peer is enough
  21. 21. Dynamic Master-Worker <ul><li>A master object distributes work to worker objects </li></ul><ul><ul><li>10,000 tasks all together </li></ul></ul><ul><ul><li>Task as RMIs </li></ul></ul><ul><li>Worker nodes join/leave at runtime </li></ul><ul><ul><li>New task for new node </li></ul></ul><ul><ul><li>Reassignment for tasks on failed nodes </li></ul></ul><ul><ul><li>No tasks were lost during computation </li></ul></ul>
  22. 22. Dynamic Master-Worker As the number of workers change, the number of assigned tasks change accordingly. The Master adaptively distributes, rolls back, and redistribute tasks.
  23. 23. A Real-Life Application <ul><li>Solving a combination optimization problem </li></ul><ul><ul><li>Permutation Flow Shop Problem </li></ul></ul><ul><ul><li>Parallel Branch-and-Bound </li></ul></ul><ul><ul><ul><li>Master-Worker style </li></ul></ul></ul><ul><ul><ul><li>Periodic updates </li></ul></ul></ul><ul><ul><li>Work distribution </li></ul></ul><ul><ul><ul><li>Divide search space evenly as subtasks </li></ul></ul></ul><ul><ul><ul><li>Load-balancing </li></ul></ul></ul><ul><ul><ul><ul><li>Unfinished tasks are sub-divided and redistributed </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Wasteful computation is quite possible </li></ul></ul></ul></ul>
  24. 24. Master-Worker Coordination <ul><li>Master does RMI to Worker </li></ul><ul><ul><li>Worker: periodic bound exchange with master </li></ul></ul><ul><ul><li>Not a straightforward Master-Worker application </li></ul></ul><ul><ul><li>Requires flexible framework like ours </li></ul></ul>Master Worker doJob() exchange_bound()
  25. 25. Runtime Speedup Lacks scalability with over 900 cores
  26. 26. Cumulative Computation Time Growth in Cum. Comp. time is attributed to increased re-execution of task If the Cum. Comp. time is taken into account, the speed up from 169 cores to 948 cores (5.64 times) is 4.94
  27. 27. Troubleshoot Search Engine <ul><li>Ever stuck debugging, or troubleshooting? </li></ul><ul><li>Re-rank google queries and give weight to pages for web-forums and solutions </li></ul><ul><ul><li>Natural language processing and machine learning </li></ul></ul><ul><li>Parallel computation on Grid backend </li></ul><ul><ul><li>Real time response </li></ul></ul>backend Search Engine Query: “ vmware kernel panic” Compute!! Compute!! Compute!!
  28. 28. Agenda <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Proposal </li></ul><ul><li>Preliminary Experiments </li></ul><ul><li>Conclusion and Future Work </li></ul>
  29. 29. Conclusion <ul><li>A distributed object oriented programming framework for Grid environments </li></ul><ul><ul><li>A novel distributed object oriented programming model </li></ul></ul><ul><ul><li>Grid-enabled via automatic overlay construction </li></ul></ul><ul><li>Showed that real-life Grid application needs can be addressed by our framework </li></ul><ul><ul><li>Deployed actual parallel applications on over 900 cores over 9 clusters with NAT/Firewalls, joins, and failures </li></ul></ul><ul><ul><li>Implemented a Grid computing backend for troubleshooting search engine </li></ul></ul>
  30. 30. Future Work <ul><li>Reliable WAN communication for the Grid overlays </li></ul><ul><ul><li>Node failure </li></ul></ul><ul><ul><li>Connection failure </li></ul></ul><ul><li>Weakness of WAN connections </li></ul><ul><ul><li>Router Policies </li></ul></ul><ul><ul><ul><li>close connections after given period </li></ul></ul></ul><ul><ul><li>Obscure kernel bugs with NAT </li></ul></ul><ul><ul><ul><li>Connection resets </li></ul></ul></ul>Faulty link WAN links are more vulnerable, and failures will occur
  31. 31. Some Related Work <ul><li>Robust Tree Topologies for Sensor Networks [ England ‘06] </li></ul><ul><ul><li>Create spanning tree for data reduction </li></ul></ul><ul><ul><li>Flat tree for high reliability </li></ul></ul><ul><ul><ul><li>Fewest Hops </li></ul></ul></ul><ul><ul><li>Tree with short distance for low power consumption </li></ul></ul><ul><ul><ul><li>Shortest Path </li></ul></ul></ul><ul><ul><li>⇒ Spanning Tree that merges the two metrics for the best of two worlds </li></ul></ul>Fewest Hop: High Reliability High Power Usage Shortest Path: Low Reliability Low Power Usage
  32. 32. Possible Future Direction <ul><li>Our context: Grid computing </li></ul><ul><ul><li>communication latency </li></ul></ul><ul><ul><li>= metric for link reliability </li></ul></ul><ul><li>Fewest Hops </li></ul><ul><ul><li>Reliability for node failure </li></ul></ul><ul><li>Shortest Distance </li></ul><ul><ul><li>Reliability for link failure </li></ul></ul>Short reliable links Long faulty links Can we construct an overlay connection topology that take the best of two worlds?
  33. 33. Publications <ul><li>1. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid , May 2008 (Poster Paper. To Appear). </li></ul><ul><li>2. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems . (Conditional Accept) </li></ul><ul><li>3. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures . (To Appear) </li></ul><ul><li>4. Ken Hironaka, Shogo Sawai, Kenjiro Taura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing . August 2007 </li></ul><ul><li>5. Ken Hironaka, Kenjiro Taura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming . Vol 48 No.SIG 12 (PRO 34), pp.28-40 , August 2007 </li></ul>
  34. 36. Problems with Grid Computing (2) <ul><li>Complexity of Programming on the Grid </li></ul><ul><ul><li>Low Level Computing (sockets) </li></ul></ul><ul><ul><ul><li>Communication </li></ul></ul></ul><ul><ul><ul><li>Multi-threaded Computing (Synchronization) </li></ul></ul></ul><ul><ul><ul><li>Heavy Burden on Non-experts </li></ul></ul></ul><ul><ul><li>Flexibility and Integration </li></ul></ul><ul><ul><ul><li>Grid Frameworks for task distribution </li></ul></ul></ul><ul><ul><ul><li>Independent parallel programming languages </li></ul></ul></ul><ul><ul><ul><li>Computing is not execution of many independent tasks </li></ul></ul></ul><ul><ul><ul><ul><li>Need finer grained communication </li></ul></ul></ul></ul><ul><ul><ul><li>Bad interface with user application </li></ul></ul></ul><ul><ul><ul><ul><li>Java, Ruby, Python, PHP </li></ul></ul></ul></ul>
  35. 37. Related Work <ul><li>Discussed with respect to criteria necessary for modern Grid computing </li></ul><ul><ul><li>Workflow Coordination </li></ul></ul><ul><ul><ul><li>Flexibility without putting the burden on the user </li></ul></ul></ul><ul><ul><li>Joining Nodes / Failure of resources </li></ul></ul><ul><ul><ul><li>Handling these events should not dominate the programming overhead </li></ul></ul></ul><ul><ul><li>Connectivity in Wide-Area Networks </li></ul></ul><ul><ul><ul><li>Adaptation to networks with NAT/firewall with little manual settings </li></ul></ul></ul>
  36. 38. Workflow Coordination (1) <ul><li>Condor / DAGMan [Thain ‘05] </li></ul><ul><ul><li>“ Tasks” are expressed as script files and distributed on idle nodes </li></ul></ul><ul><ul><li>Dependency between tasks can be expressed in DAG (Directed Acyclic Graph) </li></ul></ul><ul><li>Ibis / Satin [Wrzesinska ‘06] </li></ul><ul><ul><li>framework for divide-and-conquer problems </li></ul></ul><ul><ul><li>Tasks can be broken into smaller sub-tasks , on which it depends </li></ul></ul>DAG Dependency Relationship Central Manager Busy Nodes Assign Cluster Task <ul><li>Many computation cannot be expressed as “Tasks” with dependencies </li></ul><ul><li>A task’s communication is limited to others to which it has dependencies </li></ul>
  37. 39. Object Synchronization Example class A: def __init__(self, x): self.x = x def f(self, b): self.x += 1 #blocking RMI b.g() self.x -= 1 return Atomic section Atomic section a b b.g() Value x stays consistent In method f(), instance a invokes blocking method g() on object b only 1 thread at a time give-up ownership during RMI block
  38. 40. Adaptation to Dynamic Resources <ul><li>Signal delivery to objects </li></ul><ul><ul><li>Unblocks any thread that is blocking in the object’s context </li></ul></ul><ul><ul><ul><li>Can be used to notify asynchronous events </li></ul></ul></ul><ul><ul><ul><ul><li>A joining node </li></ul></ul></ul></ul><ul><li>Node Failure ⇒ RMI Failure </li></ul><ul><ul><li>Failure returned as exception to method invocation </li></ul></ul><ul><ul><li>The user can catch the exception, and perform rollback procedures if necessary </li></ul></ul>exception Th object block unblock signal
  39. 41. Preliminary Experiments <ul><li>Overlay Construction Simulation </li></ul><ul><li>A Simple Master-Worker Application with dynamically joining/leaving nodes </li></ul><ul><li>A Real-life Parallel Application on the Grid </li></ul><ul><li>A Troubleshoot-Search Engine </li></ul>