Distributed Computing


Published on

Comprehensive study of parallel, cluster, distributed, grid and cloud computing paradigms

Published in: Technology

Distributed Computing

  1. 1. Distributed Computing Sudarsun Santhiappan sudarsun@{burning-glass.com, gmail.com} Burning Glass Technologies Kilpauk, Chennai 600010
  2. 2. Technology is Changing... <ul><li>Computational Power gets Doubled every 18 months
  3. 3. Networking Bandwidth and Speed getting Doubled every 9 months
  4. 4. How to tap the benefits of this Technology ?
  5. 5. Should we grow as an Individual ?
  6. 6. Should we grow as a Team ? </li></ul>
  7. 7. The Coverage Today <ul><li>Parallel Processing
  8. 8. Multiprocessor or Multi-Core Computing
  9. 9. Symmetric Multiprocessing
  10. 10. Cluster Computing {PVM}
  11. 11. Distributed Computing {TAO, OpenMP}
  12. 12. Grid Computing {Globus Toolkit}
  13. 13. Cloud Computing {Amazon EC2} </li></ul>
  14. 14. Parallel Computing <ul><li>It is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently in parallel.
  15. 15. Multi-Core, Multiprocessor SMP, Massively Parallel Processing (MPP) Computers
  16. 16. Is it easy to write a parallel program ? </li></ul>
  17. 17. Cluster Computing <ul><li>A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer
  18. 18. Operate in shared memory mode (mostly)
  19. 19. Tightly coupled with high-speed networking, mostly with optical fiber channels.
  20. 20. HA, Load Balancing, Compute Clusters
  21. 21. Can we Load Balance using DNS ? </li></ul>
  22. 22. Distributed Computing <ul><li>Wikipedia : It deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime </li></ul>
  23. 23. Grid Computing <ul><li>Wikipedia: A form of distributed computing whereby a super and virtual computer is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform large tasks.
  24. 24. pcwebopedia.com : Unlike conventional networks that focus on communication among devices, grid computing harnesses unused processing cycles of all computers in a network for solving problems too intensive for any stand-alone machine.
  25. 25. IBM: Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.
  26. 26. Sun: Grid Computing is a computing infrastructure that provides dependable, consistent, pervasive and inexpensive access to computational capabilities. </li></ul>
  27. 27. Cloud Computing <ul><li>Wikipedia: It is a style of computing in which dynamically stable and often virtualised resources are provided as a service over the Internet.
  28. 28. Infrastructure As A Service (IaaS)
  29. 29. Platform As A Service (PaaS)
  30. 30. Software as a Service (SaaS)
  31. 31. Provide common business applications online accessible from a web browser.
  32. 32. Amazon Elastic Computing, Google Apps </li></ul>
  33. 33. Hardware: IBM p690 Regatta 32 POWER4 CPUs (1.1 GHz) 32 GB RAM 218 GB internal disk OS: AIX 5.1 Peak speed: 140.8 GFLOP/s * Programming model: shared memory multithreading (OpenMP) (also supports MPI) * GFLOP/s: billion floating point operations per second
  34. 34. 270 Pentium4 XeonDP CPUs 270 GB RAM 8,700 GB disk OS: Red Hat Linux Enterprise 3 Peak speed: 1.08 TFLOP/s * Programming model: distributed multiprocessing (MPI) * TFLOP/s: trillion floating point operations per second Hardware: Pentium4 Xeon Cluster
  35. 35. 56 Itanium2 1.0 GHz CPUs 112 GB RAM 5,774 GB disk OS: Red Hat Linux Enterprise 3 Peak speed: 224 GFLOP/s * Programming model: distributed multiprocessing (MPI) * GFLOP/s: billion floating point operations per second Hardware: Itanium2 Cluster schooner.oscer.ou.edu New arrival!
  36. 36. Vector Processing <ul><li>It is based on array processors where the instruction set includes operations that can perform mathematical operations on data elements simultaneously
  37. 37. Example: Finding Scalar dot product between two vectors
  38. 38. Is vector processing a parallel computing model?
  39. 39. What are the limitations of Vector processing ?
  40. 40. Extensively in Video processing & Games... </li></ul>
  41. 41. Pipelined Processing <ul><li>The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step.
  42. 42. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once.
  43. 43. A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle
  44. 44. Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs </li></ul>
  45. 45. Parallel Vs Pipelined Processing <ul><li>Parallel processing </li></ul><ul><li>Pipelined processing </li></ul>a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4 P1 P2 P3 P4 P1 P2 P3 P4 time Colors: different types of operations performed a, b, c, d: different data streams processed Less inter-processor communication Complicated processor hardware time More inter-processor communication Simpler processor hardware
  46. 46. Data Dependence <ul><li>Parallel processing requires NO data dependence between processors </li></ul><ul><li>Pipelined processing will involve inter-processor communication </li></ul>P1 P2 P3 P4 P1 P2 P3 P4 time time
  47. 47. Typical Computing Elements Hardware Operating System Applications Programming paradigms P P P P P P   Microkernel Multi-Processor Computing System Threads Interface Process Processor Thread P
  48. 48. Why Parallel Processing ? <ul><li>Computation requirements are ever increasing; for instance -- visualization, distributed databases, simulations, scientific prediction (ex: climate, earthquake), etc.
  49. 49. Sequential architectures reaching physical limitation (speed of light, thermodynamics)
  50. 50. Limit on number of transistor per square inch
  51. 51. Limit on inter-component link capacitance </li></ul>
  52. 52. Symmetric Multiprocessing SMP <ul><li>Involves a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory
  53. 53. Kernel can execute on any processor
  54. 54. Typically each processor does self-scheduling form the pool of available process or threads
  55. 55. Scalability problems in Uniform Memory Access
  56. 56. NUMA to improve speed, but limitations on data migration
  57. 57. Intel, AMD processors are SMP units
  58. 58. What is ASMP ? </li></ul>
  59. 61. SISD : A Conventional Computer <ul><li>Speed is limited by the rate at which computer can transfer information internally. </li></ul>Ex:PC, Macintosh, Workstations Processor Data Input Data Output Instructions
  60. 62. The MISD Architecture <ul><li>More of an intellectual exercise than a practical configuration. Few built, but commercially not available </li></ul>Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C
  61. 63. SIMD Architecture Ex: CRAY machine vector processing, Intel MMX (multimedia support) C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C
  62. 64. Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C
  63. 65. Shared Memory MIMD machine <ul><li>Communication: Source Processor writes data to GM & destination retrieves it.
  64. 66. Limitation : reliability & expandability A memory component or any processor failure affects the whole system. </li></ul><ul><li>Increase of processors leads to memory contention. </li></ul>Ex. : Silicon graphics supercomputers.... Global Memory System Processor A Processor B Processor C MEMORY BUS MEMORY BUS MEMORY BUS
  65. 67. Distributed Memory MIMD <ul><li>Communication : IPC on High Speed Network.
  66. 68. Network can be configured to ... Tree, Mesh, Cube, etc.
  67. 69. Unlike Shared MIMD </li></ul><ul><ul><li>Readily expandable
  68. 70. Highly reliable (any CPU failure does not affect the whole system) </li></ul></ul>Processor A Processor B Processor C IPC channel IPC channel MEMORY BUS MEMORY BUS MEMORY BUS Memory System A Memory System B Memory System C
  69. 71. Laws of caution..... <ul><li>Speed of computers is proportional to the square of their cost. </li></ul>i.e. cost = Speed <ul><li>Speedup by a parallel computer increases as the logarithm of the number of processors. </li></ul><ul><ul><li>Speedup = log 2 (no. of processors) </li></ul></ul>S P logP C S (speed = cost 2 )
  70. 72. Micro Kernel based Operating Systems for High Performance Computing
  71. 73. <ul>Three approaches to building OS.... </ul><ul><ul><li>Monolithic OS
  72. 74. Layered OS
  73. 75. Microkernel based OS </li></ul></ul> Client server OS Suitable for MPP systems <ul><li>Simplicity, flexibility and high performance are crucial for OS. </li></ul>Operating System Models
  74. 76. Monolithic Operating System <ul><li>Better application Performance
  75. 77. Difficult to extend </li></ul>Ex: MS-DOS Application Programs Application Programs System Services Hardware User Mode Kernel Mode
  76. 78. Layered OS <ul><li>Easier to enhance
  77. 79. Each layer of code access lower level interface
  78. 80. Low-application performance </li></ul>Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Ex : UNIX
  79. 81. Microkernel/Client Server OS (for MPP Systems) <ul><li>Tiny OS kernel providing basic primitive (process, memory, IPC)
  80. 82. Traditional services becomes subsystems
  81. 83. Monolithic Application Perf. Competence
  82. 84. OS = Microkernel + User Subsystems </li></ul>Client Application Thread lib. File Server Network Server Display Server Microkernel Hardware Send Reply Ex: Mach, PARAS, Chorus, etc. User Kernel
  83. 85. What are Micro Kernels ? <ul><li>Small operating system core
  84. 86. Contains only essential core operating systems functions
  85. 87. Many services traditionally included in the operating system are now external subsystems </li></ul><ul><ul><li>Device drivers
  86. 88. File systems
  87. 89. Virtual memory manager
  88. 90. Windowing system
  89. 91. Security services </li></ul></ul>
  90. 93. HPC Cluster Architecture Frontend Node Public Ethernet Private Ethernet Network Application Network (Optional) Power Distribution (Net addressable units as option) Node Node Node Node Node Node Node Node Node Node
  91. 94. Most Critical Problems with Clusters <ul><li>The largest problem in clusters is software skew </li></ul><ul><ul><li>When software configuration on some nodes is different than on others
  92. 95. Small differences (minor version numbers on libraries) can cripple a parallel program </li></ul></ul><ul><li>The second most important problem is lack of adequate job control of the parallel process </li></ul><ul><ul><li>Signal propagation
  93. 96. Cleanup </li></ul></ul>
  94. 97. Top 3 Problems with Software Packages <ul><li>Software installation works only in interactive mode </li></ul><ul><ul><li>Need a significant work by end-user </li></ul></ul><ul><li>Often rational default settings are not available </li></ul><ul><ul><li>Extremely time consuming to provide values
  95. 98. Should be provided by package developers but … </li></ul></ul><ul><li>Package is required to be installed on a running system </li></ul><ul><ul><li>Means multi-step operation: install + update
  96. 99. Intermediate state can be insecure </li></ul></ul>
  97. 100. Clusters Classification..1 <ul><li>Based on Focus (in Market) </li></ul><ul><ul><li>High Performance (HP) Clusters </li></ul></ul><ul><ul><ul><li>Grand Challenging Applications </li></ul></ul></ul><ul><ul><li>High Availability (HA) Clusters </li></ul></ul><ul><ul><ul><li>Mission Critical applications </li></ul></ul></ul>
  98. 101. HA Cluster: Server Cluster with &quot;Heartbeat&quot; Connection
  99. 102. Clusters Classification..2 <ul><li>Based on Workstation/PC Ownership </li></ul><ul><ul><li>Dedicated Clusters
  100. 103. Non-dedicated clusters </li></ul></ul><ul><ul><ul><li>Adaptive parallel computing
  101. 104. Also called Communal multiprocessing </li></ul></ul></ul>
  102. 105. Clusters Classification..3 <ul><li>Based on Node Architecture .. </li></ul><ul><ul><li>Clusters of PCs (CoPs)
  103. 106. Clusters of Workstations (COWs)
  104. 107. Clusters of SMPs (CLUMPs) </li></ul></ul>
  105. 108. Building Scalable Systems: Cluster of SMPs (Clumps) Performance of SMP Systems Vs. Four-Processor Servers in a Cluster
  106. 109. Clusters Classification..4 <ul><li>Based on Node OS Type .. </li></ul><ul><ul><li>Linux Clusters (Beowulf)
  107. 110. Solaris Clusters (Berkeley NOW)
  108. 111. NT Clusters (HPVM)
  109. 112. AIX Clusters (IBM SP2)
  110. 113. SCO/Compaq Clusters (Unixware)
  111. 114. Digital VMS Clusters, HP clusters </li></ul></ul>
  112. 115. Clusters Classification..5 <ul>Based on Processor Arch, Node Type </ul><ul><li>Homogeneous Clusters </li></ul><ul><ul><li>All nodes will have similar configuration </li></ul></ul><ul><li>Heterogeneous Clusters </li></ul><ul><ul><li>Nodes based on different processors and running different Operating Systems </li></ul></ul>
  113. 116. Cluster Implementation <ul><li>What is Middleware ?
  114. 117. What is Single System Image ?
  115. 118. Benefits of Single System Image </li></ul>
  116. 119. What is Cluster Middle-ware ? <ul><li>An interface between user applications and cluster hardware and OS platform.
  117. 120. Middle-ware packages support each other at the management, programming, and implementation levels.
  118. 121. Middleware Layers: </li></ul><ul><ul><li>SSI Layer
  119. 122. Availability Layer: It enables the cluster services of </li></ul></ul><ul><ul><ul><li>Checkpointing, Automatic Failover, recovery from failure,
  120. 123. fault-tolerant operating among all cluster nodes. </li></ul></ul></ul>
  121. 124. Middleware Design Goals <ul><li>Complete Transparency </li></ul><ul><ul><li>Lets the see a single cluster system.. </li></ul></ul><ul><ul><ul><li>Single entry point, ftp, telnet, software loading... </li></ul></ul></ul><ul><li>Scalable Performance </li></ul><ul><ul><li>Easy growth of cluster </li></ul></ul><ul><ul><ul><li>no change of API & automatic load distribution. </li></ul></ul></ul><ul><li>Enhanced Availability </li></ul><ul><ul><li>Automatic Recovery from failures </li></ul></ul><ul><ul><ul><li>Employ checkpointing & fault tolerant technologies </li></ul></ul></ul><ul><ul><li>Handle consistency of data when replicated.. </li></ul></ul>
  122. 125. What is Single System Image (SSI) ? <ul><li>A single system image is the illusion , created by software or hardware, that a collection of computing elements appear as a single computing resource. </li></ul><ul><li>SSI makes the cluster appear like a single machine to the user, to applications, and to the network.
  123. 126. A cluster without a SSI is not a cluster </li></ul>
  124. 127. Benefits of Single System Image <ul><li>Usage of system resources transparently
  125. 128. Improved reliability and higher availability
  126. 129. Simplified system management
  127. 130. Reduction in the risk of operator errors
  128. 131. User need not be aware of the underlying system architecture to use these machines effectively </li></ul>
  129. 132. Distributed Computing <ul><li>No shared memory
  130. 133. Communication among processes </li></ul><ul><ul><li>Send a message
  131. 134. Receive a message </li></ul></ul><ul><li>Asynchronous
  132. 135. Synchronous
  133. 136. Synergy among processes </li></ul>
  134. 137. Messages <ul><li>Messages are sequences of bytes moving between processes
  135. 138. The sender and receiver must agree on the type structure of values in the message
  136. 139. “ Marshalling”: data layout so that there is no ambiguity such as “four chars” v. “one integer”. </li></ul>
  137. 140. Message Passing <ul><li>Process A sends a data buffer as a message to process B.
  138. 141. Process B waits for a message from A, and when it arrives copies it into its own local memory.
  139. 142. No memory shared between A and B. </li></ul>
  140. 143. Message Passing <ul><li>Obviously, </li></ul><ul><ul><li>Messages cannot be received before they are sent.
  141. 144. A receiver waits until there is a message. </li></ul></ul><ul><li>Asynchronous </li></ul><ul><ul><li>Sender never blocks, even if infinitely many messages are waiting to be received
  142. 145. Semi-asynchronous is a practical version of above with large but finite amount of buffering </li></ul></ul>
  143. 146. Message Passing: Point to Point <ul><li>Q: send(m, P) </li></ul><ul><ul><li>Send message M to process P </li></ul></ul><ul><li>P: recv(x, Q) </li></ul><ul><ul><li>Receive message from process Q, and place it in variable x </li></ul></ul><ul><li>The message data </li></ul><ul><ul><li>Type of x must match that of m
  144. 147. As if x := m </li></ul></ul>
  145. 148. Broadcast <ul><li>One sender Q, multiple receivers P
  146. 149. Not all receivers may receive at the same time
  147. 150. Q: broadcast (m) </li></ul><ul><ul><li>Send message M to processes </li></ul></ul><ul><li>P: recv(x, Q) </li></ul><ul><ul><li>Receive message from process Q, and place it in variable x </li></ul></ul>
  148. 151. Synchronous Message Passing <ul><li>Sender blocks until receiver is ready to receive.
  149. 152. Cannot send messages to self.
  150. 153. No buffering. </li></ul>
  151. 154. Asynchronous Message Passing <ul><li>Sender never blocks.
  152. 155. Receiver receives when ready. </li></ul><ul><li>Can send messages to self. </li></ul><ul><li>Infinite buffering. </li></ul>
  153. 156. Message Passing <ul><li>Speed not so good </li></ul><ul><ul><li>Sender copies message into system buffers.
  154. 157. Message travels the network.
  155. 158. Receiver copies message from system buffers into local memory.
  156. 159. Special virtual memory techniques help. </li></ul></ul><ul><li>Programming Quality </li></ul><ul><ul><li>less error-prone cf. shared memory </li></ul></ul>
  157. 160. Distributed Programs <ul><li>Spatially distributed programs </li></ul><ul><ul><li>A part here, a part there, …
  158. 161. Parallel
  159. 162. Synergy </li></ul></ul><ul><li>Temporally distributed programs </li></ul><ul><ul><li>Compute half today, half tomorrow
  160. 163. Combine the results at the end </li></ul></ul><ul><li>Migratory programs </li></ul><ul><ul><li>Have computation, will travel </li></ul></ul>
  161. 164. Technological Bases of Distributed+Parallel Programs <ul><li>Spatially distributed programs </li></ul><ul><ul><li>Message passing </li></ul></ul><ul><li>Temporally distributed programs </li></ul><ul><ul><li>Shared memory </li></ul></ul><ul><li>Migratory programs </li></ul><ul><ul><li>Serialization of data and programs </li></ul></ul>
  162. 165. Technological Bases for Migratory programs <ul><li>Same CPU architecture </li></ul><ul><ul><li>X86, PowerPC, MIPS, SPARC, …, JVM </li></ul></ul><ul><li>Same OS + environment
  163. 166. Be able to “checkpoint” </li></ul><ul><ul><li>suspend, and
  164. 167. then resume computation
  165. 168. without loss of progress </li></ul></ul>
  166. 169. Message Passing Libraries <ul><li>Programmer is responsible for initial data distribution, synchronization, and sending and receiving information
  167. 170. Parallel Virtual Machine (PVM)
  168. 171. Message Passing Interface (MPI)
  169. 172. Bulk Synchronous Parallel model (BSP) </li></ul>
  170. 173. BSP: Bulk Synchronous Parallel model <ul><li>Divides computation into supersteps
  171. 174. In each superstep a processor can work on local data and send messages.
  172. 175. At the end of the superstep, a barrier synchronization takes place and all processors receive the messages which were sent in the previous superstep </li></ul>
  173. 176. BSP: Bulk Synchronous Parallel model <ul><li>http://www.bsp-worldwide.org/
  174. 177. Book: Rob H. Bisseling, “Parallel Scientific Computation: A Structured Approach using BSP and MPI,” Oxford University Press, 2004, 324 pages, ISBN 0-19-852939-2. </li></ul>
  175. 178. BSP Library <ul><li>Small number of subroutines to implement </li></ul><ul><ul><li>process creation,
  176. 179. remote data access, and
  177. 180. bulk synchronization. </li></ul></ul><ul><li>Linked to C, Fortran, … programs </li></ul>
  178. 181. Portable Batch System (PBS) <ul><li>Prepare a .cmd file </li></ul><ul><ul><li>naming the program and its arguments
  179. 182. properties of the job
  180. 183. the needed resources  </li></ul></ul><ul><li>Submit .cmd to the PBS Job Server: qsub command 
  181. 184. Routing and Scheduling: The Job Server </li></ul><ul><ul><li>examines .cmd details to route the job to an execution queue.
  182. 185. allocates one or more cluster nodes to the job
  183. 186. communicates with the Execution Servers (mom's) on the cluster to determine the current state of the nodes. 
  184. 187. When all of the needed are allocated, passes the .cmd on to the Execution Server on the first node allocated (the &quot;mother superior&quot;).  </li></ul></ul><ul><li>Execution Server </li></ul><ul><ul><li>will login on the first node as the submitting user and run the .cmd file in the user's home directory. 
  185. 188. Run an installation defined prologue script.
  186. 189. Gathers the job's output to the standard output and standard error
  187. 190. It will execute installation defined epilogue script.
  188. 191. Delivers stdout and stdout to the user. </li></ul></ul>
  189. 192. TORQUE, an open source PBS <ul><li>Tera-scale Open-source Resource and QUEue manager (TORQUE) enhances OpenPBS
  190. 193. Fault Tolerance </li></ul><ul><ul><li>Additional failure conditions checked/handled
  191. 194. Node health check script support </li></ul></ul><ul><li>Scheduling Interface
  192. 195. Scalability </li></ul><ul><ul><li>Significantly improved server to MOM communication model
  193. 196. Ability to handle larger clusters (over 15 TF/2,500 processors)
  194. 197. Ability to handle larger jobs (over 2000 processors)
  195. 198. Ability to support larger server messages </li></ul></ul><ul><li>Logging
  196. 199. http://www.supercluster.org/projects/torque/ </li></ul>
  197. 200. PVM, and MPI <ul><li>Message passing primitives
  198. 201. Can be embedded in many existing programming languages
  199. 202. Architecturally portable
  200. 203. Open-sourced implementations </li></ul>
  201. 204. Parallel Virtual Machine ( PVM ) <ul><li>PVM enables a heterogeneous collection of networked computers to be used as a single large parallel computer.
  202. 205. Older than MPI
  203. 206. Large scientific/engineering user community
  204. 207. http://www.csm.ornl.gov/pvm/ </li></ul>
  205. 208. Message Passing Interface (MPI) <ul><li>http ://www-unix.mcs.anl.gov/mpi/
  206. 209. MPI-2.0 http://www.mpi-forum.org/docs/
  207. 210. MPI CH: www.mcs.anl.gov/mpi/mpich / by Argonne National Laboratory and Missisippy State University
  208. 211. LAM: http://www.lam-mpi.org/
  209. 212. http://www.open-mpi.org/ </li></ul>
  210. 213. OpenMP for shared memory <ul><li>Distributed shared memory API
  211. 214. User-gives hints as directives to the compiler
  212. 215. http://www.openmp.org </li></ul>
  213. 216. SPMD <ul><li>Single program, multiple data
  214. 217. Contrast with SIMD
  215. 218. Same program runs on multiple nodes
  216. 219. May or may not be lock-step
  217. 220. Nodes may be of different speeds
  218. 221. Barrier synchronization </li></ul>
  219. 222. Condor <ul><li>Cooperating workstations: come and go.
  220. 223. Migratory programs </li></ul><ul><ul><li>Checkpointing
  221. 224. Remote IO </li></ul></ul><ul><li>Resource matching
  222. 225. http://www.cs.wisc.edu/condor/ </li></ul>
  223. 226. Migration of Jobs <ul><li>Policies </li></ul><ul><ul><li>Immediate-Eviction
  224. 227. Pause-and-Migrate </li></ul></ul><ul><li>Technical Issues </li></ul><ul><ul><li>Check-pointing: Preserving the state of the process so it can be resumed.
  225. 228. Migrating from one architecture to another </li></ul></ul>
  226. 229. OpenMosix Distro <ul><li>Quantian Linux </li></ul><ul><ul><li>Boot from DVD-ROM
  227. 230. Compressed file system on DVD
  228. 231. Several GB of cluster software
  229. 232. http:// dirk.eddelbuettel.com/quantian.html </li></ul></ul><ul><li>Live CD/DVD or Single Floppy Bootables </li></ul><ul><ul><li>http://bofh.be/clusterknoppix/
  230. 233. http://sentinix.org/
  231. 234. http://itsecurity.mq.edu.au/chaos/
  232. 235. http://openmosixloaf.sourceforge.net/
  233. 236. http://plumpos.sourceforge.net/
  234. 237. http://www.dynebolic.org/
  235. 238. http://bccd.cs.uni.edu/
  236. 239. http://eucaristos.sourceforge.net/
  237. 240. http://gomf.sourceforge.net/ </li></ul></ul><ul><li>Can be installed on HDD </li></ul>
  238. 241. What is openMOSIX? <ul><li>An open source enhancement to the Linux kernel
  239. 242. Cluster with come-and-go nodes
  240. 243. System image model: Virtual machine with lots of memory and CPU
  241. 244. Granularity: Process
  242. 245. Improves the overall (cluster-wide) performance.
  243. 246. Multi-user, time-sharing environment for the execution of both sequential and parallel applications
  244. 247. Applications unmodified (no need to link with special library) </li></ul>
  245. 248. What is openMOSIX? <ul><li>Execution environment: </li></ul><ul><ul><li>farm of diskless x86 based nodes
  246. 249. UP (uniprocessor), or
  247. 250. SMP (symmetric multi processor)
  248. 251. connected by standard LAN (e.g., Fast Ethernet) </li></ul></ul><ul><li>Adaptive resource management to dynamic load characteristics </li></ul><ul><ul><li>CPU, RAM, I/O, etc. </li></ul></ul><ul><li>Linear scalability </li></ul>
  249. 252. Users’ View of the Cluster <ul><li>Users can start from any node in the cluster, or sysadmin sets-up a few nodes as login nodes
  250. 253. Round-robin DNS: “hpc.clusters” with many IPs assigned to same name
  251. 254. Each process has a Home-Node </li></ul><ul><ul><li>Migrated processes always appear to run at the home node, e.g., “ps” show all your processes, even if they run elsewhere </li></ul></ul>
  252. 255. MOSIX architecture <ul><li>network transparency
  253. 256. preemptive process migration
  254. 257. dynamic load balancing
  255. 258. memory sharing
  256. 259. efficient kernel communication
  257. 260. probabilistic information dissemination algorithms
  258. 261. decentralized control and autonomy </li></ul>
  259. 262. A two tier technology <ul><li>Information gathering and dissemination </li></ul><ul><ul><li>Support scalable configurations by probabilistic dissemination algorithms
  260. 263. Same overhead for 16 nodes or 2056 nodes </li></ul></ul><ul><li>Pre-emptive process migration that can migrate any process, anywhere, anytime - transparently </li></ul><ul><ul><li>Supervised by adaptive algorithms that respond to global resource availability
  261. 264. Transparent to applications, no change to user interface </li></ul></ul>
  262. 265. Tier 1: Information gathering and dissemination <ul><li>In each unit of time (e.g., 1 second) each node gathers information about: </li></ul><ul><ul><li>CPU(s) speed, load and utilization
  263. 266. Free memory
  264. 267. Free proc-table/file-table slots </li></ul></ul><ul><li>Info sent to a randomly selected node
  265. 268. Scalable - more nodes better scattering </li></ul>
  266. 269. Tier 2: Process migration <ul><li>Load balancing: reduce variance between pairs of nodes to improve the overall performance
  267. 270. Memory ushering: migrate processes from a node that nearly exhausted its free memory, to prevent paging
  268. 271. Parallel File I/O: bring the process to the file-server, direct file I/O from migrated processes </li></ul>
  269. 272. Network transparency <ul><li>The user and applications are provided a virtual machine that looks like a single machine.
  270. 273. Example: Disk access from diskless nodes on fileserver is completely transparent to programs </li></ul>
  271. 274. Preemptive process migration <ul><li>Any user’s process, trasparently and at any time, can/may migrate to any other node.
  272. 275. The migrating process is divided into: </li><ul><li>system context ( deputy ) that may not be migrated from home workstation (UHN);
  273. 276. user context ( remote ) that can be migrated on a diskless node; </li></ul></ul>
  274. 277. Splitting the Linux process <ul><li>System context (environment) - site dependent- “home” confined
  275. 278. Connected by an exclusive link for both synchronous (system calls) and asynchronous (signals, MOSIX events)
  276. 279. Process context (code, stack, data) - site independent - may migrate </li></ul>Deputy Remote Kernel Kernel Userland Userland openMOSIX Link Local master node diskless node
  277. 280. Dynamic load balancing <ul><li>Initiates process migrations in order to balance the load of farm
  278. 281. responds to variations in the load of the nodes, runtime characteristics of the processes, number of nodes and their speeds
  279. 282. makes continuous attempts to reduce the load differences among nodes
  280. 283. the policy is symmetrical and decentralized </li><ul><li>all of the nodes execute the same algorithm
  281. 284. the reduction of the load differences is performed indipendently by any pair of nodes </li></ul></ul>
  282. 285. The ACE ORB <ul><li>What Is CORBA?
  283. 286. CORBA Basics </li></ul><ul><ul><li>Clients, Servers, and Servants
  284. 287. ORBs and POAs
  285. 288. IDL and the Role of IDL Compilers
  286. 289. IORs
  287. 290. Tying it all together </li></ul></ul><ul><li>Overview of ACE/TAO
  288. 291. CORBA Services </li></ul><ul><ul><li>Naming Service
  289. 292. Trading Service
  290. 293. Event Service </li></ul></ul><ul><li>Multi-Threaded Issues Using CORBA </li></ul>
  291. 294. What Is CORBA? <ul><li>C ommon O bject R equest B roker A rchitecture </li></ul><ul><ul><li>Common Architecture
  292. 295. Object Request Broker – ORB </li></ul></ul><ul><li>Specification from the OMG </li></ul><ul><ul><li>http://www.omg.org/technology/documents/corba_spec_catalog.htm
  293. 296. Must be implemented before usable </li></ul></ul>
  294. 297. What Is CORBA? <ul><li>More specifically: </li></ul><ul><ul><li>“ ( CORBA ) is a standard defined by the Object Management Group (OMG) that enables software components written in multiple computer languages and running on multiple computers to work together ” (1)
  295. 298. Allows for Object Interoperability, regardless of: </li></ul></ul><ul><ul><ul><li>Operating Systems
  296. 299. Programming Language
  297. 300. Takes care of Marshalling and Unmarshalling of Data </li></ul></ul></ul><ul><ul><li>A method to perform Distributed Computing </li></ul></ul>
  298. 301. What Is CORBA? Program A <ul><li>Running on a Windows PC
  299. 302. Written in Java </li></ul>Program B <ul><li>Running on a Linux Machine
  300. 303. Written in C++ </li></ul>CORBA
  301. 304. CORBA Basics: Clients, Servers, and Servants <ul><li>CORBA Clients </li></ul><ul><ul><li>An Application (program)
  302. 305. Request services from Servant object </li></ul></ul><ul><ul><ul><li>Invoke a method call </li></ul></ul></ul><ul><ul><li>Can exist on a different computer from Servant </li></ul></ul><ul><ul><ul><li>Can also exist on same computer, or even within the same program, as the Servant </li></ul></ul></ul><ul><ul><li>Implemented by Software Developer </li></ul></ul>
  303. 306. CORBA Basics: Clients, Servers, and Servants <ul><li>CORBA Servers </li></ul><ul><ul><li>An Application (program)
  304. 307. Performs setup needed to get Servants configured properly </li></ul></ul><ul><ul><ul><li>ORB’s, POA’s </li></ul></ul></ul><ul><ul><li>Instantiates and starts Servants object(s)
  305. 308. Once configuration done and Servant(s) running, Clients can begin to send messages
  306. 309. Implemented by Software Developer </li></ul></ul>
  307. 310. CORBA Basics: Clients, Servers, and Servants <ul><li>Servants </li></ul><ul><ul><li>Objects
  308. 311. Implement interfaces
  309. 312. Respond to Client requests
  310. 313. Exists within the same program as the Server that created and started it
  311. 314. Implemented by Software Developer </li></ul></ul>
  312. 315. ORB’s and POA’s <ul><li>ORB: Object Request Broker </li></ul><ul><ul><li>The “ORB” in “CORBA” </li></ul></ul><ul><ul><ul><li>At the heart of CORBA </li></ul></ul></ul><ul><ul><li>Enables communication
  313. 316. Implemented by ORB Vendor </li></ul></ul><ul><ul><ul><li>An organization that implements the CORBA Specification (a company, a University, etc.) </li></ul></ul></ul><ul><ul><li>Can be viewed as an API/Framework </li></ul></ul><ul><ul><ul><li>Set of classes and method </li></ul></ul></ul><ul><ul><li>Used by Clients and Servers to properly setup communication </li></ul></ul><ul><ul><ul><li>Client and Server ORB’s communicate over a network
  314. 317. Glue between Client and Server applications </li></ul></ul></ul>
  315. 318. ORB’s and POA’s <ul><li>POA: Portable Object Adapter </li></ul><ul><ul><li>A central CORBA goal: Programs using different ORB’s (provided by different ORB Vendors) can still communicate
  316. 319. The POA was adopted as the solution
  317. 320. Can be viewed as an API/Framework </li></ul></ul><ul><ul><ul><li>Set of classes and method </li></ul></ul></ul><ul><ul><li>Sits between ORB’s and Servants </li></ul></ul><ul><ul><ul><li>Glue between Servants and ORBs </li></ul></ul></ul><ul><ul><li>Job is to: </li></ul></ul><ul><ul><ul><li>Receive messages from ORB’s
  318. 321. Activate the appropriate Servant
  319. 322. Deliver the message to the Servant </li></ul></ul></ul>
  320. 323. CORBA Basics: IDL <ul><li>IDL: The Interface Definition Language </li></ul><ul><ul><li>Keyword: Definition </li></ul></ul><ul><ul><ul><li>No “executable” code (cannot implement anything)
  321. 324. Very similar to C++ Header Files
  322. 325. Language independent from Target Language </li></ul></ul></ul><ul><ul><ul><ul><li>Allows Client and Server applications to be written in different (several) languages </li></ul></ul></ul></ul><ul><ul><li>A “contract” between Clients and Servers </li></ul></ul><ul><ul><ul><li>Both MUST have the exact same IDL
  323. 326. Specifies messages and data that can be sent by Clients and received by Servants </li></ul></ul></ul><ul><ul><li>Written by Software Developer </li></ul></ul>
  324. 327. CORBA Basics: IDL <ul><li>Used to define interfaces (i.e. Servants) </li></ul><ul><ul><li>Classes and methods that provide services </li></ul></ul><ul><li>IDL Provides… </li></ul><ul><ul><li>Primitive Data Types (int, float, boolean, char, string)
  325. 328. Ability to compose primitives into more complex data structures
  326. 329. Enumerations, Unions, Arrays, etc.
  327. 330. Object-Oriented Inheritance </li></ul></ul>
  328. 331. CORBA Basics: IDL <ul><li>IDL Compilers </li></ul><ul><ul><li>Converts IDL files to target language files
  329. 332. Done via Language Mappings </li></ul></ul><ul><ul><ul><li>Useful to understand your Language Mapping scheme </li></ul></ul></ul><ul><ul><li>Target language files contain all the implementation code that facilitates CORBA-based communication </li></ul></ul><ul><ul><ul><li>More or less “hides” the details from you </li></ul></ul></ul><ul><ul><li>Creates client “stubs” and Server “skeletons”
  330. 333. Provided by ORB Vendor </li></ul></ul>
  331. 334. CORBA Basics: IDL IDL File IDL Compiler Client Stub Files Server Skeleton Files Generates Generates Generated Files are in Target Language: <ul><li>C++
  332. 335. Java
  333. 336. etc. </li></ul>Generated Files are in Target Language: <ul><li>C++
  334. 337. Java
  335. 338. etc. </li></ul>Client Programs used the classes in the Client Stub files to send messages to the Servant objects Client Program Servant Object Servant Objects inherit from classes in the Server Skeleton files to receive messages from the Client programs Association Inheritance
  336. 339. CORBA Basics: IDL <ul><li>Can also generate empty Servant class files </li></ul>IDL Compiler converts to C++ (in this case)
  337. 340. CORBA Basics: IOR’s <ul><li>IOR: Interoperable Object Reference </li></ul><ul><ul><li>Can be thought of as a “Distributed Pointer”
  338. 341. Unique to each Servant
  339. 342. Used by ORB’s and POA’s to locate Servants </li></ul></ul><ul><ul><ul><li>For Clients, used to find Servants across networks
  340. 343. For Servers, used to find proper Servant running within the application </li></ul></ul></ul><ul><ul><li>Opaque to Client and Server applications </li></ul></ul><ul><ul><ul><li>Only meaningful to ORB’s and POA’s
  341. 344. Contains information about IP Address, Port Numbers, networking protocols used, etc. </li></ul></ul></ul><ul><ul><li>The difficult part is obtaining them </li></ul></ul><ul><ul><ul><li>This is the purpose/reasoning behind developing and using CORBA Services </li></ul></ul></ul>
  342. 345. CORBA Basics: IOR’s <ul><li>Can be viewed in “stringified” format, but… </li></ul><ul><ul><li>Still not very meaningful </li></ul></ul>
  343. 346. CORBA Basics: IOR’s <ul><li>Standardized, to some degree: </li></ul>… … Standardized by the OMG: <ul><li>Used by Client side ORB’s to locate Server side (destination) ORB’s
  344. 347. Contains information needed to make physical connection </li></ul>NOT Standardized by the OMG; proprietary to ORB Vendors <ul><li>Used by Server side ORB’s and POA’s to locate destination Servants </li></ul>
  345. 348. CORBA Basics: Tying it All Together
  346. 349. CORBA Basics: Tying it All Together Client Program IOR (Servant Ref) Server Program Servant Message(Data) Logical Flow Client Program Server Program Servant Actual Flow POA ORB IOR (Servant Ref) ORB Once ORB’s and POA’s set up and configured properly, transparency is possible <ul><li>ORB’s communicate over network
  347. 350. POA’s activate servants and deliver messages </li></ul>
  348. 351. Overview of ACE/TAO <ul><li>ACE: Adaptive Communications Environment </li></ul><ul><ul><li>Object-Oriented Framework/API
  349. 352. Implements many concurrent programming design patterns
  350. 353. Can be used to build more complex communications-based packages </li></ul></ul><ul><ul><ul><li>For example, an ORB </li></ul></ul></ul>
  351. 354. Overview of ACE/TAO <ul><li>TAO: The ACE ORB </li></ul><ul><ul><li>Built on top of ACE
  352. 355. A CORBA implementation
  353. 356. Includes many (if not all) CORBA features specified by the OMG </li></ul></ul><ul><ul><ul><li>Not just an ORB
  354. 357. Provides POA’s, CORBA Services, etc. </li></ul></ul></ul><ul><ul><li>Object-Oriented Framework/API </li></ul></ul>
  355. 358. CORBA Services: The Naming Service <ul><li>The CORBA Naming Service is similar to the White Pages (phone book)
  356. 359. Servants place their “names,” along with their IOR’s, into the Naming Service </li></ul><ul><ul><li>The Naming Service stores these as pairs </li></ul></ul><ul><li>Later, Clients obtain IOR’s from the Naming Service by passing the name of the Servant object to it </li></ul><ul><ul><li>The Naming Service returns the IOR </li></ul></ul><ul><li>Clients may then use to make requests </li></ul>
  357. 360. CORBA Services: The Trading Service <ul><li>The CORBA Naming Service is similar to the Yellow Pages (phone book)
  358. 361. Servants place a description of the services they can provide (i.e. their “Trades”), along with their IOR’s, into the Trading Services </li></ul><ul><ul><li>The Trading Service stores these </li></ul></ul><ul><li>Clients obtain IOR’s from the Trading Service by passing the type(s) of Services they require </li></ul><ul><ul><li>The Trading Service returns an IOR </li></ul></ul><ul><li>Clients may then use to make requests </li></ul>
  359. 362. Multi-Threaded Issues Using CORBA <ul><li>Server performance can be improved by using multiple threads </li></ul><ul><ul><li>GUI Thread
  360. 363. Listening Thread
  361. 364. Processing Thread </li></ul></ul><ul><li>Can also use multiple ORBs and POAs to improve performance </li></ul><ul><ul><li>Requires a multi-threaded solution </li></ul></ul>
  362. 365. What is Grid Computing? <ul><li>Computational Grids </li></ul><ul><ul><li>Homogeneous (e.g., Clusters)
  363. 366. Heterogeneous (e.g., with one-of-a-kind instruments) </li></ul></ul><ul><li>Cousins of Grid Computing
  364. 367. Methods of Grid Computing </li></ul>
  365. 368. Computational Grids <ul><li>A network of geographically distributed resources including computers, peripherals, switches, instruments, and data.
  366. 369. Each user should have a single login account to access all resources.
  367. 370. Resources may be owned by diverse organizations. </li></ul>
  368. 371. Computational Grids <ul><li>Grids are typically managed by gridware.
  369. 372. Gridware can be viewed as a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability…) </li></ul>
  370. 373. Cousins of Grid Computing <ul><li>Parallel Computing
  371. 374. Distributed Computing
  372. 375. Peer-to-Peer Computing
  373. 376. Many others: Cluster Computing, Network Computing, Client/Server Computing, Internet Computing, etc... </li></ul>
  374. 377. Distributed Computing <ul><li>People often ask: Is Grid Computing a fancy new name for the concept of distributed computing?
  375. 378. In general, the answer is “no.” Distributed Computing is most often concerned with distributing the load of a program across two or more processes. </li></ul>
  376. 379. PEER2PEER Computing <ul><li>Sharing of computer resources and services by direct exchange between systems.
  377. 380. Computers can act as clients or servers depending on what role is most efficient for the network. </li></ul>
  378. 381. Methods of Grid Computing <ul><li>Distributed Supercomputing
  379. 382. High-Throughput Computing
  380. 383. On-Demand Computing
  381. 384. Data-Intensive Computing
  382. 385. Collaborative Computing
  383. 386. Logistical Networking </li></ul>
  384. 387. Distributed Supercomputing <ul><li>Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer.
  385. 388. Tackle problems that cannot be solved on a single system. </li></ul>
  386. 389. High-Throughput Computing <ul><li>Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work. </li></ul>
  387. 390. On-Demand Computing <ul><li>Uses grid capabilities to meet short-term requirements for resources that are not locally accessible.
  388. 391. Models real-time computing demands. </li></ul>
  389. 392. Data-Intensive Computing <ul><li>The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases.
  390. 393. Particularly useful for distributed data mining. </li></ul>
  391. 394. Collaborative Computing <ul><li>Concerned primarily with enabling and enhancing human-to-human interactions.
  392. 395. Applications are often structured in terms of a virtual shared space. </li></ul>
  393. 396. Logistical Networking <ul><li>Global scheduling and optimization of data movement.
  394. 397. Contrasts with traditional networking, which does not explicitly model storage resources in the network.
  395. 398. Called &quot;logistical&quot; because of the analogy it bears with the systems of warehouses, depots, and distribution channels. </li></ul>
  396. 399. Globus <ul><li>A collaboration of Argonne National Laboratory’s Mathematics and Computer Science Division, the University of Southern California’s Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory.
  397. 400. Started in 1996 and is gaining popularity year after year. </li></ul>
  398. 401. Globus <ul><li>A project to develop the underlying technologies needed for the construction of computational grids.
  399. 402. Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth. </li></ul>
  400. 403. The Globus Toolkit <ul><li>The Globus Resource Allocation Manager (GRAM) </li></ul><ul><ul><li>Creates, monitors, and manages services.
  401. 404. Maps requests to local schedulers and computers. </li></ul></ul><ul><li>The Grid Security Infrastructure (GSI) </li></ul><ul><ul><li>Provides authentication services. </li></ul></ul>
  402. 405. The Globus Toolkit <ul><li>The Monitoring and Discovery Service (MDS) </li></ul><ul><ul><li>Provides information about system status, including server configurations, network status, and locations of replicated datasets, etc. </li></ul></ul><ul><li>Nexus and globus_io </li></ul><ul><ul><li>provides communication services for heterogeneous environments. </li></ul></ul>
  403. 406. What are Clouds? <ul><li>Clouds are “Virtual Clusters” (“Virtual Grids”) of possibly “Virtual Machines” </li><ul><li>They may cross administrative domains or may “just be a single cluster”; the user cannot and does not want to know </li></ul><li>Clouds support access (lease of) computer instances </li><ul><li>Instances accept data and job descriptions (code) and return results that are data and status flags </li></ul><li>Each Cloud is a “Narrow” (perhaps internally proprietary) Grid
  404. 407. Clouds can be built from Grids
  405. 408. Grids can be built from Clouds </li></ul>
  406. 409. Virtualization and Cloud Computing <ul><li>The Virtues of Virtualization </li><ul><li>Portable environments, enforcement and isolation, fast to deploy, suspend/resume, migration… </li></ul><li>Cloud computing </li><ul><li>SaaS: software as a service
  407. 410. Service: provide me with a workspace
  408. 411. Virtualization makes it easy to provide a workspace/VM </li></ul><li>Cloud computing </li><ul><li>resource leasing, utility computing, elastic computing
  409. 412. Amazon’s Elastic Compute Cloud (EC2) </li></ul><li>Is this real? Or is this just a proof-of-concept? </li><ul><li>Successfully used commercially on a large scale
  410. 413. More experience for scientific applications </li></ul></ul>Virtual Workspaces: http//workspace.globus.org
  411. 414. Two major types of cloud <ul><li>Compute and Data Cloud </li><ul><li>EC2, Google Map Reduce, Science clouds
  412. 415. Provision platform for running science codes
  413. 416. Open source infrastructure: workspace, eucalyptus, hub0
  414. 417. Virtualization: providing environments as VMs </li></ul><li>Hosting Cloud </li><ul><li>GoogleApp Engine
  415. 418. Highly-available, fault tolerance, robustness, etc for Web capabilities
  416. 419. Community example: IU hosting environment (quarry) </li></ul></ul>Virtual Workspaces: http//workspace.globus.org
  417. 420. Technical Questions on Clouds <ul><li>How is data compute affinity tackled in clouds? </li><ul><li>Co-locate data and compute clouds?
  418. 421. Lots of optical fiber i.e. “just” move the data? </li></ul><li>What happens in clouds when demand for resources exceeds capacity – is there a multi-day job input queue? </li><ul><li>Are there novel cloud scheduling issues? </li></ul><li>Do we want to link clouds (or ensembles as atomic clouds); if so how and with what protocols
  419. 422. Is there an intranet cloud e.g. “cloud in a box” software to manage personal (cores on my future 128 core laptop) department or enterprise cloud? </li></ul>
  420. 423. Thanks Much.. <ul><li>99% of the slides are taken from the Internet from various Authors. Thanks to all of them!
  421. 424. Sudarsun Santhiappan
  422. 425. Director – R & D
  423. 426. Burning Glass Technologies
  424. 427. Kilpauk, Chennai 600010 </li></ul>