Distributed Computing


Published on

Comprehensive study of parallel, cluster, distributed, grid and cloud computing paradigms

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Distributed Computing

    1. 1. Distributed Computing Sudarsun Santhiappan sudarsun@{burning-glass.com, gmail.com} Burning Glass Technologies Kilpauk, Chennai 600010
    2. 2. Technology is Changing... <ul><li>Computational Power gets Doubled every 18 months
    3. 3. Networking Bandwidth and Speed getting Doubled every 9 months
    4. 4. How to tap the benefits of this Technology ?
    5. 5. Should we grow as an Individual ?
    6. 6. Should we grow as a Team ? </li></ul>
    7. 7. The Coverage Today <ul><li>Parallel Processing
    8. 8. Multiprocessor or Multi-Core Computing
    9. 9. Symmetric Multiprocessing
    10. 10. Cluster Computing {PVM}
    11. 11. Distributed Computing {TAO, OpenMP}
    12. 12. Grid Computing {Globus Toolkit}
    13. 13. Cloud Computing {Amazon EC2} </li></ul>
    14. 14. Parallel Computing <ul><li>It is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently in parallel.
    15. 15. Multi-Core, Multiprocessor SMP, Massively Parallel Processing (MPP) Computers
    16. 16. Is it easy to write a parallel program ? </li></ul>
    17. 17. Cluster Computing <ul><li>A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer
    18. 18. Operate in shared memory mode (mostly)
    19. 19. Tightly coupled with high-speed networking, mostly with optical fiber channels.
    20. 20. HA, Load Balancing, Compute Clusters
    21. 21. Can we Load Balance using DNS ? </li></ul>
    22. 22. Distributed Computing <ul><li>Wikipedia : It deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime </li></ul>
    23. 23. Grid Computing <ul><li>Wikipedia: A form of distributed computing whereby a super and virtual computer is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform large tasks.
    24. 24. pcwebopedia.com : Unlike conventional networks that focus on communication among devices, grid computing harnesses unused processing cycles of all computers in a network for solving problems too intensive for any stand-alone machine.
    25. 25. IBM: Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.
    26. 26. Sun: Grid Computing is a computing infrastructure that provides dependable, consistent, pervasive and inexpensive access to computational capabilities. </li></ul>
    27. 27. Cloud Computing <ul><li>Wikipedia: It is a style of computing in which dynamically stable and often virtualised resources are provided as a service over the Internet.
    28. 28. Infrastructure As A Service (IaaS)
    29. 29. Platform As A Service (PaaS)
    30. 30. Software as a Service (SaaS)
    31. 31. Provide common business applications online accessible from a web browser.
    32. 32. Amazon Elastic Computing, Google Apps </li></ul>
    33. 33. Hardware: IBM p690 Regatta 32 POWER4 CPUs (1.1 GHz) 32 GB RAM 218 GB internal disk OS: AIX 5.1 Peak speed: 140.8 GFLOP/s * Programming model: shared memory multithreading (OpenMP) (also supports MPI) * GFLOP/s: billion floating point operations per second
    34. 34. 270 Pentium4 XeonDP CPUs 270 GB RAM 8,700 GB disk OS: Red Hat Linux Enterprise 3 Peak speed: 1.08 TFLOP/s * Programming model: distributed multiprocessing (MPI) * TFLOP/s: trillion floating point operations per second Hardware: Pentium4 Xeon Cluster
    35. 35. 56 Itanium2 1.0 GHz CPUs 112 GB RAM 5,774 GB disk OS: Red Hat Linux Enterprise 3 Peak speed: 224 GFLOP/s * Programming model: distributed multiprocessing (MPI) * GFLOP/s: billion floating point operations per second Hardware: Itanium2 Cluster schooner.oscer.ou.edu New arrival!
    36. 36. Vector Processing <ul><li>It is based on array processors where the instruction set includes operations that can perform mathematical operations on data elements simultaneously
    37. 37. Example: Finding Scalar dot product between two vectors
    38. 38. Is vector processing a parallel computing model?
    39. 39. What are the limitations of Vector processing ?
    40. 40. Extensively in Video processing & Games... </li></ul>
    41. 41. Pipelined Processing <ul><li>The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step.
    42. 42. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once.
    43. 43. A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle
    44. 44. Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs </li></ul>
    45. 45. Parallel Vs Pipelined Processing <ul><li>Parallel processing </li></ul><ul><li>Pipelined processing </li></ul>a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4 P1 P2 P3 P4 P1 P2 P3 P4 time Colors: different types of operations performed a, b, c, d: different data streams processed Less inter-processor communication Complicated processor hardware time More inter-processor communication Simpler processor hardware
    46. 46. Data Dependence <ul><li>Parallel processing requires NO data dependence between processors </li></ul><ul><li>Pipelined processing will involve inter-processor communication </li></ul>P1 P2 P3 P4 P1 P2 P3 P4 time time
    47. 47. Typical Computing Elements Hardware Operating System Applications Programming paradigms P P P P P P   Microkernel Multi-Processor Computing System Threads Interface Process Processor Thread P
    48. 48. Why Parallel Processing ? <ul><li>Computation requirements are ever increasing; for instance -- visualization, distributed databases, simulations, scientific prediction (ex: climate, earthquake), etc.
    49. 49. Sequential architectures reaching physical limitation (speed of light, thermodynamics)
    50. 50. Limit on number of transistor per square inch
    51. 51. Limit on inter-component link capacitance </li></ul>
    52. 52. Symmetric Multiprocessing SMP <ul><li>Involves a multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory
    53. 53. Kernel can execute on any processor
    54. 54. Typically each processor does self-scheduling form the pool of available process or threads
    55. 55. Scalability problems in Uniform Memory Access
    56. 56. NUMA to improve speed, but limitations on data migration
    57. 57. Intel, AMD processors are SMP units
    58. 58. What is ASMP ? </li></ul>
    59. 61. SISD : A Conventional Computer <ul><li>Speed is limited by the rate at which computer can transfer information internally. </li></ul>Ex:PC, Macintosh, Workstations Processor Data Input Data Output Instructions
    60. 62. The MISD Architecture <ul><li>More of an intellectual exercise than a practical configuration. Few built, but commercially not available </li></ul>Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C
    61. 63. SIMD Architecture Ex: CRAY machine vector processing, Intel MMX (multimedia support) C i <= A i * B i Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C
    62. 64. Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C
    63. 65. Shared Memory MIMD machine <ul><li>Communication: Source Processor writes data to GM & destination retrieves it.
    64. 66. Limitation : reliability & expandability A memory component or any processor failure affects the whole system. </li></ul><ul><li>Increase of processors leads to memory contention. </li></ul>Ex. : Silicon graphics supercomputers.... Global Memory System Processor A Processor B Processor C MEMORY BUS MEMORY BUS MEMORY BUS
    65. 67. Distributed Memory MIMD <ul><li>Communication : IPC on High Speed Network.
    66. 68. Network can be configured to ... Tree, Mesh, Cube, etc.
    67. 69. Unlike Shared MIMD </li></ul><ul><ul><li>Readily expandable
    68. 70. Highly reliable (any CPU failure does not affect the whole system) </li></ul></ul>Processor A Processor B Processor C IPC channel IPC channel MEMORY BUS MEMORY BUS MEMORY BUS Memory System A Memory System B Memory System C
    69. 71. Laws of caution..... <ul><li>Speed of computers is proportional to the square of their cost. </li></ul>i.e. cost = Speed <ul><li>Speedup by a parallel computer increases as the logarithm of the number of processors. </li></ul><ul><ul><li>Speedup = log 2 (no. of processors) </li></ul></ul>S P logP C S (speed = cost 2 )
    70. 72. Micro Kernel based Operating Systems for High Performance Computing
    71. 73. <ul>Three approaches to building OS.... </ul><ul><ul><li>Monolithic OS
    72. 74. Layered OS
    73. 75. Microkernel based OS </li></ul></ul> Client server OS Suitable for MPP systems <ul><li>Simplicity, flexibility and high performance are crucial for OS. </li></ul>Operating System Models
    74. 76. Monolithic Operating System <ul><li>Better application Performance
    75. 77. Difficult to extend </li></ul>Ex: MS-DOS Application Programs Application Programs System Services Hardware User Mode Kernel Mode
    76. 78. Layered OS <ul><li>Easier to enhance
    77. 79. Each layer of code access lower level interface
    78. 80. Low-application performance </li></ul>Application Programs System Services User Mode Kernel Mode Memory & I/O Device Mgmt Hardware Process Schedule Application Programs Ex : UNIX
    79. 81. Microkernel/Client Server OS (for MPP Systems) <ul><li>Tiny OS kernel providing basic primitive (process, memory, IPC)
    80. 82. Traditional services becomes subsystems
    81. 83. Monolithic Application Perf. Competence
    82. 84. OS = Microkernel + User Subsystems </li></ul>Client Application Thread lib. File Server Network Server Display Server Microkernel Hardware Send Reply Ex: Mach, PARAS, Chorus, etc. User Kernel
    83. 85. What are Micro Kernels ? <ul><li>Small operating system core
    84. 86. Contains only essential core operating systems functions
    85. 87. Many services traditionally included in the operating system are now external subsystems </li></ul><ul><ul><li>Device drivers
    86. 88. File systems
    87. 89. Virtual memory manager
    88. 90. Windowing system
    89. 91. Security services </li></ul></ul>
    90. 93. HPC Cluster Architecture Frontend Node Public Ethernet Private Ethernet Network Application Network (Optional) Power Distribution (Net addressable units as option) Node Node Node Node Node Node Node Node Node Node
    91. 94. Most Critical Problems with Clusters <ul><li>The largest problem in clusters is software skew </li></ul><ul><ul><li>When software configuration on some nodes is different than on others
    92. 95. Small differences (minor version numbers on libraries) can cripple a parallel program </li></ul></ul><ul><li>The second most important problem is lack of adequate job control of the parallel process </li></ul><ul><ul><li>Signal propagation
    93. 96. Cleanup </li></ul></ul>
    94. 97. Top 3 Problems with Software Packages <ul><li>Software installation works only in interactive mode </li></ul><ul><ul><li>Need a significant work by end-user </li></ul></ul><ul><li>Often rational default settings are not available </li></ul><ul><ul><li>Extremely time consuming to provide values
    95. 98. Should be provided by package developers but … </li></ul></ul><ul><li>Package is required to be installed on a running system </li></ul><ul><ul><li>Means multi-step operation: install + update
    96. 99. Intermediate state can be insecure </li></ul></ul>
    97. 100. Clusters Classification..1 <ul><li>Based on Focus (in Market) </li></ul><ul><ul><li>High Performance (HP) Clusters </li></ul></ul><ul><ul><ul><li>Grand Challenging Applications </li></ul></ul></ul><ul><ul><li>High Availability (HA) Clusters </li></ul></ul><ul><ul><ul><li>Mission Critical applications </li></ul></ul></ul>
    98. 101. HA Cluster: Server Cluster with &quot;Heartbeat&quot; Connection
    99. 102. Clusters Classification..2 <ul><li>Based on Workstation/PC Ownership </li></ul><ul><ul><li>Dedicated Clusters
    100. 103. Non-dedicated clusters </li></ul></ul><ul><ul><ul><li>Adaptive parallel computing
    101. 104. Also called Communal multiprocessing </li></ul></ul></ul>
    102. 105. Clusters Classification..3 <ul><li>Based on Node Architecture .. </li></ul><ul><ul><li>Clusters of PCs (CoPs)
    103. 106. Clusters of Workstations (COWs)
    104. 107. Clusters of SMPs (CLUMPs) </li></ul></ul>
    105. 108. Building Scalable Systems: Cluster of SMPs (Clumps) Performance of SMP Systems Vs. Four-Processor Servers in a Cluster
    106. 109. Clusters Classification..4 <ul><li>Based on Node OS Type .. </li></ul><ul><ul><li>Linux Clusters (Beowulf)
    107. 110. Solaris Clusters (Berkeley NOW)
    108. 111. NT Clusters (HPVM)
    109. 112. AIX Clusters (IBM SP2)
    110. 113. SCO/Compaq Clusters (Unixware)
    111. 114. Digital VMS Clusters, HP clusters </li></ul></ul>
    112. 115. Clusters Classification..5 <ul>Based on Processor Arch, Node Type </ul><ul><li>Homogeneous Clusters </li></ul><ul><ul><li>All nodes will have similar configuration </li></ul></ul><ul><li>Heterogeneous Clusters </li></ul><ul><ul><li>Nodes based on different processors and running different Operating Systems </li></ul></ul>
    113. 116. Cluster Implementation <ul><li>What is Middleware ?
    114. 117. What is Single System Image ?
    115. 118. Benefits of Single System Image </li></ul>
    116. 119. What is Cluster Middle-ware ? <ul><li>An interface between user applications and cluster hardware and OS platform.
    117. 120. Middle-ware packages support each other at the management, programming, and implementation levels.
    118. 121. Middleware Layers: </li></ul><ul><ul><li>SSI Layer
    119. 122. Availability Layer: It enables the cluster services of </li></ul></ul><ul><ul><ul><li>Checkpointing, Automatic Failover, recovery from failure,
    120. 123. fault-tolerant operating among all cluster nodes. </li></ul></ul></ul>
    121. 124. Middleware Design Goals <ul><li>Complete Transparency </li></ul><ul><ul><li>Lets the see a single cluster system.. </li></ul></ul><ul><ul><ul><li>Single entry point, ftp, telnet, software loading... </li></ul></ul></ul><ul><li>Scalable Performance </li></ul><ul><ul><li>Easy growth of cluster </li></ul></ul><ul><ul><ul><li>no change of API & automatic load distribution. </li></ul></ul></ul><ul><li>Enhanced Availability </li></ul><ul><ul><li>Automatic Recovery from failures </li></ul></ul><ul><ul><ul><li>Employ checkpointing & fault tolerant technologies </li></ul></ul></ul><ul><ul><li>Handle consistency of data when replicated.. </li></ul></ul>
    122. 125. What is Single System Image (SSI) ? <ul><li>A single system image is the illusion , created by software or hardware, that a collection of computing elements appear as a single computing resource. </li></ul><ul><li>SSI makes the cluster appear like a single machine to the user, to applications, and to the network.
    123. 126. A cluster without a SSI is not a cluster </li></ul>
    124. 127. Benefits of Single System Image <ul><li>Usage of system resources transparently
    125. 128. Improved reliability and higher availability
    126. 129. Simplified system management
    127. 130. Reduction in the risk of operator errors
    128. 131. User need not be aware of the underlying system architecture to use these machines effectively </li></ul>
    129. 132. Distributed Computing <ul><li>No shared memory
    130. 133. Communication among processes </li></ul><ul><ul><li>Send a message
    131. 134. Receive a message </li></ul></ul><ul><li>Asynchronous
    132. 135. Synchronous
    133. 136. Synergy among processes </li></ul>
    134. 137. Messages <ul><li>Messages are sequences of bytes moving between processes
    135. 138. The sender and receiver must agree on the type structure of values in the message
    136. 139. “ Marshalling”: data layout so that there is no ambiguity such as “four chars” v. “one integer”. </li></ul>
    137. 140. Message Passing <ul><li>Process A sends a data buffer as a message to process B.
    138. 141. Process B waits for a message from A, and when it arrives copies it into its own local memory.
    139. 142. No memory shared between A and B. </li></ul>
    140. 143. Message Passing <ul><li>Obviously, </li></ul><ul><ul><li>Messages cannot be received before they are sent.
    141. 144. A receiver waits until there is a message. </li></ul></ul><ul><li>Asynchronous </li></ul><ul><ul><li>Sender never blocks, even if infinitely many messages are waiting to be received
    142. 145. Semi-asynchronous is a practical version of above with large but finite amount of buffering </li></ul></ul>
    143. 146. Message Passing: Point to Point <ul><li>Q: send(m, P) </li></ul><ul><ul><li>Send message M to process P </li></ul></ul><ul><li>P: recv(x, Q) </li></ul><ul><ul><li>Receive message from process Q, and place it in variable x </li></ul></ul><ul><li>The message data </li></ul><ul><ul><li>Type of x must match that of m
    144. 147. As if x := m </li></ul></ul>
    145. 148. Broadcast <ul><li>One sender Q, multiple receivers P
    146. 149. Not all receivers may receive at the same time
    147. 150. Q: broadcast (m) </li></ul><ul><ul><li>Send message M to processes </li></ul></ul><ul><li>P: recv(x, Q) </li></ul><ul><ul><li>Receive message from process Q, and place it in variable x </li></ul></ul>
    148. 151. Synchronous Message Passing <ul><li>Sender blocks until receiver is ready to receive.
    149. 152. Cannot send messages to self.
    150. 153. No buffering. </li></ul>
    151. 154. Asynchronous Message Passing <ul><li>Sender never blocks.
    152. 155. Receiver receives when ready. </li></ul><ul><li>Can send messages to self. </li></ul><ul><li>Infinite buffering. </li></ul>
    153. 156. Message Passing <ul><li>Speed not so good </li></ul><ul><ul><li>Sender copies message into system buffers.
    154. 157. Message travels the network.
    155. 158. Receiver copies message from system buffers into local memory.
    156. 159. Special virtual memory techniques help. </li></ul></ul><ul><li>Programming Quality </li></ul><ul><ul><li>less error-prone cf. shared memory </li></ul></ul>
    157. 160. Distributed Programs <ul><li>Spatially distributed programs </li></ul><ul><ul><li>A part here, a part there, …
    158. 161. Parallel
    159. 162. Synergy </li></ul></ul><ul><li>Temporally distributed programs </li></ul><ul><ul><li>Compute half today, half tomorrow
    160. 163. Combine the results at the end </li></ul></ul><ul><li>Migratory programs </li></ul><ul><ul><li>Have computation, will travel </li></ul></ul>
    161. 164. Technological Bases of Distributed+Parallel Programs <ul><li>Spatially distributed programs </li></ul><ul><ul><li>Message passing </li></ul></ul><ul><li>Temporally distributed programs </li></ul><ul><ul><li>Shared memory </li></ul></ul><ul><li>Migratory programs </li></ul><ul><ul><li>Serialization of data and programs </li></ul></ul>
    162. 165. Technological Bases for Migratory programs <ul><li>Same CPU architecture </li></ul><ul><ul><li>X86, PowerPC, MIPS, SPARC, …, JVM </li></ul></ul><ul><li>Same OS + environment
    163. 166. Be able to “checkpoint” </li></ul><ul><ul><li>suspend, and
    164. 167. then resume computation
    165. 168. without loss of progress </li></ul></ul>
    166. 169. Message Passing Libraries <ul><li>Programmer is responsible for initial data distribution, synchronization, and sending and receiving information
    167. 170. Parallel Virtual Machine (PVM)
    168. 171. Message Passing Interface (MPI)
    169. 172. Bulk Synchronous Parallel model (BSP) </li></ul>
    170. 173. BSP: Bulk Synchronous Parallel model <ul><li>Divides computation into supersteps
    171. 174. In each superstep a processor can work on local data and send messages.
    172. 175. At the end of the superstep, a barrier synchronization takes place and all processors receive the messages which were sent in the previous superstep </li></ul>
    173. 176. BSP: Bulk Synchronous Parallel model <ul><li>http://www.bsp-worldwide.org/
    174. 177. Book: Rob H. Bisseling, “Parallel Scientific Computation: A Structured Approach using BSP and MPI,” Oxford University Press, 2004, 324 pages, ISBN 0-19-852939-2. </li></ul>
    175. 178. BSP Library <ul><li>Small number of subroutines to implement </li></ul><ul><ul><li>process creation,
    176. 179. remote data access, and
    177. 180. bulk synchronization. </li></ul></ul><ul><li>Linked to C, Fortran, … programs </li></ul>
    178. 181. Portable Batch System (PBS) <ul><li>Prepare a .cmd file </li></ul><ul><ul><li>naming the program and its arguments
    179. 182. properties of the job
    180. 183. the needed resources  </li></ul></ul><ul><li>Submit .cmd to the PBS Job Server: qsub command 
    181. 184. Routing and Scheduling: The Job Server </li></ul><ul><ul><li>examines .cmd details to route the job to an execution queue.
    182. 185. allocates one or more cluster nodes to the job
    183. 186. communicates with the Execution Servers (mom's) on the cluster to determine the current state of the nodes. 
    184. 187. When all of the needed are allocated, passes the .cmd on to the Execution Server on the first node allocated (the &quot;mother superior&quot;).  </li></ul></ul><ul><li>Execution Server </li></ul><ul><ul><li>will login on the first node as the submitting user and run the .cmd file in the user's home directory. 
    185. 188. Run an installation defined prologue script.
    186. 189. Gathers the job's output to the standard output and standard error
    187. 190. It will execute installation defined epilogue script.
    188. 191. Delivers stdout and stdout to the user. </li></ul></ul>
    189. 192. TORQUE, an open source PBS <ul><li>Tera-scale Open-source Resource and QUEue manager (TORQUE) enhances OpenPBS
    190. 193. Fault Tolerance </li></ul><ul><ul><li>Additional failure conditions checked/handled
    191. 194. Node health check script support </li></ul></ul><ul><li>Scheduling Interface
    192. 195. Scalability </li></ul><ul><ul><li>Significantly improved server to MOM communication model
    193. 196. Ability to handle larger clusters (over 15 TF/2,500 processors)
    194. 197. Ability to handle larger jobs (over 2000 processors)
    195. 198. Ability to support larger server messages </li></ul></ul><ul><li>Logging
    196. 199. http://www.supercluster.org/projects/torque/ </li></ul>
    197. 200. PVM, and MPI <ul><li>Message passing primitives
    198. 201. Can be embedded in many existing programming languages
    199. 202. Architecturally portable
    200. 203. Open-sourced implementations </li></ul>
    201. 204. Parallel Virtual Machine ( PVM ) <ul><li>PVM enables a heterogeneous collection of networked computers to be used as a single large parallel computer.
    202. 205. Older than MPI
    203. 206. Large scientific/engineering user community
    204. 207. http://www.csm.ornl.gov/pvm/ </li></ul>
    205. 208. Message Passing Interface (MPI) <ul><li>http ://www-unix.mcs.anl.gov/mpi/
    206. 209. MPI-2.0 http://www.mpi-forum.org/docs/
    207. 210. MPI CH: www.mcs.anl.gov/mpi/mpich / by Argonne National Laboratory and Missisippy State University
    208. 211. LAM: http://www.lam-mpi.org/
    209. 212. http://www.open-mpi.org/ </li></ul>
    210. 213. OpenMP for shared memory <ul><li>Distributed shared memory API
    211. 214. User-gives hints as directives to the compiler
    212. 215. http://www.openmp.org </li></ul>
    213. 216. SPMD <ul><li>Single program, multiple data
    214. 217. Contrast with SIMD
    215. 218. Same program runs on multiple nodes
    216. 219. May or may not be lock-step
    217. 220. Nodes may be of different speeds
    218. 221. Barrier synchronization </li></ul>
    219. 222. Condor <ul><li>Cooperating workstations: come and go.
    220. 223. Migratory programs </li></ul><ul><ul><li>Checkpointing
    221. 224. Remote IO </li></ul></ul><ul><li>Resource matching
    222. 225. http://www.cs.wisc.edu/condor/ </li></ul>
    223. 226. Migration of Jobs <ul><li>Policies </li></ul><ul><ul><li>Immediate-Eviction
    224. 227. Pause-and-Migrate </li></ul></ul><ul><li>Technical Issues </li></ul><ul><ul><li>Check-pointing: Preserving the state of the process so it can be resumed.
    225. 228. Migrating from one architecture to another </li></ul></ul>
    226. 229. OpenMosix Distro <ul><li>Quantian Linux </li></ul><ul><ul><li>Boot from DVD-ROM
    227. 230. Compressed file system on DVD
    228. 231. Several GB of cluster software
    229. 232. http:// dirk.eddelbuettel.com/quantian.html </li></ul></ul><ul><li>Live CD/DVD or Single Floppy Bootables </li></ul><ul><ul><li>http://bofh.be/clusterknoppix/
    230. 233. http://sentinix.org/
    231. 234. http://itsecurity.mq.edu.au/chaos/
    232. 235. http://openmosixloaf.sourceforge.net/
    233. 236. http://plumpos.sourceforge.net/
    234. 237. http://www.dynebolic.org/
    235. 238. http://bccd.cs.uni.edu/
    236. 239. http://eucaristos.sourceforge.net/
    237. 240. http://gomf.sourceforge.net/ </li></ul></ul><ul><li>Can be installed on HDD </li></ul>
    238. 241. What is openMOSIX? <ul><li>An open source enhancement to the Linux kernel
    239. 242. Cluster with come-and-go nodes
    240. 243. System image model: Virtual machine with lots of memory and CPU
    241. 244. Granularity: Process
    242. 245. Improves the overall (cluster-wide) performance.
    243. 246. Multi-user, time-sharing environment for the execution of both sequential and parallel applications
    244. 247. Applications unmodified (no need to link with special library) </li></ul>
    245. 248. What is openMOSIX? <ul><li>Execution environment: </li></ul><ul><ul><li>farm of diskless x86 based nodes
    246. 249. UP (uniprocessor), or
    247. 250. SMP (symmetric multi processor)
    248. 251. connected by standard LAN (e.g., Fast Ethernet) </li></ul></ul><ul><li>Adaptive resource management to dynamic load characteristics </li></ul><ul><ul><li>CPU, RAM, I/O, etc. </li></ul></ul><ul><li>Linear scalability </li></ul>
    249. 252. Users’ View of the Cluster <ul><li>Users can start from any node in the cluster, or sysadmin sets-up a few nodes as login nodes
    250. 253. Round-robin DNS: “hpc.clusters” with many IPs assigned to same name
    251. 254. Each process has a Home-Node </li></ul><ul><ul><li>Migrated processes always appear to run at the home node, e.g., “ps” show all your processes, even if they run elsewhere </li></ul></ul>
    252. 255. MOSIX architecture <ul><li>network transparency
    253. 256. preemptive process migration
    254. 257. dynamic load balancing
    255. 258. memory sharing
    256. 259. efficient kernel communication
    257. 260. probabilistic information dissemination algorithms
    258. 261. decentralized control and autonomy </li></ul>
    259. 262. A two tier technology <ul><li>Information gathering and dissemination </li></ul><ul><ul><li>Support scalable configurations by probabilistic dissemination algorithms
    260. 263. Same overhead for 16 nodes or 2056 nodes </li></ul></ul><ul><li>Pre-emptive process migration that can migrate any process, anywhere, anytime - transparently </li></ul><ul><ul><li>Supervised by adaptive algorithms that respond to global resource availability
    261. 264. Transparent to applications, no change to user interface </li></ul></ul>
    262. 265. Tier 1: Information gathering and dissemination <ul><li>In each unit of time (e.g., 1 second) each node gathers information about: </li></ul><ul><ul><li>CPU(s) speed, load and utilization
    263. 266. Free memory
    264. 267. Free proc-table/file-table slots </li></ul></ul><ul><li>Info sent to a randomly selected node
    265. 268. Scalable - more nodes better scattering </li></ul>
    266. 269. Tier 2: Process migration <ul><li>Load balancing: reduce variance between pairs of nodes to improve the overall performance
    267. 270. Memory ushering: migrate processes from a node that nearly exhausted its free memory, to prevent paging
    268. 271. Parallel File I/O: bring the process to the file-server, direct file I/O from migrated processes </li></ul>
    269. 272. Network transparency <ul><li>The user and applications are provided a virtual machine that looks like a single machine.
    270. 273. Example: Disk access from diskless nodes on fileserver is completely transparent to programs </li></ul>
    271. 274. Preemptive process migration <ul><li>Any user’s process, trasparently and at any time, can/may migrate to any other node.
    272. 275. The migrating process is divided into: </li><ul><li>system context ( deputy ) that may not be migrated from home workstation (UHN);
    273. 276. user context ( remote ) that can be migrated on a diskless node; </li></ul></ul>
    274. 277. Splitting the Linux process <ul><li>System context (environment) - site dependent- “home” confined
    275. 278. Connected by an exclusive link for both synchronous (system calls) and asynchronous (signals, MOSIX events)
    276. 279. Process context (code, stack, data) - site independent - may migrate </li></ul>Deputy Remote Kernel Kernel Userland Userland openMOSIX Link Local master node diskless node
    277. 280. Dynamic load balancing <ul><li>Initiates process migrations in order to balance the load of farm
    278. 281. responds to variations in the load of the nodes, runtime characteristics of the processes, number of nodes and their speeds
    279. 282. makes continuous attempts to reduce the load differences among nodes
    280. 283. the policy is symmetrical and decentralized </li><ul><li>all of the nodes execute the same algorithm
    281. 284. the reduction of the load differences is performed indipendently by any pair of nodes </li></ul></ul>
    282. 285. The ACE ORB <ul><li>What Is CORBA?
    283. 286. CORBA Basics </li></ul><ul><ul><li>Clients, Servers, and Servants
    284. 287. ORBs and POAs
    285. 288. IDL and the Role of IDL Compilers
    286. 289. IORs
    287. 290. Tying it all together </li></ul></ul><ul><li>Overview of ACE/TAO
    288. 291. CORBA Services </li></ul><ul><ul><li>Naming Service
    289. 292. Trading Service
    290. 293. Event Service </li></ul></ul><ul><li>Multi-Threaded Issues Using CORBA </li></ul>
    291. 294. What Is CORBA? <ul><li>C ommon O bject R equest B roker A rchitecture </li></ul><ul><ul><li>Common Architecture
    292. 295. Object Request Broker – ORB </li></ul></ul><ul><li>Specification from the OMG </li></ul><ul><ul><li>http://www.omg.org/technology/documents/corba_spec_catalog.htm
    293. 296. Must be implemented before usable </li></ul></ul>
    294. 297. What Is CORBA? <ul><li>More specifically: </li></ul><ul><ul><li>“ ( CORBA ) is a standard defined by the Object Management Group (OMG) that enables software components written in multiple computer languages and running on multiple computers to work together ” (1)
    295. 298. Allows for Object Interoperability, regardless of: </li></ul></ul><ul><ul><ul><li>Operating Systems
    296. 299. Programming Language
    297. 300. Takes care of Marshalling and Unmarshalling of Data </li></ul></ul></ul><ul><ul><li>A method to perform Distributed Computing </li></ul></ul>
    298. 301. What Is CORBA? Program A <ul><li>Running on a Windows PC
    299. 302. Written in Java </li></ul>Program B <ul><li>Running on a Linux Machine
    300. 303. Written in C++ </li></ul>CORBA
    301. 304. CORBA Basics: Clients, Servers, and Servants <ul><li>CORBA Clients </li></ul><ul><ul><li>An Application (program)
    302. 305. Request services from Servant object </li></ul></ul><ul><ul><ul><li>Invoke a method call </li></ul></ul></ul><ul><ul><li>Can exist on a different computer from Servant </li></ul></ul><ul><ul><ul><li>Can also exist on same computer, or even within the same program, as the Servant </li></ul></ul></ul><ul><ul><li>Implemented by Software Developer </li></ul></ul>
    303. 306. CORBA Basics: Clients, Servers, and Servants <ul><li>CORBA Servers </li></ul><ul><ul><li>An Application (program)
    304. 307. Performs setup needed to get Servants configured properly </li></ul></ul><ul><ul><ul><li>ORB’s, POA’s </li></ul></ul></ul><ul><ul><li>Instantiates and starts Servants object(s)
    305. 308. Once configuration done and Servant(s) running, Clients can begin to send messages
    306. 309. Implemented by Software Developer </li></ul></ul>
    307. 310. CORBA Basics: Clients, Servers, and Servants <ul><li>Servants </li></ul><ul><ul><li>Objects
    308. 311. Implement interfaces
    309. 312. Respond to Client requests
    310. 313. Exists within the same program as the Server that created and started it
    311. 314. Implemented by Software Developer </li></ul></ul>
    312. 315. ORB’s and POA’s <ul><li>ORB: Object Request Broker </li></ul><ul><ul><li>The “ORB” in “CORBA” </li></ul></ul><ul><ul><ul><li>At the heart of CORBA </li></ul></ul></ul><ul><ul><li>Enables communication
    313. 316. Implemented by ORB Vendor </li></ul></ul><ul><ul><ul><li>An organization that implements the CORBA Specification (a company, a University, etc.) </li></ul></ul></ul><ul><ul><li>Can be viewed as an API/Framework </li></ul></ul><ul><ul><ul><li>Set of classes and method </li></ul></ul></ul><ul><ul><li>Used by Clients and Servers to properly setup communication </li></ul></ul><ul><ul><ul><li>Client and Server ORB’s communicate over a network
    314. 317. Glue between Client and Server applications </li></ul></ul></ul>
    315. 318. ORB’s and POA’s <ul><li>POA: Portable Object Adapter </li></ul><ul><ul><li>A central CORBA goal: Programs using different ORB’s (provided by different ORB Vendors) can still communicate
    316. 319. The POA was adopted as the solution
    317. 320. Can be viewed as an API/Framework </li></ul></ul><ul><ul><ul><li>Set of classes and method </li></ul></ul></ul><ul><ul><li>Sits between ORB’s and Servants </li></ul></ul><ul><ul><ul><li>Glue between Servants and ORBs </li></ul></ul></ul><ul><ul><li>Job is to: </li></ul></ul><ul><ul><ul><li>Receive messages from ORB’s
    318. 321. Activate the appropriate Servant
    319. 322. Deliver the message to the Servant </li></ul></ul></ul>
    320. 323. CORBA Basics: IDL <ul><li>IDL: The Interface Definition Language </li></ul><ul><ul><li>Keyword: Definition </li></ul></ul><ul><ul><ul><li>No “executable” code (cannot implement anything)
    321. 324. Very similar to C++ Header Files
    322. 325. Language independent from Target Language </li></ul></ul></ul><ul><ul><ul><ul><li>Allows Client and Server applications to be written in different (several) languages </li></ul></ul></ul></ul><ul><ul><li>A “contract” between Clients and Servers </li></ul></ul><ul><ul><ul><li>Both MUST have the exact same IDL
    323. 326. Specifies messages and data that can be sent by Clients and received by Servants </li></ul></ul></ul><ul><ul><li>Written by Software Developer </li></ul></ul>
    324. 327. CORBA Basics: IDL <ul><li>Used to define interfaces (i.e. Servants) </li></ul><ul><ul><li>Classes and methods that provide services </li></ul></ul><ul><li>IDL Provides… </li></ul><ul><ul><li>Primitive Data Types (int, float, boolean, char, string)
    325. 328. Ability to compose primitives into more complex data structures
    326. 329. Enumerations, Unions, Arrays, etc.
    327. 330. Object-Oriented Inheritance </li></ul></ul>
    328. 331. CORBA Basics: IDL <ul><li>IDL Compilers </li></ul><ul><ul><li>Converts IDL files to target language files
    329. 332. Done via Language Mappings </li></ul></ul><ul><ul><ul><li>Useful to understand your Language Mapping scheme </li></ul></ul></ul><ul><ul><li>Target language files contain all the implementation code that facilitates CORBA-based communication </li></ul></ul><ul><ul><ul><li>More or less “hides” the details from you </li></ul></ul></ul><ul><ul><li>Creates client “stubs” and Server “skeletons”
    330. 333. Provided by ORB Vendor </li></ul></ul>
    331. 334. CORBA Basics: IDL IDL File IDL Compiler Client Stub Files Server Skeleton Files Generates Generates Generated Files are in Target Language: <ul><li>C++
    332. 335. Java
    333. 336. etc. </li></ul>Generated Files are in Target Language: <ul><li>C++
    334. 337. Java
    335. 338. etc. </li></ul>Client Programs used the classes in the Client Stub files to send messages to the Servant objects Client Program Servant Object Servant Objects inherit from classes in the Server Skeleton files to receive messages from the Client programs Association Inheritance
    336. 339. CORBA Basics: IDL <ul><li>Can also generate empty Servant class files </li></ul>IDL Compiler converts to C++ (in this case)
    337. 340. CORBA Basics: IOR’s <ul><li>IOR: Interoperable Object Reference </li></ul><ul><ul><li>Can be thought of as a “Distributed Pointer”
    338. 341. Unique to each Servant
    339. 342. Used by ORB’s and POA’s to locate Servants </li></ul></ul><ul><ul><ul><li>For Clients, used to find Servants across networks
    340. 343. For Servers, used to find proper Servant running within the application </li></ul></ul></ul><ul><ul><li>Opaque to Client and Server applications </li></ul></ul><ul><ul><ul><li>Only meaningful to ORB’s and POA’s
    341. 344. Contains information about IP Address, Port Numbers, networking protocols used, etc. </li></ul></ul></ul><ul><ul><li>The difficult part is obtaining them </li></ul></ul><ul><ul><ul><li>This is the purpose/reasoning behind developing and using CORBA Services </li></ul></ul></ul>
    342. 345. CORBA Basics: IOR’s <ul><li>Can be viewed in “stringified” format, but… </li></ul><ul><ul><li>Still not very meaningful </li></ul></ul>
    343. 346. CORBA Basics: IOR’s <ul><li>Standardized, to some degree: </li></ul>… … Standardized by the OMG: <ul><li>Used by Client side ORB’s to locate Server side (destination) ORB’s
    344. 347. Contains information needed to make physical connection </li></ul>NOT Standardized by the OMG; proprietary to ORB Vendors <ul><li>Used by Server side ORB’s and POA’s to locate destination Servants </li></ul>
    345. 348. CORBA Basics: Tying it All Together
    346. 349. CORBA Basics: Tying it All Together Client Program IOR (Servant Ref) Server Program Servant Message(Data) Logical Flow Client Program Server Program Servant Actual Flow POA ORB IOR (Servant Ref) ORB Once ORB’s and POA’s set up and configured properly, transparency is possible <ul><li>ORB’s communicate over network
    347. 350. POA’s activate servants and deliver messages </li></ul>
    348. 351. Overview of ACE/TAO <ul><li>ACE: Adaptive Communications Environment </li></ul><ul><ul><li>Object-Oriented Framework/API
    349. 352. Implements many concurrent programming design patterns
    350. 353. Can be used to build more complex communications-based packages </li></ul></ul><ul><ul><ul><li>For example, an ORB </li></ul></ul></ul>
    351. 354. Overview of ACE/TAO <ul><li>TAO: The ACE ORB </li></ul><ul><ul><li>Built on top of ACE
    352. 355. A CORBA implementation
    353. 356. Includes many (if not all) CORBA features specified by the OMG </li></ul></ul><ul><ul><ul><li>Not just an ORB
    354. 357. Provides POA’s, CORBA Services, etc. </li></ul></ul></ul><ul><ul><li>Object-Oriented Framework/API </li></ul></ul>
    355. 358. CORBA Services: The Naming Service <ul><li>The CORBA Naming Service is similar to the White Pages (phone book)
    356. 359. Servants place their “names,” along with their IOR’s, into the Naming Service </li></ul><ul><ul><li>The Naming Service stores these as pairs </li></ul></ul><ul><li>Later, Clients obtain IOR’s from the Naming Service by passing the name of the Servant object to it </li></ul><ul><ul><li>The Naming Service returns the IOR </li></ul></ul><ul><li>Clients may then use to make requests </li></ul>
    357. 360. CORBA Services: The Trading Service <ul><li>The CORBA Naming Service is similar to the Yellow Pages (phone book)
    358. 361. Servants place a description of the services they can provide (i.e. their “Trades”), along with their IOR’s, into the Trading Services </li></ul><ul><ul><li>The Trading Service stores these </li></ul></ul><ul><li>Clients obtain IOR’s from the Trading Service by passing the type(s) of Services they require </li></ul><ul><ul><li>The Trading Service returns an IOR </li></ul></ul><ul><li>Clients may then use to make requests </li></ul>
    359. 362. Multi-Threaded Issues Using CORBA <ul><li>Server performance can be improved by using multiple threads </li></ul><ul><ul><li>GUI Thread
    360. 363. Listening Thread
    361. 364. Processing Thread </li></ul></ul><ul><li>Can also use multiple ORBs and POAs to improve performance </li></ul><ul><ul><li>Requires a multi-threaded solution </li></ul></ul>
    362. 365. What is Grid Computing? <ul><li>Computational Grids </li></ul><ul><ul><li>Homogeneous (e.g., Clusters)
    363. 366. Heterogeneous (e.g., with one-of-a-kind instruments) </li></ul></ul><ul><li>Cousins of Grid Computing
    364. 367. Methods of Grid Computing </li></ul>
    365. 368. Computational Grids <ul><li>A network of geographically distributed resources including computers, peripherals, switches, instruments, and data.
    366. 369. Each user should have a single login account to access all resources.
    367. 370. Resources may be owned by diverse organizations. </li></ul>
    368. 371. Computational Grids <ul><li>Grids are typically managed by gridware.
    369. 372. Gridware can be viewed as a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability…) </li></ul>
    370. 373. Cousins of Grid Computing <ul><li>Parallel Computing
    371. 374. Distributed Computing
    372. 375. Peer-to-Peer Computing
    373. 376. Many others: Cluster Computing, Network Computing, Client/Server Computing, Internet Computing, etc... </li></ul>
    374. 377. Distributed Computing <ul><li>People often ask: Is Grid Computing a fancy new name for the concept of distributed computing?
    375. 378. In general, the answer is “no.” Distributed Computing is most often concerned with distributing the load of a program across two or more processes. </li></ul>
    376. 379. PEER2PEER Computing <ul><li>Sharing of computer resources and services by direct exchange between systems.
    377. 380. Computers can act as clients or servers depending on what role is most efficient for the network. </li></ul>
    378. 381. Methods of Grid Computing <ul><li>Distributed Supercomputing
    379. 382. High-Throughput Computing
    380. 383. On-Demand Computing
    381. 384. Data-Intensive Computing
    382. 385. Collaborative Computing
    383. 386. Logistical Networking </li></ul>
    384. 387. Distributed Supercomputing <ul><li>Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer.
    385. 388. Tackle problems that cannot be solved on a single system. </li></ul>
    386. 389. High-Throughput Computing <ul><li>Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work. </li></ul>
    387. 390. On-Demand Computing <ul><li>Uses grid capabilities to meet short-term requirements for resources that are not locally accessible.
    388. 391. Models real-time computing demands. </li></ul>
    389. 392. Data-Intensive Computing <ul><li>The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases.
    390. 393. Particularly useful for distributed data mining. </li></ul>
    391. 394. Collaborative Computing <ul><li>Concerned primarily with enabling and enhancing human-to-human interactions.
    392. 395. Applications are often structured in terms of a virtual shared space. </li></ul>
    393. 396. Logistical Networking <ul><li>Global scheduling and optimization of data movement.
    394. 397. Contrasts with traditional networking, which does not explicitly model storage resources in the network.
    395. 398. Called &quot;logistical&quot; because of the analogy it bears with the systems of warehouses, depots, and distribution channels. </li></ul>
    396. 399. Globus <ul><li>A collaboration of Argonne National Laboratory’s Mathematics and Computer Science Division, the University of Southern California’s Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory.
    397. 400. Started in 1996 and is gaining popularity year after year. </li></ul>
    398. 401. Globus <ul><li>A project to develop the underlying technologies needed for the construction of computational grids.
    399. 402. Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth. </li></ul>
    400. 403. The Globus Toolkit <ul><li>The Globus Resource Allocation Manager (GRAM) </li></ul><ul><ul><li>Creates, monitors, and manages services.
    401. 404. Maps requests to local schedulers and computers. </li></ul></ul><ul><li>The Grid Security Infrastructure (GSI) </li></ul><ul><ul><li>Provides authentication services. </li></ul></ul>
    402. 405. The Globus Toolkit <ul><li>The Monitoring and Discovery Service (MDS) </li></ul><ul><ul><li>Provides information about system status, including server configurations, network status, and locations of replicated datasets, etc. </li></ul></ul><ul><li>Nexus and globus_io </li></ul><ul><ul><li>provides communication services for heterogeneous environments. </li></ul></ul>
    403. 406. What are Clouds? <ul><li>Clouds are “Virtual Clusters” (“Virtual Grids”) of possibly “Virtual Machines” </li><ul><li>They may cross administrative domains or may “just be a single cluster”; the user cannot and does not want to know </li></ul><li>Clouds support access (lease of) computer instances </li><ul><li>Instances accept data and job descriptions (code) and return results that are data and status flags </li></ul><li>Each Cloud is a “Narrow” (perhaps internally proprietary) Grid
    404. 407. Clouds can be built from Grids
    405. 408. Grids can be built from Clouds </li></ul>
    406. 409. Virtualization and Cloud Computing <ul><li>The Virtues of Virtualization </li><ul><li>Portable environments, enforcement and isolation, fast to deploy, suspend/resume, migration… </li></ul><li>Cloud computing </li><ul><li>SaaS: software as a service
    407. 410. Service: provide me with a workspace
    408. 411. Virtualization makes it easy to provide a workspace/VM </li></ul><li>Cloud computing </li><ul><li>resource leasing, utility computing, elastic computing
    409. 412. Amazon’s Elastic Compute Cloud (EC2) </li></ul><li>Is this real? Or is this just a proof-of-concept? </li><ul><li>Successfully used commercially on a large scale
    410. 413. More experience for scientific applications </li></ul></ul>Virtual Workspaces: http//workspace.globus.org
    411. 414. Two major types of cloud <ul><li>Compute and Data Cloud </li><ul><li>EC2, Google Map Reduce, Science clouds
    412. 415. Provision platform for running science codes
    413. 416. Open source infrastructure: workspace, eucalyptus, hub0
    414. 417. Virtualization: providing environments as VMs </li></ul><li>Hosting Cloud </li><ul><li>GoogleApp Engine
    415. 418. Highly-available, fault tolerance, robustness, etc for Web capabilities
    416. 419. Community example: IU hosting environment (quarry) </li></ul></ul>Virtual Workspaces: http//workspace.globus.org
    417. 420. Technical Questions on Clouds <ul><li>How is data compute affinity tackled in clouds? </li><ul><li>Co-locate data and compute clouds?
    418. 421. Lots of optical fiber i.e. “just” move the data? </li></ul><li>What happens in clouds when demand for resources exceeds capacity – is there a multi-day job input queue? </li><ul><li>Are there novel cloud scheduling issues? </li></ul><li>Do we want to link clouds (or ensembles as atomic clouds); if so how and with what protocols
    419. 422. Is there an intranet cloud e.g. “cloud in a box” software to manage personal (cores on my future 128 core laptop) department or enterprise cloud? </li></ul>
    420. 423. Thanks Much.. <ul><li>99% of the slides are taken from the Internet from various Authors. Thanks to all of them!
    421. 424. Sudarsun Santhiappan
    422. 425. Director – R & D
    423. 426. Burning Glass Technologies
    424. 427. Kilpauk, Chennai 600010 </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.