Project AppleSeed: A Parallel Macintosh Cluster


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Project AppleSeed: A Parallel Macintosh Cluster

  1. 1. Project AppleSeed: A Parallel Macintosh Cluster Viktor K. Decyk, Dean E. Dauger, and Pieter R. Kokelaar Department of Physics and Astronomy University of California, Los Angeles Los Angeles, CA 90095-1547 Abstract We have constructed a parallel cluster consisting of 22 Apple Macintosh G3 and G4 computers running the MacOS, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Introduction Members of the Plasma Simulation Group at UCLA have been actively involved in a number of High-Performance Computing (HPC) projects [1-2] and have pioneered the use of parallel Particle-in-Cell (PIC) codes on massively parallel architectures [3]. Parallel computers are necessary for realistic 3D PIC calculations. In recent years, the parallel technology originally developed for the HPC program has migrated to clusters of commodity computers. Although these clusters generally cannot handle the largest problems, they are inexpensive and useful for code development and give reasonable performance on certain medium scale problems. The most common platform for building such a parallel cluster is based on the Pentium processor running the Linux version of Unix. Building and maintaining a Linux cluster, however, is difficult for the novice, and requires substantial expertise in the Unix operating system. Indeed, reference [4] discusses many of the details one needs to worry about. Recently, we have written a communications library and supporting software which enables one to build and run an Apple Macintosh cluster. Such a cluster runs the same programs as the massively parallel computers, yet is very simple to build and maintain, and gives excellent performance for certain problems. This library and related files and utilities are available at our web site: AppleSeed Software Implementation The current standard for programming on distributed memory computers is MPI [5]. Although Apple computers do not support MPI, they do support a number of
  2. 2. other communications libraries in the MacOS. Since our parallel codes use only a limited subset of MPI, it was straightforward to write a partial implementation of MPI (34 subroutines), which we call MacMPI, based on these native communications libraries. Our first implementation of MacMPI was based on AppleTalk and the PPC Toolbox. This implementation is very robust and reliable, since PPC Toolbox is very mature, and works with virtually every Macintosh, even older machines using the Motorola 680x0 processors. However, it does not give optimum performance, since it was written in an earlier era when network speeds were much slower. In order to obtain high performance, we developed another version of MacMPI based on the TCP/IP implementation of Open Transport, called MacMPI_IP, which gives performance about 7 times faster for large messages than the original MacMPI and is reliable. Both versions of MacMPI are available on our web site in Fortran77 and C. A utility called Launch Den Mother (and associated Launch Puppies) has also been written to automate the procedure of selecting remote computers, copying the executable and associated input files, and starting the parallel application on each computer. In order to make the Macintosh cluster more useful for teaching students how to develop parallel programs, we have added some enhancements to MacMPI. One of these is the monitoring of MPI messages. When a monitor switch is turned on, a small status window appears which shows which nodes are communicating, the size of messages, as well as speedometers indicating the percent of time the program is communicating and what communication speeds are being achieved. We have also made available a number of demonstration programs and sample source code. The most interesting of these is the Parallel Fractal Demo (including an Interactive version) which runs on an arbitrary number of nodes. AppleSeed Hardware Implementation It is quite trivial to build a Macintosh cluster, and the directions will fit on a single page. All one needs is a Fast Ethernet switch or hub and some Category 5 cables, then plug one end of each cable to the Ethernet jack on each Mac and the other end to a port on the switch. To set up the software, one needs to download the Launch Den Mother utility and Puppy and set appropriate permissions with some Control Panel switches in the MacOS. To test your cluster, we recommend downloading and running our Parallel Fractal Demo. If you plan to write your own MPI software, then you also need to download the appropriate MacMPI library. Our current Appleseed cluster consists of 12 user-owned machines and 10 common machines. The user-owned machines are used for normal daily activities in the daytime, but are generally available at night for numerical computing. The common machines are always available for numerical computing. All the Macintoshes have two 100BaseT Ethernet adapters and are connected to two networks simultaneously. One adapter is set to TCP/IP and the machines are all connected via a Cisco switch, which is then connected to a fast campus ATM
  3. 3. network. The other adapter is set to AppleTalk, and most of the machines are connected via an Asanté switch, which is then connected to a slower campus network. When the machines in the cluster are communicating only with each other, their communications are handled entirely by the switches and do not go on the campus networks. The 10 common machines are currently clustered in groups of 4, 4 and 2 in different offices. Each sub-cluster shares a single keyboard and monitor, by connecting each computer in the sub-cluster to a Master View USB KVM Switch. Performance The performance of this cluster was excellent for certain classes of problems, mainly those where communication was small compared to the calculation and the message packet size was large. Results for the large 3D benchmark described in Ref. [6] are summarized below. One can see that the Mac cluster performance was better than that achieved by the Cray T3E-900 when using the same number of nodes. ----------------------------------------- Computer Push Time Loop Time Mac G4/450, IP cluster, 8 proc: 772 nsec. 2756.9 sec. Mac G4/450, IP cluster, 4 proc: 1928 nsec. 6715.3 sec. ----------------------------------------- Cray T3E-900, w/MPI, 8 proc: 1800 nsec. 6196.3 sec. Cray T3E-900, w/MPI, 4 proc: 3844 nsec. 13233.7 sec. The above times are for a 3D particle simulation, using 7,962,624 particles and a 64x32x128 mesh for 425 time steps: To determine what message sizes gave good performance, we developed a ping- pong and swap benchmark (where pairs of processors exchange packets of equal size) where the bandwidth was defined to be twice the packet size divided by the time to exchange the data. The figure below shows a typical curve. As one can see, high bandwidth is achieved for message sizes of around 212 (4096) words or larger. Best bandwidth rates achieved on this test are better than 90% of the peak speed of the 100 Mbps hardware. For the 3D benchmark case described in [7], the average packet size varied between 213 and 217 words, which is in the region of good performance. Conclusion The AppleSeed cluster is particularly attractive for small groups with limited resources. It has been useful for student training, code development, and running large calculations for extended periods. This is especially convenient for unfunded research or exploratory projects, or when meeting short deadlines. It also encourages a more interactive style of computing.
  4. 4. Ping-Pong Communications Test with MacMPI 12.5 Open Transport/IP 10.0 Pentium II Linux cluster Rate(MB/sec) 7.5 Open Transport/AppleTalk 5.0 2.5 PPC Toolbox/AppleTalk 0.0 0 5 10 15 20 ln2(data length(words)) Macintosh G3/350, 100BaseT with Switch Acknowledgements This work has supported by NSF contracts DMS-9722121 and PHY 93-19198 and DOE contracts DE-FG03-98DP00211, DE-FG03-97ER25344, DE-FG03-86ER53225, and DE- FG03-92ER40727. References [1] R. D. Sydora, V. K. Decyk, and J. M. Dawson, “Fluctuation-induced heat transport results from a large global 3D toroidal particle simulation model”, Plasma Phys. Control. Fusion 38, A281 (1996). [2] Ji Qiang, R. Ryne, S. Habib, and V. K. Decyk, “An Object-Oriented Parallel Particle- in-Cell Code for Beam Dynamics Simulation in Linear Accelerators,” Proc. Supercomputing 99, Portland, Oregon, Nov. 1999, CD-ROM. [3] P. C. Liewer and V. K. Decyk, “A General Concurrent Algorithm for Plasma Particle-in-Cell Codes,” J. Computational Phys. 85, 302 (1989). [4] T. L. Sterling, J. Salmon, D. J. Becker, and D. F. Savarese, How to Build a Beowulf, [MIT Press, Cambridge, MA, USA, 1999]. [5] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Reference [MIT Press, Cambridge, MA, 1996. [6] V. K. Decyk, “Skeleton PIC Codes for Parallel Computers,” Computer Physics Communications 87, 87 (1995).