Project AppleSeed: A Parallel Macintosh Cluster
Viktor K. Decyk, Dean E. Dauger, and Pieter R. Kokelaar
Department of Physics and Astronomy
University of California, Los Angeles
Los Angeles, CA 90095-1547
We have constructed a parallel cluster consisting of 22 Apple Macintosh G3 and
G4 computers running the MacOS, and have achieved very good performance on
numerically intensive, parallel plasma particle-in-cell simulations. A subset of the
MPI message-passing library was implemented in Fortran77 and C. This library
enabled us to port code, without modification, from other parallel processors to the
Macintosh cluster. For large problems where message packets are large and
relatively few in number, performance of 50-150 MFlops/node is possible,
depending on the problem. Unlike Unix-based clusters, no special expertise in
operating systems is required to build and run the cluster.
Members of the Plasma Simulation Group at UCLA have been actively involved
in a number of High-Performance Computing (HPC) projects [1-2] and have
pioneered the use of parallel Particle-in-Cell (PIC) codes on massively parallel
architectures . Parallel computers are necessary for realistic 3D PIC calculations.
In recent years, the parallel technology originally developed for the HPC program
has migrated to clusters of commodity computers. Although these clusters
generally cannot handle the largest problems, they are inexpensive and useful for
code development and give reasonable performance on certain medium scale
problems. The most common platform for building such a parallel cluster is based
on the Pentium processor running the Linux version of Unix. Building and
maintaining a Linux cluster, however, is difficult for the novice, and requires
substantial expertise in the Unix operating system. Indeed, reference  discusses
many of the details one needs to worry about. Recently, we have written a
communications library and supporting software which enables one to build and
run an Apple Macintosh cluster. Such a cluster runs the same programs as the
massively parallel computers, yet is very simple to build and maintain, and gives
excellent performance for certain problems. This library and related files and
utilities are available at our web site: http://exodus.physics.ucla.edu/appleseed/.
AppleSeed Software Implementation
The current standard for programming on distributed memory computers is MPI
. Although Apple computers do not support MPI, they do support a number of
other communications libraries in the MacOS. Since our parallel codes use only a
limited subset of MPI, it was straightforward to write a partial implementation of
MPI (34 subroutines), which we call MacMPI, based on these native
communications libraries. Our first implementation of MacMPI was based on
AppleTalk and the PPC Toolbox. This implementation is very robust and reliable,
since PPC Toolbox is very mature, and works with virtually every Macintosh, even
older machines using the Motorola 680x0 processors. However, it does not give
optimum performance, since it was written in an earlier era when network speeds
were much slower. In order to obtain high performance, we developed another
version of MacMPI based on the TCP/IP implementation of Open Transport, called
MacMPI_IP, which gives performance about 7 times faster for large messages than
the original MacMPI and is reliable. Both versions of MacMPI are available on our
web site in Fortran77 and C. A utility called Launch Den Mother (and associated
Launch Puppies) has also been written to automate the procedure of selecting
remote computers, copying the executable and associated input files, and starting the
parallel application on each computer.
In order to make the Macintosh cluster more useful for teaching students how to
develop parallel programs, we have added some enhancements to MacMPI. One of
these is the monitoring of MPI messages. When a monitor switch is turned on, a
small status window appears which shows which nodes are communicating, the
size of messages, as well as speedometers indicating the percent of time the program
is communicating and what communication speeds are being achieved.
We have also made available a number of demonstration programs and sample
source code. The most interesting of these is the Parallel Fractal Demo (including an
Interactive version) which runs on an arbitrary number of nodes.
AppleSeed Hardware Implementation
It is quite trivial to build a Macintosh cluster, and the directions will fit on a
single page. All one needs is a Fast Ethernet switch or hub and some Category 5
cables, then plug one end of each cable to the Ethernet jack on each Mac and the
other end to a port on the switch. To set up the software, one needs to download the
Launch Den Mother utility and Puppy and set appropriate permissions with some
Control Panel switches in the MacOS. To test your cluster, we recommend
downloading and running our Parallel Fractal Demo. If you plan to write your own
MPI software, then you also need to download the appropriate MacMPI library.
Our current Appleseed cluster consists of 12 user-owned machines and 10
common machines. The user-owned machines are used for normal daily activities
in the daytime, but are generally available at night for numerical computing. The
common machines are always available for numerical computing. All the
Macintoshes have two 100BaseT Ethernet adapters and are connected to two
networks simultaneously. One adapter is set to TCP/IP and the machines are all
connected via a Cisco switch, which is then connected to a fast campus ATM
network. The other adapter is set to AppleTalk, and most of the machines are
connected via an Asanté switch, which is then connected to a slower campus
network. When the machines in the cluster are communicating only with each
other, their communications are handled entirely by the switches and do not go on
the campus networks.
The 10 common machines are currently clustered in groups of 4, 4 and 2 in
different offices. Each sub-cluster shares a single keyboard and monitor, by
connecting each computer in the sub-cluster to a Master View USB KVM Switch.
The performance of this cluster was excellent for certain classes of problems,
mainly those where communication was small compared to the calculation and the
message packet size was large. Results for the large 3D benchmark described in Ref.
 are summarized below. One can see that the Mac cluster performance was better
than that achieved by the Cray T3E-900 when using the same number of nodes.
Computer Push Time Loop Time
Mac G4/450, IP cluster, 8 proc: 772 nsec. 2756.9 sec.
Mac G4/450, IP cluster, 4 proc: 1928 nsec. 6715.3 sec.
Cray T3E-900, w/MPI, 8 proc: 1800 nsec. 6196.3 sec.
Cray T3E-900, w/MPI, 4 proc: 3844 nsec. 13233.7 sec.
The above times are for a 3D particle simulation, using 7,962,624
particles and a 64x32x128 mesh for 425 time steps:
To determine what message sizes gave good performance, we developed a ping-
pong and swap benchmark (where pairs of processors exchange packets of equal size)
where the bandwidth was defined to be twice the packet size divided by the time to
exchange the data. The figure below shows a typical curve. As one can see, high
bandwidth is achieved for message sizes of around 212 (4096) words or larger. Best
bandwidth rates achieved on this test are better than 90% of the peak speed of the
100 Mbps hardware.
For the 3D benchmark case described in , the average packet size varied
between 213 and 217 words, which is in the region of good performance.
The AppleSeed cluster is particularly attractive for small groups with limited
resources. It has been useful for student training, code development, and running
large calculations for extended periods. This is especially convenient for unfunded
research or exploratory projects, or when meeting short deadlines. It also
encourages a more interactive style of computing.
Ping-Pong Communications Test with MacMPI
Pentium II Linux cluster
2.5 PPC Toolbox/AppleTalk
0 5 10 15 20
Macintosh G3/350, 100BaseT with Switch
This work has supported by NSF contracts DMS-9722121 and PHY 93-19198 and DOE
contracts DE-FG03-98DP00211, DE-FG03-97ER25344, DE-FG03-86ER53225, and DE-
 R. D. Sydora, V. K. Decyk, and J. M. Dawson, “Fluctuation-induced heat transport
results from a large global 3D toroidal particle simulation model”, Plasma Phys.
Control. Fusion 38, A281 (1996).
 Ji Qiang, R. Ryne, S. Habib, and V. K. Decyk, “An Object-Oriented Parallel Particle-
in-Cell Code for Beam Dynamics Simulation in Linear Accelerators,” Proc.
Supercomputing 99, Portland, Oregon, Nov. 1999, CD-ROM.
 P. C. Liewer and V. K. Decyk, “A General Concurrent Algorithm for Plasma
Particle-in-Cell Codes,” J. Computational Phys. 85, 302 (1989).
 T. L. Sterling, J. Salmon, D. J. Becker, and D. F. Savarese, How to Build a Beowulf,
[MIT Press, Cambridge, MA, USA, 1999].
 M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The
Complete Reference [MIT Press, Cambridge, MA, 1996.
 V. K. Decyk, “Skeleton PIC Codes for Parallel Computers,” Computer Physics
Communications 87, 87 (1995).