568 IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Review of Cooling Technologies
for Computer Products
Richard C. Chu, Robert E. Simons, Michael J. Ellsworth, Roger R. Schmidt, and Vincent Cozzolino
Invited Paper
Abstract—This paper provides a broad review of the cooling
technologies for computer products from desktop computers to
large servers. For many years cooling technology has played a key
role in enabling and facilitating the packaging and performance
improvements in each new generation of computers. The role of
internal and external thermal resistance in module level cooling
is discussed in terms of heat removal from chips and module
and examples are cited. The use of air-cooled heat sinks and
liquid-cooled cold plates to improve module cooling is addressed.
Immersion cooling as a scheme to accommodate high heat flux
at the chip level is also discussed. Cooling at the system level is
discussed in terms of air, hybrid, liquid, and refrigeration-cooled
systems. The growing problem of data center thermal manage-
ment is also considered. The paper concludes with a discussion of
future challenges related to computer cooling technology.
Index Terms—Air cooling, data center cooling, flow boiling, heat
sink, immersion cooling, impingement cooling, liquid cooling, pool
boiling, refrigeration cooling, system cooling, thermal, thermal
management, water cooling.
I. INTRODUCTION
E LECTRONIC devices and equipment now permeate vir-tually every aspect of our daily life. Among the most
ubiquitous of these is the electronic computer varying in size
from the handheld personal digital assistant to large scale main-
frames or servers. In many instances a computer is imbedded
within some other device controlling its function and is not
even recognizable as such. The applications of computers vary
from games for entertainment to highly complex systems sup-
porting vital health, economic, scientific, and military activities.
In a growing number of applications computer failure results
in a major disruption of vital services and can even have
life-threatening consequences. As a result, efforts to improve
the reliability of electronic computers are as important as ef-
forts to improve their speed and storage capacity.
Since the development of the first electronic digital computers
in the 1940s, the effective removal of heat has played a key role
in insuring the reliable operation of successive generations of
computers. The Electrical Numerical Integrator and Computer
(ENIAC), dedicated in 1946, has been described as a “30 ton,
boxcar-sized machine requiring an array of industrial cooling
Manuscript received August 30, 2004.
The authors are with the IBM Corporation, Poughkeepsie, NY 12601 USA
(e-mail: [email protected]).
Digital Object Identifier 10.1109/TDMR.2004.840855
fans to remove the 140 kW dissipated from its 18 000 vacuum
tubes” [1]. Following ENIAC, most early digital computers used
vacuum-tube electroni.
568 IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL.docx
1. 568 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Review of Cooling Technologies
for Computer Products
Richard C. Chu, Robert E. Simons, Michael J. Ellsworth, Roger
R. Schmidt, and Vincent Cozzolino
Invited Paper
Abstract—This paper provides a broad review of the cooling
technologies for computer products from desktop computers to
large servers. For many years cooling technology has played a
key
role in enabling and facilitating the packaging and performance
improvements in each new generation of computers. The role of
internal and external thermal resistance in module level cooling
is discussed in terms of heat removal from chips and module
and examples are cited. The use of air-cooled heat sinks and
liquid-cooled cold plates to improve module cooling is
addressed.
Immersion cooling as a scheme to accommodate high heat flux
at the chip level is also discussed. Cooling at the system level is
discussed in terms of air, hybrid, liquid, and refrigeration-
cooled
systems. The growing problem of data center thermal manage-
ment is also considered. The paper concludes with a discussion
of
future challenges related to computer cooling technology.
Index Terms—Air cooling, data center cooling, flow boiling,
2. heat
sink, immersion cooling, impingement cooling, liquid cooling,
pool
boiling, refrigeration cooling, system cooling, thermal, thermal
management, water cooling.
I. INTRODUCTION
E LECTRONIC devices and equipment now permeate vir-tually
every aspect of our daily life. Among the most
ubiquitous of these is the electronic computer varying in size
from the handheld personal digital assistant to large scale main-
frames or servers. In many instances a computer is imbedded
within some other device controlling its function and is not
even recognizable as such. The applications of computers vary
from games for entertainment to highly complex systems sup-
porting vital health, economic, scientific, and military
activities.
In a growing number of applications computer failure results
in a major disruption of vital services and can even have
life-threatening consequences. As a result, efforts to improve
the reliability of electronic computers are as important as ef-
forts to improve their speed and storage capacity.
Since the development of the first electronic digital computers
in the 1940s, the effective removal of heat has played a key role
in insuring the reliable operation of successive generations of
computers. The Electrical Numerical Integrator and Computer
(ENIAC), dedicated in 1946, has been described as a “30 ton,
boxcar-sized machine requiring an array of industrial cooling
Manuscript received August 30, 2004.
The authors are with the IBM Corporation, Poughkeepsie, NY
12601 USA
(e-mail: [email protected]).
3. Digital Object Identifier 10.1109/TDMR.2004.840855
fans to remove the 140 kW dissipated from its 18 000 vacuum
tubes” [1]. Following ENIAC, most early digital computers used
vacuum-tube electronics and were cooled with forced air.
The invention of the transistor by Bardeen, Brattain, and
Shockley at Bell Laboratories in 1947 [2] foreshadowed the
development of generations of computers yet to come. As a
replacement for vacuum tubes, the miniature transistor gener-
ated less heat, was much more reliable, and promised lower
production costs. For a while it was thought that the use of
transistors would greatly reduce if not totally eliminate cooling
concerns. This thought was short-lived as packaging engineers
worked to improve computer speed and storage capacity by
packaging more and more transistors on printed circuit boards,
and then on ceramic substrates.
The trend toward higher packaging densities dramatically
gained momentum with the invention of the integrated cir-
cuit separately by Kilby at Texas Instruments and Noyce at
Fairchild Semiconductor in 1959 [2]. During the 1960s, small
scale and then medium scale integration (SSI and MSI) led
from one device per chip to hundreds of devices per chip. The
trend continued through the 1970s with the development of
large scale integration (LSI) technologies offering hundreds
to thousands of devices per chip, and then through the 1980s
with the development of very large scale (VLSI) technologies
offering thousands to tens of thousands of devices per chip.
This
trend continued with the introduction of the microprocessor
and continues to this day with chip makers projecting that a
microprocessor chip with a billion or more transistors will be a
reality before 2010.
In many instances the trend toward higher circuit packaging
5. Processor module cooling is typically characterized in two
ways: cooling internal and external to the module package and
applies to both single and multichip modules. Fig. 2 illustrates
the distinction between the two cooling regimes in the context
of a single-chip module.
A. Internal Module Cooling
The primary mode of heat transfer internal to the module is by
conduction. The internal thermal resistance is therefore dictated
by the module’s physical construction and material properties.
The objective is to effectively transfer the heat from the elec-
tronics circuits to an outer surface of the module where the heat
will be removed by external means which will be discussed in
the following section.
In the case of large multichip modules (MCMs) where
variation in the location and height of chips had to be
considered,
an approach (Figs. 3 and 4) was adopted that employed a
spring-loaded mechanical cylindrical piston touching each chip
with point contact and minute physical gaps between the chip
and piston and between the piston and module housing [3].
Fig. 2. Cross-section of a typical module denoting internal
cooling region and
external cooling region.
Fig. 3. Isometric cutaway view of an IBM TCM module with a
water-cooled
cold plate.
Fig. 4. Cross-sectional view of an IBM TCM module on an
individual chip
site basis.
6. The volume within the module was filled with helium gas to
minimize the thermal resistance across the gaps and achieve
an acceptable internal thermal resistance. The total module
cooling assembly was patented as a gas-encapsulated module
[4] and later named a thermal conduction module (TCM). TCM
cooling technology evolved through three generations of IBM
mainframes: system 3081, ES/3090, and ES/9000, with about
a threefold increase in cooling capability from 19 to 64 W/cm
at the chip level and 3.7 to 11.8 W/cm at the module level
570 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Fig. 5. Cross-sectional view of a Hitachi M-880 module on an
individual chip
site basis.
[5]. The last generation TCM incorporated a copper piston (the
original piston was aluminum) with a cylindrical center section
and a slight taper on each end to minimize the gap between
piston and cap while retaining intimate contact between the
piston face and the chip [6]. Additionally, the volume inside the
module was filled with a PAO (polyalphaolefin) oil instead of
helium to reduce the piston-to-cap and chip-to-piston thermal
resistances. Hitachi packaged a similar conduction scheme in
their M-880 [7] and MP5800 [8] processors. Instead of a
cylindrical piston Hitachi utilized an interdigitated microfin
structure (Fig. 5).
In the 1990s when IBM made the switch from bipolar to
CMOS circuit technology [10] the conduction cooling approach
was simplified and reduced in cost by adopting a “flat plate”
conduction approach as shown in Fig. 6. The thermal path from
chip to cap is provided by a controlled thickness (e.g., 0.10 mm
7. to 0.18 mm) of a thermally conductive paste. This was possible
largely due to improved planarity of the substrate, better control
of dimensional tolerances and enhanced thermal conductivity of
the paste.
As time went on, chip power levels continued to increase. In
addition, concentrated areas of high heat flux 2 to 3 times the
average chip heat flux referred to as hot spots emerged. To meet
internal thermal resistance requirements, in 2001 IBM chose to
attach a high-grade silicon carbide (SiC) spreader to the chip
with an adhesive thermal interface (ATI) and then use a more
conventional thermal paste between the spreader and the cap
[10]. This configuration is shown in Fig. 7.
The adhesive thermal interface (ATI), while not as thermally
conductive as the thermal paste, could be applied much thinner
resulting in a lower thermal resistance. SiC was chosen for the
spreader material for its unique combination of high thermal
conductivity and low coefficient of thermal expansion (CTE).
The CTE of the SiC closely matches that of the silicon chip thus
avoiding stress fracturing the interface when the module heats
up during use. The thermal resistance of this package arrange-
ment is lower than just using thermal paste between chip and
cap because of the use of the lower thermal resistance ATI on
the smaller chip area. The thermal paste thermal resistance is
mitigated by applying it over a much larger area.
B. External Module Cooling
Cooling external to the module serves as the primary means
to effectively transfer the heat generated within the module to
Fig. 6. Cross-sectional view of central processor module
package with thermal
paste path to module cap [9].
8. Fig. 7. MCM cross-section showing heat spreader adhesively
attached to chip
(adapted from [10]).
the system environment. This is accomplished primarily by at-
taching a heat sink to the module. Traditionally, and prefer-
ably, the system environment of choice has been air because
of its ease of implementation, low cost, and transparency to
the end user or customer. This section, therefore, will focus
on air-cooled heat sinks. Liquid-cooled heat sinks typically re-
ferred to as cold plates will also be discussed.
1) Air-Cooled Heat Sinks: A typical air-cooled heat sink is
shown in Fig. 8. The heat sink is constructed of a base region
that is in contact with the module to be cooled. Fins protruding
from the base serve to extend surface area for heat transfer to
the air. Heat is conducted through the base, up into the fins and
then transferred to the air flowing in the spaces between the fins
by convection. The spacing between fins can run continuously
in one direction in the case of a straight fin heat sink or they
can
run in two directions in the case of a pin fin heat sink (Fig. 9).
Air flow can either be through the heat sink laterally (in cross
flow) or can impinge from the top as seen in Fig. 10.
The thermal performance of the heat sink is a function of
many variables. Geometric variables include the thickness and
plan area of the base plus the fin thickness, height, and spacing.
The principal material variable is thermal conductivity. Also
factored in is volumetric air flow and pressure drop. Many opti-
mization studies have been conducted to minimize the external
thermal resistance for a particular set of application conditions
[11]–[13]. However, over time, as greater and greater thermal
performance has been required, fin heights and fin number have
increased while fin spacing has been decreased. Additionally,
heat sinks have migrated in construction from all aluminum
9. CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 571
Fig. 8. Typical air-cooled heat sink.
Fig. 9. Typical (a) straight fin heat sink and (b) pin fin heat
sink.
Fig. 10. Air flow path through a heat sink: (a) cross flow or (b)
impingement.
(with thermal conductivity ranging from 150–200 W/mK) to
aluminum fins on copper bases (with thermal conductivity
ranging from 350–390 W/mK) to all copper. In certain cases
heat pipes have been embedded into heat sinks to more effec-
tively spread the heat [14]–[16].
Heat sink attachment to the module also plays a role in the ex-
ternal thermal performance of a module. The method of attach-
ment and the material at the interface must be considered. The
material at the interface is important because when two surfaces
are brought together seemingly in contact with one another, sur-
face irregularities such as surface flatness and surface
roughness
result in just a fraction of the surfaces actually contacting one
another. The majority of the heat is therefore transferred
through
the material that fills the voids or gaps that exist between the
two surfaces [17]. One method of heat sink attachment is by
mechanical means using screws or a clamping mechanism. Air
has traditionally existed at the interface but more recently oils
or even phase change materials (PCMs) have been used [18] to
10. reduce the thermal resistance at the interface. Another method
of attachment has been adhesively with an elastomer or epoxy.
This method has worked well on smaller single-chip modules
where heat sinks do not have to be removed from the module.
2) Water-Cooled Cold Plates: For situations where air
cooling could not meet requirements, such as was the case in
IBM’s 3081, ES/3090, and ES/9000 systems in the 1980s and
early 1990s, and the case in Hitachi’s M-880 and MP5800 in the
1990s, heat was removed from the modules via water-cooled
cold plates. Compared to air, water cooling can provide al-
most an order of magnitude reduction in thermal resistance
principally due to the higher thermal conductivity of water.
In addition, because of the higher density and specific heat of
water, its ability to absorb heat in terms of the temperature
rise across the coolant stream is approximately 3500 times that
of air. Cold plates function very similarly to air-cooled heat
sinks. For example, the ES/9000 cold plate is an internal finned
structure made of tellurium copper [19]. As with the air-cooled
heat sinks, changes in material properties and geometry were
made to improve performance. A higher thermal conductivity
tellurium copper was chosen over beryllium copper used in
previous generation cold plates. Additionally, fin heights were
increased and channel widths (analogous to fin spacings) were
decreased. The ES/9000 module also marked the first time IBM
used a PAO oil at the interface between the module cap and
cold plate to reduce the thermal interface resistance.
In an effort to significantly extend the cooling capability
of liquid-cooled cold plates, researchers continue to work on
microchannel cooling structures. The concept was originally
demonstrated over 20 years ago by Tuckerman and Pease [20].
They chemically etched 50 m-wide by 300- m-deep channels
into a 1 cm 1 cm silicon chip. By directing water through
these microchannels they were able to remove 790 W with a
temperature difference of 71 C. More recently, aluminum ni-
11. tride heat sinks fabricated using laser machining and adhesively
attached to the die have been used to cool a high-powered
MCM and achieve a junction to ambient unit thermal resistance
below 0.6 K-cm /W [21]. The challenge continues to be to
provide a practical chip or module cooling structure and flow
interconnections in a manner which is both manufacturable
(i.e., cost effective) and reliable.
572 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
C. Immersion Cooling
Immersion cooling has been of interest as a possible method
to cool high heat flux components for many years. Unlike the
water-cooled cold plate approaches which utilize physical walls
to separate the coolant from the chips, immersion cooling brings
the coolant in direct physical contact with the chips. As a result,
most of the contributors to internal thermal resistance are elim-
inated, except for the thermal conduction resistance from the
device junctions to the surface of the chip in contact with the
liquid.
Direct liquid immersion cooling offers a high heat transfer co-
efficient which reduces the temperature rise of the heated chip
surface above the liquid coolant temperature. The magnitude
of the heat transfer coefficient depends upon the thermophys-
ical properties of the coolant and the mode of convective heat
transfer employed. The modes of heat transfer associated with
liquid immersion cooling are generally classified as natural con-
vection, forced convection, and boiling. Forced convection in-
cludes liquid jet impingement in the single phase regime and
boiling (including pool boiling, flow boiling, and spray cooling)
in the two-phase regime. An example of the broad range of heat
12. flux that can be accommodated with the different modes and
forms of direct liquid immersion cooling is shown in Fig. 11
[22].
Selection of a liquid for direct immersion cooling cannot
be made on the basis of heat transfer characteristics alone.
Chemical compatibility of the coolant with the chips and
other packaging materials exposed to the liquid is an essential
consideration. There may be several coolants that can provide
adequate cooling, but only a few will be chemically compatible.
Water is an example of a liquid which has very desirable
heat transfer properties, but which is generally undesirable for
direct immersion cooling because of its chemical and electrical
characteristics. Alternatively, fluorocarbon liquids (e.g., FC-72,
FC-86, FC-77, etc.) are generally considered to be the most
suitable liquids for direct immersion cooling, in spite of their
poorer thermophysical properties [22], [23].
1) Natural and Forced Liquid Convection: As in the case of
air cooling, liquid natural convection is a heat transfer process
in which mixing and fluid motion is induced by differences in
coolant density caused by heat transferred to the coolant. As
shown in Fig. 11, this mode of heat transfer offers the lowest
heat flux or cooling capability for a given wall superheat or
surface-to-liquid temperature difference. Nonetheless, the heat
transfer rates attainable with liquid natural convection can ex-
ceed those attainable with forced convection of air.
Higher heat transfer rates may be attained by utilizing a pump
to provide forced circulation of the liquid coolant over the chip
or module surfaces. This process is termed forced convection
and the allowable heat flux for a given surface-to-liquid temper-
ature difference can be increased by increasing the velocity of
the liquid over the heated surface. The price to be paid for the
increased cooling performance will be a higher pressure drop.
This can mean a larger pump and higher system operating pres-
13. sures. Although forced convection requires the use of a pump
and the associated piping, it offers the opportunity to remove
heat from high power chips and modules in a confined space.
The liquid coolant may then be used to transport the heat to a
remote heat exchanger to reject the heat to air or water.
Fig. 11. Heat flux ranges for direct liquid immersion cooling of
microelectronic chips [22].
Fig. 12. Forced convection thermal resistance results for
simulated 12.7 mm
� 12.7 mm microelectronic chips (adapted from [24]).
Experimental studies were conducted by Incropera and
Ramadhyani [24] to study liquid forced convection heat
transfer from simulated microelectronic chips. Tests were
performed with water and dielectric liquids (FC-77 and FC-72)
flowing over bare heat sources and heat sources with pin-fin
and finned pin extended surface enhancement. It can be seen in
Fig. 12 that, depending upon surface and flow conditions (i.e.,
Reynolds number), thermal resistance values obtained for the
fluorocarbon liquids ranged from 0.4 to 20 C W. It may be
noted that a thermal resistance on the order of 0.5 C W could
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 573
support chip powers of 100 W while maintaining chip junction
temperatures 85 C or less.
The Cray-2 supercomputer introduced in the mid-1980s pro-
vides an example of the application of forced convection liquid
cooling to computer electronics [25]. As shown in Fig. 13, the
module assembly used in the Cray-2 was three-dimensional
14. in structure consisting of eight interconnected printed circuit
boards on which were mounted arrays of single-chip carriers.
Module power dissipation was reported to be 600 to 700 W.
Cooling was provided by FC-77 liquid distributed vertically
between stacks of modules and flowing horizontally between
the printed circuit cards.
Even higher heat transfer rates may be obtained in the forced
convection mode by directing the liquid flow normal to the
heated surface in the form of a liquid jet. A number of studies
[26]–[28] have been conducted to demonstrate the cooling
efficacy of liquid jet impingement flows. An example of the
chip heat flux that can be accommodated using a single FC-72
liquid jet is shown in Fig. 14. Liquid jet impingement was the
basic cooling scheme employed in the aborted SSI SS-1 super-
computer. The cooling design provided for a maximum chip
power of 40 W corresponding to a chip heat flux of 95 W/cm .
2) Pool and Flow Boiling: Boiling is a complex convec-
tive heat transfer process depending upon liquid-to-vapor phase
change with the formation of vapor bubbles at the heated sur-
face. It may be characterized as either pool boiling (occurring in
an essentially stagnant liquid) or flow boiling. The pool boiling
heat flux, , usually follows a relationship of the form
where is a constant depending upon each fluid-surface
combination, is the heat transfer surface area, is the
temperature of the heated surface, and is the saturation
temperature (i.e., boiling point) of the liquid. The value of
the exponent is typically about 3. This means that as the
heat flux is increased at the chip surface, the heat transfer
coefficient or cooling effectiveness increases. For example if
and the power dissipation is doubled, the temperature
rise will increase by only about 26% in the boiling mode
compared to 100% in the forced convection mode.
15. A problem that has been associated with pool boiling of fluo-
rocarbon liquids is that of temperature overshoot. This behavior
is characterized by a delay in the inception of boiling on the
heated surface. The heated surface continues to be cooled in the
natural convection mode, with increased surface temperatures
until a sufficient degree of superheat is reached for boiling to
occur. This behavior is a result of the good wetting character-
istics of fluorocarbon liquids and the smooth nature of silicon
chips. Although much work [29] has been done in this area, it is
still a potential problem in pool boiling applications using fluo-
rocarbon liquids to cool untreated silicon chips.
The maximum chip heat flux that can be accommodated in
pool boiling is determined by the critical heat flux. As power is
increased more and more vapor bubbles are generated. Even-
tually so many bubbles are generated that they form a vapor
blanket over the surface preventing fresh liquid from reaching
the surface and resulting in film boiling and high surface tem-
peratures. Typical critical heat fluxes encountered in saturated
Fig. 13. Forced convection liquid-cooled Cray-2 electronic
module assembly.
Fig. 14. Typical direct liquid jet impingement cooling
performance for a
6.5 mm � 6.5 mm integrated circuit chip (adapted from [28]).
(i.e., liquid temperature saturation temperature) pool boiling
of fluorocarbon liquids range from 10 to 15 W/cm , depending
upon the nature of the surface (i.e., material, finish, geometry).
The allowable critical heat flux may be extended by subcooling
the liquid below its saturation temperature. For example experi-
ments have shown that it is possible to increase the critical heat
in pool boiling to as much as 25 W/cm by subcooling the liquid
temperature to 25 C.
16. Higher critical heat fluxes may be achieved using flow
boiling. For example, heat fluxes from 25 to 30 W/cm have
been reported for liquid velocities of 0.5 to 2.5 m/s over the
heated surface [30]. In addition, it may also be noted that
temperature overshoot has not been observed to be a problem
with flow boiling.
As in the case of air cooling or single phase liquid cooling,
the heat flux that may be supported at the component level (i.e.,
chip or module) may be increased by attaching a heat sink to
the surface. As part of an early investigation of pool boiling
with fluorocarbon liquids a small 3-mm-tall molybdenum stud
with a narrow slot (0.76 mm) down the middle was attached to
574 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
a 2.16 mm 2.16 mm silicon chip. A heat flux at the chip level
in excess of 100 W/cm was achieved [31].
An example of a computer electronics package utilizing pool
boiling to cool integrated circuit chips is provided by the IBM
Liquid Encapsulated Module (LEM) developed in the 1970s
[32]. As shown in Fig. 15, a substrate with 100 integrated
circuit chips was mounted within a sealed module-cooling
assembly containing a fluorocarbon coolant (FC-72). Boiling
at the exposed chip surfaces provided a high heat transfer
coefficient (1700 to 5700 W m -K) with which to meet chip
cooling requirements. Either an air-cooled or water-cooled
cold plate could be used to handle the module heat load. With
this approach it was possible to cool 4.6 mm 4.6 mm chips
dissipating 4 W and module powers up to 300 W.
17. 3) Spray Cooling: In recent years spray cooling has re-
ceived increasing attention as a means of supporting higher
heat flux in electronic cooling applications. Spray cooling is a
process in which very fine droplets of liquid are sprayed on the
heated surface. Cooling of the surface is then achieved through
a combination of thermal conduction through the liquid in
contact with the surface and evaporation at the liquid–vapor
interface.
One of the early investigations of spray cooling was con-
ducted by Yao et al. [33] with both real and ideal sprays of
FC-72 on a heated horizontal copper surface 3.65 cm in di-
ameter. A peak heat flux of 32 W cm , or about 2 to 3 times
the critical heat flux achievable with saturated pool boiling was
reported.
Pautsch and Bar-Cohen [34] describe two methods of spray
cooling suitable for electronic cooling. One method is termed
“low density spray cooling” and is defined as occurring when
the liquid contacts and wets the surface and then boils before
interacting with the next impinging droplet. Although a very ef-
ficient method of heat transfer, it does not support very high
heat fluxes. The other method is termed “high density evapo-
rative cooling” and requires spraying the liquid on the surface
at a rate that maintains a continuously wetted surface. In the
paper, experiments are described demonstrating the capability
to accommodate heat fluxes in excess of 50 W/cm while main-
taining chip junction temperatures below 85 C with spray evap-
orative cooling. Spray evaporative cooling is used to maintain
junction temperatures of ASICs on MCMs in the CRAY SV2
system between 70 C and 85 C for heat fluxes from 15 W/cm
to 55 W/cm [35]. In addition to the CRAY cooling application,
spray cooling has gained a foothold in the military sector pro-
viding for improved thermal management, dense system pack-
aging, and reduced weight [36].
18. Researchers have also investigated spray cooling heat transfer
using other liquids. Lin and Ponnappan determined that critical
heat fluxes can reach up to 90 W/cm with fluorocarbon liquids,
490 W/cm with methanol, and higher than 500 W/cm with
water [37].
III. SYSTEM-LEVEL COOLING
Cooling systems for computers may be categorized
as air-cooled, hybrid-cooled, liquid-cooled, or refrigera-
tion-cooled. An air-cooled system is one in which air, usually
in the forced convection mode, is used to directly cool and carry
heat away from arrays of electronic modules and packages.
Fig. 15. IBM Liquid Encapsulated Module (LEM) cooling
concept.
In some systems air-cooling alone may not be adequate due
to heating of the cooling air as it passes through the machine.
In such cases a hybrid-cooling design may be employed, with
air used to cool the electronic packages and water-cooled
heat exchangers used to cool the air. For even higher power
packages it may be necessary to employ indirect liquid cooling.
This is usually done utilizing water-cooled cold plates on
which heat dissipating components are mounted, or which may
be mounted to modules containing integrated circuit chips.
Ultimately, direct liquid immersion cooling may be employed
to accommodate high heat fluxes and a high system heat load.
A. Air-Cooled Systems
Forced air-cooled systems may be further subdivided into se-
rial and parallel flow systems. In a serial flow system the same
air stream passes over successive rows of modules or boards, so
that each row is cooled by air that has been preheated by the
previous row. Depending on the power dissipated and the air
19. flow rate, serial air flow can result in a substantial air tempera-
ture rise across the machine. The rise in cooling air temperature
is directly reflected in increased circuit operating temperatures.
This effect may be reduced by increasing the air flow rate. Of
course to do this requires larger blowers to provide the higher
flow rate and overcome the increase in air flow pressure drop.
Parallel air flow systems have been used to reduce the temper-
ature rise in the cooling air [38], [39]. In systems of this type,
the
printed circuit boards or modules are all supplied air in parallel
as shown in Fig. 16. Since each board or module is delivered its
own fresh supply of cooling air, systems of this type typically
require a higher total volumetric flow rate of air.
B. Hybrid Air–Water Cooling
An air-to-liquid hybrid cooling system offers a method to
manage cooling air temperature in a system without resorting
to a parallel configuration and higher air flow rates. In a system
of this type, a water-cooled heat exchanger is placed in the
heated air stream to extract heat and reduce the air temperature.
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 575
Fig. 16. Example of a parallel air-flow cooling scheme [40].
Fig. 17. Typical processor gate configuration with air-to-water
heat exchanger
between boards.
An example of the early use of this method was in the IBM
System/360 Model 91 (c. 1964) [40]. As shown in Fig. 17, the
20. cooling system incorporated an air-to-water finned tube heat
exchanger between each successive row of circuit boards. The
modules on the boards were still cooled by forced convection
with air, however; the heated air exiting a board passed through
an air-to-water heat exchanger before passing over the next
board.
Approximately 50% of the heat transferred to air in the board
columns was transferred to the cooling water. A comparison of
cooling air temperatures in the board columns with and without
hybrid air-to-water cooling is shown in Fig. 18. The reduction
in air temperatures with air-to-water hybrid cooling resulted in
Fig. 18. Typical air temperature profiles across five high board
columns with
and without air-to-water heat exchangers between boards.
Fig. 19. Closed-loop liquid-to-air hybrid cooling system.
a one-to-one reduction in chip junction operating temperatures.
Ultimately air-to-liquid hybrid cooling offers the potential for a
sealed, recirculating, closed-cycle air-cooling system with total
heat rejection of the heat load absorbed by the air to chilled
water [39]. Sealing the system offers additional advantages. It
allows the use of more powerful blowers to deliver higher air
flow rates with little or no impact on acoustics. In addition,
the potential for electromagnetic emissions from air inlet/outlet
openings in the computer frame is eliminated.
Another variant of the hybrid cooling system is the
liquid-to-air cooling system shown schematically in Fig. 19. In
this system liquid is circulated in a sealed loop through a cold
plate attached to an electronic module dissipating heat. The
heat is then transported via the liquid stream to an air-cooled
heat exchanger where it is rejected to ambient air. This scheme
provides the performance advantages of indirect liquid cooling
21. at the module level while retaining the advantages of air cooling
at the system or box level. Most recently, a liquid-to-air cooling
system is being used to cool the two processor modules in the
Apple Power Mac G5 personal computer shipped earlier this
year [42].
576 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Fig. 20. Large scale computer configuration of the 1980s with
coolant
distribution unit (CDU).
C. Liquid-Cooling Systems
Either the air-to-water heat exchangers in a hybrid
air–water-cooled system or the water-cooled cold plates in
a conduction-cooled system rely upon a controlled source of
water in terms of pressure, flow rate, temperature, and chem-
istry. In order to insure the physical integrity, performance, and
long-term reliability of the cooling system, customer water is
usually not run directly through the water-carrying components
in electronic frames. This is because of the great variability
that can exist in the quality of water available at computer
installations throughout the world. Instead a pumping and heat
exchange unit, sometimes called a coolant distribution unit
(CDU) is used to control and distribute system cooling water to
computer electronics frames as shown in Fig. 20. The primary
closed loop (i.e., system) is used to circulate cooling water
to and from the electronics frames. The system heat load is
transferred to the secondary loop (i.e., customer water) via a
water-to-water heat exchanger in the CDU. Within an elec-
tronics frame a combination of parallel-series flow networks is
used to distribute water flow to individual cold plates and heat
22. exchangers. An example of the piping configuration used to
distribute water to cold plates mounted on multichip modules
in the IBM 3081 processor is shown in Fig. 21.
As shown in Fig. 22, the basic flow and heat exchange com-
ponents within a CDU consist of a heat exchanger, flow mixing
valve, pumps, expansion tank, and water supply/return mani-
folds. Water flow in the primary loop is provided at a fixed flow
rate by a single operating pump, with a stand-by pump to pro-
vide uninterrupted operation if the operating pump fails. The
temperature of the water in the primary loop is controlled by
using a mixing valve to regulate the fraction of the flow allowed
to pass through the water-to-water heat exchanger and forcing
the remainder to bypass the heat exchanger.
Fig. 21. Modular cold plate subsystem and water distribution
loops in the IBM
3081 processor frame.
Fig. 22. Flow schematic of a typical IBM coolant distribution
unit (CDU).
A CDU is also required for direct immersion cooling systems
such as used in the CRAY-2 discussed earlier. In this
application
the CDU performs a similar role to that in water-cooled systems
and segregates the chemical coolant (e.g., FC-77) from the cus-
tomer water as shown in Fig. 23. Of course, all the materials
within the CDU, as well as the piping distribution system must
be chemically compatible with the coolant. In addition, because
of the relatively high vapor pressure of the coolants suitable for
direct immersion applications (e.g., fluorocarbons), the cooling
system must be both “vapor-tight” and “liquid-tight” to ensure
against any loss of the relatively expensive coolant.
D. Refrigeration Cooled Systems
23. The potential for enhancement of computer performance
by operating at lower temperatures was recognized as long
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 577
Fig. 23. Cray-2 liquid immersion cooling system.
ago as the late 1960s and mid-1970s. Some of the earliest
studies focused on Josephson devices operating at liquid he-
lium temperatures (4 K). The focus then shifted to CMOS
devices operating near liquid nitrogen temperatures (77 K). A
number of researchers have identified the electrical advantages
of operating electronics all the way down to liquid nitrogen
LN temperatures (77 K) [43]–[45]. In summary, the ad-
vantages are:
• increased average carrier drift velocities (even at high
fields);
• steeper sub-threshold slope, plus reduced sub-threshold
currents (channel leakages) which provide higher noise
margins;
• higher transconductance;
• well-defined threshold voltage behavior;
• no degradation of geometry effects;
• enhanced electrical line conductivity;
• allowable current density limits increase dramatically (i.e.,
electromigration concerns diminish).
To illustrate how much improvement is realized with de-
24. creasing temperature, Fig. 24 shows the performance of a
0.1- m CMOS circuit (relative to the performance of a 0.1- m
circuit designed to operate at 100 C) as a function of tem-
perature [43]. The performance behavior is shown for three
different assumptions about the threshold voltage. Only a slight
performance gain is realized if the circuit unchanged from
its design to operate at 100 C is taken down in temperature
(same hardware). This is due to a rise in threshold voltage
that partially offsets the gain due to higher mobilities. Tuning
threshold voltages down until eventually the same off-current as
the 100 C circuit is achieved yields the greatest performance
gain to almost 2 at 123 K. In addition, the improvement
in electrical conductivity with lowering temperature of the
two metals used today to interconnect circuits on a chip [46].
Fig. 24. Relative performance factors (with respect to a 100 C
value) of 1.5-V
CMOS circuits as a function of temperature. Threshold voltages
are adjusted
differently with temperature in each of the three scenarios
shown (adapted from
[43]).
A conductivity improvement of approximately 1.5 , 2 , and
10 is realized at about 200 K, 123 K, and 77 K, respectively.
The reduction in capacitive (RC) delays can therefore approach
2 at the lower (77 K) temperatures.
One of the earliest systems to incorporate refrigeration was
the Cray-1 supercomputer announced in 1979 [47]. Its cooling
system was designed to limit the IC die temperature to a max-
imum of 65 C. The heat generated by the ICs was conducted
through the IC package, into a PC board the IC packages were
attached to, and then into a 2-mm-thick copper plate. The
copper
25. plate conducted heat to its edges which were in contact with
cast
aluminum cold bars. A refrigerant, Freon 22, flowed through
stainless steel tubes embedded in the aluminum cold bars. The
refrigerant, which was maintained at 18.5 C, absorbed the heat
that was conducted into the aluminum cold bars. The refrigera-
tion system ultimately rejected the heat to a cold water supply
flowing at 40 gpm. The maximum heat load of the system was
approximately 170 kW.
In the latter part of the 1980s, ETA Systems Inc. devel-
oped a commercial supercomputer system using CMOS logic
chips operating in liquid nitrogen [48]. The processor mod-
ules were immersed in a pool of liquid nitrogen maintained
in a vacuum-jacketed cryostat vessel within the CPU cabinet
(Fig. 25). Processor circuits were maintained below 90 K. At
this temperature, circuit speed was reported to be almost double
that obtained at above ambient temperatures. Heat transfer ex-
periments were conducted to validate peak nucleate boiling
heat flux limits of approximately 12 W/cm . A closed-loop
Stirling refrigeration system (cryogenerator) was developed
to recondense the gaseous nitrogen produced by the boiling
process.
In 1991, IBM initiated an effort to demonstrate the feasibility
of packaging and cooling a CMOS processor in a form suitable
for product use [49]. A major part of the effort was devoted
to the development of a refrigeration system that would meet
578 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Fig. 25. ETA-10 cryogenic system configuration [48].
26. IBM’s reliability and life expectancy specifications and handle
a
cooling load of 250 W at 77 K. A Stirling cycle type
refrigerator
was chosen as the only practical refrigeration method for
obtaining liquid nitrogen temperatures. Prototype models were
built with cooling capacities of 500 and 250 W at 77 K. In
addition, a packaging scheme had to be developed that would
withstand cycling from room temperature down to 77 K and
provide thermal insulation to reduce the parasitic heat losses.
A low-temperature conduction module (LTCM) was built to
package the chip and module. The LTCM, or cryostat, consisted
of a stainless steel housing with a vacuum to minimize heat
losses. This hardware was used to measure chip performance
at 77 K. As a result of this effort, prototype Stirling cycle
cryocoolers in a form factor compatible with overall system
packaging constraints were built and successfully tested and
key elements of the packaging concept were demonstrated.
IBM’s most recent interest in refrigeration-cooling focused on
the application of conventional vapor compression refrigeration
technology to operate below room temperature conditions, but
well above cryogenic temperatures. In 1997, IBM developed,
built and shipped its first refrigeration-cooled server (the S/390
G4 system) [50], [51]. This cooling scheme provided an average
processor temperature of 40 C which represented a temperature
decrease of 35 C below that of a comparable air-cooled system.
The system packaging layout is shown in Fig. 26. Below the
bulk power compartment is the central electronic complex
(CEC) where the MCM housing 12 processors is located. Two
modular refrigeration units (MRUs) located near the middle
of the frame provide cooling via the evaporator attached to the
back of the processor module. Only one MRU is operated at
a time during normal operation. The evaporator mounted on
the processor module is fully redundant with two independent
27. refrigerated passages. Refrigerant passing through one passage
is adequate to cool the MCM which dissipates a maximum
power of 1050 W. Following the success of this machine IBM
has continued to exploit the advantages of sub-ambient cooling
at the high-end of its zSeries product line.
In 1999, Fujitsu released its Global Server GS8900 that uti-
lized a refrigeration unit to chill a secondary coolant and then
supply the coolant to a liquid-cooled Central Processor Unit
(CPU) MCMs [52]. A schematic of the liquid-cooled system
is shown in Fig. 27. The refrigeration unit which is called the
chilled coolant supply unit (CCSU) contains three air-cooled re-
frigeration modules and two liquid circulating pumps. The re-
frigeration modules chill the coolant to near 0 C. The system
board assembly housing the CPU modules is accommodated in
a closed box in which the dew point is controlled in order to
pre-
vent condensation from forming on the electrical equipment. In
comparison to an air-cooled version of this system, circuit junc-
tion temperatures are reduced by more than 50 C.
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 579
Fig. 26. IBM S390 G4 server with refrigeration-cooled
processor module and redundant modular refrigeration units
(MRUs).
Fig. 27. Configuration of Fujitsu’s GS8900 low-temperature
liquid cooling system (adapted from [52]).
IV. DATA CENTER THERMAL MANAGEMENT
Due to technology compaction, the information technology
28. (IT) industry has seen a large decrease in the floor space
required to achieve a constant quantity of computing and
storage
capability. However, the energy efficiency of the equipment
has not dropped at the same rate. This has resulted in a
significant increase in power density and heat dissipation within
the footprint of computer and telecommunications hardware.
The heat dissipated in these systems is exhausted to the room
and the room has to be maintained at acceptable temperatures
for reliable operation of the equipment. Cooling computer and
telecommunications equipment rooms is becoming a major
challenge.
The increasing heat load of datacom equipment has been
documented by a thermal management consortium of 17 com-
panies and published in collaboration with the Uptime Institute
[53] as shown in Fig. 28. Also shown in this figure are mea-
sured heat fluxes (based on product footprint) of some recent
product announcements. The most recent shows a rack dissi-
pating 28 500 W resulting in a heat flux based on the footprint
of the rack of 20 900 W/m . With these heat loads the focus
for customers of such equipment is in providing adequate air
flow at a temperature that meets the manufacturer’s require-
ments. Of course, this is a very complex problem considering
the dynamics of a data center and one that is only starting
to be addressed [54]–[61]. There are many opportunities for
improving the thermal environment of data centers and the
580 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
Fig. 28. Equipment power trends [53].
Fig. 29. Cluster of server racks.
29. efficiency of the cooling techniques applied to those data cen-
ters [61]–[63].
Air-flow direction in the room has a major affect on the
cooling of computer rooms. A major requirement is the uni-
formity of air temperature at the computer inlets. A number
of papers have focused on whether the air should be deliv-
ered overhead or from underneath a raised floor [65]–[67],
ceiling height requirements to eliminate “heat traps” or hot air
stratification [64], [65], raised floor heights [64], and proper
dis-
tribution of the computer equipment in the data center [66], [68]
to eliminate the potential for hot spots or high temperatures.
Computer room cooling concepts can be classified according to
the two main types of room construction: 1) nonraised floor (or
standard room) and 2) raised floor. Some of the papers discuss
and compare these concepts in general terms [67], [69]–[71].
Data centers are typically arranged into hot and cold aisles
as shown in Fig. 29. This arrangement accommodates most
rack designs which typically employ front-to-back cooling and
somewhat separates the cold air exiting the perforated tiles
(for raised floor designs) and overhead chilled air flow (for
nonraised floor designs) from the hot air exhausting from the
back of the racks. The racks are positioned on the cold aisle
such that the fronts of the racks face the cold aisle. Similarly,
the back of the racks face each other and provide a hot-air
exhaust region. This layout allows the chilled air to wash the
front of the data processing (DP) equipment while the hot air
from the racks exits into the hot aisle as it returns to the inlet of
the air conditioning (A/C) units.
With the arrangement of computer server racks in rows within
a data center there may be zones where all the equipment within
30. that zone dissipates very high heat loads. This arrangement of
equipment may be required in order to achieve the performance
desired by the customer. These high-performance zones (shown
in Fig. 30) can provide significant challenges in maintaining an
environment within the manufacturer’s specifications. Fig. 31
shows trends for these high heat flux zones using the equipment
power trends showing in Fig. 28. In contrast, a data center that
employs a mix of computer equipment employing lower power
racks is also shown in Fig. 31.
A. Room Air Flow Designs
Air flow distribution within a data center has a major effect
on the thermal environment of the data processing equipment
lo-
cated within these rooms. A key requirement of manufacturers
is
that the inlet temperature and humidity to the electronic equip-
ment be maintained within the specifications. Customers of such
equipment typically employ two types of air distribution sys-
tems to provide this environment. These are briefly described
below.
1) Non-Raised Floor Room Cooling: Cooling air can be
supplied from the ceiling in the center of the room, where com-
puters are located, with exhausts located near the walls. Short
partitions are installed around the supply opening to minimize
short circuiting of supply air to returns. Similarly cool air from
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 581
Fig. 30. Data center management focus areas.
31. Fig. 31. Zonal heat fluxes for commercial and high-performance
computing.
a more distributed area of the ceiling can be supplied with
exhaust located around the perimeter or a return in the floor.
Alternatively a design employed by the telecommunications
industry and more recently employed in the computer industry
utilizes heat exchangers located above the racks near the
ceiling.
The racks are arranged using the hot and cold aisle concept
where hot air from hot aisles enter the heat exchangers and
once cooled in the heat exchanger is forced down into the cold
aisles using fans mounted at the bottom of heat exchangers.
2) Raised Floor Room Cooling: Computers typically have a
large number of cables connecting the components within a rack
and between racks. To maintain a neat layout, a raised floor
(also
known as false floor or double floor) is used and all
interconnect
cabling is located under the raised floor. In many cases this
space
under the raised floor can be used as an air supply plenum with
the use of perforated tiles exhausting chilled air. Similarly, it is
possible to have a false ceiling (also called dropped ceiling) in
the room with the space above the false ceiling used as the air
supply or the return plenum. The air flow can be from floor to
ceiling, ceiling to floor, floor to exhausts located in the walls or
other locations in the room.
B. Factors Influencing Rack Inlet Temperatures
The primary thermal management focus for data centers is
that the temperature and humidity requirements for the elec-
tronic equipment housed within the data center are met. For
32. example, one large computer manufacturer has a 42U (1U =
44.45 mm) tall rack configured for front-to-back air cooling and
requires that the inlet air temperature into the front of the rack
be
maintained between 10 and 32 C for elevations up to 1295 m
(4250 feet). Higher elevations require a derating of the max-
imum dry bulb temperature of 1 C for every 219 m (720 feet)
above 1295 m (4250 feet) up to 3048 m (10000 feet). These
temperature requirements are to be maintained over the entire
front of the 2 m height of the rack where air is drawn into the
system. Fig. 30 shows an account with 49 of these racks each
dissipating from 7 to 8 kW. Since air enters the front of each
rack over the entire height of the rack it is a challenge to main-
tain the temperature within the requirements as stated above for
all the racks within the data center. Although the inlet air tem-
peratures for all the racks met the requirements there were mod-
ifications required after the installation in order that the
require-
ments be met. Herein lays the challenge to data center facility
operators, especially with the increased equipment heat loads
as shown in Fig. 28. How do operators maintain these environ-
mental requirements for all the racks situated within the data
center and in a data center where the equipment is constantly
changing? Without proper attention to the design of the facili-
ties in providing proper airflow and rack inlet air temperatures
hot spots within the data center can occur.
Besides the power density of the equipment in the data center
increasing significantly, there are other factors that influence
data center thermal management. Managers of IT equipment
582 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
33. need to deploy equipment quickly in order to get maximum
use of a large financial asset. This may mean that minimal time
is spent on site preparation, thereby potentially resulting in
thermal issues once the equipment is installed.
The construction cost of a data center is now exceeding $1000
per square foot in some metropolitan areas and the annual oper-
ating cost is $50 to $150 per square foot. For these reasons, IT
and facilities managers want to obtain the most out of their data
center space and maximize the utilization of their infrastructure.
Unfortunately, the current situation in many data centers does
not permit this optimization. The equipment installed into a data
center can be from many different manufacturers each having
a different environmental specification. With these requirement
the IT facilities manager is required to overcool his data center
to compensate for the equipment with the tightest requirements.
C. Need for Thermal Guidelines
Since many of the data center thermal management issues are
industry-wide, a number of equipment manufacturers decided
to form a consortium in 1998 to address common issues related
to thermal management of data centers and telecommunications
rooms. Initial interest was expressed from the following compa-
nies: Amdahl, Cisco Systems, Compaq, Cray, Inc., Dell Com-
puter, EMC, HP, IBM, Intel, Lucent Technologies, Motorola,
Nokia, Nortel Networks, Sun Microsystems, and Unisys. As a
result the Thermal Management Consortium for Data Centers
and Telecommunications Rooms was formed. Since the industry
was facing increasing power trends, it was decided that the first
priority was to develop and then publish (in collaboration with
Uptime Institute) a trend chart on power density of the
industry’s
equipment that would aid customers in planning data centers for
the future (see Fig. 28).
34. In January 2002, the American Society of Heating, Re-
frigerating and Air Conditioning Engineers (ASHRAE) was
approached with a proposal to create an independent committee
to specifically address high-density electronic heat loads. The
proposal was accepted by ASHRAE and eventually a tech-
nical committee, TC9.9 Mission Critical Facilities, Technology
Spaces, and Electronic Equipment, was formed. The first pri-
ority of TC9.9 was to create a Thermal Guidelines document
that would help to align the designs of equipment manufac-
turers and help data center facility designers to create efficient
and fault tolerant operation within the data center. The re-
sulting document, Thermal Guidelines for Data Processing
Environments, was published in January 2004 [73]. Some of
the key issues of that document will now be described.
For data centers, the primary thermal management focus is
on assuring that the housed equipment’s temperature and hu-
midity requirements are met. Each manufacturer has their own
environmental specification and a customer of many types of
electronic equipment is faced with a wide variety of environ-
mental specifications. In an effort to standardize, the ASHRAE
TC9.9 committee first surveyed the environmental specifica-
tions of a number of data processing equipment manufacturers.
From this survey, four classes were identified that would en-
compass most of the specifications. Also included within the
guidelines was a comparison to the NEBS (Network Equipment
Building Systems) specifications for the telecommunications
industry to show both the differences and also aid in possible
convergence of the specifications in the future. The four data
processing classes cover the entire environmental range from
air conditioned, server and storage environments of classes 1
and 2 to the lesser controlled environments like class 3 for
workstations, PCs and portables or class 4 for point of sales
equipment with virtually no environmental control.
35. In order for seamless integration between the server and the
data center to occur, certain protocols need to be developed
especially in the area of airflow. This section provides airflow
guidelines for both the IT/Facility managers and the equip-
ment manufacturers to design systems that are compatible and
minimize inefficiencies. Currently, manufacturers design their
equipment exhaust and inlets wherever it is convenient from
an architectural standpoint. As a result, there have been many
cases where the inlet of one server is directly next to the
exhaust
of adjacent equipment resulting in the ingestion of hot air. This
has direct consequences to the reliability of that machine. This
guideline attempts to steer manufacturers toward a common
airflow scheme to prevent this hot air ingestion by specifying
regions for inlets and exhausts. The guideline recommends one
of the three airflow configurations: front-to-rear, front-to-top
and front-to-top-and-rear.
Once manufacturers start implementing the equipment pro-
tocol, it will become easier for facility managers to optimize
their layouts to provide maximum possible density by following
the hot-aisle/cold-aisle concept as shown in Fig. 30. In other
words, the front face of all equipment is always facing the cold
aisle.
The ASHRAE guideline’s heat and airflow reporting sec-
tion defines what information is to be reported by the infor-
mation technology equipment manufacturer to assist the data
center planner in the thermal management of the data center.
The equipment heat release value is the key parameter that is
reported. In addition several other pieces of information are re-
quired if the heat release values are to be meaningful like total
system air flow rate, typical configurations of system, air flow
direction of system, and class environment, just to mention a
few.
36. Other publications will follow on data center thermal man-
agement with one planned for January 2005 that will update the
initial trend chart and will discuss air cooling and water cooling
in the context of the data center. However, to aid in the ad-
vancement of data center thermal management it is of utmost
importance to understand the current situation in high density
data centers in order to build on this understanding to further
enhance the thermal environment in data centers. In this effort
Schmidt [74] published the first paper of its kind to completely
thermally profile a high density data center. The motivation for
the paper was twofold. First, the paper provided some basic in-
formation on the thermal/flow data collected from a high
density
data center. Second, it provided a methodology which others
can
follow in collecting thermal and air flow data from data centers
so that data can be assimilated to make comparisons. This data-
base can then provide the basis for future data center air cooling
design and aid in the understanding of deployment of racks of
higher heat loads in the future. This data needs to be further
expanded so that data center design and optimization from an
air-cooled viewpoint can occur.
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 583
Data centers do have limitations and each data center is
unique such that some data centers have much lower power
density limitations than others. To resolve these environmental
issues in some data centers today manufacturers of HVAC
equipment have begun to offer liquid cooling solutions to aid
in data center thermal management. The objective of these
new approaches is to move the liquid cooling closer to the
source of the problem, which is the electronic equipment that
37. is producing the heat. Placing the cooling near the source of
heat shortens the distance that air must be moved and results in
minimal static pressure. This increases the capacity, flexibility,
efficiency, and scalability of the cooling solutions. Several
viable options based on this strategy have been developed:
1) rear-mounted fin and tube heat exchangers; 2) internal fin
and tube heat exchangers either at the bottom of a rack of
electronic equipment or mounted to the side of a rack; and
3) overhead fin and tube heat exchangers. Although each one
of these is a liquid-cooled solution adjacent to the air-cooled
rack, the liquid can be either water based or refrigerant based.
These solutions and others will continue to be promoted with
the increased power densities being shipped and the projections
of the increased heat loads by the manufacturers of datacom
equipment.
V. FUTURE CHALLENGES
For many years the major challenge facing thermal engineers
has been how to limit chip operating temperatures in the face
of increases in heat flux with each new generation of chip de-
sign. This challenge may be expected to continue through the
remainder of this decade. As the size of semiconductor devices
is reduced further, leakage power dissipation may become com-
parable to or even greater than the active device power dissipa-
tion further compounding the thermal challenge.
In the previous sections the cooling technologies and designs
developed to respond to increased powers were discussed with
no mention of cost. Although controlling and reducing cost has
always been an objective, the overriding consideration was to
provide the necessary cooling even if the cost was higher than
desired. Today things are considerably different with intense
competition demanding increased performance at reduced cost.
While the focus remains on providing the necessary cooling, it
is no longer acceptable to do so at any cost. The cost of cooling
38. must be commensurate with the overall manufacturing cost of
the computer and indeed be a relatively small fraction of the
total cost!
Although air cooling may be expected to continue to be the
most pervasive method of cooling, in many instances the chips
and packages that require cooling are at or will soon exceed
the limits of air cooling. As this happens it will be necessary to
once again introduce water or some other form of liquid
cooling.
This represents a real challenge as it does not mean simply res-
urrecting the water-cooled designs of the past. Machines today
are packaged much more densely than in the past making the job
of introducing water or any other form of liquid cooling much
more challenging. In addition, today many machines must virtu-
ally operate continuously without interruption. This means that
the cooling design must incorporate redundancy to allow for a
blower or pump failure while continuing to provide the required
cooling function. It also means that provisions must be incorpo-
rated in the cooling design to allow replacement of the failed
unit while the machine continues to operate. All of these con-
siderations clearly represent an increased level of challenges for
thermal engineers. It also means that thermal engineers must be
an integral part of the design process from the very beginning
and work very closely with electrical and packaging engineers
to achieve a truly holistic design.
In addition, as identified in the thermal management section
of the 2002 National Electronics Management Technology
Roadmap [75] there are several major cooling areas requiring
further development and innovation. In order to diffuse high
heat flux from chip heat sources and reduce thermal resistance
at the chip-to-sink interface, there is a need to develop low cost,
higher thermal conductivity, packaging materials such as adhe-
sives, thermal pastes and thermal spreaders. Advanced cooling
39. technology in the form of heat pipes and vapor chambers are
already widely used. Further advances in these technologies
as well as thermoelectric cooling technology, direct liquid
cooling technology, high-performance air-cooled heat sinks
and air movers are also needed. Also as discussed earlier in
the paper, cooling at the data center level is also becoming a
very challenging problem. High performance cooling systems
that will minimize the impact to the environment within the
customer’s facility are needed to answer this challenge. Finally,
to achieve the holistic design referred to above, it will be
necessary to develop advanced modeling tools to integrate
the electrical, thermal, and mechanical aspects of package
and product function, while providing enhanced usability and
minimizing interface incompatibilities.
It is clear that thermal management for high-performance
computers will continue to be an area offering engineers many
challenges and opportunities for meaningful contributions and
innovations.
REFERENCES
[1] A. E. Bergles, “The evolution of cooling technology for
electrical, elec-
tronic, and microelectronic equipment,” ASME HTD, vol. 57,
pp. 1–9,
1986.
[2] D. Hanson, The New Alchemists. New York: Avon Books,
1982.
[3] R. C. Chu, U. P. Hwang, and R. E. Simons, “Conduction
cooling for an
LSI package: A one-dimensional approach,” IBM J. Res.
Develop., vol.
26, no. 1, pp. 45–54, Jan. 1982.
40. [4] R. C. Chu, O. R. Gupta, U. P. Hwang, and R. E. Simons,
“Gas encapsu-
lated cooling module,” U.S. Patent 3,741,292, 1976.
[5] R. C. Chu and R. E. Simons, “Cooling technology for high
performance
computers: Design applications,” in Cooling of Electronic
Systems, S.
Kakac, H. Yuncu, and K. Hijikata, Eds. Boston, MA: Kluwer,
1994,
pp. 97–122.
[6] G. F. Goth, M. L. Zumbrunnen, and K. P. Moran, “Dual-
Tapered piston
(DTP) module cooling for IBM enterprise system/9000
systems,” IBM
J. Res. Develop., vol. 36, no. 4, pp. 805–816, July 1992.
[7] F. Kobayashi, Y. Watanabe, M. Yamamoto, A. Anzai, A.
Takahashi, T.
Daikoku, and T. Fujita, “Hardware technology for HITACHI M-
880 pro-
cessor group,” in Proc. 41st Electronics Components and
Technology
Conf., Atlanta, GA, May 1991, pp. 693–703.
[8] F. Kobayashi, Y. Watanabe, K. Kasai, K. Koide, K.
Nakanishi, and R.
Sato, “Hardware technology for the Hitachi MP5800 series
(HDS Sky-
line Series),” IEEE Trans. Adv. Packag., vol. 23, no. 3, pp.
504–514, Aug.
2000.
[9] P. Singh, D. Becker, V. Cozzolino, M. Ellsworth, R.
41. Schmidt, and E.
Seminaro, “System packaging for a CMOS mainframe,”
Advancing Mi-
croelectron., vol. 25, no. 7, pp. 12–17, 1998.
584 IEEE TRANSACTIONS ON DEVICE AND MATERIALS
RELIABILITY, VOL. 4, NO. 4, DECEMBER 2004
[10] J. U. Knickerbocker, “An advanced multichip module
(MCM) for high-
performance unix servers,” IBM J. Res. Develop., vol. 46, no. 6,
pp.
779–804, Nov. 2002.
[11] D. J. De Kock and J. A. Visser, “Optimal heat sink design
using mathe-
matical optimization,” Adv. Electron. Packag., vol. 1, pp. 337–
347, 2001.
[12] J. R. Culham and Y. S. Muzychka, “Optimization of plate
fin heat sinks
using entropy generation minimization,” IEEE Trans. Compon.
Packag.
Technol., vol. 24, no. 2, pp. 159–165, Jun. 2001.
[13] M. F. Holahan, “Fins, fans, and form: Volumetric limits to
air-side heat
sink performance,” in Proc. 9th Intersociety Conf. Thermal and
Ther-
momechanical Phenomena in Electronic Systems, Las Vegas,
NV, Jun.
2004, pp. 564–570.
[14] F. Roknaldin and R. A. Sahan, “Cooling solution for next
42. generation
high-power processor boards in 1U computer servers,” Adv.
Electron.
Packag., vol. 2, pp. 629–634, 2003.
[15] M. Gao and Y. Cao, “Flat and U-shaped heat spreaders for
high-power
electronics,” Heat Transfer Eng., vol. 24, no. 3, pp. 57–65,
May/Jun.
2003.
[16] Z. Z. Yu and T. Harvey, “Precision-Engineered heat pipe
for cooling Pen-
tium II in compact PCI design,” in Proc. 7th Intersociety Conf.
Thermal
and Thermomechanical Phenomena in Electronic Systems, Las
Vegas,
NV, May 2000, pp. 102–105.
[17] V. W. Antonetti, S. Oktay, and R. E. Simons, “Heat
transfer in electronic
packages,” in Microelectronics Packaging Handbook, R. R.
Tummala
and E. J. Rymaszewski, Eds. New York: Van Nostrand
Reinhold, 1989,
pp. 189–190.
[18] R. S. Prasher, C. Simmons, and G. Solbrekken, “Thermal
contact resis-
tance of phase change and grease type polymeric materials,”
Amer. Soc.
Mechanical Engineers, Manufacturing Engineering Division
(MED),
vol. 11, pp. 461–466, 2000.
[19] D. J. Delia, T. C. Gilgert, N. H. Graham, U. P. Hwang, P.
43. W. Ing, J.
C. Kan, R. G. Kemink, G. C. Maling, R. F. Martin, K. P. Moran,
J. R.
Reyes, R. R. Schmidt, and R. A. Steinbrecher, “System cooling
design
for the water-cooled IBM enterprise system/9000 processors,”
IBM J.
Res. Develop., vol. 36, no. 4, pp. 791–803, Jul. 1992.
[20] D. B. Tuckerman and R. F. Pease, “High performance heat
sinking for
VLSI,” IEEE Electron. Device Lett., vol. EDL-2, no. 5, pp.
126–129,
May 1981.
[21] R. Hahn, A. Kamp, A. Ginolas, M. Schmidt, J. Wolf, V.
Glaw, M. Topper,
O. Ehrmann, and H. Reichl, “High power multichip modules
employing
the planar embedding technique and microchannel water heat
sinks,”
IEEE Trans. Compon., Packag., Manufact. Technol.–Part A, vol.
20, no.
4, pp. 432–441, Dec. 1997.
[22] A. E. Bergles and A. Bar-Cohen, “Direct liquid cooling of
microelec-
tronic components,” in Advances in Thermal Modeling of
Electronic
Components and Systems, A. Bar-Cohen and A. D. Kraus, Eds.
New
York: ASME Press, 1990, vol. 2, pp. 233–342.
[23] R. E. Simons, “Direct liquid immersion cooling for high
power density
microelectronics,” Electron. Cooling, vol. 2, no. 2, 1996.
44. [24] F. P. Incropera, “Liquid immersion cooling of electronic
components,”
in Heat Transfer in Electronic and Microelectronic Equipment,
A. E.
Bergles, Ed. New York: Hemisphere, 1990.
[25] R. D. Danielson, N. Krajewski, and J. Brost, “Cooling a
superfast com-
puter,” Electron. Packag. Produc., pp. 44–45, Jul. 1986.
[26] L. Jiji and Z. Dagan, “Experimental investigation of single
phase multi
jet impingement cooling of an array of microelectronic heat
sources,” in
Modern Developments in Cooling Technology for Electronic
Equipment,
W. Aung, Ed. New York: Hemisphere, 1988, pp. 265–283.
[27] P. F. Sullivan, S. Ramadhyani, and F. P. Incropera,
“Extended surfaces to
enhance impingement cooling with single circular jets,” Adv.
Electron.
Packag., vol. ASME EEP-1, pp. 207–215, Apr. 1992.
[28] G. M. Chrysler, R. C. Chu, and R. E. Simons, “Jet
impingement boiling
of a dielectric coolant in narrow gaps,” IEEE Trans. CPMT-A,
vol. 18,
no. 3, pp. 527–533, 1995.
[29] A. E. Bergles and A. Bar-Cohen, “Immersion cooling of
digital com-
puters,” in Cooling of Electronic Systems, S. Kakac, H. Yuncu,
and K.
Hijikata, Eds. Boston, MA: Kluwer, 1994, pp. 539–621.
45. [30] I. Mudawar and D. E. Maddox, “Critical heat flux in
subcooled flow
boiling of fluorocarbon liquid on a simulated chip in a vertical
rectan-
gular channel,” Int. J. Heat Mass Transfer, vol. 32, 1989.
[31] R. C. Chu and R. E. Simons, “Review of boiling heat
transfer for cooling
of high-power density integrated circuit chips,” in Process,
Enhanced,
and Multiphase Heat Transfer, A. E. Bergles, R. M. Manglik,
and A. D.
Kraus, Eds. New York: Begell House, 1996.
[32] R. E. Simons, “The evolution of IBM high performance
cooling tech-
nology,” IEEE Trans. CPMT-Part A, vol. 18, no. 4, pp. 805–
811, 1995.
[33] S. C. Yao, S. Deb, and N. Hammouda, “Impacting spray
boiling for
thermal control of electronic systems,” Heat Transfer Electron.,
vol.
ASME HTD-111, pp. 129–134, 1989.
[34] G. Pautsch and A. Bar-Cohen, “Thermal management of
multichip mod-
ules with evaporative spray cooling,” Adv. Electron. Packag.,
vol. ASME
EEP-26-2, pp. 1453–1461, 1999.
[35] G. Pautsch, “An overview on the system packaging of the
Cray SV2
supercomputer,” presented at the IPACK 2001 Conf., Kauai, HI,
2001.
46. [36] T. Cader and D. Tilton, “Implementing spray cooling
thermal manage-
ment in high heat flux applications,” in Proc. 2004 Intersociety
Conf.
Thermal Performance, 2004, pp. 699–701.
[37] G. Lin and R. Ponnappan, “Heat transfer characteristics of
spray cooling
in a closed loop,” Int. J. Heat Mass Transfer, vol. 46, pp. 3737–
3746,
2003.
[38] C. Hilbert, S. Sommerfeldt, O. Gupta, and D. J. Herrell,
“High perfor-
mance air cooled heat sinks for integrated circuits,” IEEE
Trans. CHMT,
vol. 13, no. 4, pp. 1022–1031, 1990.
[39] R. C. Chu, R. E. Simons, and K. P. Moran, “System
cooling design con-
siderations for large mainframe computers,” in Cooling
Techniques for
Computers, W. Aung, Ed. New York: Hemisphere, 1991.
[40] V. W. Antonetti, R. C. Chu, and J. H. Seely, “Thermal
design for IBM
system/360 model 91,” presented at the 8th Int. Electronic
Circuit Pack-
aging Symp., San Francisco, CA, 1967.
[41] R. C. Chu, M. J. Ellsworth, E. Furey, R. R. Schmidt, and R.
E. Simons,
“Method and apparatus for combined air and liquid cooling of
stacked
electronic components,” U.S. Patent 6,775,137 B2, Aug. 10,
47. 2004.
[42] H. Bray, “Computer Makers Sweat Over Cooling,” The
Boston Globe,
2004.
[43] Y. Taur and J. Nowak, “CMOS devices below 0.1 �m How
high will
performance go ?,” in Int. Electron Devices Meeting Tech. Dig.,
1997,
pp. 215–218.
[44] K. Rose, R. Mangaser, C. Mark, and E. Sayre,
“Cryogenically cooled
CMOS,” Critical Rev. Solid State Materials Sci., vol. 4, no. 1,
pp. 63–99,
1999.
[45] W. F. Clark, E. Badih, and R. G. Pires, “Low temperature
CMOS—A
brief review,” IEEE Trans. Compon., Hybrids, Manufact.
Technol., vol.
15, no. 3, pp. 397–404, Jun. 1992.
[46] R. F. Barron, Cryogenic Systems, 2nd ed. New York:
Oxford Univ.
Press, 1985.
[47] J. S. Kolodzey, “Cray-1 computer technology,” IEEE
Trans. Compon.,
Hybrids, Manufact. Technol., vol. CHMT-4, no. 2, pp. 181–186,
Jun.
1981.
[48] D. M. Carlson, D. C. Sullivan, R. E. Bach, and D. R.
Resnick, “The
48. ETA-10 liquid-nitrogen-cooled supercomputer system,” IEEE
Trans.
Electron. Devices, vol. 36, no. 8, pp. 1404–1413, Aug. 1989.
[49] R. E. Schwall and W. S. Harris, “Packaging and cooling of
low temper-
ature electronics,” in Advances in Cryogenic Engineering. New
York:
Plenum Press, 1991, pp. 587–596.
[50] R. R. Schmidt, “Low temperature electronics cooling,”
Electronics
Cooling, vol. 6, no. 3, Sep. 2000.
[51] R. R. Schmidt and B. Notohardjono, “High-End server low
temperature
cooling,” IBM J. Res. Develop., vol. 46, no. 2, pp. 739–751,
2002.
[52] A. Fujisaki, M. Suzuki, and H. Yamamoto, “Packaging
technology for
high performance CMOS server fujitsu GS8900,” IEEE Trans.
Adv.
Packag., vol. 24, pp. 464–469, Nov. 2001.
[53] Heat Density Trends in Data Processing, Computer
Systems and
Telecommunication Equipment. Santa Fe, NM: Uptime Institute,
2000.
[54] R. Schmidt, “Effect of data center characteristics on data
processing
equipment inlet temperatures,” in Proc. IPACK ’01, Advances
in
Electronic Packaging 2001, vol. 2, Kauai, HI, Jul. 2001, pp.
1097–1106.
49. [55] R. Schmidt and E. Cruz, “Raised floor computer data
center: Effect on
rack inlet temperatures of chilled air exiting both the hot and
cold aisles,”
in Proc. ITHERM, San Diego, CA, Jun. 2002, pp. 580–594.
[56] , “Raised floor computer data center: Effect on rack inlet
tempera-
tures when rack flow rates are reduced,” presented at the Int.
Electronic
Packaging Conf. and Exhibition, Maui, HI, Jul. 2003.
[57] , “Raised floor computer data center: Effect on rack inlet
tempera-
tures when adjacent racks are removed,” presented at the Int.
Electronic
Packaging Conf. and Exhibition, Maui, HI, July 2003.
[58] , “Raised floor computer data center: Effect on rack inlet
temper-
atures when high powered racks are situated amongst lower
powered
racks,” presented at the ASME IMECE Conf., New Orleans, LA,
Nov.
2002.
CHU et al.: REVIEW OF COOLING TECHNOLOGIES FOR
COMPUTER PRODUCTS 585
[59] , “Clusters of high powered racks within a raised floor
computer
data center: Effect of perforated tile flow distribution on rack
inlet air
50. temperatures,” presented at the ASME IMECE Conf.,
Washington, DC,
Nov. 2003.
[60] C. Patel, C. Bash, C. Belady, L. Stahl, and D. Sullivan,
“Computational
fluid dynamics modeling of high compute density data centers
to assure
system inlet air specifications,” in Proc. IPACK ’01, Advances
in Elec-
tronic Packaging 2001, vol. 2, Kauai, HI, July 2001, pp. 821–
829.
[61] C. Patel, R. Sharma, C. Bash, and A. Beitelmal, “Thermal
considerations
in cooling large scale compute density data centers,” in Proc.
ITHERM,
San Diego, CA, Jun. 2002, pp. 767–776.
[62] C. Patel, C. Bash, R. Sharma, M. Beitelmal, and R.
Friedrich, “Smart
cooling of data centers,” in Proc. IPACK ’03, Advances in
Electronic
Packaging 2003, Maui, HI, Jul. 2003, pp. 129–137.
[63] C. Bash, C. Patel, and R. Sharma, “Efficient thermal
management of
data centers—Immediate and long term research needs,”
HVAC&R Res.
J., vol. 9, no. 2, pp. 137–152, Apr. 2003.
[64] H. Obler, “Energy efficient computer cooling,”
Heating/Piping/Air Con-
ditioning, vol. 54, no. 1, pp. 107–111, Jan. 1982.
[65] J. M. Ayres, “Air conditioning needs of computers pose
51. problems for
new office building,” Heating, Piping and Air Conditioning,
vol. 34, no.
8, pp. 107–112, Aug. 1962.
[66] H. F. Levy, “Computer room air conditioning: How to
prevent a catas-
trophe,” Building Syst. Des., vol. 69, no. 11, pp. 18–22, Nov.
1972.
[67] R. W. Goes, “Design electronic data processing
installations for relia-
bility,” Heating, Piping Air Cond., vol. 31, no. 9, pp. 118–120,
Sept.
1959.
[68] W. A. Di Giacomo, “Computer room environmental
systems,” Heating,
Piping Air Cond., vol. 45, no. 11, pp. 76–80, Oct. 1973.
[69] F. J. Grande, “Application of a new concept in computer
room air con-
ditioning,” Western Electric Eng., vol. 4, no. 1, pp. 32–34, Jan.
1960.
[70] F. Green, “Computer room air distribution,” ASHRAE J.,
vol. 9, no. 2,
pp. 63–64, Feb. 1967.
[71] M. N. Birken, “Cooling computers,” Heating, Piping Air
Cond., vol. 39,
no. 6, pp. 125–128, Jun. 1967.
[72] H. F. Levy, “Air distribution through computer room
floors,” Building
Syst. Des., vol. 70, no. 7, pp. 16–16, Oct./Nov. 1973.
52. [73] Thermal Guidelines for Data Processing Environments.
Atlanta, GA:
ASHRAE, 2004.
[74] R. Schmidt, “Thermal profile of a high density data center-
methodology
to thermally characterize a data center,” presented at the
ASHRAE
Nashville Conf., Nashville, TN, May 2004.
[75] R. C. Chu and Y. Joshi, Eds., “Thermal Management,” in
National Elec-
tronics Manfacturing Technology Roadmaps. Herndon, VA:
National
Electronic Manufacturing Initiative, Inc., 2002.
Richard C. Chu has been an IBM Fellow since
1983. He is also a Fellow of ASME and AAAS.
Since joining IBM’s Development Laboratory in
Poughkeepsie, NY, in 1960, he has held a variety
of technical and managerial assignments. His
leadership and creativity in the area of thermal
management of microelectronic equipment have
earned him numerous awards and increasing re-
sponsibilities. He invented/co-invented the Modular
Conduction Cooling System and the Thermal Con-
duction Module (TCM) cooling concept, which
was the primary cooling solution for IBM’s high performance
computers for
many years. He has been recognized by IBM as a master
inventor with over
100 issued patents and over 150 patent disclosure publications.
He has also
published two co-authored books on the subject of thermal
53. management of
microelectronics.
Dr. Chu is the recipient of 38 IBM Invention Achievement
Awards, 4 IBM
Outstanding Innovation Awards, and an IBM Corporate Award.
Among his
many other honors, he is a past president of the IBM Academy
of Technology,
an elected member of the National Academy of Engineering, a
member of the
Academia Sinica, and a Distinguished Alumni of both his alma
maters, Purdue
University and National Cheng-Kung University in Taiwan.
Most recently, he
was the recipient of the 2003 InterPACK Conference
Achievement Award.
Robert E. Simons received the B.S. degree in
mechanical engineering from Widener University,
Chester, PA, and the M.S. degree in operations
research and applied statistics from Union College,
Schenectady, NY.
Prior to retiring from IBM in 1995, he was a
Senior Technical Staff Member and manager in the
Advanced Thermal Laboratory at the IBM Devel-
opment Laboratory, Poughkeepsie, NY. He joined
IBM in 1966 working in the thermal area as an
engineer and manager, and was a key participant in
the thermal design and development of cooling technologies for
the IBM 3033,
3081, and 3090 computer systems, as well as the development
of direct liquid
immersion cooling techniques. As a co-inventor of the cooling
54. scheme for the
IBM Thermal Conduction Module (TCM), he received an IBM
Outstanding
Innovation Award and a Corporate Award. While at IBM, he
was a member
of the IBM Academy of Technology. He is an inventor on over
50 issued
U.S. patents and 75 invention publications. He has published
over 50 papers
and book chapters related to cooling electronic packages and
systems, and
developed a short course on electronics cooling that he taught in
the U.S. and
Europe.
Mr. Simons is a recipient of the Semi-Therm Significant
Contributor Award
and has been active in the conference since its inception serving
in the capacities
of session, program and general chairman. He is also a past
chairman of the
ASME Heat Transfer Division K-16 Committee on Heat
Transfer in Electronic
Equipment.
Michael J. Ellsworth received the B.E.M.E. in 1984
and the M.E.M.E. degree in 1988 from Manhattan
College, Riverdale, NY.
He is a Senior Technical Staff Member working in
the Advanced Thermal Laboratory in Poughkeepsie,
NY, and has been with IBM since 1988. While at
IBM he has explored improved cooling for applica-
tions ranging from laptops to high-end servers and
has investigated cooling technologies encompassing
air, water, and refrigeration. From 1992 to 1996 he
55. was a ceramic/thin film package applications engi-
neer and technical program manager in the Interconnect
Products Group, East
Fishkill, NY. He is a member of IEEE and of ASME where he
serves on the
Electronics and Photonics Packaging Division Executive
Committee and on the
K-16 Committee on Heat Transfer in Electronic Equipment. He
has published
15 technical papers and holds 33 U.S. patents.
Roger R. Schmidt has over 25 years experience
in engineering and engineering management in the
thermal design of IBM’s large scale computers. He
has led development teams in cooling mainframes,
client/servers, parallel processors and test equip-
ment utilizing such cooling mediums as air, water,
and refrigerants. He has published more than 60
technical papers and holds 44 patents in the area
of electronic cooling. He is a member of ASME’s
Heat Transfer Division and an active member of
the K-16 Electronic Cooling Committee. He has
been an Associate Editor of the Journal of Electronic
Packaging. He has
taught extensively over the past 20 years Mechanical
Engineering courses
for prospective Professional Engineers and has given seminars
on electronic
cooling at a number of universities.
Dr. Schmidt is a Distinguished Engineer, IBM Academy of
Technology
Member, and a ASME Fellow.
56. Vincent Cozzolino holds degrees in electrical engi-
neering and physics.
He joined IBM in November, 1977. After holding
various technical positions, he became a Manager in
1982. He has held management positions in manu-
facturing and development and managed employees
worldwide. He is currently the Vice President of
Product and Quality Engineering.
tocReview of Cooling Technologies for Computer
ProductsRichard C. Chu, Robert E. Simons, Michael J.
Ellsworth, Roger R.I. I NTRODUCTIONFig.€1. Evolution of
module level heat flux in high-end computerII. M ODULE -L
EVEL C OOLINGA. Internal Module CoolingFig.€2. Cross-
section of a typical module denoting internal coolFig.€3.
Isometric cutaway view of an IBM TCM module with a
waterFig.€4. Cross-sectional view of an IBM TCM module on
an individuFig.€5. Cross-sectional view of a Hitachi M-880
module on an indB. External Module CoolingFig.€6. Cross-
sectional view of central processor module packageFig.€7.
MCM cross-section showing heat spreader adhesively attac1)
Air-Cooled Heat Sinks: A typical air-cooled heat sink is
showFig.€8. Typical air-cooled heat sink.Fig.€9. Typical (a)
straight fin heat sink and (b) pin fin heat Fig.€10. Air flow path
through a heat sink: (a) cross flow or (b2) Water-Cooled Cold
Plates: For situations where air cooling coC. Immersion
Cooling1) Natural and Forced Liquid Convection: As in the case
of air cFig.€11. Heat flux ranges for direct liquid immersion
cooling ofFig.€12. Forced convection thermal resistance results
for simula2) Pool and Flow Boiling: Boiling is a complex
convective heat tFig.€13. Forced convection liquid-cooled Cray-
2 electronic modulFig.€14. Typical direct liquid jet
impingement cooling performan3) Spray Cooling: In recent
years spray cooling has received incIII. S YSTEM -L EVEL C
OOLINGFig.€15. IBM Liquid Encapsulated Module (LEM)
cooling concept.A. Air-Cooled SystemsB. Hybrid Air Water
57. CoolingFig.€16. Example of a parallel air-flow cooling scheme
[ 40 ] .Fig.€17. Typical processor gate configuration with air-
to-water Fig.€18. Typical air temperature profiles across five
high boardFig.€19. Closed-loop liquid-to-air hybrid cooling
system.Fig.€20. Large scale computer configuration of the
1980s with coC. Liquid-Cooling SystemsFig.€21. Modular cold
plate subsystem and water distribution looFig.€22. Flow
schematic of a typical IBM coolant distribution unD.
Refrigeration Cooled SystemsFig.€23. Cray-2 liquid immersion
cooling system.Fig.€24. Relative performance factors (with
respect to a 100 $,Fig.€25. ETA-10 cryogenic system
configuration [ 48 ] .Fig.€26. IBM S390 G4 server with
refrigeration-cooled processor Fig.€27. Configuration of
Fujitsu's GS8900 low-temperature liquiIV. D ATA C ENTER T
HERMAL M ANAGEMENTFig.€28. Equipment power trends [
53 ] .Fig.€29. Cluster of server racks.A. Room Air Flow
Designs1) Non-Raised Floor Room Cooling: Cooling air can be
supplied frFig.€30. Data center management focus
areas.Fig.€31. Zonal heat fluxes for commercial and high-
performance c2) Raised Floor Room Cooling: Computers
typically have a large nB. Factors Influencing Rack Inlet
TemperaturesC. Need for Thermal GuidelinesV. F UTURE C
HALLENGESA. E. Bergles, The evolution of cooling
technology for electricaD. Hanson, The New Alchemists . New
York: Avon Books, 1982.R. C. Chu, U. P. Hwang, and R. E.
Simons, Conduction cooling forR. C. Chu, O. R. Gupta, U. P.
Hwang, and R. E. Simons, Gas encapR. C. Chu and R. E.
Simons, Cooling technology for high performaG. F. Goth, M. L.
Zumbrunnen, and K. P. Moran, Dual-Tapered pistF. Kobayashi,
Y. Watanabe, M. Yamamoto, A. Anzai, A. Takahashi, F.
Kobayashi, Y. Watanabe, K. Kasai, K. Koide, K. Nakanishi,
andP. Singh, D. Becker, V. Cozzolino, M. Ellsworth, R.
Schmidt, andJ. U. Knickerbocker, An advanced multichip
module (MCM) for highD. J. De Kock and J. A. Visser, Optimal
heat sink design using mJ. R. Culham and Y. S. Muzychka,
Optimization of plate fin heat M. F. Holahan, Fins, fans, and
58. form: Volumetric limits to air-siF. Roknaldin and R. A. Sahan,
Cooling solution for next generatiM. Gao and Y. Cao, Flat and
U-shaped heat spreaders for high-powZ. Z. Yu and T. Harvey,
Precision-Engineered heat pipe for cooliV. W. Antonetti, S.
Oktay, and R. E. Simons, Heat transfer in elR. S. Prasher, C.
Simmons, and G. Solbrekken, Thermal contact reD. J. Delia, T.
C. Gilgert, N. H. Graham, U. P. Hwang, P. W. IngD. B.
Tuckerman and R. F. Pease, High performance heat sinking fR.
Hahn, A. Kamp, A. Ginolas, M. Schmidt, J. Wolf, V. Glaw, M.
TA. E. Bergles and A. Bar-Cohen, Direct liquid cooling of
microelR. E. Simons, Direct liquid immersion cooling for high
power denF. P. Incropera, Liquid immersion cooling of
electronic componenR. D. Danielson, N. Krajewski, and J.
Brost, Cooling a superfastL. Jiji and Z. Dagan, Experimental
investigation of single phaseP. F. Sullivan, S. Ramadhyani, and
F. P. Incropera, Extended surG. M. Chrysler, R. C. Chu, and R.
E. Simons, Jet impingement boiA. E. Bergles and A. Bar-Cohen,
Immersion cooling of digital comI. Mudawar and D. E. Maddox,
Critical heat flux in subcooled floR. C. Chu and R. E. Simons,
Review of boiling heat transfer for R. E. Simons, The evolution
of IBM high performance cooling techS. C. Yao, S. Deb, and N.
Hammouda, Impacting spray boiling for G. Pautsch and A. Bar-
Cohen, Thermal management of multichip modG. Pautsch, An
overview on the system packaging of the Cray SV2 T. Cader
and D. Tilton, Implementing spray cooling thermal managG. Lin
and R. Ponnappan, Heat transfer characteristics of spray C.
Hilbert, S. Sommerfeldt, O. Gupta, and D. J. Herrell, High peR.
C. Chu, R. E. Simons, and K. P. Moran, System cooling design
V. W. Antonetti, R. C. Chu, and J. H. Seely, Thermal design for
R. C. Chu, M. J. Ellsworth, E. Furey, R. R. Schmidt, and R. E.
SH. Bray, Computer Makers Sweat Over Cooling, The Boston
Globe, 2Y. Taur and J. Nowak, CMOS devices below 0.1 $mu
{hbox{m}}$ HoK. Rose, R. Mangaser, C. Mark, and E. Sayre,
Cryogenically cooleW. F. Clark, E. Badih, and R. G. Pires, Low
temperature CMOS A bR. F. Barron, Cryogenic Systems, 2nd
ed. New York: Oxford Univ. J. S. Kolodzey, Cray-1 computer
59. technology, IEEE Trans. Compon.,D. M. Carlson, D. C.
Sullivan, R. E. Bach, and D. R. Resnick, ThR. E. Schwall and
W. S. Harris, Packaging and cooling of low temR. R. Schmidt,
Low temperature electronics cooling, Electronics R. R. Schmidt
and B. Notohardjono, High-End server low temperatuA.
Fujisaki, M. Suzuki, and H. Yamamoto, Packaging technology
foHeat Density Trends in Data Processing, Computer Systems
and TelR. Schmidt, Effect of data center characteristics on data
procesR. Schmidt and E. Cruz, Raised floor computer data
center: EffecC. Patel, C. Bash, C. Belady, L. Stahl, and D.
Sullivan, ComputaC. Patel, R. Sharma, C. Bash, and A.
Beitelmal, Thermal considerC. Patel, C. Bash, R. Sharma, M.
Beitelmal, and R. Friedrich, SmC. Bash, C. Patel, and R.
Sharma, Efficient thermal management oH. Obler, Energy
efficient computer cooling, Heating/Piping/Air J. M. Ayres, Air
conditioning needs of computers pose problems fH. F. Levy,
Computer room air conditioning: How to prevent a catR. W.
Goes, Design electronic data processing installations for W. A.
Di Giacomo, Computer room environmental systems, Heating,
F. J. Grande, Application of a new concept in computer room air
F. Green, Computer room air distribution, ASHRAE J., vol. 9,
noM. N. Birken, Cooling computers, Heating, Piping Air Cond.,
volH. F. Levy, Air distribution through computer room floors,
BuildThermal Guidelines for Data Processing Environments .
Atlanta, GR. Schmidt, Thermal profile of a high density data
center-methodR. C. Chu and Y. Joshi, Eds., Thermal
Management, in National El
559
ADVANCED COOLING TECHNOLOGY FOR LEADING-
EDGE COMPUTER PRODUCTS
R.C. Chu
60. System 390 Division
IBM Corporation
Poughkeepsie, NY
USA
ABSTRACT
Cooling technology has been a vital prerequisite for the rapid
and continued advancement of computer products, ranging 6om
laptops to supercomputers. This paper provides a review of the
recent development of cooling technology for computers. Both
air
woling and liquid cooling are included. Air cooling is discussed
in terms of the advantages of impinging flow. An example of
module internal conduction enhancement is given. Liquid
cooling
is discussed in terms of indirect liquid cooling with water
coupled with enhanced conduction, and direct immersion
coollng
with dielectric coolants. Special cooling technology is included
in terms of the application of heat pipes and Ne possibility of
using liquid metal flow to cool electronic packages.
INTRODUCTION
During the last decade the number of circuits per chip
increased while circuit power dissipation decreased for s t a t e
4
t h M semiconductor technology. Because of advances in
microelectronic fabrication techniques, the circllit size
decreased
faster than the circuit power dissipation. This led to a geometric
growtb in the module level power dissipation. The recent switch
fmm Bipolar to CMOS 'was expected to alleviate the power
dissipation problem. In reality, the switch has only resulted in a
61. backwards shin of a b u t 10 years in the geometric growth of
module level heat flux. The change from Bipolar has provided a
short recess from the run away growul in heat flux, but the
recess
will soon be over.
A maximum operating temperature is specified for all
electronics chips in order 'to maintain a desired level of
reliability. It, is therefore necessary to be able to measure and
predict the junction temperature of each chip. One method often
used is to express the junction temperature in terms of a total
temperature budget, made up of the ambient temperature plus
the temperature rises from ambient to the chip device junction.
For a single or multichip module this budget can be expressed
as:
Tj = -+ ATh, t ATm -+ATm + Tmb
where
T, = chip junction temperature
AT,-E =junction to chip temperature rise
ATh, = module internal temperature rise = PchipRd
ATm = module external temperature rise = PmdR,
ATm = temperature rise of coolant = Q I pmc,
Tun,, = ambient temperature
As can be seen from the precedmg equation, simply
minimizing one term will not necessarily result in an acceptable
junctlon temperature. All terms must be considered. Usually
there is no way to modify the ambient temperature other than
changing the operating specification. So this is taken as fixed.
The junction to chip temperature rise is determined by the
circuit
technology and is also considered fixed. The other three terms
may be controlled through application of cooling technology
and
62. enhancements. In virtually all cases, the dominant temperature
rises will be the internal and external rises. The internal
temperature rise is the product of chip power and the internal
thermal resistanw from chip to case. ?he external temperature
rise is the product of module power and the external thermal
resistance from the case to the cooling fluid. A variety of
techniques have been utilized in different cooling system
designs
to minimize the internal and external thermal resistances, as
well
as the temperature rise.of the coolant. This paper addresses
some of the methods that have been used to provide reliable
cooling systems for leading edge computer products.
AIR COOLING
not as good as for most water or immersion systems, most
computers still depend wholly or in part on air woling. This
includes everything from portables to mainframes.
One technique now being used to extend air cooling may be
called highly parallel impingement. With this technique each
module on a card receives an individual, unheated air stream.
The individual air streams eliminate the problem of cooling air
temperature rise; the fourth term in the temperature budget. The
top illustration of figure 1 shows the concept. Air is drawn from
the card side of a large plenum through appropriately sized
orifices in the plate separating the plenum and the modules. As
shown, the air impinges against the pin fm heat sinks and flows
into the return plenums. From there the air is drawn through the
blower and pushed out of the system.
Although the thermal performance of an air cooled system i s .
Highly Parallel Impingement Flow
63. Conventional Cross Flow
Figure 1. Parallel impingement and conventional crowflow.
0-78034306-9/98/$10.00 1998 IEEE
560
R,("CIW)
original Description
A couple of things should be noted in this impingement
scheme. First, either pin fin or parallel plate f i i heat sinks
could
be used. Second, depending on the power dissipation of the
individual modules, the vertical expanse of the supply plenum
could be made quite small, and could bc incorporated into a
book
package.
By using many individual air streams, the sensible energy gain
of an air streum cooling a given module is limited to the power
dissipated by the module alone. In a conventional systan, where
the air enters along one edge ofthe card and flows across several
rows of modules. as shown in the bottom half of figure 2, either
the pressure drop is tolerable but the air temperature rises
beyond limits (low air flow rate) or the air temperature rise is
not
excessive but the total pressure drop becomes too great (high air
flow rate). While the ducting of the highly parallel impingement
cooling scheme is more complex, it can solve a number of the
thermal and hydraulic problems associated with air cooling a
large array of modules.
64. MODULE INTERNAL CONDUCTION ENHANCEMENT
Selection of an appropriate cooling technology must consider
both internal and external resistances. As indicated in the
junction temperature equation, even if any resistance is r e d d
to zero it would still be necessary to contend with the sum of
the
other resistances. Ideally it would be desirable to minimize all
the resistances. Much work over the past few years has lead to
new heat sink designs and new cooling approaches (such as
impingement) to reduce external thermal resistance. Advances
in
the reduction of the external resistance, while good it itself, will
lead to a limit dictated by the internal resistance. Therefore,
consideration must be given to techniques to enhance the
internal
resistance of single and multichip modules. The *que
which will be discussed here is the use of a thermal space
transformer.
In a typical single chip module (SCM), the thermal path
betwcen chip and housing is mainly through a paste or epoxy
layer between the back side of the chip and inside of the module
cover. The paste or epoxy gap is relatively thick to
BcMMnodate
manufacturing tolerances and results in a relatively large
thermal
resistance. Consider a single 18.2 mm chip in a 42.5 mm
ceramic column grid array (CCGA) module. Using a high
thermal conductivity paste (k=3.8 W/m K) between the chip and
the cover (1.0 mm aluminum) yields an internal resistance of
0.286 "CIW. As can be seen in the table, the largest contributor
to the internal resistance is the chip to cover resistance. If the
paste layer could be spread over a larger area, this thermal
resistance could be substantially reduced.
65. R&(oCIW)
Thermal
Oil Interface Heat Sink
cover
spreader
Paste Interface
Oil Interface chip
Substrate
Figure 2. Single chip module with thermal space transformer
Table 1. Thermal resistances for conventional and
thermal transformer enhanced packages.
As shown in figure 2, the thermal space transformer spreads
the heat out and provides a larger area for conduction through
the
paste. In the module with the thermal space transformer, the
paste layer thickness remains the same. Although this design
results in more interfaces, as can be seen in Table 1 the overall
internal thermal resistance is smaller. Compared with the
original design, about 40% smaller. This temperature savings on
the internal si& will allow either a smaller heat sink, or a lower
cooling air flow rate.
LIQUID COOLING
While air cooling can be thought of as the basic yardstick for
amparing the efficacy of electronic cooling systems, there are
many applications which require a more substantial cooling
system. Air is cheap, but water provides the largest capacity for
heat removal in computers. Due to P a f o m c e requirements, a
single module may often dissipate a considerable amount of