1. Chip-Package Thermal Co-Simulation Technique for Thermally Aware Chip Design
Kamal Karimanal
ANSYS Incorporated
275, Technology Drive
Canonsburg, PA, USA, 15317
Phone: 724 514 3650
Email: kamal.karimanal@ansys.com
ABSTRACT
This paper proposes an early stage, fast and accurate approach for a temperature aware, chip level circuit design. In essence, the strategy involves up front characterization of the package and the surroundings using rigorous thermal simulation techniques for on-demand available, automated and accurate thermal model for the silicon design team. The model takes as input, the chip power map and outputs steady state temperature map. This was accomplished by using a method of linear superposition for characterizing the detailed model of a chip along with its packaging and relevant surroundings. The proposed repository of compact models in library form obviates the need for advanced background in heat transfer or time consuming computations at the point of use which have traditionally been a bottleneck to early stage adoption of thermal management during chip design.
KEY WORDS: chip package co-design, silicon package co- design, power map, linear superposition, hot spot, chip temperature distribution.
NOMENCLATURE
Cij Influence Coefficient (C/W) for power matrix element i, due to heat generation at element j
T Centroid temperature ( C ), of power map element
Q Heat Dissipation (W)
Greek symbols Error
INTRODUCTION
Traditional electronics thermal management has been focusing on controls at the downstream end of the supply chain, such as fan sizing, fan and vent positioning, heat sink optimization, cost-performance tradeoff etc. These are essential functions of thermal management which will continue to be important steps in ensuring reliable electronics products. Early stage thermal management at silicon design stage is non-existent for most part due to various challenges: Inability to accurately predict chip power and its distribution Applicability of the assumption of uniform heat distribution on the chip justified the simplicity of
power minimization as the single thermal goal at chip design stage. Absence of awareness in the past for the total cost of ownership of electronic equipment that includes energy costs. Turnaround time for accurate, realistic temperature prediction for each power distribution configuration during chip design stage was prohibitive due to computational and work flow challenges.
Methodologies and tools to predict chip power distribution with a fair level or certainty for a given operational mode are available and continue to improve ([1], [2], [3]). Continuous miniaturization, multi core configurations and the trend towards system on chip have ensured that chip heat distribution is always non-uniform and can vary depending on operating mode. The total cost approach to equipment choice being practiced by datacenter operators and consumers have made energy efficiency an important design criteria. Finally and most importantly, proposal of a solution to the thermal prediction turnaround time challenge due to the computational and workflow issues mentioned earlier is the focus of this work.
The possibility of the lowest chip power not necessarily being the most appropriate design criteria was articulated by Hamman et al [4]). This work also used the principle of linear superposition for estimating power distribution from IR camera readings of temperature distribution. A technique for quick estimation of steady state chip temperature distribution using linear superposition approach and analytical spreading resistance based model was proposed by Sikka ([5]). The concept of utilizing test measurements or simulation predictions of temperatures at points of interest from a sufficient series of linearly independent physical or numerical experiments was proposed by Stout ([6], [7]). The authors used an example problem containing 3 sources and 5 points of interest to demonstrate the validity of the approach to steady as well as transient temperature prediction.
This work will apply the linear superposition technique to a 10X10 array of heat sources (power map) inside a flip chip type package cooled by forced convection cooling. The applicability of linearity assumption to typical forced convection heat sink cooled scenario will be studied using
2. ANSYS Icepak software and an in house linear solver program for performing the characterization, validation and fast temperature prediction for any given power map condition.
DESCRIPTION OF METHODOLOGY APPLICABILITY
The methodology used in this work involves the following essential parts:
Characterization
Characterization part involves the following steps:
1. Build a sufficiently detailed thermal model of the chip, its packaging and heat sinking mechanism.
2. Create rectangular array of heat generation surfaces on the active surface of the chip. The array could be 10X10 or 30X30 etc., depending on the nature of the power map matrix which will be used as input at the end usage stage.
3. Perform as many simulations as the number of sources in the array by sequentially subjecting each of the source to a known identical power, while all other sources in the array are not subject to any power. The temperatures at the centroid of each of these sources will form the response for each of the mutually independent numerical experiment. The results from characterization simulations can be used to estimate self and mutual influence coefficients as follows:
Cij =( Ti – Tamb )/Pj (1)
The coefficient matrix determined from the above characterization can be used to determine temperature for that system corresponding to any given power map matrix [Q] as follows:
[T] = [C] * [Q] (2)
Validation
The temperature predicted in this manner was compared against ANSYS Icepak Simulation prediction using complete CFD simulations of the system for which the power map [Q] was applied to the source objects. Validation was done by means of a quantification which defines estimation error by comparison with simulation as follows: Tij-est – Tij-sim ) / (Tij-sim) (3)
A 20mm X 20mm X 1.15mm chip packaged inside a flipchip type packaging was used for the study. Details of the packaging are shown in figure 1. The package was cooled using an extruded heatsink. Dimensions of the heat sink are depicted in Figure 2. Complete assembly details for the scenario to be studied is as shown in Figure 3. The material properties used are listed in Table 1.
Fig. 1 Cross sectional details of Flip Chip Package model
Fig. 2 Cross section of the heat sink used in simulations
Figure 3: Overall assembly and model set up.
Part Name
Thermal Conductivity (W/m/K)
Chip
180
Underfill region
0.8 in-plane, 10 Normal
Solder Ball Region
0.03 in-plane, 12 Normal
Die Attach
0.8
Internal Heat Spreader (IHS), Copper
380
Aluminum Heat Sink
205
Substrate
20 in-plane, 0.35 Normal
PCB
20 in-plane, 0.35 Normal
Table 1: Properties used in Sample Case
3. Applicability and Scope
Equation (2) assumed a linear system. The heat conduction in the solids is indeed linear since the temperature dependence of material conductivity is expected to be minimal within the range of temperatures of interest to electronics cooling. However, the natural convection and radiation modes of heat transfer are dependent on the distribution of exposed surface temperatures. Hence the proposed methodology is not applicable for specific natural convection or radiation cooled scenarios. Even forced convection heat transfer coefficient distribution on the exposed surfaces is a function of temperature distribution (and a function of the chip power as a result). The above difference is due to the dependence of thermal boundary layer growth on the surface heat flux distribution. However, the sensitivity of predicted temperature is expected to be milder for forced convection cooling compared to natural convection and radiation cooling. This sensitivity of prediction accuracy to the uncertainties in heat transfer coefficient for an example electronics cooling problem is one of the objectives of this work.
More importantly, the methodology is aimed at a thermally aware chip design. Often times, the end use application is unknown at the early chip design stages. Hence linearity assumption for external forced convection is likely to lead to much smaller uncertainties than the single Rja metric which is widely used for temperature estimation at the chip design stages today.
Figure 4 shows a schematic of organizational workflow that will help realize thermally aware chip design using the proposed methodology. This workflow addresses concerns of bottlenecks to expected turnaround and estimation accuracy. The workflow begins with a team with thermal core competency ensuring appropriate boundary conditions and representative models to create the library of characterized data from which the [C] matrices shall be created. The libraries may be made available on the organizational intranet, allowing the chip design team to estimate temperatures for every heat distribution scenario without having to build and solve a full computational simulation run every time. The turnaround at the point of use for the proposed methodology is expected to be just a few seconds, as opposed to the simulation set up and manual communication of data which can easily run into days.
It should be pointed out that the total simulation time for the upfront characterization is likely to be longer than the simulation time needed for accurately predicting temperature distribution for any one power map scenario using detailed Computational Fluid Dynamics (CFD) model. Given this fact, value of the proposed methodology should be considered in the following context:
Figure 4: Workflow schematic to facilitate organizational deployment.
The number of power map scenarios to be studied will be a lot more than just a few once the proposed approach is deployed for temperature prediction at chip design stage. Since the exact final application environment could be multifarious, the chip thermal criteria could be based on some typical cooling scenarios such as passive forced cooling, active heat sink air cooling, liquid cooling, etc. For such typical modeling purposes, the level of details to be incorporated need not lead to several hours of computation for each of the linearly independent characterization runs. At the same time the proposed approach will capture necessary conduction and convection pathways to make it significantly more applicable and valuable than the simplistic, Rja or film coefficient on chip type treatments. Even the time consuming characterization simulations are justified by the fact that they can be run simultaneously since the inputs for characterization runs are independent, and before the chip level circuit design even starts. Since the characterization has been highly automated, the cost of manual efforts is minimized. Thus the on-demand availability of accurate and fast temperature prediction capability will more than pay for the up- front characterization effort and costs.
RESULTS AND DISCUSSION
A 10 X 10 array power map was used for validating the methodology proposed. This distribution of heat generation was applied on the surface of the chip. The power array is shown in Table 2:
4. Table 2: Chip surface heat distribution (power map) used for validation.
The temperature distribution on the chip surface due to the applied power map was determined using the methodology proposed in the previous section. The 10X10 array required 100 linearly independent cases to be simulated using ANSYS Icepak software. An application specific user programmable interface and workflow automation were scripted using the macro capability available in ANSYS Icepak. TCL/TK programming language was used for this purpose. This automation, created the 10X10 array of heat generating surfaces and parameterized the power input. The scripting also set up the 100 linearly independent numerical experiment cases and temperature reporting from which the [C] matrix was created. A program was written in C++ programming language to take as input the reports written by ANSYS Icepak and the power map in Comma Separated Values (CSV) file format and produce temperature prediction by implementing the formula in equaltion [2]. The temperatures thus predicted are shown in Figure 5. (top image). The predicted temperatures using the proposed methodology were validated against the traditional CFD simulation approach that directly used the power map as the heat generation source term on the chip surface. Results from the CFD simulation is shown in Figure 5 (bottom).
Results show that the predictions using the two methods were almost identical. To quantify the agreement in results the two data were used in equation (3) to estimate the percentage error on all the 100 sources. The estimates are shown in Figure 6 which shows less than 1% error for all the cases. The alternating pattern in the error estimates is hard to ignore. A closer evaluation of thermal boundary layer behavior will explain this pattern. The pattern is because of the fact that the fin surface heat transfer coefficient distribution is dependent on the heat source distribution. In Figure 6, each of the sets of 10 data points from the left to right was for source locations starting from downstream edge and progressively moving towards the upstream edge. The self heating influence coefficient for the downstream sources determined from characterization runs tend to predict lower temperature, since the thermal boundary layer initiates just ahead of the source location, resulting in higher fin surface heat transfer coeffcienets. However, during actual validation case, the thermal boundary layer initiated much upstream since all the sources were heated. Hence the error estimates were negative for the downstream sources.
Despite the above cause for error, the overall error estimates are acceptably low. One should bear in mind that the error in heat transfer coefficient differences between the trial and validation cases affects only one link of the heat transfer pathway. The other links such as the spreading resistance, contact resistance at the interfaces and all other conduction resistances in the system are all independent of power distribution and hence do not introduce any error to the prediction.
Figure 5: Comparison between temprature map estimated from equation 2 (top) and CFD results from ANSYS Icepak (bottom)
5. Figure 6: Percentage difference between estimation and Icepak simulation prediction for the 100 sources.
Summary & Conclusions
A methodology for effectively deploying thermally aware chip design was proposed. The methodology involves one time pre-characterization followed by several temperature estimations at the chip design stage using a computer program written to implement the linear superposition based approach. Scripts were also written to automate the characterization workflow in ANSYS Icepak software. The automation helps make an otherwise tedious process practical for organization wide deployment.
The temperature estimation for a typical forced convection cooled scenario was compared with the full fledged CFD simulation of the same power map. The comparison yielded less than 1% difference in temperature estimations. The trend of differences, however pointed to systematic uncertainties due to non-linearities in forced convection heat transfer coefficient caused by the dependence of thermal boundary layer growth to chip surface heat distribution.
References
[1] D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a Framework for Architectural-Level Power Analysis and Optimizations, International Symposium on Computer Architecture, Jun 2000
[2] K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-Aware Microarchitecture, Proceedings of the 30th Annual International Symposium on Computer Architecture, Jun 2003.
[3] S. Wilton and N. Jouppi. CACTI: An Enhanced Cache
Access and Cycle Time Model. IEEE Journal on Solid-State
Circuits, May 1996.
[4] H.F. Hamann, J. Lacey, A. Weger, and J. Wakil. Spatially- resolved imaging of microprocessor power (SIMP): hotspots in microprocessors. In Thermal and thermomechanical Phenomena in Electronics Systems, 2006, pages 121–125. IEEE Computer Society, May 2006.
[5] An Analytical Temperature Prediction Method for a Chip Power Map, Sikka, K. 21st IEEE SemiTherm Symposium, 2005.
[6] Linear Superposition Speeds Thermal Modeling (Part I), Stout, R., Power Electronics Technology, January 2007.
[7] Linear Superposition Speeds Thermal Modeling (Part II), Stout, R., Power Electronics Technology, February 2007.