Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OLA Conf 2002 - OLA in SoC Design Environment - paper


Published on

The integration of Open Library Architecture (OLA) libraries within nano-technology design environments can positively impact SoC design cycle times. Consistent calculation of desired information across a standard application programming interface (API) ensures analysis convergence among tools, eliminates data exchange processing and storage requirements, and significantly reduces iterations through design processes steps.

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

OLA Conf 2002 - OLA in SoC Design Environment - paper

  1. 1. Benefits of OLA Integration into Nano-Technology SoC Design Environments 2002 First Annual OLA Developer’s Conference February 11-12, 2002 San Jose, California Timothy J. Ehrler Senior Principal Methodology Engineer SoC Methodology Development, Design Technology Group Philips Semiconductors 8372 S. River Parkway, Tempe AZ 85284 Abstract frequencies and decreased cell delays which are As technologies progress to sub-100nm level, becoming more susceptible to IR drop and more increased chip densities are allowing greater dependent upon output loading and input slew rates functionality to be combined onto a single die. than with previous technologies. At the same time, Increasingly complex designs are evolving from what timing has become increasingly affected by had previously been sets of ASIC chips into a highly interconnect related issues such as cross-coupling, integrated system on a chip (SoC). This added wire inductance, and signal noise. complexity is reflected not only in that of the design itself, but also in the demands placed upon the EDA As technology progresses to, and even exceeds, the tools and methodologies necessary to implement such sub-100nm, or “nano-technology”, level, the designs. capability exists to implement a complete functional system on a single chip. Whereas previous Critical to the SoC design cycle is the convergence to technologies had necessitated the implementation of sufficiently accurate timing and power. Most EDA a total system “solution” to be distributed across a methodologies rely on tool-specific, proprietary number of advanced ASIC chips, current characterization data views, or that of a “de-facto” technologies now allow, and indeed encourage, the standard format. Calculation algorithms differ, as do complete implementation within a single “system-on- critical signal integrity (SI) analysis capabilities, with chip” (SoC). designers encountering inconsistent, divergent results, often among different tools from the same 2. Design Flow Complexity vendor. The necessary exchange of large volumes of In order to realize the implementation of such timing information among tools, with the associated expansive designs, however, a new paradigm has storage and export/import time costs, further impacts emerged which focuses on integrating previously design cycle times. Multiple passes through design developed and validated complex blocks of logic processes magnify these impacts. and/or intellectual property (IP), cores, and memories. The high levels of integration associated The integration of Open Library Architecture (OLA) with this paradigm is dramatically increasing the libraries within nano-technology design interconnect to cell delay ratio, requiring more environments can positively impact SoC design cycle accurate timing calculation methodologies based times. Consistent calculation of desired information upon the emerging deep sub-micron (DSM) across a standard application programming interface interconnect issues. (API) ensures analysis convergence among tools, eliminates data exchange processing and storage 3. Technology & Design Information requirements, and significantly reduces iterations In order to address these technology and design through design processes steps. issues, many more tools are being injected into traditional design flows, most of which analyze, 1. Technology Advancements generate, or depend upon, concise timing and/or Semiconductor technology has been advancing at power information to arrive at optimal design least as rapidly as the rate predicted by Moore’s Law. solutions. Worse still, much of this information is As transistor sizes have decreased, so too have exchanged among tools by formatting and exporting associated cell sizes, with increased device operating to mass storage from one tool, followed by importing -1-
  2. 2. from storage, parsing, and interpreting that data within another tool. RTL LIB Although the format or content of traditional Synthesis Delay representations of the characterized information Optimization Scan Insertion Calculation (Tech. WL) LIB required by a particular tool may be well defined, the interpretation of that data, calculation algorithms Netlist SDF Slew involved, and accuracy of such calculations may Report differ significantly among tools. The resulting Static inconsistent, and oftentimes correspondingly Timing LIB Analysis inaccurate, timing information substantially contributes to increased design cycles which rely on Functional Simulation LIB consistent and accurate timing to accomplish solution design objectives [6]. Formal Verification 4. Traditional Design Flow Import LIB Library In order to illustrate the major issues facing timing Floor closure driven design flows, we’ll first review a Planning typical design flow using traditional library formats. This flow, restricted to only a relevant subset for this Design Database Wireload Extraction Custom discussion, is illustrated in Figure 1. The simplistic Wireloads assumption herein is that the user’s design flow may Delay encompass a variety of tools from multiple tool Calculation LIB (Custom WL) vendors, including the foundry-provided delay calculator required for sign-off. This also implies, Netlist SDF Slew perhaps in the extreme, that each tool, or type Report thereof, requires its own library, the format of which may be industry standard, “de-facto” standard, or Place & Route Clock Tree Static Timing LIB proprietary, and may not be common to other tools Pad Ring Analysis within the flow. Parasitics Extraction SPEF Of particular note within the timing sections of the design flow, shown within the shaded areas, is the Delay LIB Calculation inclusion of a foundry or semiconductor vendor (Parasitics) supplied delay calculation tool. This tool generates a timing back-annotation SDF file using (perhaps) Netlist SDF Slew Report proprietary timing calculation algorithms specific to the supported technology. In addition to providing Static Timing LIB delay and constraint timing, it may also provide a Analysis slewrate, or ramp times, report as well. Scripts or other tools may process this report, or it may be Formal Verification LIB directly imported by the static timing analysis and/or synthesis and optimization tools, using such information as constraints for further analysis. This Figure 1. Traditional Design Flow becomes much more critical to timing closure within later physical design phases since design 4.1 Pre-Route Timing Closure performance becomes increasingly impacted by slight Preliminary timing closure is usually performed after changes to the design itself, where slewrates may the initial RTL-to-gates synthesis process in order to become more consequential than delay times. arrive at a sufficiently practical implementation of the design solution within given performance Although functional simulation and formal specifications. This phase may also require closure verification process steps are included within the for gross power consumption, which may or may not illustrated flow, they are not relevant to the initial be arrived at using additional analysis tools. timing closure discussions, but their presence within Interconnect timing is estimated using the technology the flow will be touched on when discussing an OLA library’s wireload tables, which can be detrimental to based design flow. the closure cycle since such models are statistical by -2-
  3. 3. nature, and can not reflect the varying interconnect clock tree synthesis, and I/O pad ring processing characteristics among IP, cores, and random logic. steps have been completed. At this point in the design cycle, the design has been completely implemented at Although the iterations through this process of timing the physical level, and all information required to calculation, static timing analysis, and logic achieve power and timing closure is available to the optimization may not be as numerous as when respective tools. performed in later physical design phases, an especially high performance design may require a Of particular relevance to this discussion is the significant number of iterations when implemented timing closure iteration cycle. Included within this with low-speed/low-power, i.e. low-performance, process is the major overhead of parasitics extraction, technology libraries. The greater the disparity with the associated I/O, storage, and processing costs, between the design performance objectives and the all of which can be tremendous. At this stage, performance of the implementation technology, the extracting the parasitics and generating the SPEF file more iterations which must occur in order to achieve can take 10’s of hours of processing time and multi- initial closure. As shown, however, the cost of each Gbytes of storage space. Conversely, importing that iteration cycle is the generation and back-annotation information can take even longer since the contents of SDF and slewrate information files, along with the must be parsed and processed in a manner dictated by associated processing, I/O, and storage resource the consuming tool, and may require a costs. correspondingly large memory requirement to do so. In addition, SDF generation can take many hours and If there are any discrepancies between the timing consume many 100’s of Mbytes of storage, with the view from which the SDF file has been generated and same impact of importing and processing that that of the consuming tool, considerable efforts are information by its consuming tool. required to modify the SDF to conform to those views demanded by the latter. Given the significant 5. Timing Closure Impediments size and content of this timing information file for Because of the methodologies employed within SoC designs, conversion tool limits may well be traditional design flows, the deficiencies that can be exceeded by the complexity of the task. attributed to the representation, organization, exchange, and processing of characterization and 4.2 Floor Plan Timing Closure design information, the efforts involved in achieving Secondary timing closure may be performed after timing closure with large SoC designs can be initial floor planning but prior to final placement and immense, requiring significant resource commitments routing of the design. At this point in the flow, in terms of compute facilities, mass storage, custom wireload models may be derived from the personnel, and design time. The major issues floor plan in order to make a more meaningful contributing to ineffective timing closure include estimate of interconnect timing. Iterations through timing calculation methods, interconnect analysis, this phase can assist in reaching gross placement view consistency, and information exchange. The timing, but can be very deceptive since the derived limitations and restrictions caused by these issues custom wireloads are still statistical, although result in additional iterations within stages of the targeted at this particular implementation only, yet design cycle, oscillating around design performance still can not accurately account for the varying types targets as the designer attempts to converge on of interconnect among the blocks and gates. sufficiently accurate timing. In addition to the costly overhead of SDF processing, 5.1 Timing Calculation Methods there are also the costs, though not nearly as severe, Each tool within a design flow will usually contain its of processing the custom wireloads. Design changes own timing engine, based upon the supporting library resulting from the timing analysis warrant views containing pertinent characterization data, the corresponding changes to the design database. This, algorithms of which are sufficiently different enough in turn, requires the extraction and generation of a that the timing obtained from one tool may be netlist file for those tools not having direct access to inconsistent with that of another, and may be the database, with the associated time and storage performed with varying levels of accuracy among costs. them. Methods and calculations regarding the derating and/or scaling of this timing will differ, as 4.3 Post-Route Timing Closure will the capability to perform instance-specific versus The most critical phase of design implementation is global PVT point processing to account for IR drop the final timing closure after placement, routing, and thermal effects. -3-
  4. 4. 5.2 Interconnect Analysis SDF format, 2.1 and 3.0, may require both flavors be In addition to differing timing calculation methods, generated to satisfy the consuming tool requirements, each tool may have its own interconnect analysis which can convey inconsistent information to the algorithms as well. Different methods of network various tools. reduction may be employed, loads may be calculated as lumped or effective, and network driving Some tools may support specific constructs, such as waveforms and subsequent propagation throughout REMOVAL timing, but others may translate them to may or may not be implemented or supported, and another, such as HOLD constraints, while still others most probably differ among tools. may ignore them completely. Although an SDF can represent a triplet of timing as well as a single timing Signal integrity issues, such as cross-coupling effects point, SDF generators may well only support one, and noise-propagation, may or may not be while consuming tools may only support the other. If implemented, or may be implemented sufficiently a triplet representation is required, but the generation differently as to appear conflicting among tools. is at a single point, multiple generations must be performed to obtain the corresponding points and the 5.3 Timing View Consistency results merged into a single SDF file, with the Tools and library views are inherently coupled, associated I/O, processing, and storage overhead resulting in inconsistent, and oftentimes conflicting, costs it imposes. timing representation among the many library views consumed within the design flow, depending on the Aside from the above issues, significant overhead capabilities and purposes of the tools involved. costs of exchanging timing information among tools Timing may be conditional within one library, in this manner are involved. The generation of the unconditional in another, and omitted entirely in information requires formatting, I/O processing, and another. Complementary constraints may be storage resources, while the consumer requires I/O, described differently, such as a SETUPHOLD parsing, and processing resources. With large SoC window in one and separate SETUP and HOLD designs, the can well take tens of hours and hundreds timing in another. Interpretation and support for of Mbytes of storage with each tool. timing constructs may differ, such as REMOVAL being treated as HOLD or ignored altogether. 6. Open Library Architecture (OLA) The creation of the Delay Calculation Language 5.4 Timing Information Exchange (DCL) based Delay Calculation System (DCS) by With SDF files being used as the most common IBM introduced the concept of embedding timing method of exchanging timing information among calculation algorithms within a technology library. tools, insufficient, inconsistent, and inaccurate The application would “converse” with the library information is presented to the consuming tools. One through a standard set of application programming of the most consequential deficiencies of the format interfaces (API) to request particular timing is the absence of available slewrate information, information rather than accessing and interpreting which becomes more critical to analysis and design raw timing information from a library and then tools for DSM SoC designs. Lacking this calculating the desired result. Later enhancements to information, a tool may derive inaccurate ramp times, include power calculation capabilities resulted in the default to an incorrect value, or simply assume a 0.0 IEEE 1481-1999 standard for Delay and Power value, all of which will severely affect tools that rely Calculation System (DPCS) [1]. on skew information for critical paths or structures, such as clock trees. Subsequent extensions to the system to include graph-based functional descriptions, vector based There can be significant differences in the timing timing and power arithmetic models, and cell and pin view defined within the library from which the SDF properties and attributes from Accellera’s Advanced is generated and that of the consuming tool, resulting Library Format (ALF) standard [2][3] further in unsuccessful back-annotation or, even worse, expanded its capabilities. This resulting SI2 Open default or erroneous timing. An SDF generation tool Library Architecture (OLA) standard was further may merge interconnect timing with path timing improved upon to include more concise APIs for rather than being separately specified, preventing interconnect parasitics, with later additional APIs consuming tools from properly performing their developed to address signal integrity issues such as function. Calculated negative timing may or may not cross-coupling, noise propagation, parasitic analysis, be generated in the SDF, and consuming tools may or and physical characteristics for floor planning, may not accept it. Support of multiple versions of the placement, routing, etc. [4][5]. -4-
  5. 5. 6.1 OLA Concept APP: request timing from OLA The purpose of OLA is to provide a single method by OLA: get passed timing path which information required by an application is OLA: request PVT from APP consistently and accurately calculated. It replaces the APP: return PVT traditional method of parsing and interpreting OLA: get passed ‘ck’ slew characterization information from varying view OLA: request ‘q’ load from APP formats and calculating the desired results using APP: return ‘q’ load application-specific algorithms with a compiled OLA: calculate timing library from which the desired information can be (early/late delay/slew) programmatically requested, calculated, then returned OLA: return timing as shown in Figure 2. APP: use requested timing information Tool 6.2 OLA Benefits By embedding the algorithmic calculations within the library itself, consistent results are always obtained Tool DPCS DPCS OLA for use by the requesting application. Slew/rate LIB information is calculated in conjunction with delay Tool timing, providing those tools additional information not otherwise directly available through SDF timing information exchange. Since network reduction, Figure 2. OLA Concept parasitic analysis, cross-talk, and noise propagation methodologies are embedded as well, interconnect timing calculations between cells is as consistent as The concept of OLA, and of DPCS in general, is that that within cells. Providing this consistent and an application dynamically links the OLA library at additional information, as well as eliminating runtime, and “converses” with the library through a annotation failures due to timing view inconsistencies standard defined set of a programming interfaces and conflicting/ambiguous annotation information (API) to obtain such information as is needed by that interpretation, significantly reduces timing closure application. The application initiates the request for iteration cycles. information, and the library responds to the requests, returning the requested information. It does so by The generation of multiple SDF files at different PVT using the information provided through the API, points, and overhead costs of merging them into an using internally cached information, using library acceptable form for consuming tools, is eliminated characterization information, and/or requesting since instance-specific timing calculations are additional information from the application, then supported. Incremental timing is easily performed on calculating the requested information and returning it demand, again eliminating the requirement for SDF to the application. At any time during this generation and annotation to account for incremental “conversation”, additional information may be design changes. requested until all required information has been collected, the results calculated, and then returned to Because timing information and algorithms are the requestor. compiled into the library instead being made available in a readable format, intellectual property A very simplistic example of this interaction can be content can be hidden from the user. This protects the shown for an application, such as a static timing vendor’s IP, allows for the implementation of internal analysis tool, requiring timing within a flip-flop cell timing within the IP, and also prevents local ‘dff’ from the rising clock ‘ck’ to falling output ‘q’. “hacking” of library information by users. dff In addition to providing a consistent calculation methodology, functional expressions, such as d q specified for conditional timing and functional behavior, is available in a graph-based form. This removes from each application the requirement to ck parse and interpret expressions, again eliminating inconsistent interpretation of library information Figure 3. Timing Example among tools. It also provides consistent functional information such that synthesis, formal verification, -5-
  6. 6. and simulation tools can use the library as well, eliminating even more views from the design flow. RTL OLA LIB 6.3 Design Flow Usage Synthesis The most productive usage of OLA libraries within Optimization Scan Insertion the design flow involves those stages relating to timing closure. By replacing the separate typical Netlist Static Timing static timing analysis sub-flows involving the Analysis foundry-supplied delay calculator (Figure 4) with one Functional interfacing with the OLA library (Figure 5), a more Simulation concise, consistent, and accurate timing analysis can Delay and Power Calculation System (OLA) be performed. This eliminates the need for a stand- Formal Verification alone delay calculator since the timing algorithms contained therein are now included within the library Design Database itself, and provides slew as well as timing Floor Planning information to be provided to the analysis tool. Wireload Extraction Custom Delay Calculation LIB Wireloads Static Netlist Timing SDF Slew OLA Analysis Report LIB DPCS Place & Route Static Static Clock Tree Timing LIB Timing Pad Ring Analysis Analysis Parasitics Extraction SPEF Figure 4. Typical Timing Analysis Figure 5. OLA Timing Analysis Static Netlist Timing Notably missing from this sub-flow is the SDF file, Analysis the usage of which for timing back-annotation is no Formal longer required. The reduction in the number of Verification required library view formats to that of OLA only, eliminating perhaps inconsistent and inaccurate Figure 6. OLA Based Design Flow timing views from the analysis sub-flow, promotes faster timing convergence as well. 7.1 Timing Closure Improvements The consistent and accurate timing calculation The combination of compatible timing views, algorithms embedded within the OLA library allow consistent timing calculations, and the elimination of faster convergence to a reliable timing solution. This incomplete timing information exchange through the capability is available for pre-route, floor plan, and intermediate SDF file, greatly reduces the number of post-route stages of timing closure, all using iterations required to converge upon a timing consistent timing calculation methods and algorithms solution. embedded therein. 7. OLA Based Design Flow Iterations within the timing closure stages of the An equivalent design flow which integrates OLA design flow are significantly reduced primarily due to libraries therein, replacing the stand-alone foundry- this combination of accurate and consistent timing supplied delay calculator and SDF back-annotation calculation. The single timing engine within the file, is shown in Figure 6. The relative simplicity of library itself provides consistent information to the this flow with respect to the previous typical one is application, complete with slew times, for both early immediately apparent by the simplified timing and late timing. Tool-specific algorithms are avoided, closure stages, as well as the notable reduction in the as are the commonplace incompatibilities among number of required library views for the various tools differing timing views usually present within the included within the flow. associated libraries. Back-annotation of [in]complete timing information using SDF files is also avoided since such information need not be exchanged among -6-
  7. 7. tools, but rather are calculated and provided as Interconnect analysis, with due consideration of needed by the application. signal integrity issues, can be calculated in a consistently accurate manner, allowing faster timing Instance specific PVT-related timing is provided for convergence once the physical implementation of a consideration of IR drop and thermal affects, as is the design is realized. In addition, instance-specific PVT- capability to provide incremental timing as opposed based timing provides for increased accuracy where to requiring generation and exchange of complete IR drop and thermal effects may manifest block or design timing information. themselves, and the capability to provide incremental timing on demand eliminates the need for further Interconnect timing calculations, with the associated iteration cycles. parasitics network reduction and waveform propagation algorithms, are part of the library timing Above all, the elimination of SDF file based timing engine, and provides consistent results to all information exchange requirements among tools, applications. Signal integrity issues such as cross-talk with the incurred compatibility, resource, and time can be implemented therein, as can be the inclusion costs, greatly improves design development of inductance for RLC rather than RC based timing. productivity. 7.2 Extended Integration In conclusion, the integration of OLA libraries within In addition to the elimination of the delay calculation a design flow, in conjunction with appropriate OLA- tool and SDF file, note the further elimination of compliant tools, can significantly improve design many of the tool-specific library views. Since OLA efforts by reducing timing closure time through the libraries provide information and associated use of more accurate and consistent timing algorithms in a standard accessible method, and calculation methods, which directly contributes to provide for more than just timing and power analysis reduced design cycle time. tools, OLA-compliant tools other than those intended strictly for static timing analysis can be integrated References into the design flow as well, further reducing the [1] Design Automation Standards Committee of the need for the various formats of tool-specific views IEEE Computer Society, “IEEE Standard for previously required. Such tools include synthesis, Integrated Circuit (IC) Delay and Power scan insertion, optimization, functional simulation, Calculation System”, IEEE 1481-1999, 26 June formal verification, and many others. 1999. [2] Accellera, “Advanced Library Format (ALF) for An extremely aggressive integration of OLA- ASIC Technology, Cells, and Blocks”, revision compliant tools and libraries, utilized wherever 2.0, 14 December 2000. possible within a complete industry design flow [7], [3] IEEE P1603, “A standard for an Advanced can dramatically reduce the number of required Library Format (ALF) describing Integrated library views, as shown in Figure 7, yielding Circuit (IC) technology, cells, and blocks”, corresponding improvements within the design flow. revision draft 2, 12 November 2001. [4] Silicon Integration Initiative, “Specification for Design Process Tools Standard / Total OLA Total Format Proprietary Formats Replaceable Formats Reduction the Open Library Architecture (OLA)”, revision Formats / Deleteable RTL Development/Analysis 5 3/0 3 2/0 2 33% 1.7.04, 3 January 2002. Design Synthesis 7 4/6 10 4/1 6 40% Logic/Timing Verification 17 5/11 16 6/5 6 63% [5] J. Abraham, S. Churiwala, “Flexible Model for Partitioning & Floor Planning 11 3/9 12 5/0 8 33% Layout & Chip Finishing 21 4/15 19 6/2 12 37% Delay and Power”, Silicon Integration Initiative, 1998. Figure 7. Library View Requirement Reduction [6] T. Tessier, C. Buhlman, “Timing Closure of a 870Kgate + 3 Mbit Ram, 0.2u-12mm Die in a 8. Conclusion 1312 Pin Package IC”, SNUG 2001. The capability of providing consistent and accurate [7] T. Ehrler, “Multiple Design Flows: Reducing timing information at all levels of the design process, Support Requirements with OLA”, Custom from pre-route through post-route, can dramatically Integrated Circuits Conference 2001, ALF/OLA reduce, if not eliminate, iterations within timing Panel Discussion, 6-9 May 2001. closure stages, converging on a design solution which meets performance objectives much faster and more easily than with traditional approaches. -7-