SNUG 2009 paper


Published on

Air presentation at SNUG Europe 2009

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SNUG 2009 paper

  1. 1. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER Implementation Methodology for Dual-Mode GPS Receiver David Tester1; J Young, C Atkinson, T Ryan2 GPS functionality has been synonymous with in-car navigation but required information is either available or frozen. The has recently emerged as a must-have feature in recent cellphones resulting inherent conflict between schedule and execution such as the Apple iPhone, Blackberry Pearl and Nokia N95. These steers key back-end decisions. are examples of embedded positioning, capable of enabling features and functionality in a wide range of additional portable Challenges associated with back-end implementation of electronics such as digital cameras, watches and media players. complex system-level products, in the context of a large well Air has developed a GPS receiver optimized for the requirements of resourced organization, are well documented. This paper embedded positioning. Capable of supporting today’s “killer” outlines the start-up’s perspective on the same problem, but in navigation application, the Air architecture is optimized for the the context of taking a product from concept to market in the more demanding requirements of embedded GPS and critically, for minimum time with minimum investment and resources. the first time, offers the capability of 24/7 continuous location The conflict between exhaustive, conclusive analysis and awareness for mobile, battery powered, consumer devices. getting a product to market is not for everyone! Many of the This paper outlines design flow and implementation decisions critical implementation decisions can only be made based on utilized in the successful development of Air’s first generation previous experiences and instinct. airwave1 product, optimized for the first non-cellular embedded GPS application – geotagging – in the digital camera market. Disruptive products are created through new innovative Conventional wisdom dictates that complex system-level products architecture decisions which exploit system optimizations not demand both a “bleeding edge” process node (45nm, etc) and available to competitors. Products are not “made” through $25M+ of venture capital investment to bring initial silicon to careful implementation of circuit level functionality but the market. The product described in this paper was implemented in same product opportunity can be “lost” through inappropriate 130nm CMOS technology and taken from concept to engineering implementation of that same circuit level functionality. sample silicon within 3 years with significantly less than $25M by This paper will not present any architectural details for Air’s the combination of an experienced team and robust methodology. first generation dual-mode low power GPS receiver. Instead Successful realization of Air’s target power budget for airwave1 the focus is how the unique architecture was successfully demanded a wide range of low power techniques spanning system and architecture level through RTL, gate and transistor levels. mapped to silicon with specific focus on back-end IC design. Air partnered with Synopsys for both digital EDA tools and also physical IC design services. The resulting product enables GPS II. PRODUCT OVERVIEW AND ARCHITECTURE functionality for less than the standby current of a cellphone. airwave1 is a single die consumer GPS solution containing optimized GPS signal processing and integrated radio with I. INTRODUCTION embedded processor, memory and support peripherals with Development of any low-power product demands additional on-chip support analog functions. optimization from system to transistor level. This paper Implemented in a 130nm CMOS process the IC requires a GPS outlines the recent experiences at Air in the development of a antenna, SAW filter, crystal and passive support components. 130nm structured custom 41.6M transistor GPS receiver IC, airwave1 provides independent GPS searching and tracking with specific focus on the back-end physical silicon design. capabilities. Multiple instances of two implementations of the Air is a pre-revenue venture capital funded fabless proprietary satellite tracking DSP are provided comprising semiconductor company developing a family of embedded 125K and 100K gates along with the 1.1M gate searching DSP. The embedded 32b microprocessor requires 47K gates with GPS receivers optimized for 24/7 operation in mobile devices. various support digital blocks utilizing an additional 61K Start-up’s must identify new, emerging markets but also gates. The integrated GPS radio and general purpose support deliver disruptive products before competitors. As a result, analog are implemented as two independent analog macro’s. anything and everything that can be performed in parallel Standard cell logic and the device pad ring are implemented must be done in parallel to support this target. This demands with libraries from TSMC and ARM Physical IP whilst the back-end physical IC design starts before front-end activities power management functionality uses a mix of cells from both have completed. Many of the decisions required for back-end Air and ARM Physical IP. Memory macros were licensed from work need to be taken (at risk) before all or even most of the both Dolphin and ARM Physical IP. 1 David Tester ( is co-founder and CTO of Air 2 Jon Young, Chris Atkinson and Tom Ryan are with Synopsys
  2. 2. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER III. DEVELOPMENT FLOW AND EDA TOOLS Static power is a function of the total number of logic gates airwave1 was constructed as a “structured custom” device powered by the digital supply rail. If further reduction in total within a conventional digital IC development flow that gate count is not an option (or is already minimized) then an includes additional verification stages to ensure the product additional option is to break functionality into blocks and then power budget was not violated, as discussed later in the paper. remove power from individual blocks – power islands. Digital functionality is implemented with a combination of Replacement of a single digital supply voltage by multiple VHDL and Verilog, synthesized with Design Compiler and switched power domains can eliminate leakage current from RTL to gate level netlist (and later pre-P&R and post-P&R) major functional blocks when those parts of a system are idle. verification performed with Formality. Static timing analysis airwave1 comprises 44 independent digital supply domains. was performed with PrimeTime. Device layout was with ICC. Reducing total gate count in a design reduces total transistor Clock gating functionality was inserted with PowerCompiler. gate area but often at the cost of increased development time. P&R blocks used macros developed by Air and characterized For a start-up, additional optimization to refine, rather than by Synopsys using Liberty NCX and NanoTime prior to use in create, functionality can conflict with schedule requirements. a traditional DesignCompiler based logic synthesis flow. Embedded memory within an IC presents exactly the same RTL design and verification was performed by Air. Custom static power consumption issues. Rather than optimize gate digital macro characterization was performed by Synopsys. count the challenge becomes optimization of memory size. Logic synthesis and formal verification was performed by Air. Leakage current is also temperature dependant. Obtaining an Digital and analog macro layout was performed by Air. Block acceptable leakage current at 25°C is often not a challenge. layout and device layout was performed by Synopsys. Reaching that acceptable leakage current at 85°C is complex. Pre-P&R static timing was performed by Air with post-P&R static timing being performed both by Air and Synopsys using V. UNDERSTANDING DYNAMIC POWER CONSUMPTION post layout parasitic extracted by Star-RCXT. Battery powered products, such as GPS receivers, must The radio and support analog are developed in a traditional optimize the power consumed during normal operation. design flow by an internal team and delivered into the ICC 1. What options to minimize dynamic power are available? flow as GDS with a CDL netlist. Prior to integration these 2. What is the minimum data processing rate required? macros were verified as DRC and LVS clean both in tools from the analog design flow and with Hercules. 3. What clock frequency does the design operate with? 4. Are all clock edges required for processing? Final LVS and DRC verification for the complete device was performed by Synopsys with Hercules. 5. Are “spare” clock cycles available in the system timing? 6. Do all subsystems operate at the same frequency? IV. LEAKAGE AND POWER MANAGEMENT ISSUES 7. Does all logic within a block operate at the same rate? 8. Can further clock edges be removed with clock gating? Dynamic power consumption of a conventional GPS receiver is 9. Can clock gating be added at the RTL level? Capturing impacted by the balance between functionality implemented as clock gating at this level ensures that maximum hardware and that implemented as software. knowledge about the processing rates is captured - with Static power consumption increases with logic complexity and potential schedule cost. memory requirements. Each transistor potentially contributes Additional optimization can be realized with tools such as to the total static (or leakage) power consumption for a device. Synopsys PowerCompiler to automatically identify flip-flops What are the options to reduce static power consumption? that could be gated either because the clock edge is not Leakage from each transistor is defined by the bias conditions required or because the data does not change. for that component. Within a custom IC flow this offers the airwave1 development utilized a mixture of both techniques. opportunity of local power down transistors to force bias Investigation of power dissipated by typical flip-flop designs conditions on circuits that are not required. Additionally the shows 10% to 20% of power consumed can relate directly to size of transistors can be optimized. Minor variations in switching activity on flip-flop clock pins and internal buffers. transistor sizes can often provide a significant reduction in leakage. Finally, the supply voltage to individual circuits can Efficiency of clock gating depends on where the gating cells be removed when specific functionality is not required. are placed within the buffer chain used to build the clock tree. In the context of semi-custom IC design these techniques are Does a design really need both the Q and QN output pins for not directly available. The opportunity to change transistor each latch or flip-flop? Provision of both pins increases both sizing for an existing standard cell library would violate the the area and power consumption of each cell. library license agreement and would demand characterization Deep cones of logic between flip-flops can risk non-minimum of the new library. Skills and design tools required to perform switching activity when driving registers change state. Such these activities are often not available. What options remain? logic can significantly increase total power consumption for Custom digital cells were developed for airwave1 but not to adders and multipliers, for example. directly address dynamic (or static) power consumption issues.
  3. 3. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER Reduction of digital power supply voltage, either on a global Analysis of post-layout performance suggests that high drive or block-by-block basis can reduce the dynamic power strength logic gates don’t offer an optimum tradeoff for power consumed by core logic although the overall efficiency when routing is limited and parasitic capacitance minimized. improvement depends on the voltage regulation architecture. Circuit level simulation for key logic paths within airwave1 confirms this gate drive strength vs routing parasitic tradeoff. VI. PREDICTING DYNAMIC POWER CONSUMPTION Synthesis strategies for the various DSP blocks in airwave1 are Predicting digital power consumption in the traditional very different. Within the high data processing rate satellite synthesis based semi-custom flow is a challenge. Power is not searching DSP timing closure is complex, demanding gate modeled in the typical RTL based logic simulation flow. delay after post-layout parasitic capacitance less than 200ps. Switching activity, gate drive strength information and P&R Only X1, X2 and X4 drive strength cells were permitted both in parasitic capacitance is not modeled until the final stages of the initial logic synthesis and in post-placement optimization. typical design flow. Influence of these factors on decisions In contrast, the low data processing rate satellite tracking DSP made during the architecture stage of implementation remains requires a clear minimum drive strength and minimum logic unknown until late (often too late) in the development process. area rather than a timing driven synthesis strategy. The “power aware” design flow used is shown in Figure 1. Circuit level simulations of the gate level netlist after logic synthesis but prior to layout was essential to confirm power consumption for each functional block remained within the overall power budget for the complete system. VIII. CLOCK TREE ESTIMATION AND IMPLEMENTATION Clock trees within design directly impact power consumption. Typical clock trees constructed with automatic CTS tools will provide functional, but over designed, results. Target requirements for clock skew and transition times affect power consumption of the clock tree built by CTS. Whilst P&R tools will typically remove all logic within a pre-layout clock tree realistic targets for clock skew and transition times are essential in initial synthesis to create logic suitable for minimal Figure 1 – Power Aware Design Flow optimization after construction of the clock trees. Whilst the device was constructed with a hierarchical block- Logic complexity and switching activity is minimized through by-block approach the total number of clock domains within careful system design and modeling (prior to RTL coding) and the design exceeds 44 major clocks and 400 in total. Each efficiency of the resulting implementation is peer reviewed domain contains multiple levels of clock gating, both manually throughout the development process. inserted in RTL and inserted automatically with PowerCompiler. Rather than perform circuit level simulations after the P&R process is completed to discover the power consumption the Multiple iterations of P&R are mandatory to tune the synthesis post-layout parasitic capacitance is bounded at the start of the model of clock uncertainty and transitions if clock trees that design process with a P&R constraint, enabling circuit level don’t contain strings of X20 buffers are to be avoided. simulation of key blocks long before P&R has taken place. Estimation of clock tree power consumption remains a manual Circuit level simulation of key digital blocks was performed activity. Circuit level simulation of clock tree performance was pre-layout with estimated (and bounded) routing parasitics essential to confirm block power budgets post-layout were and post-layout with extracted parasitics in a Cadence analog met. flow with Spectre and UltraSim circuit simulators. Evaluation of the resulting silicon shows actual power IX. ROUTING CONGESTION AND POST-LAYOUT CAPACITANCE consumption for the most power critical digital blocks on airwave1 is implemented in CMOS 130nm 6LM UTM process. airwave1 are within 10% of simulations. Impact on switching performance and power consumption of post-layout parasitic capacitance can be significant and reliable VII. LOGIC SYNTHESIS, TIMING CONSTRAINTS AND LAYOUT prediction of digital power consumption demands control of airwave1 includes multiple independent signal processing post-layout routing capacitance. blocks for satellite detection and tracking operating at various More metal layers generally give better utilization after P&R clock rates of 96MHz, 64MHz, 32MHz and 16MHz. The but what are the implications of “ultra thick” top layer metal? maximum clock rate blocks contain logic with hundreds of Minimum metal pitch and spacing rules for UTM metal have paths containing over 92 levels of logic between flip-flops. the effect of making upper metal ineffective for detailed signal routing and suitable only for power supply distribution. It is
  4. 4. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER true that IR drop in DVDD and DVSS lines is significantly performance of these local memory functions compared to a improved but the standard cell utilization degrades as a result. traditional array built with multiple flip-flops with synthesis. Knowledge of relative DFF placement, driver and load allows X. POWER DOMAINS & POWER MANAGEMENT KIT the performance of each flip-flop to be optimized with power As previously described in section IV airwave1 contains 44 for the specific use-case as the target constraint. In these rare independent digital power domains for fine control of leakage circumstances gates with sub-optimal propagation delays and current during operation of the GPS receiver. This is transition times can offer optimized power consumption. illustrated in Figure 2 below. The resulting macrocell offered 40% power improvement with an additional 25% area optimization compared to synthesis. Example switching performance is shown in Figure 3. Synopsys provided cell characterization for the macro using Liberty NCX for the cell characterization and then NanoTime to generate the performance data for the cell array which was subsequently included in the standard logic synthesis, static timing and P&R flow. Figure 2 – Voltage Domains in airwave1 All communication between blocks utilizes conventional voltage clamp cells from a vendor power management kit. All cells were manually inserted into the design at RTL level with corresponding synthesis don’t touch constraints in the flow. Figure 3 – Switching Performance of Power Optimized Flip-Flop Power domain control cells were automatically inserted by ICC using its built in capabilities from manually generated XII. IMPACT OF POST-PLACEMENT LOGIC OPTIMIZATION TCL commands with gate level netlist verification of the During the digital layout process there are various points resulting design. As part of this process the number and size where ICC can re-optimize logic to fix timing and design rule of the header cells needed by each voltage region had to be violations. Each optimization step offers the opportunity to calculated and the impact on the chip die and floorplan transition a block from meeting to violating its power budget! understood. Typically logic synthesis for low power exploits carefully The library cell used to create the switched voltage domains coded RTL with specific synthesis constraints to ensure exactly was internally developed at Air and exported into ICC. the desired logic is obtained. Constraints applied to ICC for There was no requirement for state retention flip-flops due to optimization steps must match those used for initial synthesis. system level optimizations. Continuous power is required for on-chip RAM with impact on memory macro leakage current.3 XIII. PHYSICAL DESIGN ISSUES (IN A START-UP) For any pre-revenue company, time-to-market is critical. In XI. CUSTOM LOGIC CELLS OPTIMIZED FOR POWER the race to bring a new product to market there is an inherent The high performance satellite searching DSP contains 105K conflict between activities essential to create the optimum flip-flops as part of the local datapath to provide a total of 264 device floorplan and a time optimized development schedule. discrete memory blocks. The ability of P&R tools to constrain As start-up developing a complex system-level product against the placement of 105,000 elements and build a structured array an aggressive time to market goal, this is where the fun begins presented a risk for the physical design phase of airwave1. and conflict between front-end and back-end design appears. Development of a custom macrocell not only eliminates the cell No top level netlist? No problem! Critical decisions on device placement risk but also offers an opportunity to optimize the floorplan, pad-ring, global signal routing and package need to be made before the major functional block design is complete, 3
  5. 5. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER before block level layout is complete and long before the final XIV. DEVICE PACKAGING OPTIONS AND PAD RING DESIGN (or even the preliminary) top level netlist is available. Whilst the internal evaluation package requires a 304 ball BGA for the 244 pad airwave1 engineering sample silicon, the device is offered to customers in a 68 lead QFN package. Careful design of the pad ring was essential to ensure the 176 evaluation only I/O’s could be appropriately bonded in the customer QFN bond option. An example module containing the engineering sample silicon with full GPS reference design is shown in Figure 5. Figure 6 is the first photo geotagged with airwave1 silicon. XV. DESIGN TOOL FLOW Digital design followed an conventional logic synthesis based flow using the Synopsys tools DesignCompiler, PrimeTime, PowerCompiler, Formality, ICC and Hercules. Analog design followed a conventional Cadence flow using Composer, Artist, Spectre, SpectreRF, Virtuoso and Assura. Licensed IP from all vendors and macros created by Air were all subject to QA verification for LVS and DRC with Hercules. Figure 4 – airwave1 Floorplan Final full-chip LVS and DRC was performed in Hercules prior Can the analog macros be delivered days before tape-out?... to release by Air of final GDSII to the foundry for manufacture. Hierarchical layout of a complex IC trades die area for risk. Whilst flat layout of a complex IC allows EDA tools to make unexpected placement and routing optimizations, predictable execution for the back-end phase of development is only possible with hierarchical floorplan and implementation. Although without a fully automated method for implementing the power down regions the time to re-spin a floorplan Figure 5 – GPS module with engineering sample airwave1 silicon (including header cells, power routing and voltage aware well TIE and filler cell placement) can be longer than expected, in this project extensive use of TCL scripting was used to minimize the impact of changes and automate the process. Throughout the digital design flow all block interfaces were maintained with minimal connectivity issues for optimized chip-level routing and timing, knowing that a block-by-block hierarchical approach to device layout would be essential and would enable physical IC design to start (at the block level) before the functional RTL development phase had completed. Global routing for a complex system-level product in a six layer metal process where the top two layers of metal are more suited to power supply routing than signal routing is a major issue for a hierarchical based layout flow and demands careful floorplanning in advance. The “signal” processed by a GPS radio is, quite literally, noise. Figure 6 – The first photograph geotagged with airwave1 silicon Digital circuits are very good at generating not just wideband noise but also noise at very specific design related frequencies. Careful floorplanning is required to ensure all blocks capable XVI. THE FUTURE of generating noise that would degrade the radio performance The silicon described in this paper is the engineering sample are suitably located on the die. Circuits capable of generating release of the airwave1 product. Having demonstrated silicon noise and circuits sensitive to noise must be shielded with that provided right-first-functionality the development team is guard rings. Coupling capacitance between global route active on development of the production version of airwave1. signals must be minimized.
  6. 6. IMPLEMENTATION METHODOLOGY FOR DUAL-MODE GPS RECEIVER XVII. CONCLUSION Air successfully completed development of the 130nm CMOS single-die GPS receiver on-time and on-budget in conjunction with Synopsys Professional Services physical design group. The resulting right-first-time complex mixed signal silicon has been sampled to lead digital camera customers in Japan.4 ACKNOWLEDGMENT Development of any complex semiconductor product is a group activity involving (and often demanding) system, silicon and software optimizations and tradeoffs. The receiver described in this paper forms part of a system-level GPS semiconductor product developed by the R&D development team at Air Semiconductor. The authors wrote this paper but the product results from the combined contributions of all team members. David Tester is the CTO and leads the architecture and product development activities for embedded GPS products at Air, having raised series-A venture capital funding and co-founded the company in 2006. Prior to co-founding Air, he spent 15 years in various semiconductor development and management positions based both in the UK and US with Dialog Semiconductor, LSI Logic, Conexant, Symbionics and GEC Research. He was listed in GPS World’s “50 Leaders to Watch” during 2008. Air was awarded the Red Herring Europe 100 along with both the Electra and IET start-up of the year awards in 2008. His high volume, standard product, consumer IC background spans both analog and digital silicon development – ranging from system level to transistor level design. He has participated in the development of over 20 high volume consumer semiconductor products for the navigation, wireless voice, wireless data, digital TV and PC graphics markets. Mr Tester is a senior member of IEEE, ION and IET; He is registered as a Chartered Engineer with both ECUK and FEANI. He holds nine US patents. Tom Ryan (left), Jon Young (centre) and Chris Atkinson (right) work for Synopsys Global Technical Support in Reading, UK. Global Technical Support enables customer adoption and deployment of Synopsys’ technology and flows to improve their design productivity and tape out predictability. 4