May 1, 2013 1A breakthrough in logic design dras3cally improving performances from 65/55nm and below Ilan Sever Library group CTO And Israeli Subsidiary Manager DOLPHIN INTEGRATION
May 1, 2013 2• Incorporated as French SA in 1985 • on Alternext of NYSE in 2007 • as the Provider of Design Products for mixed signal SoCs • now ac3ve from 180 nm to 28 nm – with 135 Design Engineers – plus Field Applica3on Engineers and SoC Integra3on Engineers expert at Hardware Modeling to provide SoCs with the best subsystems: • High Resolu3on Audio (Converters and Audio Signal Processing) • High Resolu3on Measurement (Converters for Power Metering, Mems, etc.) • Low-‐power Storage (Register Banks and Memories) • Low-‐power Microcontrol Logic (80x51 Legacy, eFlash Caches, Coprocessors...) – and innova3ve libraries of Standard Cells and Memory Registers – with Power Regula3on, Reference, Clock & Detector Networking – where the major diﬀeren3ator is the Flexibility of IP conﬁgura3ons (FLIP) Corporate ID
May 1, 2013 3• Incorporated in October 2009 as Dolphin Integra3on Ltd • With the charter to develop innova3ve small-‐capacity memory architectures • 7 Employees, All engineers • Developed three product families • in technologies ranging from 0.13u down to 55nm : Innova3ve high-‐density 1PRFile “AURA” up to 25% smaller than compe3tor’s solu3on with half the dynamic power – Licensed by TSMC, by Leading IDM’s and Fablesses Innova3ve high-‐density DPRFILE “ERIS” up to 35% smaller than compe3tor’s solu3on while providing two full Read+Write ports (as opposed to 1R1W 2-‐Port registers) Patent-‐Pending “CARME” mul3-‐port register allowing seamless replacement of Flip-‐Flop and extreme high-‐speed asynchronous access for accelera3on of digital blocks Dolphin in Israel
May 1, 2013 4Market trends : Boom of average SoCs clock speed • Consumer electronics and mobile devices drive the need for higher SoC performances – High performance required for embedded processor – High density and low power required for rest of SoC • Targeted applica3ons – Smartphone – Mul3media – Gaming – Compu3ng – … Source: Kurzweil
May 1, 2013 5Design techniques for improving performance of cri3cal paths on logic blocks • Logic designers can leverage 4 solu3ons to improve performances of logic blocks while maintaining the best density/power trade-‐oﬀ 5 Design techniques Drawbacks ImpactsMulti process support Use LP process for power critical circuits Use G process for speed critical circuitsThe high leakage of G processLeakage lossMulti Vt support in standard cells Use LVt cells with improved performance in critical pathsAn additional LVT layer is needed.The high leakage of LVT cellsLeakage lossMulti tracks support in standard cells Use 7/8 Track libraries for density optimized blocksand 10/12/14 Track libraries for speed critical blocksMost libraries are not path mixable,so optimization is limited to thewhole logic block level: cells areoversized for all non-critical pathsof the block.Area lossCARME bit-‐cell based register packs Use CARME for speed cri3cal registers - -
May 1, 2013 6Barriers and challenges in op3mizing the register ﬁles within a logic design • Flexibility of conﬁgura3ons : #words, #bits (unlike custom solu3ons) • Reset opera3on (does not exist in SRAM-‐Based macros) • Scan/DFT (SRAM-‐Based macros do not support scan and require BIST) • Write & Read access protocol and speed • Mul3 ports • Usage within a standard logic ﬂow • Automa3c P&R inside and area of standard logic rows • Dynamic consump3on and IR-‐Drop during read/write ac3vity • Support for power-‐down & reten3on modes • Area – always a key factor 6
May 1, 2013 77 Property/ChallengeSynthesizable FF-BasedregisterSynthesizable Latch-Based registerSRAM-Based register CARMERegister PackReset Yes Yes No YesScan / DFT Yes No (Need BIST) No (Need BIST) YesWrite access Synchronous Synchronous Synchronous SynchronousOptional asynchronouswrite-throughRead access Asynchronous Asynchronous Synchronous AsynchronousMulti Port Yes Yes No YesPlacement andRoutabilityStandard P&R Standard P&R Hard macro placementoutside logic rowsHard macro compatiblewith placement in logicrowsCell Compa3ble register packs CARME key features
May 1, 2013 8Cell Compa3ble register packs CARME key features • Brand new kind of bit-‐cell based generator which can be used as an alterna3ve to standard cell based implementa3on for storage elements such as registers – CARME instances are ac3ng exactly as synthesized registers thus ensuring a seamless replacement • CARME is the ideal solu3on for those who want to improve speed but also even further the logic density and dynamic power • Tradi3onal registers once placed are unstructured and widespread lunless hierarchically Placed & Routed. Opposite to this approach, CARME registers are structured as “packs” to facilitate RTL engineering but s3ll enjoy the ﬂexibility of a generator 8
May 1, 2013 9Cell Compa3ble register packs CARME performances @ 65 nm LP • Benchmark results aoer Synthesis (with scan inser3on) on Motu Uta V5 9 Process: TSMC 65 nm LP Standard cell library performances are for SVt PVT used for timings: SS; 1.08 V; 125°C Accuracy of results for CARME Speed +/-10% Area +/-5%18% gain in density22% gain in speed
May 1, 2013 10CARME Vs. Alterna3ves • Benchmark: Implementa3on of a 16x16 Look-‐Up-‐Table (TSMC 65nm LP) Property/ChallengeSynthesizableFF-BasedSynthesizableLatch-BasedSRAM-Based CARMERegister PackArea (65LP) 3973 um² 1965 um² 1530 um² 2704 um²Speed (accesstime, typical)0.39 ns 0.6-0.8 ns 0.8-1.0 ns 0.22 nsPower @1GHz 2.03 mW 0.80 mW 1.43 mW 0.89 mWReset Yes Yes No YesWrite access Synchronous Synchronous Synchronous SynchronousRead access Asynch. Synchronous Synchronous Asynch.Multi Port Yes Yes No Yes
May 1, 2013 11• READ is Asynchronous • Can support up to 4 independent read ports. read_addrdata_out delay: read_addr=>data_out delay: read_addr=>data_outFast Asynchronous Read
May 1, 2013 12CARME compiler highlights • Architecture – Based on patentable bit-‐cell – Op3mized for easy & risk-‐free integra3on within standard-‐cell rows • Flexibility – 2 to 128 words – 4 to 144 bit wide – Up to 4 independent read-‐ports • Features & Beneﬁts – Very fast asynchronous read opera3on – Synchronous write with op3onal fast write-‐through – 1 write port, mul3ple read ports – Reset func3on – Reten3on Mode – Byte/Bit-‐Write control CARME register pack 16X16 TSMC 65LP Access 3me 220ps
May 1, 2013 13CARME compiler highlights • Proprietary Bitcell Features : • Scannable • Reserable • High-‐speed write • Support for mul3ple high-‐speed read ports • Area eﬃcient – ½ of normal D-‐FF • Low power – ½ of normal D-‐FF • Reten3on-‐Ready -‐ Replace reten3on-‐FF • Non-‐Pushed-‐Rules : Easily retarget-‐able
May 1, 2013 14All outputs are routed to Distribu3on Plane Up to 16 Bitcells in a pack Output Mux Address Bus Data Bus Basic Architecture
May 1, 2013 15Add read ports in a modular way without complexity or performance degrada3on OutMux Port A Addr Bus A DataOut Port A Mul3ple read ports Addr Bus B OutMux Port B DataOut Port B
May 1, 2013 17CARME compiler highlights • Flexible number of read ports 1 Port 2 Ports 4 Ports
May 1, 2013 18CARME compiler highlights • Fits inside logic rows – zero overhead for spacers, power rings, wrappers • Custom layout ﬁts number of horizontal & ver3cal tracks • IR-‐Drop-‐aware placement : Shared among rows • Just like a big standard – cell !
May 1, 2013 19CARME compiler highlights • Rou3ng-‐Aware structure • Feed-‐Through over the cell
May 1, 2013 20CARME performances @ 65 nm LP 20 • Register performances Block Name Register size ConﬁguraBon Speed write operaBon (ps) Speed read operaBon (ps) Dynamic power (uA/MHZ) MCU OR1200 32x32 2R1W 497 611 1.3 ALU CHRONOS 16x32 3R1W 406 490 0.52 USB 32x8 1R1W 358 442 0.28 UART 16x8 1R1W 332 420 0.19 Spi 4x8 1R1W 315 240 0.15 PVT used for timings: SS; 1.08 V; 125°C PVT user for dynamic power consumption: TT; 1.2 V; 25°C
May 1, 2013 21CARME performances @ 65 nm LP 21 Post P&R results on Motu-‐Uta using a High-‐Density 7-‐Track Spinner (Pulsed-‐latch) library : • W/O CARME : 114000 um2 at 195 MHz • Using CARME : 97600 um2 at 196 MHz (-‐15% area, same speed) • Using CARME : 103000 um2 at 233 MHz (-‐10% area, +20% speed)
May 1, 2013 22uHD-‐BTF Standard Cells CARME Compiler Cell Compa3ble register packs CARME integra3on ﬂow MHzPatent pendingReduced cell stemlibrary based onpulsed latch for ultrahigh densityPatent pendingbit-cell based generatorof register packsLogiWare modelsLibrary of verilog models ofregistersScriptsAutomatic detection ofregisters in a RTL design andtheir swift replacement by amodel enabling bothsynthesis and instantiation ofregisters
May 1, 2013 23Cell Compa3ble register packs CARME integra3on ﬂow Memory compilers Memory instances list LOGIWARE Library Memory compilers User’s original RTL DETECTION script Standard implementation flowCARME implementation flow: automated stepsDolphin’s silicon IPs offeringSELECTION script Hard macros instan3ated RTL Netlist with hard macros Synthesis Updated Memory instances list …1 2 5 3 4 6 7 8 Selection script allows replacement percriteria defined by the user: Above certain # of bits (IE >500 bits) Above a defined speed/area/leakage gain Always replace inside a specified block Do not touch a specified block Etc..
May 1, 2013 24Summary • CARME is an innova3ve patent-‐pending breakthrough in logic design combining the ﬂexibility and testability of synthesizable registers together with the high density of memory generators and the high speed and low power of custom data-‐paths. • Dolphin integra3on is con3nuously challenging the tradi3onal library market with the introduc3on of patented ground-‐breaking innova3ons allowing SoC architects and backend-‐engineers to maximize their silicon performance/cost.
May 1, 2013 25THANK YOU !Ilan Sever email@example.com Sales : firstname.lastname@example.org www.dolphin-‐integra3on.com