  • 1. May  1,  2013 1A  breakthrough  in  logic  design  dras3cally  improving  performances  from  65/55nm  and  below  Ilan  Sever  Library  group  CTO  And  Israeli  Subsidiary  Manager  DOLPHIN  INTEGRATION  
  • 2. May  1,  2013 2•  Incorporated  as  French  SA  in  1985  •  on  Alternext  of  NYSE  in  2007  •  as  the  Provider  of  Design  Products  for  mixed  signal  SoCs  •  now  ac3ve  from  180  nm  to  28  nm    –  with  135  Design  Engineers    –  plus  Field  Applica3on  Engineers  and  SoC  Integra3on  Engineers  expert  at  Hardware  Modeling  to  provide  SoCs  with  the  best  subsystems:  •  High  Resolu3on  Audio  (Converters  and  Audio  Signal  Processing)  •  High  Resolu3on  Measurement  (Converters  for  Power  Metering,  Mems,  etc.)  •  Low-­‐power  Storage  (Register  Banks  and  Memories)  •  Low-­‐power  Microcontrol  Logic  (80x51  Legacy,  eFlash  Caches,  Coprocessors...)  –  and  innova3ve  libraries  of  Standard  Cells  and  Memory  Registers    –  with  Power  Regula3on,  Reference,  Clock  &  Detector  Networking  –  where  the  major  differen3ator  is  the  Flexibility  of  IP  configura3ons  (FLIP)  Corporate  ID  
  • 3. May  1,  2013 3•  Incorporated  in  October  2009  as  Dolphin  Integra3on  Ltd  •  With  the  charter  to  develop  innova3ve  small-­‐capacity  memory  architectures  •  7  Employees,    All  engineers    •  Developed  three  product  families    •  in  technologies  ranging  from  0.13u  down  to  55nm  :    Innova3ve  high-­‐density  1PRFile  “AURA”  up  to  25%  smaller  than  compe3tor’s  solu3on  with  half  the  dynamic  power  –  Licensed  by  TSMC,  by  Leading  IDM’s  and  Fablesses    Innova3ve  high-­‐density  DPRFILE  “ERIS”  up  to  35%  smaller  than  compe3tor’s  solu3on  while  providing  two  full  Read+Write  ports  (as  opposed  to  1R1W  2-­‐Port  registers)    Patent-­‐Pending  “CARME”  mul3-­‐port  register  allowing  seamless  replacement  of  Flip-­‐Flop  and  extreme  high-­‐speed  asynchronous  access  for  accelera3on  of  digital  blocks  Dolphin  in  Israel  
  • 4. May  1,  2013 4Market  trends  :    Boom  of  average  SoCs  clock  speed  •  Consumer  electronics  and  mobile  devices  drive  the  need  for  higher  SoC  performances  –  High  performance  required  for  embedded  processor  –  High  density  and  low  power  required  for  rest  of  SoC    •  Targeted  applica3ons  –  Smartphone  –  Mul3media  –  Gaming  –  Compu3ng  –  …  Source: Kurzweil
  • 5. May  1,  2013 5Design  techniques  for  improving  performance  of  cri3cal  paths  on  logic  blocks  •  Logic  designers  can  leverage  4  solu3ons  to  improve  performances  of  logic  blocks  while  maintaining  the  best  density/power  trade-­‐off  5  Design techniques Drawbacks ImpactsMulti process support Use LP process for power critical circuits Use G process for speed critical circuitsThe high leakage of G processLeakage lossMulti Vt support in standard cells Use LVt cells with improved performance in critical pathsAn additional LVT layer is needed.The high leakage of LVT cellsLeakage lossMulti tracks support in standard cells Use 7/8 Track libraries for density optimized blocksand 10/12/14 Track libraries for speed critical blocksMost libraries are not path mixable,so optimization is limited to thewhole logic block level: cells areoversized for all non-critical pathsof the block.Area lossCARME  bit-­‐cell  based  register  packs   Use  CARME  for  speed  cri3cal  registers   - -
  • 6. May  1,  2013 6Barriers  and  challenges  in  op3mizing  the  register  files  within  a  logic  design  •  Flexibility  of  configura3ons  :  #words,  #bits  (unlike  custom  solu3ons)  •  Reset  opera3on  (does  not  exist  in  SRAM-­‐Based  macros)  •  Scan/DFT  (SRAM-­‐Based  macros  do  not  support  scan  and  require  BIST)  •  Write  &  Read  access  protocol  and  speed  •  Mul3  ports  •  Usage  within  a  standard  logic  flow  •  Automa3c  P&R  inside  and  area  of  standard  logic  rows  •  Dynamic  consump3on  and  IR-­‐Drop  during  read/write  ac3vity  •  Support  for  power-­‐down  &  reten3on  modes  •  Area  –  always  a  key  factor  6  
  • 7. May  1,  2013 77  Property/ChallengeSynthesizable FF-BasedregisterSynthesizable Latch-Based registerSRAM-Based register CARMERegister PackReset Yes Yes No YesScan / DFT Yes No (Need BIST) No (Need BIST) YesWrite access Synchronous Synchronous Synchronous SynchronousOptional asynchronouswrite-throughRead access Asynchronous Asynchronous Synchronous AsynchronousMulti Port Yes Yes No YesPlacement andRoutabilityStandard P&R Standard P&R Hard macro placementoutside logic rowsHard macro compatiblewith placement in logicrowsCell  Compa3ble  register  packs  CARME    key  features  
  • 8. May  1,  2013 8Cell  Compa3ble  register  packs  CARME    key  features  •  Brand  new  kind  of  bit-­‐cell  based  generator  which  can  be  used  as  an  alterna3ve  to  standard  cell  based  implementa3on  for  storage  elements  such  as  registers  –  CARME  instances  are  ac3ng  exactly  as  synthesized  registers  thus  ensuring  a  seamless  replacement  •  CARME  is  the  ideal  solu3on  for  those  who  want  to  improve  speed  but  also  even  further  the  logic  density  and  dynamic  power  •  Tradi3onal  registers  once  placed  are  unstructured  and  widespread  lunless  hierarchically  Placed  &  Routed.  Opposite  to  this  approach,  CARME  registers  are  structured  as  “packs”  to  facilitate  RTL  engineering  but  s3ll  enjoy  the  flexibility  of  a  generator  8  
  • 9. May  1,  2013 9Cell  Compa3ble  register  packs  CARME  performances  @  65  nm  LP  •  Benchmark  results  aoer  Synthesis  (with  scan  inser3on)  on  Motu  Uta  V5  9    Process: TSMC 65 nm LP  Standard cell library performances are for SVt  PVT used for timings: SS; 1.08 V; 125°C  Accuracy of results for CARME  Speed +/-10%  Area +/-5%18% gain in density22% gain in speed
  • 10. May  1,  2013 10CARME  Vs.  Alterna3ves  •  Benchmark:  Implementa3on  of  a  16x16  Look-­‐Up-­‐Table  (TSMC  65nm  LP)  Property/ChallengeSynthesizableFF-BasedSynthesizableLatch-BasedSRAM-Based CARMERegister PackArea (65LP) 3973 um² 1965 um² 1530 um² 2704 um²Speed (accesstime, typical)0.39 ns 0.6-0.8 ns 0.8-1.0 ns 0.22 nsPower @1GHz 2.03 mW 0.80 mW 1.43 mW 0.89 mWReset Yes Yes No YesWrite access Synchronous Synchronous Synchronous SynchronousRead access Asynch. Synchronous Synchronous Asynch.Multi Port Yes Yes No Yes
  • 11. May  1,  2013 11•  READ  is  Asynchronous  •  Can  support  up  to  4  independent  read  ports.  read_addrdata_out          delay:  read_addr=>data_out  delay:  read_addr=>data_outFast  Asynchronous  Read  
  • 12. May  1,  2013 12CARME  compiler  highlights  •  Architecture  –  Based  on  patentable  bit-­‐cell  –  Op3mized  for  easy  &  risk-­‐free  integra3on  within  standard-­‐cell  rows  •  Flexibility  –  2  to  128  words  –  4  to  144  bit  wide  –  Up  to  4  independent  read-­‐ports  •  Features  &  Benefits  –  Very  fast  asynchronous  read  opera3on  –  Synchronous  write  with  op3onal  fast  write-­‐through  –  1  write  port,  mul3ple  read  ports  –  Reset  func3on  –  Reten3on  Mode  –  Byte/Bit-­‐Write  control  CARME  register  pack  16X16  TSMC  65LP  Access  3me  220ps  
  • 13. May  1,  2013 13CARME  compiler  highlights  •  Proprietary  Bitcell  Features  :  •  Scannable  •  Reserable  •  High-­‐speed  write  •  Support  for  mul3ple  high-­‐speed  read  ports  •  Area  efficient  –  ½  of  normal  D-­‐FF  •  Low  power  –  ½  of  normal  D-­‐FF  •  Reten3on-­‐Ready  -­‐  Replace  reten3on-­‐FF  •  Non-­‐Pushed-­‐Rules  :  Easily  retarget-­‐able  
  • 14. May  1,  2013 14All  outputs    are  routed  to  Distribu3on  Plane  Up  to  16  Bitcells  in  a  pack    Output  Mux  Address  Bus  Data  Bus  Basic  Architecture  
  • 15. May  1,  2013 15Add  read  ports  in  a  modular  way  without  complexity  or  performance  degrada3on  OutMux                          Port  A  Addr  Bus  A  DataOut              Port  A  Mul3ple  read  ports  Addr  Bus  B  OutMux                          Port  B  DataOut              Port  B  
  • 16. May  1,  2013 16ScanCK_N  ScanCK  ScanCK_N  ScanCK  •  Scannable  Latch  Array  CARME  compiler  highlights  
  • 17. May  1,  2013 17CARME  compiler  highlights  •  Flexible  number  of  read  ports  1  Port   2  Ports   4  Ports  
  • 18. May  1,  2013 18CARME  compiler  highlights  •  Fits  inside  logic  rows  –  zero  overhead  for  spacers,  power  rings,  wrappers  •  Custom  layout  fits  number  of  horizontal  &  ver3cal  tracks  •  IR-­‐Drop-­‐aware  placement  :  Shared  among  rows  •  Just  like  a  big  standard  –  cell  !  
  • 19. May  1,  2013 19CARME  compiler  highlights  •  Rou3ng-­‐Aware  structure  •  Feed-­‐Through  over  the  cell  
  • 20. May  1,  2013 20CARME  performances    @  65  nm  LP  20  •  Register  performances  Block  Name   Register  size   ConfiguraBon   Speed    write  operaBon  (ps)  Speed    read  operaBon  (ps)  Dynamic  power  (uA/MHZ)  MCU  OR1200   32x32   2R1W   497   611   1.3  ALU  CHRONOS   16x32   3R1W   406   490   0.52  USB   32x8   1R1W   358   442   0.28  UART   16x8   1R1W   332   420   0.19  Spi   4x8   1R1W   315   240   0.15    PVT used for timings: SS; 1.08 V; 125°C  PVT user for dynamic power consumption: TT; 1.2 V; 25°C
  • 21. May  1,  2013 21CARME  performances    @  65  nm  LP  21  Post  P&R  results    on  Motu-­‐Uta    using  a  High-­‐Density  7-­‐Track  Spinner  (Pulsed-­‐latch)  library  :  •  W/O    CARME  :  114000  um2  at  195  MHz  •  Using  CARME  :      97600  um2  at  196  MHz  (-­‐15%  area,  same  speed)  •  Using  CARME  :  103000  um2  at  233  MHz  (-­‐10%  area,  +20%  speed)  
  • 22. May  1,  2013 22uHD-­‐BTF  Standard  Cells  CARME  Compiler  Cell  Compa3ble  register  packs  CARME  integra3on  flow  MHzPatent pendingReduced cell stemlibrary based onpulsed latch for ultrahigh densityPatent pendingbit-cell based generatorof register packsLogiWare modelsLibrary of verilog models ofregistersScriptsAutomatic detection ofregisters in a RTL design andtheir swift replacement by amodel enabling bothsynthesis and instantiation ofregisters
  • 23. May  1,  2013 23Cell  Compa3ble  register  packs  CARME  integra3on  flow  Memory  compilers  Memory  instances  list  LOGIWARE  Library  Memory  compilers  User’s  original  RTL  DETECTION  script  Standard implementation flowCARME implementation flow: automated stepsDolphin’s silicon IPs offeringSELECTION  script  Hard  macros  instan3ated  RTL  Netlist  with  hard  macros  Synthesis  Updated  Memory  instances  list  …1  2   5  3   4  6   7   8  Selection script allows replacement percriteria defined by the user:  Above certain # of bits (IE >500 bits)  Above a defined speed/area/leakage gain  Always replace inside a specified block  Do not touch a specified block  Etc..
  • 24. May  1,  2013 24Summary  •  CARME  is  an  innova3ve  patent-­‐pending  breakthrough  in  logic  design  combining  the  flexibility  and  testability  of  synthesizable  registers  together  with  the  high  density  of  memory  generators  and  the  high  speed  and  low  power  of  custom  data-­‐paths.  •  Dolphin  integra3on  is  con3nuously  challenging  the  tradi3onal  library  market  with  the  introduc3on  of  patented  ground-­‐breaking  innova3ons  allowing  SoC  architects  and  backend-­‐engineers  to  maximize  their  silicon  performance/cost.  
  • 25. May  1,  2013 25THANK YOU !Ilan  Sever  Sales  :  www.dolphin-­‐