High Level Design & ESL
How design cost is driving innovation in
system-level designs?
Rajesh Gupta
University of Californ...
My main point
 At various time VLSI design has been driven by
 Area, timing, power, reliability, manufacturing variabili...
3
12/18/03 R. Gupta, UC San Diego
The Technology and Its Industry
ComponentsSystems
Tools
Masks
Mask data
More Silicon to More Boxes…
 Of the 72 distinct application markets that rely on value
added IC designs (ASIC, ASSP, FPGA...
Is there a problem here?
WW Market Forecast : ASIC vs. FPGA
0
5000
10000
15000
20000
25000
30000
35000
2003 2004 2005 2006...
More & Moore
 Most things in real-life do not
scale anywhere close to this
 Battery energy, power sources
 Size, Space,...
A Tale of Two Consequence
1. EDA: Raise abstractions
 Raising abstraction has always been part of the solution strategy t...
FPGA v. ASIC: Cost v. Volume
FPGA
ASIC
Structured ASIC, SA
cf
New Fabric, T
xf xa
ct
ca
Volume
Total Cost
A good solution:...
ASIC/FPGA Tradeoff
F
A
SA
cf
T
xf xa
ct
ca
Volume
Total Cost
A good solution:
xf  0 or better ASIC, ct  cf
xa  infinity...
Better ASIC or Better FPGA?
F
A
cf
ca
Volume
Total Cost
Improved Area Utilization
Reduced
Design Cost;
Chip
implementation...
F
A
cf
ca
Volume
TotalCost
Better area utilization
in FPGA, 7x target
F
A
cf
ca
Better synthesis,
EDA, 2x target
F
A
cf
ca...
Technical Dimensions of the
Problem
 SE: Silicon Efficiency
 Inherently better circuit implementation styles, levels, lo...
ITRS, last updated 2006
Designer Productivity is Challenge #1
Verification
Predictable
Implementation
Embedded SW
Distribu...
Impact on Designer Productivity
Design Technology Year Productivity
Delta
gates/DY
Comments
Physical Design (APR) 1993 38....
Raising Verification
Golden
Reference
Model
Property Checker
Refinement or
Equivalence Checker
Verification Techniques
 S...
Refinement Checking
Input Program
(Specification)
Transformed
Program
(Implementation)
Transformations
Refinement
Or Equiv...
Prototype Implementation -
ARCCoS
CSP
Specification
CSP
Specification
Front End Parser
Specification (CFG)
CSP
Implementat...
Results from ARCCoS
Descriptions #Process Time (no PO)
(min:sec)
Time (PO)
(min:sec)Spec Impl Total
Simple buffer 3 4 7 00...
Example
i2: k = p
i1: sum = 0
i3: (k < 10) i6: ¬ (k < 10)
i4: k = k + 1
i5: sum = sum + k
a2
a3
a4
a5
a1
a6
a0
i7: return ...
On going work
Intermediate
Representation
Static Analysis
Partial Order
Information
Explicit Stateless Model Checker
Query...
Closing Thoughts
 ASIC design cost is the new driver
 Solution space is expanded to include not only tools but
also arch...
Upcoming SlideShare
Loading in …5
×

Fmcad08

276 views
176 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
276
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In this work I propose a verification framenwork consisting of scalable techniques for automatic verification of system designs. In particular I propose techniques for two components a Property checker and a Refinement or equivalence checker. The property checker takes as input a high-level model and through a reapeted process of checking and correction produces a golden reference model. The refinement or the equivalent checker takes as input a specification and a implementation and checks if the implementation preserves certain properties of the specification. In short, we use a combination of techniques from … to do property checking and a combination of techniques from … to equivalence checking Our results shows that our techniques are efficient and useful.
  • Consider an Input program called Specification which goes through a series of transformation to form a transformed program called Implementation. The problem at hand is that we want to make sure that the implementation preserves some properties of the specification. One way to do this is to use a checker and prove once for all that the transformations used will always produce correct results. However, proving refinements using this method is usually very hard. Another way, and the one that we use is to prove it for each pair of programs Here we prove that the implementation is a refinement of the specification or it is equivalent to the spec. By refinement we mean that the possible set of traces of the implementation is a subset of the traces of the Specification.
  • We have implemented these two algorithms in a system called ARCCoS. In ARCCoS we have also implemented a very simple PO technique to reduce the no of interleavings in the control state of the 2 programs
  • Next we applied our tool through a set of benchmarks the result of which is shown in this table. The most significant result here is that we were able to run our tool in a example with 53 threads in around 37 mins We also ran our tool over parts of an industrial example namely EP2 system and was also able to infer the simulation relation pretty quickly.
  • Next lets see an example of the transformation done by spark This program computes the sum from p+1 to 10. SPARK takes this input and applies loop pipelining and copy propagation and some other heuristics to get the scheduled IR which is shown in the right side. In particular, the instruction i4: k = k+1 is duplicated and one copy is put above the loop and one at the end of the loop. Then some instruction are coupled together based on the resource alocation to be executed concurrently. Our goal is to show that the specification and impl are equivalent. So we applied our algorithm to validate it. The result of the forward pass finds the points of interest and the local conditions The 2 nd pass or the backward pass propagates these local conditions backwards using weakest pre condition till we reach a fixpoint.
  • This slide shows the big picture of our approach: We are building a framework called Satya that takes as input a systemC design and then converts it to an intermediate representation. We then do static analysis to generate the POR info and an initial test bench. These informations are used by an explicit stateless model checker that explores all possible behaviors of the design. It then if required generates another test bench and explore again for that test bench.
  • Fmcad08

    1. 1. High Level Design & ESL How design cost is driving innovation in system-level designs? Rajesh Gupta University of California, San Diego mesl . ucsd . edu FMCAD, Portland, Nov. 17, 2008
    2. 2. My main point  At various time VLSI design has been driven by  Area, timing, power, reliability, manufacturing variability  Cost of design is likely to be the driver for future innovations in how we architect, design and implement future ICs in each of these areas:  Tools, Methods  Architectures  Programming models and methods
    3. 3. 3 12/18/03 R. Gupta, UC San Diego The Technology and Its Industry ComponentsSystems Tools Masks Mask data
    4. 4. More Silicon to More Boxes…  Of the 72 distinct application markets that rely on value added IC designs (ASIC, ASSP, FPGA, SOC)  over 50% are less than $500M, 75% are less than $1B  The rising fabless, fablite  The US has 56% of over 1K design houses…  …and accounts for 76% of industry revenues  (Wireless 27%, networking 25%, consumer 20%)  Cost is increasingly the driver for fabless  Only 17% of designs above 500 MHz  67% of ASIC designs are 299 MHz and lower  Sizes pretty much evenly distributed from 100K to 5M gates Source: IBS
    5. 5. Is there a problem here? WW Market Forecast : ASIC vs. FPGA 0 5000 10000 15000 20000 25000 30000 35000 2003 2004 2005 2006 2007 2008 2009 2010 2011 $(millions) Total ASIC Total FPGA Source: Gartner Dataquest “ASIC and FPGA WW Market Forecast, January 2008”
    6. 6. More & Moore  Most things in real-life do not scale anywhere close to this  Battery energy, power sources  Size, Space, Spectrum  Design time.  Dealing with the effects of Moore  “Embedded Systems” 486 Pad limited die: 200 pins 52 mm2 0 1 2 3 4 5 6 16x 14x 12x 10x 8x 6x 4x 2x 1x Improvement(comparedtoyear0) Time (years)
    7. 7. A Tale of Two Consequence 1. EDA: Raise abstractions  Raising abstraction has always been part of the solution strategy to lower design costs.  In design modeling, design synthesis, design verification 1. Architecture: Raise programmability  Holy Grail: ASIC efficiency with CPU programmability.  The tremendous space of architectural innovations between ASIC and FPGA ► Let us take a look at the two sides from a familiar perspective
    8. 8. FPGA v. ASIC: Cost v. Volume FPGA ASIC Structured ASIC, SA cf New Fabric, T xf xa ct ca Volume Total Cost A good solution: xf  0 or better ASIC, ct  cf xa  infinity or better FPGA, mtma  Currently we are: cf = 2 ca ; mf = 20 ma  Fixed cost of FPGA design = 2 * ASIC design costs  Per part cost of FPGAs rises 20x cost of ASIC. 
    9. 9. ASIC/FPGA Tradeoff F A SA cf T xf xa ct ca Volume Total Cost A good solution: xf  0 or better ASIC, ct  cf xa  infinity or better FPGA, mtma
    10. 10. Better ASIC or Better FPGA? F A cf ca Volume Total Cost Improved Area Utilization Reduced Design Cost; Chip implementation, Shuttles, etc. Space of ‘synthetic’ solutions
    11. 11. F A cf ca Volume TotalCost Better area utilization in FPGA, 7x target F A cf ca Better synthesis, EDA, 2x target F A cf ca Design for synthesis, 3x cost increase
    12. 12. Technical Dimensions of the Problem  SE: Silicon Efficiency  Inherently better circuit implementation styles, levels, logic: Asynchronous, GALS  AE: Architectural Efficiency  Inherently improved application-level performance or performance independent of mapping methods  PA: Programmer Accessibility  Use existing programming models/methods to ensure IP availability and integration.  DP: Designer Productivity
    13. 13. ITRS, last updated 2006 Designer Productivity is Challenge #1 Verification Predictable Implementation Embedded SW Distributed design, AMS
    14. 14. Impact on Designer Productivity Design Technology Year Productivity Delta gates/DY Comments Physical Design (APR) 1993 38.9% 5.55K PD integration Tall-thin Engineer 1995 63.6% 9.1K Chip/circuit/PD/Verif. Small block reuse 1997 340% 40K 2.5K-75K gates Large block reuse 1999 38.9% 56K 75K-1M gates IC implementation suits 2001 63.6% 91K RTL-GDSII integration RTL functional verification 2003 37.5% 125K SW development verif. ES Methodology 2005 60% 200K Behavioral above RTL Very large block reuse 2007 200% 600K >1M gates, IP cores Homogenous parallel processing 2009 100-200% 1.2M Many identical cores around a main processor Intelligent test bench 2011 37.5%2.4M Automation of verification partitioning Concurrent SW compiler 2013 60% 3.3M Enables SW in parallel SOCs Heterogenous massive parallel processing 2015 100-200% 5.3M Specialized cores around a main processor System-level DA and executable specs 2017- 19 100-200% 10.5M On/off-chip integration of functions. Total 264,000%
    15. 15. Raising Verification Golden Reference Model Property Checker Refinement or Equivalence Checker Verification Techniques  Scalable techniques for automatic verification of system designs Architecture Level Transaction Level Model (TLM) (Non-Synthesizable Subset) Register Transfer Level (RTL) Micro-architecture Level (Synthesizable Subset) Mostly Manual High Level Synthesis Translation Validation Automated Theorem Proving Relational Approach Verification Techniques Partial Order Reduction Explicit Stateless Search Automatic Test Generation Property checker Refinement/Equivalence checker
    16. 16. Refinement Checking Input Program (Specification) Transformed Program (Implementation) Transformations Refinement Or Equivalent CheckerChecker
    17. 17. Prototype Implementation - ARCCoS CSP Specification CSP Specification Front End Parser Specification (CFG) CSP Implementation CSP Implementation Implementation (CFG) Automated Theorem Prover (Simplify) A R C C o S Simulation Relation Simulation RelationInference Engine Partial Order Reduction Engine Checking Engine
    18. 18. Results from ARCCoS Descriptions #Process Time (no PO) (min:sec) Time (PO) (min:sec)Spec Impl Total Simple buffer 3 4 7 00:00 00:00 Simple vending machine 1 1 2 00:00 00:00 Cyclic scheduler 3 3 6 01:01 00:49 College student tracking system 1 2 3 00:01 00:01 Single communication link 3 8 11 00:01 00:01 2 parallel communication links 6 12 18 01:28 00:04 3 parallel communication links 9 16 25 514:52 00:21 4 parallel communication links 12 20 32 DNT 01:11 5 parallel communication links 15 24 39 DNT 02:32 6 parallel communication links 18 28 46 DNT 08:29 7 parallel communication links 21 32 53 DNT 37:28 Hardware refinement 3 5 8 00:00 00:00 EP2 System 1 2 3 01:51 01:47
    19. 19. Example i2: k = p i1: sum = 0 i3: (k < 10) i6: ¬ (k < 10) i4: k = k + 1 i5: sum = sum + k a2 a3 a4 a5 a1 a6 a0 i7: return sum (a) Specification + + <Resource Allocation: b1 b2 b3 b4 b0 j1: sum = 0 j2: k = p j41: t = p + 1 j4: k = t j5: sum = sum + t j42: t = t + 1 j7: return sum j6: ¬ (k < 10)j3: (k < 10) (b) Implementation (l1, l2) 1st Pass 2nd Pass 1. (a0, b0) ps = pi ps = pi 2. (a2, b1) ks = ki ks = ki Λ sums = sumi Λ (ks + 1) = ti 3. (a5, b3) sums = sumi sums = sumi sum = ∑10 i p+1 Loop pipelining Copy propagation
    20. 20. On going work Intermediate Representation Static Analysis Partial Order Information Explicit Stateless Model Checker Query Engine Explore Engine SystemC Design Test Bench SystemC Simulator Satya
    21. 21. Closing Thoughts  ASIC design cost is the new driver  Solution space is expanded to include not only tools but also architectures  A time for tremendous creativity F A cf ca Volume TotalCost Better area utilization in FPGA, 7x target F A cf ca Volume TotalCost Better area utilization in FPGA, 7x target F A cf ca Better synthesis, EDA, 2x target F A cf ca Better synthesis, EDA, 2x target F A cf ca Design for synthesis, 3x cost increase F A cf ca Design for synthesis, 3x cost increase

    ×