VLSI and Embedded System DESIGN – provides an overview of VLSI design and embedded systems. It discusses the challenges in VLSI design including power dissipation and interconnect issues. It then defines embedded systems and describes their characteristics like being single-purposed, tightly constrained, and reactive in real-time. Key technologies for embedded systems design are discussed including processor technology, integrated circuit technology, and design technologies which allow a unified view of hardware and software co-design. Optimization of design metrics like cost, performance, power, and flexibility is a major challenge.
VLSI and Embedded System DESIGN - An Overview of Key Challenges
1. VLSI and Embedded System DESIGN –
(An Overview)
Prof. N.S.Murthy
(Former Dean and HOD/ ECE/
NIT- Warangal)
nsmurthy58@gmail.com
1
2. Lecture Outline
• What are the challenges in VLSI Design?
• What is an Embedded System?
• What is SOC?
• What is FPGA?
• ASIC vs FPGAs?
• Applications
2
12. 12
Power will be a major problem
5KW
18KW
1.5KW
500W
4004
8008
8080
8085
8086
286
386
486
Pentium® proc
0.1
1
10
100
1000
10000
100000
1971 1974 1978 1985 1992 2000 2004 2008
Year
Power
(Watts)
Power delivery and dissipation will be prohibitive
Courtesy, Intel
14. Power Consumption
• Dynamic
– Transition
– Short circuit
• Leakage
– Sub-threshold leakage
– Diode/Drain leakage
– Gate leakage
At 250nm leakage power was only 5% but it is increasing
rapidly as geometries decrease
14
21. 21
Not Only Microprocessors
Digital Cellular Market
(Phones Shipped)
1996 1997 1998 1999 2000
Units 48M 86M 162M 260M 435M
Analog
Baseband
Digital Baseband
(DSP + MCU)
Power
Manageme
nt
Small
Signal RF
Powe
r
RF
(data from Texas Instruments)
Cell
Phone
22. 22
Challenges in Digital Design
“Microscopic Problems”
• Ultra-high speed design
• Interconnect
• Noise, Crosstalk
• Reliability, Manufacturability
• Power Dissipation
• Clock distribution.
Everything Looks a Little Different
“Macroscopic Issues”
• Time-to-Market
• Millions of Gates
• High-Level Abstractions
• Reuse & IP: Portability
• Predictability
• etc.
…and There’s a Lot of Them
DSM 1/DSM
?
24. 24
Why Scaling?
• Technology shrinks by 0.7/generation
• With every generation can integrate 2x more
functions per chip; chip cost does not increase
significantly
• Cost of a function decreases by 2x
• But …
– How to design chips with more and more functions?
– Design engineering population does not double every two
years…
• Hence, a need for more efficient design methods
– Exploit different levels of abstraction
26. Major Design Challenges
26
Microscopic issues
ultra-high speeds
power dissipation and supply
rail drop
growing importance of
interconnect
noise, crosstalk
reliability, manufacturability
clock distribution
Macroscopic issues
time-to-market
design complexity (millions of
gates)
high levels of abstractions
reuse and IP, portability
systems on a chip (SoC)
tool interoperability
Design Approach
Top – Down approach
Define top-block , identify the sub blocks needed
to build the top level block and divide further up
to the leaf cells
Bottom – Up approach
Identify the available building blocks, use them to
build a bigger cells and use them to build top
level block
Combination of both
27. 27
Design Metrics
• How to evaluate performance of a digital
circuit (gate, block, …)?
– Cost
– Reliability
– Scalability
– Speed (delay, operating frequency)
– Power dissipation
– Energy to perform a function
28. 28
Cost of Integrated Circuits
• NRE (non-recurrent engineering) costs
– design time and effort, mask generation
– one-time cost factor
• Recurrent costs
– silicon processing, packaging, test
– proportional to volume
– proportional to chip area
32. 32
Some Examples (1994)
Chip Metal
layers
Line
width
Wafer
cost
Def./
cm2
Area
mm2
Dies/
wafer
Yield Die
cost
386DX 2 0.90 $900 1.0 43 360 71% $4
486 DX2 3 0.80 $1200 1.0 81 181 54% $12
Power PC
601
4 0.80 $1700 1.3 121 115 28% $53
HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73
DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149
Super Sparc 3 0.70 $1700 1.6 256 48 13% $272
Pentium 3 0.80 $1500 1.5 296 40 9% $417
33. Embedded Systems Design: A
Unified Hardware/Software
Introduction
33
Introduction to
Embedded Systems Design
34. 34
Embedded systems overview
• Computing systems are everywhere
• Most of us think of “desktop” computers
– PC’s
– Laptops
– Mainframes
– Servers
• But there’s another type of computing system
– Far more common...
35. 35
Embedded systems overview
• Embedded computing systems
– Computing systems embedded
within electronic devices
– Hard to define. Nearly any
computing system other than a
desktop computer
– Billions of units produced yearly,
versus millions of desktop units
– Perhaps 50 per household and
per automobile
Computers are in
here...
and here...
and even here...
Lots more of
these, though they
cost a lot less
each.
36. 36
A “short list” of embedded systems
And the list goes on and on
Anti-lock brakes
Auto-focus cameras
Automatic teller
machines
Automatic toll systems
Automatic transmission
Avionic systems
Battery chargers
Camcorders
Cell phones
Cell-phone base stations
Cordless phones
Cruise control
Curbside check-in
systems
Digital cameras
Disk drives
Electronic card readers
Electronic instruments
Electronic toys/games
Factory control
Fax machines
Fingerprint identifiers
Home security systems
Life-support systems
Medical testing systems
Modems
MPEG decoders
Network cards
Network switches/routers
On-board navigation
Pagers
Photocopiers
Point-of-sale systems
Portable video games
Printers
Satellite phones
Scanners
Smart ovens/dishwashers
Speech recognizers
Stereo systems
Teleconferencing systems
Televisions
Temperature controllers
Theft tracking systems
TV set-top boxes
VCR’s, DVD players
Video game consoles
Video phones
Washers and dryers
37. 37
Some common characteristics of
embedded systems
• Single-functioned
– Executes a single program, repeatedly
• Tightly-constrained
– Low cost, low power, small, fast, etc.
• Reactive and real-time
– Continually reacts to changes in the system’s
environment
– Must compute certain results in real-time without
delay
38. 38
An embedded system example -- a
digital camera
Microcontroller
CCD preprocessor Pixel coprocessor
A2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
• Single-functioned -- always a digital camera
• Tightly-constrained -- Low cost, low power, small, fast
• Reactive and real-time -- only to a small extent
39. 39
Design challenge – optimizing
design metrics
• Obvious design goal:
– Construct an implementation with desired
functionality
• Key design challenge:
– Simultaneously optimize numerous design metrics
• Design metric
– A measurable feature of a system’s
implementation
– Optimizing design metrics is a key challenge
40. 40
Design challenge – optimizing
design metrics
• Common metrics
– Unit cost: the monetary cost of manufacturing each copy of the system,
excluding NRE cost
– NRE cost (Non-Recurring Engineering cost): The one-
time monetary cost of designing the system
– Size: the physical space required by the system
– Performance: the execution time or throughput of the system
– Power: the amount of power consumed by the system
– Flexibility: the ability to change the functionality of the system without
incurring heavy NRE cost
41. 41
Design challenge – optimizing
design metrics
• Common metrics (continued)
– Time-to-prototype: the time needed to build a working version of
the system
– Time-to-market: the time required to develop a system to the point
that it can be released and sold to customers
– Maintainability: the ability to modify the system after its initial
release
– Correctness, safety, many more
42. 42
Design metric competition --
improving one may worsen others
• Expertise with both
software and hardware is
needed to optimize design
metrics
– Not just a hardware or
software expert, as is common
– A designer must be
comfortable with various
technologies in order to
choose the best for a given
application and constraints
Size
Performanc
e
Power
NRE
cost
Microcontro
ller
CCD
preprocessor
Pixel
coprocessor
A2D
D2A
JPEG codec
DMA controller
Memory
controller
ISA bus
interface
UART LCD ctrl
Display
ctrl
Multiplier/Accu
m
Digital camera chip
lens
CCD
Hardware
Software
43. 43
Three key embedded system
technologies
• Technology
– A manner of accomplishing a task, especially using
technical processes, methods, or knowledge
• Three key technologies for embedded systems
– Processor technology
– IC technology
– Design technology
44. 44
Processor technology
• The architecture of the computation engine used to implement
a system’s desired functionality
• Processor does not have to be programmable
– “Processor” not equal to general-purpose processor
Application-specific
Registers
Custom
ALU
Datapath
Controller
Program
memory
Assembly
code for:
total = 0
for i =1 to …
Control
logic and
State
register
Data
memory
IR PC
Single-purpose (“hardware”)
Datapath
Controller
Control
logic
State
register
Data
memory
index
total
+
IR PC
Register
file
General
ALU
Datapath
Controller
Program
memory
Assembly
code for:
total = 0
for i =1 to …
Control
logic and
State
register
Data
memory
General-purpose (“software”)
45. 45
Processor technology
• Processors vary in their customization for the problem at hand
total = 0
for i = 1 to N
loop
total += M[i]
end loop
General-
purpose
processor
Single-
purpose
processor
Application-
specific
processor
Desired
functionality
46. 46
The co-design ladder
• In the past:
– Hardware and software
design technologies were
very different
– Recent maturation of
synthesis enables a unified
view of hardware and
software
• Hardware/software
“codesign” Implementation
Assembly
instructions
Machine
instructions
Register transfers
Compilers
(1960's,1970
's)
Assemblers,
linkers
(1950's, 1960's)
Behavioral
synthesis
(1990's)
RT synthesis
(1980's,
1990's)
Logic
synthesis
(1970's,
1980's)
Microprocessor plus
program bits:
“software”
VLSI, ASIC, or PLD
implementation:
“hardware”
Logic gates
Logic equations /
FSM's
Sequential program code (e.g., C, VHDL)
The choice of hardware versus software for a particular function is simply a
tradeoff among various design metrics, like performance, power, size, NRE
cost, and especially flexibility; there is no fundamental difference between
what hardware or software can implement.
47. 47
Summary
• Embedded systems are everywhere
• Key challenge: optimization of design metrics
– Design metrics compete with one another
• A unified view of hardware and software is necessary to
improve productivity
• Three key technologies
– Processor: general-purpose, application-specific, single-purpose
– IC: Full-custom, semi-custom, PLD
– Design: Compilation/synthesis, libraries/IP, test/verification
48. Why Worry about Power?
Total Energy of Milky Way
Galaxy: 1059 J
Minimum switching energy
for digital gate (1
electron@100 mV):
1.6 *10-20 J
(limit -- thermal noise)
Upper bound on number of digital operations: 6 x1078
Operations/year performed by 1 billion 100 MOPS
computers: 3 1024
Energy consumed in 180 years, assuming a doubling of
computational requirements every year (Moore’s Law).
The Tongue-in-Cheek Answer
48
49. Power the Dominant Design Constraint (1)
Cost of large data centers solely determined by power bill …
Columbia River
Google Data Centre
Oregaon.
8,00
0 100,000
450,000
NY Times, June 06
49
50. 50
400 Millions of Personal Computers
worldwide (Year 2000)
- Assumed to consume 0.16 Tera kWh per
year
Equivalent to 26 nuclear power plants
Over 1 Giga kWh per year just for cooling
Including manufacturing electricity
[Ref: Bar-Cohen et al., 2000]
52. Chip Architecture and Power Density
Integration of diverse functionality
SoC causes major variations in activ
(and hence power density)
The past: temperature
uniformity
Today: steep
gradients
Temperature variations cause
performance degradation –
higher temperature means
slower clock speed
52
53. Temperature Gradients (and Performance)
IBM Power PC 4 temperature map
Hot spot:
138 W/cm2
(3.6 x chip avg flux)
Glass ceramic substrate
SiC spreader (chip underneath spreade
Copper hat (heat sink on top not shown
53
55. Power The Dominant Design Constraint (3)
Exciting emerging applications require “zero-power”
Example: Computation/Communication Nodes
for Wireless Sensor Networks
Meso-scale low-cost wireless transceivers for
ubiquitous wireless data acquisition that
• are fully integrated
– Size smaller than 1 cm3
•are dirt cheap
–At or below 1$
• minimize power/energy dissipation
– Limiting power dissipation to 100 mW
enables energy scavenging
• and form self-configuring, robust, ad-hoc networks
containing 100’s to 1000’s of nodes
55
56. How to Make Electronics Truly Disappear?
From 10’s of cm3 and 10’s to 100’s of mW
To 10’s of mm3 and 10’s of mW
57. Power the Dominant Design Constraint
Exciting emerging applications require “zero-power”
Real-time Health Monitoring
Smart Surfaces
Artificial Skin
Philips Sand module
UCBmm3 radio
UCB PicoCube
Still at least one order of magnitude away
57
58. Evolution of Supply Voltages in the Past
Minimum Feature Size (micron)
10
-1
1
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Supply
Voltage
(V)
Supply voltage scaling only from the 1990’s
58
65. Motivation for Low Power Design
Low power design is important from three
different reasons
• Device temperature
– Failure rate, Cooling and packaging costs
• Life of the battery
– Meantime between charging, System cost
• Environment
– Overall energy consumption
65
66. Power Consumption
• Dynamic
– Transition
– Short circuit
• Leakage
– Sub-threshold leakage
– Diode/Drain leakage
– Gate leakage
At 250nm leakage power was only 5% but it is increasing
rapidly as geometries decrease
66
72. Reverse-Biased Diode Leakage
N
p+ p+
Reverse Leakage Current
+
-
Vdd
GATE
IDL = JS A
JS = 1-5pA/mm2
for a 1.2mm CMOS technology
Js double with every 9o
C increase in temperature
JS = 10-100 pA/mm2 at 25 deg C for 0.25mm CMOS
JS doubles for every 9 deg C!
72
76. Optimization is possible at every level
ALGORITHMIC LEVEL eg: DFT: O(N2); FFT: O(N)
ARCHITECTURAL LEVEL
LOGIC LEVEL
CIRCUIT LEVEL
DEVICE LEVEL
…..
76
77. System Level Design
Same MP3 Application
running on different systems
consume significantly
different amounts of power
• System partitioning
• Busses/Memory/IO devices /interfaces
• Choice of components
• Coding
• System states (sleep/snooze etc)
• DVS/DFS/..
77
82. Device Technology
• Multi-oxide devices
• Multiple “cell types” on a single substrate
– Logic, SRAM, Flash etc.
• Support for many other low power design
techniques (multiple thresholds, multiple
voltages, multiple frequencies etc.)
82
86. Reduction of Sub-threshold Leakage
Current
• Reduce supply voltage
• Reduce size of the circuit
– Resize transistors as per performance requirements
– Dynamically cut power supply to unused circuits
• Cooling
• Reduce threshold voltage
– Stack the off-transistors in series
– Isolating supply through sleep transistors
– Dual threshold; higher threshold on non-critical paths
– Adaptive body biasing
86
87. OPTIMIZATION AT LOGIC LEVEL
20 Transistors
2:1 MULTIPLEXER
6
6
6
2
S
A
B
Y
4
4
4
2
S
A
B
Y
14 Transistors
87
88. OPTIMIZATION AT CIRCUIT LEVEL
2:1 MULTIPLEXER USING TRANSMISSION GATE LOGIC
S
A
B
S
Y
6 Transistors
(including 2 for inverting S)
S
S
88
89. A
A
B
B
C
C
D
D
VDD
Y
Y = (AB + C)D
0
A
B
C
D
Y
18 Transistors 8 Transistors
Optimized transistor level realization of Boolean function
89
90. Low Power RTL Synthesis Techniques
• Module selection
• Retiming
• Pipelining
• Parallelism
• Bus data encoding
• FSM encoding
• Transformations for Switching activity
reduction
90
91. Module Selection
• Modules are used for implementing functional
units, small memory modules etc.
• Significant difference in power consumption
of different implementations
• Word-length as well as number coding
techniques employed can play a significant
role
91
92. Ripple Carry Adder
Carry signal switching propagates through all
the stages
and consumes Power
ACTEL:
MAPLD2004
92
93. Carry Look Ahead Adder
Carry signal switching propagates through much
less number of stages and thus not only reduces
delay but can also consume less power
ACTEL:
MAPLD2004
93
94. Other Operations and Operators
• ALUs
– Traditional method: Perform all operations and
use select for the output; very inefficient in terms
of switching activity
– Permit switching activity only in the operator
required in this cycle
• Complex operators like MAC
• Cordic functions
– Look up table vs computation
94
96. Retiming - Positioning a Flip-flop and
Power Consumption
Logi
c
Logi
c
FF
CL
CL
CR
Eg
Eg
ER
P1 = k * Eg *
CL
P2 = k * (Eg * CR + ER
* CL)
P2 can be less than
P1
96
97. Pipelining
• Pipelining effects power in two different ways
• One factor is similar to retiming where flip-
flops can cut down on glitches
• As pipelining can reduce the critical path to
give higher frequency and performance
(throughput), this can be used to reduce the
voltage for the given throughput to reduce
power
97
99. Increasing Parallelism/ Concurrency
• Chandrakasan[4] first showed that concurrency can
be used to reduce power instead of increasing
performance
• Primary idea is to reduce the frequency of operation
and/or voltage to meet a certain throughput
• Power consumed by additional logic required to
distribute computation and multiplex results needs
to be accounted for
99
100. Effect of Parallelism
Case1:
Single FU
Case2:
Two FUs
for
enhanced
performan
ce
Case 3:
Two FUs
for
reducing
power
freq: f0
voltage: v0
throughput:
T0
f1 = f0
v1 = v0
T1 > T0
FU
reg
FU
reg
FU
reg
FU
reg
FU
f2 < f0
v2 < v0
T2 = T0
M
U
x
M
U
x
100
101. Examples and Case study
• Usage of redundant arithmetic
• Usage of alternative number representation
(normalized / Gray coded)
• Usage of running transforms
____________________________________________
• Design of an alternative arithmetic unit (e.g. CORDIC)
• Design of an FFT address generator
101
103. CORDIC Algorithm – a simple, smart way of computing
trigonometric quantities (e.g., cos ) in digital hardware
and to realize multiplierless architectures.
CORDIC : “Coordinate Rotation Digital Computer”
Define a set of basic CORDIC angles , 0 90 .
o o
k k
•
1
tan (2 ), 0,1,2, ....
k
k k
• 0 0 1 2 1
45 , ..... ..... 0
o o
k k
Given an angle , 0 90 ,
o o
we can write,
0
,
i i
i
where 0
1 ( 1)
i
103
104. In practice, the summation is truncated up to a finite
number of terms, say, M (called wordlength)
:
as
generated
is
sequence
The i
End
i
sign
i
i
M
to
i
For
i
i
i
)];
1
(
[
;
)
(
)
1
(
1
0
1
,
(0)
1
0
104
105. CORDIC Rotation
( , )
x y
( ', ')
x y
1
1 1 0 0
0
Rot( ) = Rot( ). . Rot( )
M
k k M M
k
' cos sin
' sin cos
1 tan
cos
tan 1
Rot ( )
x x
y y
x
y
x
y
1
0
1
0 1 0
1 tan
1 tan
'
( cos )
' tan 1 tan 1
M
M
k
k M
x x
y y
105
106. ( 1) 0
1
( 1) 0
0
' 1 2 1 2
( cos )
' 2 1 2 1
M
M
k M
k
x x
y y
•
1
0
cos : Universal constant
M
k
k
• Elementary rotations – need only shifting operations
--- can be pipelined
• Shifting done by direct bus connections
2 i
v u
u
v
106
107. i
x i
y
2 i
2 i
i
i
sign
1
i
x 1
i
y 1
i
i
Rotation by i
(i-th stage of a pipelined CORDIC unit)
0
x x
0
y y
0
0
Rot( )
Pipeline
latch
1
y
1
1
M
x
1
M
y
1
M
1
Rot( )
M
M
x
M
y
Pipelined CORDIC Unit (PCU)
107
108. CONCLUSION
•There ALWAYS EXISTS A BETTER SOLUTION
than the present one and we can think of that.
•But
There ALWAYS EXISTS A BETTER SOLUTION
than what we can think of!
108