SlideShare a Scribd company logo
1 of 76
Download to read offline
Telecommunications
Telecommunications
Telecommunications
Telecommunications Engineering Master
Engineering Master
Engineering Master
Engineering Master
Embedded Systems : Introduction
Higher Institute for Applied Sciences and Technology
Dr. Daoud KARAKOULA
Module Objectives
 Obtain experience in hardware/software design of embedded systems
 Learn how to move from algorithm to architecture
 Learn about interfacing protocols in embedded systems
 Introducing high-level programming languages to describe ES
 Introducing modern design issues : SOC, NOC, co-design, …
Syllabus
 Embedded Processors and Memory
 Embedded Systems IO
 Interfacing bus, Protocols, Timers, AD and DA, …
 Embedded Communications
Parallel/Serial Communication,
 Parallel/Serial Communication,
 Wireless Communication,
 Network Communication, …
 Processors and FPGA
 Design of Embedded Processors (HDL)
Course material: resources  references
 Embedded System Design: A Unified Hardware/Software Introduction, by
F. Vahid (UCR) and T. Givargis (UCI)
 Embedded System Design, Peter Marwedel
 The Art of Designing Embedded Systems, Jack Ganssle
 The course of Dr. Amer Baghdadi of Embedded Systems
What are Embedded Systems ?
 The embedding of microprocessors
into equipment and portable devices
started before the appearance of the
home computer
 It consumes the majority of
microprocessors that are made today
microprocessors that are made today
☺ Huge application domaine
☺ Prototyping boards
What are Embedded Systems ?
Definition
Definition
An embedded system is nearly any computing system other than general
purpose computer : desktop, laptop, or mainframe computer
An embedded system is a microprocessor-based system that is built to
control a function or range of functions and is not designed to be
programmed by the end user
What are Embedded Systems ?
Hardware and Software
Hardware and Software
Modern design requires a designer to have a unified
view of software and hardware
 Integrated circuit (IC) capacities
 Quality compiler availability
 Synthesis technology
hardware
software
Application software
OS
Sw. Comm
(drivers, interruptions)
Resources
management
Hardware communication
network
CPUs (DSP, MCU), IPs, Memories
Examples of Embedded Systems
Examples
Examples
Front panel of a microwave oven
 simple control
MP3 player
32-bit µP
GPS Receiver player
16-bit µP
Palm VX:
32-bit µP motorola Dragonball EZ
Examples of Embedded Systems
Examples
Examples
Camera canon EOS-3
3 µPs, 32-bit RISC CPU runs auto-focus
Nokia 6620-g :
32-bit RISC CPU ARM-9
Examples of Embedded Systems
Examples
Examples
iPhone 3G
ARM11 processor
- 64-bit data-path
- 64-bit data-path
- 8-stage pipeline
- Can vary in clock speed up to 700MHz or more
- ARM Intelligent Energy Manager (reduce power consumption 25-50%)
- Features vector floating point coprocessor
- ARM Jazelle enabled for embedded Java execution
Characteristics of Embedded Systems
 Single functioned
 Real-time operation
 Physical size and weight
 Low manufacturing cost
 Low manufacturing cost
 Not using general purpose processor which we find in desktop computer
 Need to work with restricted memory
 Low power
- Power consumption is critical in battery-powered devices
Design Challenges
• How much hardware do we need ?
what is word size of the CPU ? size of memory ?
• How to minimize power ?
reduce memory accesses
• How to speed up our design ?
Size
Performance
Power
NRE cost
• How to speed up our design ?
introduce parallelism, pipeline technique
• How to reduce the NRE (Non-recurring Engineering) cost ?
The one-time cost of designing the system
• Expertise with both software and hardware is needed to
optimize design metrics
• Improving one metric may worsen others
NRE cost
Architecture of Embedded Systems
Processors
total = 0
for i = 1 to N loop
total += M[i]
end loop
Desired
functionality
General-purpose
processor
Single-purpose
processor
Application-specific
processor
Introduction
 Processor
 Digital circuit that performs a
computation tasks
 Controller and datapath
 General-purpose: variety of computation
tasks
 Single-purpose: one particular
CCD preprocessor Pixel coprocessor
A2D
D2A
Digital Camera chip
CCD
 Single-purpose: one particular
computation task
 Custom single-purpose: non-standard
task
 A custom single-purpose
processor may be
 Fast, small, low power
 But, high NRE, longer time-to-market,
less flexible
µProcessor
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
lens
Combinational logic: basic logic gates
Buffer
x F F
x
y
x
y F
F
y
x
AND OR XOR
x y
0 1
1
0
1
1
0 0
F
1
0
1
0
x y
0 1
1
0
1
1
0 0
F
1
1
1
0
x y
0 1
1
0
1
1
0 0
F
0
1
0
0
x
1
0
F
1
0
x F
x
y
F
x
y F
x
y F
Inverseur NAND NOR XNOR
x y
0 1
1
0
1
1
0 0
F
0
0
0
1
x y
0 1
1
0
1
1
0 0
F
0
1
0
1
x y
0 1
1
0
1
1
0 0
F
1
0
1
1
x
1
0
F
0
1
Combinational logic: basic functions
Comparator
n-bit
n n
A B
I E S
Add
n-bit
n n
A B
C
n
Sum
Decoder
E(log n – 1) E0
Q0
Qn-1
A
Q
n
n
S0
Slog
m
UAL
n bits, m Ops
B
n
Mux
m x 1
E(m-1) E0
Q
n
n
S0
Slog
m
S = 1 if AB
E = 1 if A=B
I = 1 if AB
Sum = A+B
(first n bits)
C = (n+1)’th bit
of A+B
(C:Carry)
Q = A op B
op determined by
S
Q0 = 1 if E=0..00
Q1 = 1 if E=0..01
…
Qn-1=1 if E=1..11
Q =
E0 if S=0..00
E1 if S=0..00
…
Em-1 if S=1..11
 May have status
outputs carry, zero, etc.
with input Cin :
Somme = A + B + Cin
 with enable input en :
en=0  Output = 0..00
Sequential logic: basic functions
Counter
(n-bit)
n
Q
clk
en
Init
Shift register
(n-bit) Q
clk
E
Init
Register
(n-bit) Q
clk
load
Init
n
Q
n
E
Shift
D-FF
Q
clk
D
Init
Q
Q
Q+ =
0 if Init=1,
Q+1 if en=1  clk
Q+ =
0 if Init=1,
LSB if Shift=1  clk
- content shifted
- E stored in MSB
Q+ =
0 if Init=1,
D if clk
Q otherwise
Q+ =
0 if Init=1,
E if load=1  clk
Q otherwise
Custom single-purpose processor basic model
controller datapath
state signals
external
control
inputs
external
data
inputs
control signals
combinational
logic
(control logic
and next state)
controller
registers
datapath
controller + datapath
external
data
outputs
external
control
outputs
and next state)
state
register
functional
units
a view inside the controller and datapath
Example: Greatest Common Divisor
0: int x, y;
1: while (1) {
(b) desired functionality
GCD
clk
go_i x_i y_i
d_o
(a) black-box
 First, write the algorithm
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x  y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
GCD(42, 8) – loop of 9 iterations
evolution of (x,y) : ?
Example: Greatest Common Divisor
 Convert algorithm to
“complex” state machine
(b) state diagram
(FSMD)
 Known as FSMD:
finite-state machine
with datapath
1:
3:
4:
2:
2-J:
x = x_i
y = y_i
!go_i
1 !(!go_i)
!1
!(x!=y)
Can use templates to
perform such
conversion
5:
y = y -x
d_o = x
x = x - y
6:
7:
6-J:
5-J:
9:
1-J:
8:
x!=y
xy !(xy)
!(x!=y)
0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x  y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
State diagram templates
Branch statement
if (c1)
c1 stmts.
else if c2
c2 stmts.
else
other stmts
next statement
Loop statement
while (cond) {
loop-body-
statements
}
next statement
Assignment statement
a = b
next statement
J:
c2 stmts
next
statement
C:
!c1*c2
c1 !c1*!c2
others
c1 stmts
J:
l-b-stmts
next
statement
C:
cond
!cond
a = b
next
statement
Datapath
Creating the datapath
 Create a register for any
declared variable
 Create a functional unit for
each arithmetic operation
 Connect the ports,
registers and functional
units
1:
3:
4:
2:
2-J:
x = x_i
!go_i
1 !(!go_i)
!1
y_ld
x_ld
y_sel
x_sel
x_i y_i
Mux
2x 1
n
n
Mux
2x 1
0: x 0: y
units
 Based on reads and
writes
 Use multiplexors for
multiple sources
 Create unique identifier
 for each datapath
component control input
and output
5:
4: y = y_i
y = y -x
d_o = x
x = x - y
6:
7:
6-J:
5-J:
9:
1-J:
8:
x!=y
xy !(xy)
!(x!=y)
soustractor
–
comparator 
comparator
!=
soustractor
–
8: x-y 7: y-x
6: xy
5: x!=y
x_inf_y
x_neq_y
d_ld
y_ld
d_o
0: x 0: y
9: d
Creating the controller’s FSM
 Same structure as FSMD
 Replace complex
actions/conditions with
datapath configurations
x_i y_i
Unité opérative
n
n
x_i y_i
Unité opérative
n
n
1:
3:
4:
2:
2-J:
x = x_i
y = y_i
!go_i
1 !(!go_i)
!1
FSMD
1:
3:
4:
2:
2-J:
x_sel=0
x_ld=1
!go_i
1 !(!go_i)
!1
y_sel=0
0000
0001
0010
0011
0100
go_i
Controller
FSM
Mux
2x 1
d_ld
Mux
2x 1
0: x 0: y
soustracteur
–
comparateur

comparateur
!=
soustracteur
–
9: d
8: x-y 7: y-x
6: xy
5: x!=y
x_inf_y
x_neq_y
y_ld
x_ld
y_sel
x_sel
d_o
Mux
2x 1
d_ld
Mux
2x 1
0: x 0: y
soustracteur
–
comparateur

comparateur
!=
soustracteur
–
9: d
8: x-y 7: y-x
6: xy
5: x!=y
x_inf_y
x_neq_y
y_ld
x_ld
y_sel
x_sel
d_o
5:
y = y_i
y = y -x
d_o = x
x = x - y
6:
7:
6-J:
5-J:
9:
1-J:
8:
x!=y
xy !(xy)
!(x!=y)
x_inf_y
d_ld
x_neq_y
y_ld
x_ld
x_sel
y_sel
5:
4:
d_ld = 1
6:
7:
6-J:
5-J:
9:
1-J:
8:
!x_neq_y
y_ld=1
x_sel=1
x_ld=1
y_sel=1
y_ld=1
x_neq_y
!x_inf_y
x_inf_y
0100
0101
0110
0111
1000
1001
1010
1011
1100
Splitting into a controller and datapath
Implementation model of the
controller
Combinational
logic
x_sel
y_sel
x_ld
x_ld
x_neq_y
go_i
Mux
2x 1
Mux
2x 1
0: x 0: y
soustracteur
–
comparateur

comparateur
!=
soustracteur
–
y_ld
x_ld
y_sel
x_sel
x_i y_i
Unité opérative
n
n
Mux
2x 1
Mux
2x 1
0: x 0: y
soustracteur
–
comparateur

comparateur
!=
soustracteur
–
y_ld
x_ld
y_sel
x_sel
x_i y_i
Unité opérative
n
n
1:
3:
4:
2:
2-J:
x_sel=0
x_ld=1
!go_i
1 !(!go_i)
!1
y_sel=0
y_ld=1
0000
0001
0010
0011
0100
go_i
Unité de contrôle
1:
3:
4:
2:
2-J:
x_sel=0
x_ld=1
!go_i
1 !(!go_i)
!1
y_sel=0
y_ld=1
0000
0001
0010
0011
0100
go_i
Unité de contrôle
State register
x_inf_y
d_ld
Q1
Q2
Q3
Q4
E1
E2
E3
E4
d_ld
–

!= –
9: d
8: x-y 7: y-x
6: xy
5: x!=y
x_inf_y
x_neq_y
d_o
d_ld
–

!= –
9: d
8: x-y 7: y-x
6: xy
5: x!=y
x_inf_y
x_neq_y
d_o
5:
4:
d_ld = 1
6:
7:
6-J:
5-J:
9:
1-J:
8:
!x_neq_y
y_ld=1
x_sel=1
x_ld=1
y_sel=1
y_ld=1
x_neq_y
!x_inf_y
x_inf_y
0100
0101
0110
0111
1000
1001
1010
1011
1100
5:
4:
d_ld = 1
6:
7:
6-J:
5-J:
9:
1-J:
8:
!x_neq_y
y_ld=1
x_sel=1
x_ld=1
y_sel=1
y_ld=1
x_neq_y
!x_inf_y
x_inf_y
0100
0101
0110
0111
1000
1001
1010
1011
1100
Why splitting ?
Controller state table for the GCD example
Q3 Q2 Q1 Q0 x_neq_y x_inf_y go_i
Inputs
Q3
+
(E3)
Q2
+
(E2)
Q1
+
(E1)
Q0
+
(E0)
x_sel y_sel x_ld y_ld d_ld
Outputs
0
0
0
0
0
0
0
1
0
1
1
0
0
0
1
0
-
-
-
-
-
-
-
-
-
-
-
-
0 0 0 1 - - 0
0 1 0 1 0 - -
0 0 0 1 - - 1
0 1 0 1 1 - -
0
0
0
0
0
0
1
1
0
0
0
0
1
1
0
1
x
x
0
x
x
x
x
0
0
0
1
0
0
0
0
1
0
0 0 1 0 x x 0 0 0
0
0
0
1 0 1 1 x x 0 0 0
0 0 1 1 x x 0 0 0
0 1 1 0 x x 0 0 0
0
1
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
1
0
0
1
1
0
0
1
1
1
0
1
0
1
0
1
0
1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0 1 1 0 - 1 -
0 1 0 1 1 - -
0 1 1 0 - 0 -
1
1
1
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
0
1
0
0
0
0
0
x
1
x
x
x
x
x
x
x
1
x
x
x
x
x
x
x
x
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0 1 1 1 x x 0 0 0
0
0
0
0
1
0
0
0
0
0 1 1 0 x x 0 0 0
1 0 0 0 x x 0 0 0
Completing the GCD custom single-purpose processor
design
 We finished the datapath
 We have a state table for the
next state and control logic
Combinational
logic
(control and
new-state)
Controller
registers
Datapath
new-state)
State register
Functional
units
This is not an optimized design,
but we see the basic steps
combinational logic design
FSM design
Schematic
CAD tools : StateCAD
HDL
Optimizing single-purpose processors
 Optimization is the task of making design metric values the best
possible
 Optimization opportunities
original program
 original program
 FSMD
 datapath
 FSM
Optimizing the original program
 Analyze program attributes and look for areas of possible
improvement
 number of computations
 size of variable
 time and space complexity
 operations used
 multiplication and division very expensive
Optimizing the original program (cont’)
0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x  y)
7: y = y - x;
else
original program
0: int x, y, r;
1: while (1) {
2: while (!go_i);
// x doit être le plus grand
3: if (x_i = y_i) {
4: x=x_i;
5: y=y_i;
}
6: else {
7: x=y_i;
optimized program
replace the subtraction
operation(s) with modulo
operation in order to speed up
program
else
8: x = x - y;
}
9: d_o = x;
}
7: x=y_i;
8: y=x_i;
}
9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
program
GCD(42, 8) - 9 iterations to complete the
loop (x,y): (42, 8), (43, 8), (26,8), (18,8),
(10, 8), (2,8), (2,6), (2,4), (2,2).
GCD(42,8) - 3 iterations to complete
the loop (x,y): (42, 8), (8,2), (2,0)
Optimizing the FSMD
 Areas of possible improvements
 merge states
 states with constants on transitions can be eliminated, transition taken is
already known
 states with independent operations can be merged
 separate states
 states which require complex operations (a*b*c*d) can be broken into
smaller states to reduce hardware size
 scheduling
Optimizing the FSMD (cont.)
3:
2:
2-J:
x = x_i
4:
!go_i
!(!go_i)
!(x!=y)
1:
1
!1 original FSMD
eliminate state 1 – transitions have constant
values
merge state 2 and state 2J – no loop operation
in between them
merge state 3 and state 4 – assignment
operations are independent of one another
int x, y;
5:
3:
2:
x = x_i
y = y_i
go_i
xy xy
optimized FSMD
!go_i
y = y_i
5:
y = y -x
d_o = x
x = x - y
7:
6-J:
5-J:
9:
8:
6:
x!=y
xy !(xy)
!(x!=y)
1-J:
operations are independent of one another
merge state 5 and state 6 – transitions from
state 6 can be done in state 5
eliminate state 5J and 6J – transitions from
each state can be done from state 7 and state
8, respectively
eliminate state 1-J – transition from state 1-J
can be done directly from state 9
y = y -x
d_o = x
x = x - y
7:
9:
8:
xy xy
Optimizing the datapath
 Sharing of functional units
 one-to-one mapping, as done previously, is not necessary
 if same operation occurs in different states, they can share a single
functional unit
 Multi-functional units
 ALUs support a variety of operations, it can be shared among operations
occurring in different states
Optimizing the FSM
 State encoding
 task of assigning a unique bit pattern to each state in an FSM
 size of state register and combinational logic vary
 State minimization
 task of merging equivalent states into a single state
 state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state
Introduction to GPP
 General-Purpose Processor
 Processor designed for a variety of computation tasks
 Low unit cost, in part because manufacturer spreads NRE over large
numbers of units
 Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
 Carefully designed since higher NRE is acceptable
 Can yield good performance, size and power
 Low NRE cost, short time-to-market/prototype, high flexibility
 User just writes software; no processor design
 a.k.a. “microprocessor” – “micro” used when they were implemented on one
or a few chips rather than entire rooms
Basic Architecture
 Control unit and
datapath
 Note similarity to single-
purpose processor
Processor
Control unit Datapath
Control
Controller
ALU
Registers
Status
 Key differences ?
Memory
I/O
IR
PC
Registers
Basic Architecture
 Control unit and
datapath
 Note similarity to single-
purpose processor
Processor
Control unit Datapath
Control
Controller
ALU
Registers
Status
 Key differences
 Datapath is general
 Control unit doesn’t
store the algorithm – the
algorithm is
“programmed” into the
memory
Memory
I/O
IR
PC
Registers
Datapath Operations
 Load
 Read memory location into
register
 ALU operation
Processor
Datapath
ALU
Registers
Control
Control unit
Controller
Status
+1
11
ALU operation
 Input certain registers
through ALU, store back in
register
 Store
 Write register to memory
location
Registers
Memory
I/O
IR
PC
10
…
…
10
10 11
Control Unit
 Control unit: configures the datapath
operations
 Sequence of desired operations
(“instructions”) stored in memory –
“program”
 Instruction cycle – broken into
several sub-operations, each one
clock cycle, e.g.:
 Fetch instruction : Get next
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
 Fetch instruction : Get next
instruction into IR
 Decode : Determine what the
instruction means
 Fetch operands : Move data from
memory to datapath register
 Execute : Move data through the ALU
 Store : Write data from register to
memory
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
Control Unit Sub-Operations
 Fetch Instruction
 Get next instruction into
IR
 PC: program counter,
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
 PC: program counter,
always points to next
instruction
 IR: holds the fetched
instruction
Registers
IR
PC
Mmeory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
100
load R0, M[500]
Adresse
Control Unit Sub-Operations
 Decode
 Determine what the
instruction means
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
Registers
IR
PC
Memory
I/O
10
…
…
500
501
load R0, M[500]
Inc R1, R0
store M[501], R1
R0 R1
100 load R0, M[500]
100
101
102
Control Unit Sub-Operations
 Fetch operands
 Move data from memory
to datapath register
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
100 load R0, M[500]
10
Control Unit Sub-Operations
 Execute
 Move data through the
ALU
 This particular
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
 This particular
instruction (load R0, M[500])
does nothing during this
sub-operation
Registers
IR
PC
Mémoire
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
100 load R0, M[500]
10
Control Unit Sub-Operations
 Store
 Write data from register
to memory
 This particular
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
 This particular
instruction (load R0, M[500])
does nothing during this
sub-operation
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
100 load R0, M[500]
10
Instruction Cycles
PC=100
Fetch
operands
Exec. Store
results
clk
Fetch
inst.
Decode
Processor
Control unit Datapath
ALU
Registers
Control
Status
Controller
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
100
load R0, M[500]
10
Instruction Cycles
Processor
Control unit Datapath
ALU
Registers
Control
Status
PC=101
PC=100
Fetch
operands
Exec. Store
results
clk
Decode
Fetch
inst.
Controller
+1
11
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
101
Inc R1, R0
10
PC=101
Fetch
operands
Exec. Store
results
clk
Decode
Fetch
inst.
10
Instruction Cycles
PC=100
Fetch
operands
Exec. Store
results
clk
Fetch
inst.
Decode
Processor
Control unit Datapath
ALU
Registers
Control
Status
PC=101
Controller
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
102
store M[501], R1
10
PC=102
Fetch
operands
Exec. Store
results
clk
Decode
11
PC=101
Fetch
operands
Exec. Store
results
clk
Decode
Fetch
inst.
Fetch
inst.
11
Instruction Cycles
PC=100
Fetch
operands
Exec. Store
results
clk
Fetch
inst.
Decode
Processor
Control unit Datapath
ALU
Registers
Control
Status
PC=101
Controller
Registers
IR
PC
Memory
I/O
10
…
…
500
501
100
101
load R0, M[500]
Inc R1, R0
store M[501], R1
102
R0 R1
102
store M[501], R1
10
PC=102
Fetch
operands
Exec. Store
results
clk
Decode
11
PC=101
Fetch
operands
Exec. Store
results
clk
Decode
Fetch
inst.
Fetch
inst.
11
What’s the problem of this processor ?
Architectural Considerations
 Performance can be improved by:
Faster clock (but there’s a limit)
Pipelining: slice up instruction into stages, overlap stages
Pipelining: slice up instruction into stages, overlap stages
Multiple ALUs to support more than one instruction stream
 Superscalar and V LIW architectures
Clock Frequency
 Inverse of clock period
 Must be longer than
longest register to register
Processor
Control unit Datapath
ALU
Registers
Controller
Control
Status
longest register to register
delay in entire processor
 Memory access is often
the longest
Registers
IR
PC
Memory
I/O
1 2 3 4 5 6 7 8
Wash
Dry
Pipelined
pipelined dish cleaning
1 2 3
1 2
4
3
5
4
6
5
7
6
8
7 8
Non-pipelined
non-pipelined dish cleaning
1 2 3 4 5 6 7 8
time time
Two available
ressources
Pipelining: Increasing Instruction Throughput
Fetch-inst.
Decode
Fetch ops.
Execute
Store res.
time
Pipelined
pipelined instruction execution
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1st instruction
Superscalar Architecture
 Superscalar
 Scalar operation: executing on one or two numbers
 Fetch instructions in packets
 Static scheduling (at compilation time) or dynamic (at execution time)
 In case of dynamic scheduling: need of a complex hardware block to detect the independent
instructions
multiple
Cache/
memory
Fetch Decode multiple
instructions
Registers
memory
Fetch
FU FU FU
Several functional
units (UF)
Decode
Ordre
Sequential
instruction flow
VLIW Architecture
 VLIW (Very Long Instruction Word) : long instruction (128-1024 bits) composed of
several independent operations (rather than one)
 Equivalent to a superscalar architecture with a static scheduling
 More and more widespread
one instruction multi-operations
Cache/
memory
Fetch
one instruction multi-operations
Registres
FU FU FU
Several functional
units (UF)
Superscalar vs. VLIW
Superscalar VLIW
HW detects potential parallelism,
register renaming
parallelism detection on compile time
very complex HW, windows execution
is limited
simpler hardware, whole program is
analyzed
is limited analyzed
- large registers, large code size (wasted
bits in instruction word)
i.e. PowerPC, Pentium, AMD K5 i.e. TMS320C6x (multimedia),
IA64 (Servers  workstations)
Two Memory Architectures
Princeton
(Von Neumann)
 Fewer memory wires
 Simple Implementation
Processor
Program
Processor
Memory
 Simple Implementation
Harvard
 Simultaneous program
and data memory access
Program
memory
Data memory
Harvard
Memory
(program and data)
Princeton
(Von Neumann)
Von Neumann model is the most used generally
Harvard Princeton
More nb. of control signals Less nb. of control signals
computation speed is higher No parallelism
Cache Memory
 Memory access may be slow
 Cache is small but fast memory close
to processor
Processor
Fast/expensive technology, usually on the
same chip
 Holds copy of part of memory
 Hits and misses
 Hit : if the mem. @ is in the cache
 Miss : if not. The cache is updated
Memory
Cache
Slower/cheaper technology, usually on a
different chip
Programmer’s View
 Programmer doesn’t need detailed understanding of architecture
 Instead, needs to know what instructions can be executed
 Two levels of instructions:
 Assembly level
 Structured languages (C, C++, Java, etc.)
 Most development today done using structured languages
 But, some assembly level programming may still be necessary
 Drivers: portion of program that communicates with and/or controls (drives)
another device
 Often have detailed timing considerations, extensive bit manipulation
 Assembly level may be best for these
Assembly-Level Instructions
code.op opérande1 opérande2
code.op opérande1 opérande2
code.op opérande1 opérande2
code.op opérande1 opérande2
...
Instruction 1
Instruction 2
Instruction 3
Instruction 4
 Instruction Set
 Defines the legal set of instructions for that processor
 Data transfer: memory/register, register/register, I/O, etc.
 Arithmetic/logical: move register through ALU and back
 Branches: determine next PC value when not just PC+1
Addressing Modes
Operand field
Register-direct
Immediate data
Register address
Addressing
mode
Register-file
contents
Memory
contents
data
Register
indirect
Direct
Indirect
Register address
Memory address
Memory address
data
Memory address data
data
Memory address
MOV Rn, direct
assembler
Instruction
0000 Rn
First byte
direct
Second byte
Rn = M(direct)
Operation
MOV direct, Rn 0001 Rn direct M(direct) = Rn
MOV @Rn, Rm 0010 Rn Rm M(Rn) = Rm
A Simple Instruction Set
ADD Rn, Rm 0100 Rm
Rn Rn = Rn + Rm
MOV Rn, #immed. 0011 Rn immédiat Rn = immédiat
JZ Rn, relatif 0110 Rn relatif PC = PC + relatif
(ssi Rn = 0)
SUB Rn, Rm 0101 Rm Rn = Rn - Rm
Rn
code.op operand
Sample Programs
int total = 0;
for (int i=10; i!=0; i--)
C program Equivalent assembly program
MOV R0, #0; // total = 0
0
MOV R1, #10; // i = 10
1
MOV R2, #1; // constant 1
2
MOV R3, #0; // constant 0
3
total += i;
// next instructions...
JZ R1, Next; // Saut si i=0
Loop:
Next: // next instructions...
MOV R3, #0; // constant 0
3
ADD R0, R1; // total += i
5
SUB R1, R2; // i--
6
JZ R3, Loop; // Saut
7
Programmer Considerations
 Program and data memory space
 Embedded processors often very limited
 e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
 N-bit processor
 N-bit ALU, registers, buses, memory data interface
 Embedded: 8-bit, 16-bit, 32-bit common
 Desktop/servers: 32-bit, 64-bit
 Registers: How many are there?
 Only a direct concern for assembly-level programmers
 I/O
 How communicate with external signals?
 Interrupts
Application-Specific Instruction-Set Processors (ASIPs)
 General-purpose processors
 Sometimes too general to be effective in demanding application
 e.g., video processing – requires huge video buffers and operations on large
arrays of data, inefficient on a GPP
 But single-purpose processor has high NRE, not programmable
 ASIPs – targeted to a particular domain
 Contain architectural features specific to that domain
 e.g., embedded control, digital signal processing, video processing, network
processing, telecommunications, etc.
 Still programmable
A Common ASIP: Digital Signal Processors (DSP)
 For signal processing applications
 Large amounts of digitized data, often streaming
 Data transformations must be applied fast
 e.g., cell-phone voice filter, digital TV, music synthesizer
 DSP features
 Several instruction execution units
 Multiple-accumulate single-cycle instruction, other instrs.
 Efficient vector operations – e.g., add two arrays
 Vector ALUs, loop buffers, etc.
Example: TMS320C67x core
Another Common ASIP: Microcontroller
 For embedded control applications
 Reading sensors, setting actuators
 Mostly dealing with events (bits): data is present, but not in huge amounts
 e.g., VCR, disk drive, digital camera (assuming SPP for image compression),
washing machine, microwave oven
 Microcontroller features
 On-chip peripherals
 Timers, analog-digital converters, serial communication, etc.
 Tightly integrated for programmer, typically part of register space
 On-chip program and data memory
 Direct programmer access to many of the chip’s pins
 Specialized instructions for bit-manipulation and other low-level operations
Example: Sharp LH77790B
Trend: Even More Customized ASIPs
 In the past, microprocessors were acquired as chips
 Today, we increasingly acquire a processor as Intellectual Property (IP)
 e.g., synthesizable VHDL model
 Opportunity to add a custom datapath hardware and a few custom instructions, or
delete a few instructions
delete a few instructions
 Can have significant performance, power and size impacts
 Problem: need compiler/debugger for customized ASIP
 Remember, most development uses structured languages
 One solution: automatic compiler/debugger generation
 e.g., www.tensillica.com
 Another solution: retargettable compilers
 e.g., www.improvsys.com (customized VLIW architectures)
Selecting a Microprocessor
 Issues
 Technical: speed, power, size, cost
 Other: development environment, prior expertise, licensing, etc.
 Speed: how evaluate a processor’s speed?
 Clock speed – but instructions per cycle may differ
 Instructions per second – but work per instr. may differ
 Instructions per second – but work per instr. may differ
 Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.
 MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a.
Dhrystone MIPS. Commonly used today.
 So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
 SPEC: set of more realistic benchmarks, but oriented to desktops
 EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org
 Suites of benchmarks: automotive, consumer electronics, networking, office automation,
telecommunications
Presentation of the elementary processor
 8-bits general purpose processor
 Based on an accumulator register called ACCU (8 bits)
 Four instruction types
Mnemonic Instruction coding Description
NOR 00AAAAAA ACCU = ACCU NOR Mem[AAAAAA]
ADD 01AAAAAA ACCU = ACCU + Mem[AAAAAA], update
Carry
 Each instruction is coded with 8 bits. Two for the operation type (code.op) and 6
bits to code the operand or the address of the operand in the memory (depending
on the operation type)
ADD 01AAAAAA Carry
STA 10AAAAAA Mem[AAAAAA] = ACCU
JCC 11DDDDDD
If Carry = 0 ⇒
⇒
⇒
⇒ PC = DDDDDD Else clear
Carry (Carry=0)
[Source du jeu d’instructions : http://www.tuhh.de/~setb0209/cpu/ par T. Böscke]
000000 : 00001000 (0x08) NOR 0b001000 ; ACCU = ACCU NOR M[001000]
000001 : 01000111 (0x47) ADD 0b000111 ; ACCU = ACCU + M[000111] (Carry)
000010 : 10000110 (0x86) STA 0b000110 ; M[000110] = ACCU
000011 : 11000100 (0xC4) JCC 0b000100 ; If Carry = 0 then PC = 000100 Else clear Carry
000100 : 11000100 (0xC4) JCC 0b000100 ; PC = 000100 (Carry is already cleared!)
000101 : 00000000 (0x00)
Adr Mem binary (hexa) Instruction Comments
content in assembler
Example of a test program
000110 : 00000000 (0x00)
000111 : 01111110 (0x7E)
001000 : 11111111 (0xFF)
001001 : 00000000 (0x00)
001001 : 00000000 (0x00)
… …
Data…
… …
… …
111111 : 00000000 (0x00)
Processor design (1/3)
 Considering the basic template
architecture
 Considering the instruction set, the
number of registers, and the
Processeur
Unité de contrôle Unité opérative
Commande
Contrôleur
Contrôleur
UAL
Registres
UAL
Registres
État
number of registers, and the
eventual architectural
specifications/constraints
 And using the previously presented
design methodology
Mémoire
E/S
IR
PC IR
PC
Registres
Registres
Processor design (2/3)
Processeur
Unité de contrôle Unité opérative
Commande
Contrôleur
Contrôleur
UAL
Registres
UAL
Registres
État
Algorithm – FSMD ?
Clear PC  IR  Carry  Registers;
while (1) {
Fetch Inst (get one instruction);
Decode the instruction;
if ( CodeOp=00 or CodeOp=01 )
{
Fetch Operand (get the operand);
if CodeOp=00
Execute NOR (ACCU = ACCU NOR M[AAAAAA]);
else
Mémoire
E/S
IR
PC IR
PC
else
Execute ADD (ACCU = ACCU + M[AAAAAA]
Update Carry);
}
else if CodeOp=10
{
Execute STA (Mem[AAAAAA] = ACCU);
}
else
{
Execute JCC (if Carry=0
PC=DDDDDD
else
Carry=0);
}
}
Processor design (3/3)
selALU
ALU
NOR ou ADD
incPC
[7:0]
ldC
[7:0]
ldPC
Contrôle (FSM)
[7:6]
[5:0]
ldACCU
CodeOp
Rst
C
clrC
C
Datapath
Control unit
SelALU=‘0’ for NOR
SelALU=‘1’ for ADD
ACCU
selADR
incPC
PC
IR
clrPC
[7:0]
Mux
[5:0] [5:0]
[5:0]
Memory
ldIR
[7:0]
[7:0]
DataIn
DataOut
Adr
ldACCU
1
0
R1
ldR1
enM
weM
It is time to
It is time to
exercise!

More Related Content

Similar to Embedded system Design introduction _ Karakola

Microcontoller and Embedded System
Microcontoller and Embedded SystemMicrocontoller and Embedded System
Microcontoller and Embedded SystemKaran Thakkar
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -evechiportal
 
computer architecture
computer architecture computer architecture
computer architecture Dr.Umadevi V
 
Electronics Microcontrollers for IoT applications
Electronics Microcontrollers for IoT applicationsElectronics Microcontrollers for IoT applications
Electronics Microcontrollers for IoT applicationsLeopoldo Armesto
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.pptRafiyaKouser2
 
Microprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptxMicroprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptxSahithBeats
 
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptxCOA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptxSahithBeats
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
thu-blake-gdc-2014-final
thu-blake-gdc-2014-finalthu-blake-gdc-2014-final
thu-blake-gdc-2014-finalRobert Taylor
 
Msp 430 architecture module 1
Msp 430 architecture module 1Msp 430 architecture module 1
Msp 430 architecture module 1SARALA T
 
FinalPresent
FinalPresentFinalPresent
FinalPresentAn Nguyen
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB WorkshopAntonio Mondragon
 

Similar to Embedded system Design introduction _ Karakola (20)

A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGSA STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
 
Asic
AsicAsic
Asic
 
Module_01.ppt
Module_01.pptModule_01.ppt
Module_01.ppt
 
Esd module2
Esd module2Esd module2
Esd module2
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Microcontoller and Embedded System
Microcontoller and Embedded SystemMicrocontoller and Embedded System
Microcontoller and Embedded System
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
eel6935_ch2.pdf
eel6935_ch2.pdfeel6935_ch2.pdf
eel6935_ch2.pdf
 
computer architecture
computer architecture computer architecture
computer architecture
 
Electronics Microcontrollers for IoT applications
Electronics Microcontrollers for IoT applicationsElectronics Microcontrollers for IoT applications
Electronics Microcontrollers for IoT applications
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
 
Microprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptxMicroprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptx
 
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptxCOA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
thu-blake-gdc-2014-final
thu-blake-gdc-2014-finalthu-blake-gdc-2014-final
thu-blake-gdc-2014-final
 
Msp 430 architecture module 1
Msp 430 architecture module 1Msp 430 architecture module 1
Msp 430 architecture module 1
 
FinalPresent
FinalPresentFinalPresent
FinalPresent
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop
 

Recently uploaded

Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaDubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaUnited Arab Emirates
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...Call Girls in Nagpur High Profile
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Pooja Nehwal
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...Call Girls in Nagpur High Profile
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Serviceankitnayak356677
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service ThanePooja Nehwal
 
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | DelhiFULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhisoniya singh
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一ga6c6bdl
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样qaffana
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurSuhani Kapoor
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...Pooja Nehwal
 
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Pooja Nehwal
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsPooja Nehwal
 

Recently uploaded (20)

Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaDubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
 
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | DelhiFULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
 
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
 
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
 
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Bhavna Call 7001035870 Meet With Nagpur Escorts
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call Girls
 

Embedded system Design introduction _ Karakola

  • 1. Telecommunications Telecommunications Telecommunications Telecommunications Engineering Master Engineering Master Engineering Master Engineering Master Embedded Systems : Introduction Higher Institute for Applied Sciences and Technology Dr. Daoud KARAKOULA
  • 2. Module Objectives Obtain experience in hardware/software design of embedded systems Learn how to move from algorithm to architecture Learn about interfacing protocols in embedded systems Introducing high-level programming languages to describe ES Introducing modern design issues : SOC, NOC, co-design, …
  • 3. Syllabus Embedded Processors and Memory Embedded Systems IO Interfacing bus, Protocols, Timers, AD and DA, … Embedded Communications Parallel/Serial Communication, Parallel/Serial Communication, Wireless Communication, Network Communication, … Processors and FPGA Design of Embedded Processors (HDL)
  • 4. Course material: resources references Embedded System Design: A Unified Hardware/Software Introduction, by F. Vahid (UCR) and T. Givargis (UCI) Embedded System Design, Peter Marwedel The Art of Designing Embedded Systems, Jack Ganssle The course of Dr. Amer Baghdadi of Embedded Systems
  • 5. What are Embedded Systems ? The embedding of microprocessors into equipment and portable devices started before the appearance of the home computer It consumes the majority of microprocessors that are made today microprocessors that are made today ☺ Huge application domaine ☺ Prototyping boards
  • 6. What are Embedded Systems ? Definition Definition An embedded system is nearly any computing system other than general purpose computer : desktop, laptop, or mainframe computer An embedded system is a microprocessor-based system that is built to control a function or range of functions and is not designed to be programmed by the end user
  • 7. What are Embedded Systems ? Hardware and Software Hardware and Software Modern design requires a designer to have a unified view of software and hardware Integrated circuit (IC) capacities Quality compiler availability Synthesis technology hardware software Application software OS Sw. Comm (drivers, interruptions) Resources management Hardware communication network CPUs (DSP, MCU), IPs, Memories
  • 8. Examples of Embedded Systems Examples Examples Front panel of a microwave oven simple control MP3 player 32-bit µP GPS Receiver player 16-bit µP Palm VX: 32-bit µP motorola Dragonball EZ
  • 9. Examples of Embedded Systems Examples Examples Camera canon EOS-3 3 µPs, 32-bit RISC CPU runs auto-focus Nokia 6620-g : 32-bit RISC CPU ARM-9
  • 10. Examples of Embedded Systems Examples Examples iPhone 3G ARM11 processor - 64-bit data-path - 64-bit data-path - 8-stage pipeline - Can vary in clock speed up to 700MHz or more - ARM Intelligent Energy Manager (reduce power consumption 25-50%) - Features vector floating point coprocessor - ARM Jazelle enabled for embedded Java execution
  • 11. Characteristics of Embedded Systems Single functioned Real-time operation Physical size and weight Low manufacturing cost Low manufacturing cost Not using general purpose processor which we find in desktop computer Need to work with restricted memory Low power - Power consumption is critical in battery-powered devices
  • 12. Design Challenges • How much hardware do we need ? what is word size of the CPU ? size of memory ? • How to minimize power ? reduce memory accesses • How to speed up our design ? Size Performance Power NRE cost • How to speed up our design ? introduce parallelism, pipeline technique • How to reduce the NRE (Non-recurring Engineering) cost ? The one-time cost of designing the system • Expertise with both software and hardware is needed to optimize design metrics • Improving one metric may worsen others NRE cost
  • 14. Processors total = 0 for i = 1 to N loop total += M[i] end loop Desired functionality General-purpose processor Single-purpose processor Application-specific processor
  • 15. Introduction Processor Digital circuit that performs a computation tasks Controller and datapath General-purpose: variety of computation tasks Single-purpose: one particular CCD preprocessor Pixel coprocessor A2D D2A Digital Camera chip CCD Single-purpose: one particular computation task Custom single-purpose: non-standard task A custom single-purpose processor may be Fast, small, low power But, high NRE, longer time-to-market, less flexible µProcessor JPEG codec DMA controller Memory controller ISA bus interface UART LCD ctrl Display ctrl Multiplier/Accum lens
  • 16. Combinational logic: basic logic gates Buffer x F F x y x y F F y x AND OR XOR x y 0 1 1 0 1 1 0 0 F 1 0 1 0 x y 0 1 1 0 1 1 0 0 F 1 1 1 0 x y 0 1 1 0 1 1 0 0 F 0 1 0 0 x 1 0 F 1 0 x F x y F x y F x y F Inverseur NAND NOR XNOR x y 0 1 1 0 1 1 0 0 F 0 0 0 1 x y 0 1 1 0 1 1 0 0 F 0 1 0 1 x y 0 1 1 0 1 1 0 0 F 1 0 1 1 x 1 0 F 0 1
  • 17. Combinational logic: basic functions Comparator n-bit n n A B I E S Add n-bit n n A B C n Sum Decoder E(log n – 1) E0 Q0 Qn-1 A Q n n S0 Slog m UAL n bits, m Ops B n Mux m x 1 E(m-1) E0 Q n n S0 Slog m S = 1 if AB E = 1 if A=B I = 1 if AB Sum = A+B (first n bits) C = (n+1)’th bit of A+B (C:Carry) Q = A op B op determined by S Q0 = 1 if E=0..00 Q1 = 1 if E=0..01 … Qn-1=1 if E=1..11 Q = E0 if S=0..00 E1 if S=0..00 … Em-1 if S=1..11 May have status outputs carry, zero, etc. with input Cin : Somme = A + B + Cin with enable input en : en=0 Output = 0..00
  • 18. Sequential logic: basic functions Counter (n-bit) n Q clk en Init Shift register (n-bit) Q clk E Init Register (n-bit) Q clk load Init n Q n E Shift D-FF Q clk D Init Q Q Q+ = 0 if Init=1, Q+1 if en=1 clk Q+ = 0 if Init=1, LSB if Shift=1 clk - content shifted - E stored in MSB Q+ = 0 if Init=1, D if clk Q otherwise Q+ = 0 if Init=1, E if load=1 clk Q otherwise
  • 19. Custom single-purpose processor basic model controller datapath state signals external control inputs external data inputs control signals combinational logic (control logic and next state) controller registers datapath controller + datapath external data outputs external control outputs and next state) state register functional units a view inside the controller and datapath
  • 20. Example: Greatest Common Divisor 0: int x, y; 1: while (1) { (b) desired functionality GCD clk go_i x_i y_i d_o (a) black-box First, write the algorithm 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; } GCD(42, 8) – loop of 9 iterations evolution of (x,y) : ?
  • 21. Example: Greatest Common Divisor Convert algorithm to “complex” state machine (b) state diagram (FSMD) Known as FSMD: finite-state machine with datapath 1: 3: 4: 2: 2-J: x = x_i y = y_i !go_i 1 !(!go_i) !1 !(x!=y) Can use templates to perform such conversion 5: y = y -x d_o = x x = x - y 6: 7: 6-J: 5-J: 9: 1-J: 8: x!=y xy !(xy) !(x!=y) 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; }
  • 22. State diagram templates Branch statement if (c1) c1 stmts. else if c2 c2 stmts. else other stmts next statement Loop statement while (cond) { loop-body- statements } next statement Assignment statement a = b next statement J: c2 stmts next statement C: !c1*c2 c1 !c1*!c2 others c1 stmts J: l-b-stmts next statement C: cond !cond a = b next statement
  • 23. Datapath Creating the datapath Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units 1: 3: 4: 2: 2-J: x = x_i !go_i 1 !(!go_i) !1 y_ld x_ld y_sel x_sel x_i y_i Mux 2x 1 n n Mux 2x 1 0: x 0: y units Based on reads and writes Use multiplexors for multiple sources Create unique identifier for each datapath component control input and output 5: 4: y = y_i y = y -x d_o = x x = x - y 6: 7: 6-J: 5-J: 9: 1-J: 8: x!=y xy !(xy) !(x!=y) soustractor – comparator comparator != soustractor – 8: x-y 7: y-x 6: xy 5: x!=y x_inf_y x_neq_y d_ld y_ld d_o 0: x 0: y 9: d
  • 24. Creating the controller’s FSM Same structure as FSMD Replace complex actions/conditions with datapath configurations x_i y_i Unité opérative n n x_i y_i Unité opérative n n 1: 3: 4: 2: 2-J: x = x_i y = y_i !go_i 1 !(!go_i) !1 FSMD 1: 3: 4: 2: 2-J: x_sel=0 x_ld=1 !go_i 1 !(!go_i) !1 y_sel=0 0000 0001 0010 0011 0100 go_i Controller FSM Mux 2x 1 d_ld Mux 2x 1 0: x 0: y soustracteur – comparateur comparateur != soustracteur – 9: d 8: x-y 7: y-x 6: xy 5: x!=y x_inf_y x_neq_y y_ld x_ld y_sel x_sel d_o Mux 2x 1 d_ld Mux 2x 1 0: x 0: y soustracteur – comparateur comparateur != soustracteur – 9: d 8: x-y 7: y-x 6: xy 5: x!=y x_inf_y x_neq_y y_ld x_ld y_sel x_sel d_o 5: y = y_i y = y -x d_o = x x = x - y 6: 7: 6-J: 5-J: 9: 1-J: 8: x!=y xy !(xy) !(x!=y) x_inf_y d_ld x_neq_y y_ld x_ld x_sel y_sel 5: 4: d_ld = 1 6: 7: 6-J: 5-J: 9: 1-J: 8: !x_neq_y y_ld=1 x_sel=1 x_ld=1 y_sel=1 y_ld=1 x_neq_y !x_inf_y x_inf_y 0100 0101 0110 0111 1000 1001 1010 1011 1100
  • 25. Splitting into a controller and datapath Implementation model of the controller Combinational logic x_sel y_sel x_ld x_ld x_neq_y go_i Mux 2x 1 Mux 2x 1 0: x 0: y soustracteur – comparateur comparateur != soustracteur – y_ld x_ld y_sel x_sel x_i y_i Unité opérative n n Mux 2x 1 Mux 2x 1 0: x 0: y soustracteur – comparateur comparateur != soustracteur – y_ld x_ld y_sel x_sel x_i y_i Unité opérative n n 1: 3: 4: 2: 2-J: x_sel=0 x_ld=1 !go_i 1 !(!go_i) !1 y_sel=0 y_ld=1 0000 0001 0010 0011 0100 go_i Unité de contrôle 1: 3: 4: 2: 2-J: x_sel=0 x_ld=1 !go_i 1 !(!go_i) !1 y_sel=0 y_ld=1 0000 0001 0010 0011 0100 go_i Unité de contrôle State register x_inf_y d_ld Q1 Q2 Q3 Q4 E1 E2 E3 E4 d_ld – != – 9: d 8: x-y 7: y-x 6: xy 5: x!=y x_inf_y x_neq_y d_o d_ld – != – 9: d 8: x-y 7: y-x 6: xy 5: x!=y x_inf_y x_neq_y d_o 5: 4: d_ld = 1 6: 7: 6-J: 5-J: 9: 1-J: 8: !x_neq_y y_ld=1 x_sel=1 x_ld=1 y_sel=1 y_ld=1 x_neq_y !x_inf_y x_inf_y 0100 0101 0110 0111 1000 1001 1010 1011 1100 5: 4: d_ld = 1 6: 7: 6-J: 5-J: 9: 1-J: 8: !x_neq_y y_ld=1 x_sel=1 x_ld=1 y_sel=1 y_ld=1 x_neq_y !x_inf_y x_inf_y 0100 0101 0110 0111 1000 1001 1010 1011 1100 Why splitting ?
  • 26. Controller state table for the GCD example Q3 Q2 Q1 Q0 x_neq_y x_inf_y go_i Inputs Q3 + (E3) Q2 + (E2) Q1 + (E1) Q0 + (E0) x_sel y_sel x_ld y_ld d_ld Outputs 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 - - - - - - - - - - - - 0 0 0 1 - - 0 0 1 0 1 0 - - 0 0 0 1 - - 1 0 1 0 1 1 - - 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 x x 0 x x x x 0 0 0 1 0 0 0 0 1 0 0 0 1 0 x x 0 0 0 0 0 0 1 0 1 1 x x 0 0 0 0 0 1 1 x x 0 0 0 0 1 1 0 x x 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 1 1 0 - 1 - 0 1 0 1 1 - - 0 1 1 0 - 0 - 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 x 1 x x x x x x x 1 x x x x x x x x 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 x x 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 x x 0 0 0 1 0 0 0 x x 0 0 0
  • 27. Completing the GCD custom single-purpose processor design We finished the datapath We have a state table for the next state and control logic Combinational logic (control and new-state) Controller registers Datapath new-state) State register Functional units This is not an optimized design, but we see the basic steps combinational logic design
  • 29. Optimizing single-purpose processors Optimization is the task of making design metric values the best possible Optimization opportunities original program original program FSMD datapath FSM
  • 30. Optimizing the original program Analyze program attributes and look for areas of possible improvement number of computations size of variable time and space complexity operations used multiplication and division very expensive
  • 31. Optimizing the original program (cont’) 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x y) 7: y = y - x; else original program 0: int x, y, r; 1: while (1) { 2: while (!go_i); // x doit être le plus grand 3: if (x_i = y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; optimized program replace the subtraction operation(s) with modulo operation in order to speed up program else 8: x = x - y; } 9: d_o = x; } 7: x=y_i; 8: y=x_i; } 9: while (y != 0) { 10: r = x % y; 11: x = y; 12: y = r; } 13: d_o = x; } program GCD(42, 8) - 9 iterations to complete the loop (x,y): (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). GCD(42,8) - 3 iterations to complete the loop (x,y): (42, 8), (8,2), (2,0)
  • 32. Optimizing the FSMD Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling
  • 33. Optimizing the FSMD (cont.) 3: 2: 2-J: x = x_i 4: !go_i !(!go_i) !(x!=y) 1: 1 !1 original FSMD eliminate state 1 – transitions have constant values merge state 2 and state 2J – no loop operation in between them merge state 3 and state 4 – assignment operations are independent of one another int x, y; 5: 3: 2: x = x_i y = y_i go_i xy xy optimized FSMD !go_i y = y_i 5: y = y -x d_o = x x = x - y 7: 6-J: 5-J: 9: 8: 6: x!=y xy !(xy) !(x!=y) 1-J: operations are independent of one another merge state 5 and state 6 – transitions from state 6 can be done in state 5 eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively eliminate state 1-J – transition from state 1-J can be done directly from state 9 y = y -x d_o = x x = x - y 7: 9: 8: xy xy
  • 34. Optimizing the datapath Sharing of functional units one-to-one mapping, as done previously, is not necessary if same operation occurs in different states, they can share a single functional unit Multi-functional units ALUs support a variety of operations, it can be shared among operations occurring in different states
  • 35. Optimizing the FSM State encoding task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic vary State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state
  • 36. Introduction to GPP General-Purpose Processor Processor designed for a variety of computation tasks Low unit cost, in part because manufacturer spreads NRE over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone Carefully designed since higher NRE is acceptable Can yield good performance, size and power Low NRE cost, short time-to-market/prototype, high flexibility User just writes software; no processor design a.k.a. “microprocessor” – “micro” used when they were implemented on one or a few chips rather than entire rooms
  • 37. Basic Architecture Control unit and datapath Note similarity to single- purpose processor Processor Control unit Datapath Control Controller ALU Registers Status Key differences ? Memory I/O IR PC Registers
  • 38. Basic Architecture Control unit and datapath Note similarity to single- purpose processor Processor Control unit Datapath Control Controller ALU Registers Status Key differences Datapath is general Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Memory I/O IR PC Registers
  • 39. Datapath Operations Load Read memory location into register ALU operation Processor Datapath ALU Registers Control Control unit Controller Status +1 11 ALU operation Input certain registers through ALU, store back in register Store Write register to memory location Registers Memory I/O IR PC 10 … … 10 10 11
  • 40. Control Unit Control unit: configures the datapath operations Sequence of desired operations (“instructions”) stored in memory – “program” Instruction cycle – broken into several sub-operations, each one clock cycle, e.g.: Fetch instruction : Get next Processor Control unit Datapath ALU Registers Controller Control Status Fetch instruction : Get next instruction into IR Decode : Determine what the instruction means Fetch operands : Move data from memory to datapath register Execute : Move data through the ALU Store : Write data from register to memory Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1
  • 41. Control Unit Sub-Operations Fetch Instruction Get next instruction into IR PC: program counter, Processor Control unit Datapath ALU Registers Controller Control Status PC: program counter, always points to next instruction IR: holds the fetched instruction Registers IR PC Mmeory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 100 load R0, M[500] Adresse
  • 42. Control Unit Sub-Operations Decode Determine what the instruction means Processor Control unit Datapath ALU Registers Controller Control Status Registers IR PC Memory I/O 10 … … 500 501 load R0, M[500] Inc R1, R0 store M[501], R1 R0 R1 100 load R0, M[500] 100 101 102
  • 43. Control Unit Sub-Operations Fetch operands Move data from memory to datapath register Processor Control unit Datapath ALU Registers Controller Control Status Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 100 load R0, M[500] 10
  • 44. Control Unit Sub-Operations Execute Move data through the ALU This particular Processor Control unit Datapath ALU Registers Controller Control Status This particular instruction (load R0, M[500]) does nothing during this sub-operation Registers IR PC Mémoire I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 100 load R0, M[500] 10
  • 45. Control Unit Sub-Operations Store Write data from register to memory This particular Processor Control unit Datapath ALU Registers Controller Control Status This particular instruction (load R0, M[500]) does nothing during this sub-operation Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 100 load R0, M[500] 10
  • 46. Instruction Cycles PC=100 Fetch operands Exec. Store results clk Fetch inst. Decode Processor Control unit Datapath ALU Registers Control Status Controller Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 100 load R0, M[500] 10
  • 47. Instruction Cycles Processor Control unit Datapath ALU Registers Control Status PC=101 PC=100 Fetch operands Exec. Store results clk Decode Fetch inst. Controller +1 11 Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 101 Inc R1, R0 10 PC=101 Fetch operands Exec. Store results clk Decode Fetch inst. 10
  • 48. Instruction Cycles PC=100 Fetch operands Exec. Store results clk Fetch inst. Decode Processor Control unit Datapath ALU Registers Control Status PC=101 Controller Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 102 store M[501], R1 10 PC=102 Fetch operands Exec. Store results clk Decode 11 PC=101 Fetch operands Exec. Store results clk Decode Fetch inst. Fetch inst. 11
  • 49. Instruction Cycles PC=100 Fetch operands Exec. Store results clk Fetch inst. Decode Processor Control unit Datapath ALU Registers Control Status PC=101 Controller Registers IR PC Memory I/O 10 … … 500 501 100 101 load R0, M[500] Inc R1, R0 store M[501], R1 102 R0 R1 102 store M[501], R1 10 PC=102 Fetch operands Exec. Store results clk Decode 11 PC=101 Fetch operands Exec. Store results clk Decode Fetch inst. Fetch inst. 11 What’s the problem of this processor ?
  • 50. Architectural Considerations Performance can be improved by: Faster clock (but there’s a limit) Pipelining: slice up instruction into stages, overlap stages Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar and V LIW architectures
  • 51. Clock Frequency Inverse of clock period Must be longer than longest register to register Processor Control unit Datapath ALU Registers Controller Control Status longest register to register delay in entire processor Memory access is often the longest Registers IR PC Memory I/O
  • 52. 1 2 3 4 5 6 7 8 Wash Dry Pipelined pipelined dish cleaning 1 2 3 1 2 4 3 5 4 6 5 7 6 8 7 8 Non-pipelined non-pipelined dish cleaning 1 2 3 4 5 6 7 8 time time Two available ressources Pipelining: Increasing Instruction Throughput Fetch-inst. Decode Fetch ops. Execute Store res. time Pipelined pipelined instruction execution 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1st instruction
  • 53. Superscalar Architecture Superscalar Scalar operation: executing on one or two numbers Fetch instructions in packets Static scheduling (at compilation time) or dynamic (at execution time) In case of dynamic scheduling: need of a complex hardware block to detect the independent instructions multiple Cache/ memory Fetch Decode multiple instructions Registers memory Fetch FU FU FU Several functional units (UF) Decode Ordre Sequential instruction flow
  • 54. VLIW Architecture VLIW (Very Long Instruction Word) : long instruction (128-1024 bits) composed of several independent operations (rather than one) Equivalent to a superscalar architecture with a static scheduling More and more widespread one instruction multi-operations Cache/ memory Fetch one instruction multi-operations Registres FU FU FU Several functional units (UF)
  • 55. Superscalar vs. VLIW Superscalar VLIW HW detects potential parallelism, register renaming parallelism detection on compile time very complex HW, windows execution is limited simpler hardware, whole program is analyzed is limited analyzed - large registers, large code size (wasted bits in instruction word) i.e. PowerPC, Pentium, AMD K5 i.e. TMS320C6x (multimedia), IA64 (Servers workstations)
  • 56. Two Memory Architectures Princeton (Von Neumann) Fewer memory wires Simple Implementation Processor Program Processor Memory Simple Implementation Harvard Simultaneous program and data memory access Program memory Data memory Harvard Memory (program and data) Princeton (Von Neumann) Von Neumann model is the most used generally Harvard Princeton More nb. of control signals Less nb. of control signals computation speed is higher No parallelism
  • 57. Cache Memory Memory access may be slow Cache is small but fast memory close to processor Processor Fast/expensive technology, usually on the same chip Holds copy of part of memory Hits and misses Hit : if the mem. @ is in the cache Miss : if not. The cache is updated Memory Cache Slower/cheaper technology, usually on a different chip
  • 58. Programmer’s View Programmer doesn’t need detailed understanding of architecture Instead, needs to know what instructions can be executed Two levels of instructions: Assembly level Structured languages (C, C++, Java, etc.) Most development today done using structured languages But, some assembly level programming may still be necessary Drivers: portion of program that communicates with and/or controls (drives) another device Often have detailed timing considerations, extensive bit manipulation Assembly level may be best for these
  • 59. Assembly-Level Instructions code.op opérande1 opérande2 code.op opérande1 opérande2 code.op opérande1 opérande2 code.op opérande1 opérande2 ... Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction Set Defines the legal set of instructions for that processor Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1
  • 60. Addressing Modes Operand field Register-direct Immediate data Register address Addressing mode Register-file contents Memory contents data Register indirect Direct Indirect Register address Memory address Memory address data Memory address data data Memory address
  • 61. MOV Rn, direct assembler Instruction 0000 Rn First byte direct Second byte Rn = M(direct) Operation MOV direct, Rn 0001 Rn direct M(direct) = Rn MOV @Rn, Rm 0010 Rn Rm M(Rn) = Rm A Simple Instruction Set ADD Rn, Rm 0100 Rm Rn Rn = Rn + Rm MOV Rn, #immed. 0011 Rn immédiat Rn = immédiat JZ Rn, relatif 0110 Rn relatif PC = PC + relatif (ssi Rn = 0) SUB Rn, Rm 0101 Rm Rn = Rn - Rm Rn code.op operand
  • 62. Sample Programs int total = 0; for (int i=10; i!=0; i--) C program Equivalent assembly program MOV R0, #0; // total = 0 0 MOV R1, #10; // i = 10 1 MOV R2, #1; // constant 1 2 MOV R3, #0; // constant 0 3 total += i; // next instructions... JZ R1, Next; // Saut si i=0 Loop: Next: // next instructions... MOV R3, #0; // constant 0 3 ADD R0, R1; // total += i 5 SUB R1, R2; // i-- 6 JZ R3, Loop; // Saut 7
  • 63. Programmer Considerations Program and data memory space Embedded processors often very limited e.g., 64 Kbytes program, 256 bytes of RAM (expandable) N-bit processor N-bit ALU, registers, buses, memory data interface Embedded: 8-bit, 16-bit, 32-bit common Desktop/servers: 32-bit, 64-bit Registers: How many are there? Only a direct concern for assembly-level programmers I/O How communicate with external signals? Interrupts
  • 64. Application-Specific Instruction-Set Processors (ASIPs) General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processor has high NRE, not programmable ASIPs – targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable
  • 65. A Common ASIP: Digital Signal Processors (DSP) For signal processing applications Large amounts of digitized data, often streaming Data transformations must be applied fast e.g., cell-phone voice filter, digital TV, music synthesizer DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations – e.g., add two arrays Vector ALUs, loop buffers, etc.
  • 67. Another Common ASIP: Microcontroller For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip peripherals Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chip’s pins Specialized instructions for bit-manipulation and other low-level operations
  • 69. Trend: Even More Customized ASIPs In the past, microprocessors were acquired as chips Today, we increasingly acquire a processor as Intellectual Property (IP) e.g., synthesizable VHDL model Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions delete a few instructions Can have significant performance, power and size impacts Problem: need compiler/debugger for customized ASIP Remember, most development uses structured languages One solution: automatic compiler/debugger generation e.g., www.tensillica.com Another solution: retargettable compilers e.g., www.improvsys.com (customized VLIW architectures)
  • 70. Selecting a Microprocessor Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processor’s speed? Clock speed – but instructions per cycle may differ Instructions per second – but work per instr. may differ Instructions per second – but work per instr. may differ Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications
  • 71. Presentation of the elementary processor 8-bits general purpose processor Based on an accumulator register called ACCU (8 bits) Four instruction types Mnemonic Instruction coding Description NOR 00AAAAAA ACCU = ACCU NOR Mem[AAAAAA] ADD 01AAAAAA ACCU = ACCU + Mem[AAAAAA], update Carry Each instruction is coded with 8 bits. Two for the operation type (code.op) and 6 bits to code the operand or the address of the operand in the memory (depending on the operation type) ADD 01AAAAAA Carry STA 10AAAAAA Mem[AAAAAA] = ACCU JCC 11DDDDDD If Carry = 0 ⇒ ⇒ ⇒ ⇒ PC = DDDDDD Else clear Carry (Carry=0) [Source du jeu d’instructions : http://www.tuhh.de/~setb0209/cpu/ par T. Böscke]
  • 72. 000000 : 00001000 (0x08) NOR 0b001000 ; ACCU = ACCU NOR M[001000] 000001 : 01000111 (0x47) ADD 0b000111 ; ACCU = ACCU + M[000111] (Carry) 000010 : 10000110 (0x86) STA 0b000110 ; M[000110] = ACCU 000011 : 11000100 (0xC4) JCC 0b000100 ; If Carry = 0 then PC = 000100 Else clear Carry 000100 : 11000100 (0xC4) JCC 0b000100 ; PC = 000100 (Carry is already cleared!) 000101 : 00000000 (0x00) Adr Mem binary (hexa) Instruction Comments content in assembler Example of a test program 000110 : 00000000 (0x00) 000111 : 01111110 (0x7E) 001000 : 11111111 (0xFF) 001001 : 00000000 (0x00) 001001 : 00000000 (0x00) … … Data… … … … … 111111 : 00000000 (0x00)
  • 73. Processor design (1/3) Considering the basic template architecture Considering the instruction set, the number of registers, and the Processeur Unité de contrôle Unité opérative Commande Contrôleur Contrôleur UAL Registres UAL Registres État number of registers, and the eventual architectural specifications/constraints And using the previously presented design methodology Mémoire E/S IR PC IR PC Registres Registres
  • 74. Processor design (2/3) Processeur Unité de contrôle Unité opérative Commande Contrôleur Contrôleur UAL Registres UAL Registres État Algorithm – FSMD ? Clear PC IR Carry Registers; while (1) { Fetch Inst (get one instruction); Decode the instruction; if ( CodeOp=00 or CodeOp=01 ) { Fetch Operand (get the operand); if CodeOp=00 Execute NOR (ACCU = ACCU NOR M[AAAAAA]); else Mémoire E/S IR PC IR PC else Execute ADD (ACCU = ACCU + M[AAAAAA] Update Carry); } else if CodeOp=10 { Execute STA (Mem[AAAAAA] = ACCU); } else { Execute JCC (if Carry=0 PC=DDDDDD else Carry=0); } }
  • 75. Processor design (3/3) selALU ALU NOR ou ADD incPC [7:0] ldC [7:0] ldPC Contrôle (FSM) [7:6] [5:0] ldACCU CodeOp Rst C clrC C Datapath Control unit SelALU=‘0’ for NOR SelALU=‘1’ for ADD ACCU selADR incPC PC IR clrPC [7:0] Mux [5:0] [5:0] [5:0] Memory ldIR [7:0] [7:0] DataIn DataOut Adr ldACCU 1 0 R1 ldR1 enM weM
  • 76. It is time to It is time to exercise!