Digital design with Systemc

Specification Languages:
Part 2
Marc Engels
e-mail: marc.engels@flandersmake.be

2
Specification Languages
 Part 1: Specification Models
 Part 2: Model based system design
 Show how the models of part 1 can be used for
architectural design
 Provide hands-on experience with SystemC v2.3.2
(released in October 2017).
 Introduce OO techniques for design of hardware systems
 Part 3: Project

3
Course Material for part 2
 Prerequisite:
 part 1 of specification languages
 C++ (good tutorial at www.cplusplus.com)
 Coding and debugging programs
 RTL description of synchronous digital circuits
 Material for part 2:
 Slides with notes.
 IEEE Standard SystemC Language Reference Manual, IEEE
Std 1666-2011.

Model Based System
Design
Class 1: constructing a
functional model
Marc Engels
e-mail:
marc.engels@flandersmake.be

5
Functional modeling in
SystemC
 Introduction to design of digital embedded systems
 SystemC introduction
 SystemC functional model syntax
 Exercise 1: building a functional model in SystemC

6
Consumer devices become
increasingly more intelligent

7
… as well as professional
equipment

8
Characteristics of embedded
systems
 Optimize for power, cost, and size
 Robust design
 Provide the ability for evolution and mass customization
 Minimize time to market
 Some functionality might be safety-critical
 Interfacing with the real world, leading to real time constraints

9
Sensors Actuators
Real world process
Processing
Embedded systems combine
various types of real-time behavior
ADC DAC
event
signal signal
action
user
Signal
conditioning
Actuator
Powering

10
Digital embedded systems
combine hard- and software
User
interface
NVM
ROM
µPorDSPcore
RAM
Conf. Logic
Memories
Peripheral
Mo-
dem
buffers
Video/
Graphics
processor Protocol
Speech
Processing
Analysis of
channel
+ analog, sensors and actuators

11
Design flow for digital embedded
systems
System
Functionality
Functional
Requirements
Performance
Requirements
Architecture
Template
Architectural
Requirements
Mapping
Dedicated
Architecture
C-code
Non-functional
Requirements

12
Function to architecture
conversion follows three axes
ComputationsComputations
operations
DataData
variables, arrays
floating point
memories
fixed point
operators
CommunicationCommunication
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture

13
SystemC

14
SystemC bridges gap between
function and architecture
MATLAB
C/C++
VHDL
Verilog
SystemC
System
Functionality
Dedicated
Architecture

15
What is SystemC?
 A modeling framework in C++ for the refinement of system from a functional
description into an architecture
 Contributions:
 hardware modeling with C++: OCAPI (IMEC) and SCENIC (Synopsys/UC
Irvine)
 fixed-point data types: Frontier Design
 hardware-software co-design: CoWare (IMEC/CoWare)
 Language first standardized in December 2005 as IEEE 1666, revised in 2011 as
IEEE 1666-2011
 Extensions of SystemC:
 Verification library.
 Transaction level modeling library ( integrated in IEEE 1666-2011).
 Analog and mixed-signal modeling.
 More info: www.accellera.org

16
Which tools are available for
SystemC?
 Open source simulation library available
 Open source translators from Verilog or VHDL to SystemC
 Commercial synthesis tools:
 Cadence (Stratus HLS).
 Mentor(Catapult C).
 NEC(CyberWorkBench).
 SystemCrafter (SC).
 Xilinx (Vivado Design Suite).

17
SystemC language
architecture
C++ language
Core Language
Modules
Ports
Exports
Processes
Interfaces
Channels
Events
Event-driven simulation kernel
Data-types
4-valued logic type
4-valued logic vectors
Bit-vectors
Finite-Precision integers
Limited-Precision integers
Fixed-Point types
Pre-defined Channels
Signal, Clock, fifo,
Mutex, Semaphore.
Libraries for Specific Models of Computation and/or methodologies, e.g. TLM
interfaces, bus models, SystemC verification library
Utilities
Report Handling,
Tracing
User Application

18
SystemC core language
sc_modulesc_module
sc_portsc_port
sc_prim_channelsc_prim_channel
sc_processsc_process
sc_interfacesc_interface
sc_eventsc_event
sc_exportsc_export

19
SystemC

20
processprocess processprocess
FIFOFIFO
Kahn Process Networks in
SystemC
 (Modules to structure design)
 Functional processes
 First-In-First-Out queues
 Simulation engine

21
Modules are used for structural
partitioning the functionality
 Each module has its own class, derived from the sc_module
class.
 Every constructor of a module class shall have exactly one
parameter of class sc_module_name.
 It is good practice to make this name for an instance of the
module the same as the C++ variable name through which
the module is referenced.
 A module can be hierarchical or contains processes. In the latter case,
the SC_HAS_PROCESS(“class name”) macro is used to indicate
that the module contains processes.

22
Example of a functional model of
an adder
SC_MODULE(adder) {
//define ports
//define processes, internal data, etc.
SC_CTOR(adder) {
// body of constructor;
// process declaration, sensitivities, etc.
};
};
Class adder : public sc_module {
public:
// define ports
//define processes, , internal data, etc.
SC_HAS_PROCESS(adder);
adder(sc_module_name name):
sc_module(name) {
};
};
Explicit:Explicit: With MACROs:With MACROs:

23
Ports are used to communicate
with a FIFO channel
 General port definition: sc_port<interface>
 Predefined ports are: sc_fifo_in<T> and sc_fifo_out<T>.
 sc_fifo_in<T> is derived from sc_port<sc_fifo_in_if<T>,0> with interface
functions read(), nb_read(), and num_available().
 sc_fifo_out<T> is derived from sc_port<sc_fifo_out_if<T>,0> with interface
functions write(), nb_write(), and num_free().
 blocking read and write interface functions (automatic synchronization with
implicit wait() operations)
int a = f1.read(); // read a token
f1.write(a); // write a token
 Inspecting queues
int a = f1.num_available(); // number of tokens in a queue
int a = f1.num_free(); // number of free places in a queue

24
an adder (continued)
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
//define processes, internal data, etc.
SC_CTOR(adder) {
};
};

25
SC_THREAD processes are used
to model functional processes
 SC_THREAD processes run forever once started.
 SC_THREAD processes can be suspended by means of the
wait(event) function. In functional modeling the wait
statements are hidden in the read() and write() functions to the
queues.
 Multiple processes per module are possible
 Processes can also be dynamically created.

26
SC_MODULE(adder) {
sc_fifo_in<int> a,b;
sc_fifo_out<int> c;
void compute() {
while(true) {
int valuea = a.read();
int valueb = b.read();
c.write(valuea+valueb);
}
}
SC_CTOR(adder) {
SC_THREAD(compute);
}
};

27
Define the main program
 The systemc library must be included in the main program:
 #include <systemc.h>
 In sc_main() the following actions are taken:
 Instantiate channels with:
• sc_fifo<T> (”name”, length); // default length 16
• e.g. sc_fifo<int> f1(”f1”,2);
 Instantiate the modules.
 Bind ports of modules to channels:
• Positional
• named.
 Call sc_start() to start simulation and run until end of any
activity.

28
int sc_main(int argc , char *argv[]) {
sc_fifo<int> fifo_a, fifo_b, fifo_c; //channel instantiation
… // instantiate signal generation and evaluation module
adder my_adder(“my_adder”); // module instantiation
my_adder.a(fifo_a); // binding of port to channel
my_adder.b(fifo_b);
my_adder.c(fifo_c);
… // other modules and test bench, which drive fifo_a and fifo_b.
sc_start(); // start simulation
};
Elaborationphase

29
SC_MODULE(superfunc) {
// IO ports
sc_fifo_in<float> in;
sc_fifo_out<float> out;
//internal queues
sc_fifo<float> d;
// internal modules
function func1;
function *func2;
// Module constructor
SC_CTOR (superfunc):
func1(“func1”) {
func1.in(in);
func1.out(d);
func2 = new function (“func2”);
func2->in(d);
func2->out(out);
}
}
Modules can also be used to
create hierarchy
func1func1
superfunc
d
func2func2
sc_module(function)

30
Simulation engine
 In an un-timed model, the simulator only advances in delta-
cycles:
 If it is started to run for a finite amount of time, it will never
stop.
 We therefore run it until no events are present: sc_start();
 Ways of stopping the simulator:
 Terminate a process (return from SC_THREAD): the
simulator will stop due to the lack of events.
 Call sc_stop() when a termination condition is fulfilled.

31
SystemC

32
Goal of this exercise
 use a simplifiedJPEG block diagram to practice functional
modeling
 develop a functional process that fits into a system
 simulate a functional model
 observe the overall behavior of a system

33
What is JPEG?
 “JPEG” stands for
“Joint Photographic Experts Group”
 “JPEG” is a standard for color image compression
 “JPEG” is widely used (e.g. on the WWW)
 More information?
 http://www.jpeg.org/

34
(Partial) JPEG: a simple block
diagram
DCT
Quantize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
ENCODER
IDCT
Normalize
(+table)
ZIGZAG
SCAN
RUN-LENGTH
DECODER
Original
Image
Reconstructed
Image
JPEG-ENCODER
JPEG-DECODER
R2B
B2R
Parameters: width, height, #bits
Parameters: width, height, #bits

35
2D Discrete Cosine Transform
 Non-optimized equation
 DCT can be separated in consecutive 1-D operations
 There are many optimized DCT-algorithms available
( ) ( ) ( ) ( ) ( ) ( )
∑∑= =
++
⋅=
7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
i j
vjui
jifvCuCvuF
ππ
( ) ( ) ( ) ( ) ( ) ( )
∑∑= =
++
⋅=
7
0
7
0 16
12
cos.
16
12
cos,
4
1
,
u v
vjui
vuFvCuCjif
ππ
01
0
2
1
)(




≠
=
=
l
l
lCwhere

36
Quantization
 Each DCT coefficient is divided by the coefficient amplitude
that is just detectable by the human eye (table)
 The result is rounded to an integer
 This reduces the number of bits needed to represent the DCT
coefficient
 The quantization is the place where information of the image
might be lost, resulting in lossy compression.

37
Quantization Table
9910310011298959272
10112012110387786449
921131048164553524
771031096856372218
6280875129221714
5669574024161314
5560582619141212
6151402416101116


























=N

38
The coefficients are zigzag
scanned
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63

39
(Simplified) Run-length coding
 Send the DC value “as is”
 Represent the high frequency data with (zero run-length,
amplitude) combinations.
 End the stream with EOB (= 63).
 Example:
 in: 79, 0, -2, -1, 3, -1, 0, 0, -1, 0, 0, 0, …
 out: 79, 1,-2, 0,-1, 0, 3, 0,-1,2,-1, 63

40
How to start?
 Download exercise files form http://www.icorsi.ch/
 Follow installation instructions of exercises.
 you will find:
 In /exercises/exercise1/: main.cpp to start from
 In/exercises/modules/: library with JPEG encoder modules
{r2b,dct,quantize,zz_enc,rl_enc}.{h,cpp}, JPEG decoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.{h,cpp}
 In /exercises/images/: test images
 In /exercises/add2systemc additional functions (df_fork, fifo_stat)
 Things to be done:
 make rl_dec.h and rl_dec.cpp
 complete the main.cpp with the modules.
 Compile and execute the application.
 Inspect the number of reads and writes in the fifos
 Visualize resulting image
 Test if you can launch the application in the debugger.
 Optional: make a hierarchy for the encoder and decoder.

41
Using SystemC on
Linux/Cygwin
 Use g++ (I used version 4.5.3).
 Make a workspace in Eclipse:
 Add your source files to the project.
 Add libmodules.a
 Add libadd2systemc.a (for next exercises).
 Add libsystemc.a
 Put the right include paths and linker paths
 Build your application from within Eclipse.
 Execute your application from within Eclipse.
 Exercise1.exe –i ../images/mountain.pgm –o result.pgm

Model Based System
Design
Class 2: Fixed-point
refinement
Marc Engels

43
Fixed point refinement
 Fixed word length optimization
 Overflow and quantization
 MSB determination
 LSB determination
 Fixed word length support in SystemC
 Exercise 2: fixed point refinement of IDCT

44
Fixed point refinement is one of the
steps in architectural design
operations
DataData
variables, arrays
floating point
memories
fixed point
operators
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture

45
**
3 bytes (mantissa)3 bytes (mantissa)
+ 1 byte (exponent)+ 1 byte (exponent)
Fixed-point
•minimum area
•low power
•high speed
88
**66
1414
Finite word lengths are a must
for DSP applications
Floating-point
•powerful
•expensive (storage & ops)

46
22
33
22 22 22 22 22i.2i.2
22 11 00 -1-1 -2-2 -3-3
WLWL
IWLIWL
MSBMSB LSBLSB
How to model a fixed-point
signal?
Total number of bits WL
Integer bits IWL
Value representation
•2’s complement (i=-1)
•unsigned (i=1)
WL-IWLWL-IWL

47
How do we quantize?
truncatetruncate
(floor)(floor)
fxpfxp
flpflp
roundround
fxpfxp
flpflp
magnitudemagnitude
truncatetruncate
fxpfxp
flpflp
ceilceil
fxpfxp
flpflp

48
What happens on an overflow?
wrap-around saturation
flp flp
fxp fxp
max. value

49
Saturation Hardware
MAX_VAL
MIN_VAL
comp
comp
mux
mux
VALUE RESULT

50
Floating-pointFloating-point
algorithmalgorithmADCADC
88 77
**
**
++
??
??
??
??
????
During design we must specify
fixed-point formats for signals
z-1
DACDAC

51
Fixed-point refinement is a
complex optimization problem
Minimize overall cost:
minimal word lengths
truncate and wrap-around
MSB determination:
goal: avoid unwanted overflows
method: find min, max signal values
result: MSB position, value
representation, overflow
LSB determination:
goal: keep required precision
method: evaluate difference
between flp a fxp behavior
result: LSB position, quantization
safe rangesafe range
quantizationquantization

52
MSB determination can be
based on range calculations
* +
d
m
x
y
Put range (min, max) on inputs
Propagate range over the operators
This gives a save (pessimistic) estimate
rangerange
infoinfo
[0,255]
12
rangerange
calc.calc.[0,255]
[0,3060] [0,3315]
z-1

53
Range propagation is a simple
calculation
Operator minc maxc
c=a+b mina+minb maxa+maxb
c=a-b mina-maxb maxa-minb
c=a*b MIN(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)
MAX(mina*minb,
mina*maxb,
maxa*minb,
maxa*maxb)

54
Range calculations can get
unstable with feedback
*
+
a
X(n) Y(n)
z-1
F(n)
sample n
maxF
minF
value

55
* +
d
m
x
12 y
stimuli
?min, max
q1
Collecting signal statistics from
simulations is an alternative
Perform simulation with realistic stimuli.
Collect minimum and maximum value on each signal during the
simulation
This gives an optimistic, stimuli dependent estimate
z-1

56
signal statistic range propagation
name min max MSB1 min max MSB2
signal1 -1.5 1.6 2 -1.9 1.9 2
signal2 -1.3 1.4 2 -2.1 2.1 3
signal3 -1.2 1.2 2 -22.0 22.0 6
signal4 -1.2 1.2 2 -∞ ∞ ∞
Combine both methods for
accurate MSB determination
If MSB1 == MSB2: wrap-around(MSB1)
If MSB1 < MSB2: wrap-around(MSB2)
If MSB1 << MSB2: saturation (MSB1)
MSB2 is ∞ saturation (MSB1)

57
QQ ++
B bits
input output outputinput
noise
Quantization effects can be
modeled as additive noise
Noise is approximated by a statistical model with the following
assumptions:
the noise is uncorrelated to the input.
the noise is white.
the probability distribution is uniform.

58
Each quantization effect has
mean and variance
 Rounding with step ∆:
 Truncation with step ∆:
 Magnitude truncation with step ∆:
12
and0
2
2 ∆
== nnm σ
12
and
2
2
2 ∆
=
∆
−= nnm σ
3
and0
2
2 ∆
== nnm σ

59
This results in an equivalent
linear network
Q1Q1 +
* +
d
m
x
12 y
z-1
QQ
22
* +
d
m
x
12 y
z-1
e1(t)
+
e2(t)
))1()()(12())1()(12()( 121 −+++−+= tetetetxtxty

60
… but quantization is a non-
linear operation
*
+
-0.96
X(n) Y(n)
z-1
QQ
X(0) = 14, x(n) = 0 for n > 0
round to nearest integer
B bits
...
...
with rounding:
without rounding:

61
LSB determination is based on
simulations
All fixed-point
simulate
output
ok
yes
no
* +
stimuli
12
x
ym
QQ
* +
12
x
ym com
pare
QQ
z-1
z-1

62
Signal to quantization noise
ratio (SQNR)








+
+
= 22
22
10log10
ee
ss
x
m
m
SQNR
σ
σ
Q
-
e
me,σe
ms,σs
xQ

63
LSB selection optimizes cost and
performance
quantization
set
SQNR
pi
SQNR
accu
SQNR
pix
SQNR
coeffs
SQNR
block
SQNR
temp block
SQNR
blocki cost SNR PSNR
0 208 253 Inf 184 Inf 225 Inf 787968 27,64 31,49
1 45,5 59,76 Inf 174 Inf Inf Inf 759296 27,48 31,33
2 45,5 59,76 25,15 174 Inf Inf Inf 759296 22,66 26,51
3 45,5 59,76 38,77 174 Inf Inf Inf 759296 26,91 30,75
4 45,5 59,76 47,3 30,88 Inf Inf Inf 230912 27,35 31,19
5 45,5 59,8 47,3 30,88 29,38 Inf Inf 230912 27,34 31,19
6 45,5 61,4 47,3 30,88 29,38 -1,93 Inf 41472 20,47 24,32
7 45,5 59,8 47,3 30,88 29,38 Inf Inf 72192 27,34 31,19
8 45,5 60,23 47,3 30,88 29,38 16,73 Inf 56832 26,96 30,8
9 45,5 59,88 47,3 30,88 29,38 31,86 Inf 67072 27,31 31,16

64

65
SystemC introduces a number
of specific data types
Type Description
sc_logic 4 value {0,1,X,Z} single bit
sc_int 1 to 64 bit signed integer
sc_uint 1 to 64 bit unsigned integer
sc_bigint Arbitrary size signed integer
sc_biguint Arbitrary size unsigned integer
sc_bv Arbitrary sized 2 value vector
sc_lv Arbitrary sized 4 value vector
sc_fixed Signed fixed point
sc_ufixed Unsigned fixed point
sc_fix Untemplated signed fixed point
sc_ufix Untemplated unsigned fixed point

66
SystemC templated fixed-point
types
 Two fixed point templates
 sc_fixed <wl, iwl, q_mode, o_mode, n_bits> x; // signed
 sc_ufixed <wl, iwl, q_mode, o_mode, n_bits> y; // unsigned
 Parameters:
 wl = number of bits
 Iwl = number of integer bits
 q_mode = quantization method (SC_RND / SC_TRN /
SC_TRN_ZERO / ...)
 o_mode = overflow method (SC_SAT / SC_WRAP / … )
 n_bits = number of saturated bits in case of wrapping (default 0)
 If quantization and overflow not specified the defaults (SC_TRN and
SC_WRAP) are used

67
Fixed point lengths
sc_fixed <5, 7> v;
X X X 0 0 [ -64 , 60 ]X X
sc_fixed <5, 3> v;
X X X [ -4 , 3.75 ]X X
sc_fixed <5, -2> v;
X X X X X [ -0.125 , 0.109375 ]S S

68
Quantization methods
sc_ufixed <5, 3, SC_RND> v;
v = 3.1875
0 1 1 0 1
3.1875
011.0011
3.25
0 1 1 0 0 3.0
sc_ufixed <5, 3, SC_TRN> v;
v = 3.1875
[ 0 , 7.75 ] precision = 0.25
quantization
error
0.0625
0.1875
3.1875
011.0011

69
Overflow handling
sc_fixed <5, 5, SC_RND,SC_SAT> v;
v = 18 ;
0 1 1 1 118 15
1 0 0 1 018 -14
sc_fixed <5, 5, SC_RND,SC_WRAP> v;
v = 18;
[ -16 , 15 ]

70
Fixed-point simulation
operations in floating-point
quantization and overflow handling during assignment
sc_fixed <4,3> a;
sc_fixed <4,1> b;
sc_fixed <4,2> c;
a = 1.6;
b = 0.9;
c = a * b;
1.6 1.5
0.9 0.875
1.31251.3125 1.251.25
QQ
QQ
QQ**
0.5
0.125
0.25
lsb precision
a
b
c

71
SystemC fixed point types with
non-static arguments
 Fixed point parameter values
 sc_fxtype_params my_type(wl,iwl,q_mode,o_mode,n_bits);
 x = my_type.wl();
 my_type.iwl()=x-2;
 Two non-static fixed point types
 sc_fix x(my_type); // signed
 sc_ufix y(my_type); // unsigned
 For arrays, these types are used with a context
 sc_fxtype_context my_context(sc_fxtype_params);
 sc_fix z[64];
 Remark: for fixed point simulations, include in every file
 #define SC_INCLUDE_FX
 #include <systemc.h>

72

73
Goal of this exercise
 Perform fixed point refinement for all the internal variables of
the IDCT in the JPEG example
 determine the MSB to avoid internal overflows without overflow
logic.
 determine the LSB to have no more that 0,5dB degradation on
the PSNR of the resulting image

74
How to start?
 You find:
In .../exercises/exercise2/ : the functional model with a fixed point IDCT
implementation; types-file datatypes_original.txt
In/exercises/modules/: library of JPEG-encoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and testbench modules {src,snk,test}.
{h,cpp}
Special fixed point support functions of directory
…/exercises/add2systemc/ are used
In /exercises/images/: test images
 Things to do:
inspect the code to understand the behavior
Make the application
change datatypes.txt file
syntax: exercise2 -i <inputfile> -o <outputfile> -t <typefile>

Model Based System Design
Class 3: Communication
Refinement
Marc Engels

76
Communication refinement
 Communication refinement
 Communication refinement in SystemC
 Exercise 3: communication refinement for
the JPEG decoder

77
Communication refinement is one
of the steps in architectural design
operations
DataData
variables, arrays
floating point
memories
fixed point
operators
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
Dedicated
Architecture

78
Functional models use FIFO
communication
 Queues guarantee consistent data passing
 Implementation could become expensive for large sizes
 communication must be optimized
Process1Process1 Process2Process2
(infinite) storage

79
wire
Many communications can be
reduced to a single register
 Output of functions is registered
 No extra implementation cost
 No storage for data
 Consistency of communication needs to be guaranteed

80
w=4w=4
Example of correct wired
communication
wire
Process 1Process 1 Process 2Process 2
w=0w=0
w<4w<4
filt1
filt2
filt3
filt4
write()
w++
read()
op1
op2 op3
op4

81
1 w=1
2 w=2
3 w=3
4 w=4
5 read() op1
6 op2
7 op3
8 op4
9 read() op1
10 op2
Communication is perfectly
aligned
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
… …
We have to guarantee the condition that every write()
comes before a read()
ClockCycle

82
Small changes to design can
result in errors
 Increase (decrease) the number of operations in process 1 (2):
the same data will be consumed twice.
 Decrease (increase) the number of operations in process 1 (2):
data will be lost
 If the number of initial wait operations in process 2 is too low,
we will use undefined data
 If the number of initial wait operations in process 2 is too high,
we will loose the first data elements)

83
Example of wrong wired
communication
wirefilt1
filt2
filt3
filt4
write()
read()
op1
op2

84
1 read() op1
2 op2
3 read() op1
4 op2
5 read() op1
6 op2
7 read() op1
8 op2
9 read() op1
10 op2
The example results in
undesired behavior
1 filt1
2 filt2
3 filt3
4 filt4 write()
5 filt1
6 filt2
7 filt3
8 filt4 write()
9 filt1
10 filt2
ClockCycles
… …
?
Adapt cycle budget or introduce handshake protocol

85
Simple handshake protocol is
more robust
 The flag “a” (ask) indicates that the receiver is ready to read
data in the next cycle.
 The flag “r” (ready) indicates that data has been written
 Save communication requires at least two cycles.

86
!r
r a
Simple handshake protocol is
more robust
Process 2Process 2
filt1
r=0
filt2 filt3
if (a==1){
filt4
write()
r=1}
Process 1Process 1
!a
a
if (r==1) {
read()
op1
a=0}
op2
a=1
r
a=1
r=0

87
1 a=1
2 a=1
3 a=1
4 a=1
5 a=0 read() op1
6 a=1 op2
7 a=1
8 a=1
9 a=0 read() op1
10 a=1 op2
… and effectively synchronizes
the communication
1 r=0 filt1
2 r=0 filt2
3 r=0 filt3
4 r=1 filt4 write()
5 r=0 filt1
6 r=0 filt2
7 r=0 filt3
8 r=1 filt4 write()
9 r=0 filt1
10 r=0 filt2
ClockCycles
… …

88
r a
… also when receiver is slower
than transmitter
filt1
r=0
If(a==1){
filt2
write()
r=1} !a
!r If (r==1){
read()
op1
a=0 }
op2
r
op3
a=1
a=1
r=0
a

89
1 a=1
2 a=1
3 a=0 read() op1
4 a=0 op2
5 a=1 op3
6 a=1
7 a=0 read() op1
8 a=0 op2
9 a=1 op3
10 a=1
… but introduces then one
extra wait cycle at receiver
1 r=0 filt1
2 r=1 filt2 write()
3 r=0 filt1
4 r=0
5 r=0
6 r=1 filt2 write()
7 r=0 filt1
8 r=0
9 r=0
10 r=0 filt2 write()
Cycles
… …
The extra wait cycle can be avoided by already putting a=1 during op2

90
Most general protocol: 4-phase
handshake protocol
Ack
Ack
Ack
Req
Req
Req
Req
Ack
Req
Ack
Req
Req
Ack
Execute
Ack
Data
Ack
Req=1
Get Data
Req=0
Ack=0
Put Data
Ack=1
Ack=0

91
Multiple variations on these
handshake protocols exist
 In stead of signal levels, the protocol can be based on signal
transitions.
 The protocol can be simplified if both systems run on the same
clock.
 Protocols can be simplified if one knows that the receiver or
the transmitter is fastest.
 Synchronization can be performed on the basis of a block:
 Set-up communication for first element of a block
 Next, communicate every cycle
 Some protocols are based on typical FIFO signals: full and
empty.

92
In some cases buffered
communication is required
process2process2process1process1
Q1Q1
Queue size can be determined by monitoring the maximum
number of elements in a queue during simulation.
1 write(Q1) 1
2 write(Q1) 2
3 write(Q2) 3
4 4 read(Q2)
5 5 read(Q1)
6 6 read(Q1)
Q2Q2

93
r a
Queues must be introduced
explicitly in hardware
FIFO process
size N
fsm
Wired
handshake
protocol
Process1 Process2
r a

94
Several communications can
also be multiplexed on a bus
Process1Process1
Process3Process3
Process2Process2
Process4Process4
busbus
arbiterarbiter
r a
a r
r a
a r
Bus and arbiter classes
can be reused!

95
results in behavioral model
 Model that defines the relative ordering of input and outputs
 A clock signal is used for ordering
 Pins are accurate to the final implementation
 Internal resources are not mapped on clock cycles
(scheduling) and functional units (resource binding)

96
the JPEG decoder

97
In SystemC behavioral models
use (clocked) threads
 Modeled with thread processes SC_THREAD or with clocked thread
processes SC_CTHREAD
 Every module has a clock input:
 sc_in_clk clk;
 The SC_THREAD process is made static sensitive to a clock edge
 Sensitive << clk.pos();
 To separate clock cycles wait() statements are used.
 A synchronous or asynchronous reset signal can be specified:
 reset_signal_is(reset, true);
 async_reset_signal_is(reset, true);
 Simulation must be run for a finite time (or will not stop!) or halted
explicitly.

98
Behavioral models communi-
cate via standard signals
 All input and outputs are standard signals
 Define signals with:
 sc_signal<T> a;
 Predefined ports for sc_signal<T> channels:
 sc_in<T> with interface function read() or assignment operator.
 sc_out<T> with interface function write() or assignment operator.
 sc_inout<T> that combines both interface functions.

99
Clocks in SystemC
 Create clock
 sc_clock clock1 ( “clock_label”, period, time_unit, duty_ratio, offset, first_value );
 sc_clock clock2 ( “clock_label”, period, time_unit, duty_ratio);
 sc_clock clock3 ( “clock_label”, period, time_unit);
 Clock Binding
• f1.clk( clock1 );
 Clocks are typically defined in sc_main();
 Example
2 12 22 32 42
sc_clock clock1 ("clock1", 20, SC_NS, 0.5, 2, true);

100
Example: summing 3 values on
an input
SC_MODULE(sum3) {
sc_in_clk CLOCK;
sc_in<bool> RESET;
sc_in<unsigned> A;
sc_out<unsigned> D;
void compute();
SC_CTOR(sum3) {
SC_CTHREAD(compute, CLOCK.pos());
reset_signal_is(RESET,true);
};
};
void sum3::compute() {
unsigned tmp;
// reset section
while (TRUE) { // main loop
tmp = A.read();
wait(); // end first I/O cycle
tmp += A.read();
wait(); // end second I/O cycle
tmp += A.read();
D.write(tmp);
wait(); // end third I/O cycle
}
}

101
Gradual Communication
refinement (1/2)
queue
r a
Behavioral_process1 Behavioral_process2
clock
Converters
Q1 Q2

102
Gradual Communication
refinement (2/2)
Process1Process1 BehavioralBehavioral
Process2Process2
C1C1
r a
Behavioral_process1
clock
Q1
BehavioralBehavioral
Process2Process2r a
clock
BehavioralBehavioral
Process1Process1

103
Converter SystemC code
template <class T> SC_MODULE(FF2P) {
sc_fifo_in<T> input;
sc_out<T> output;
sc_in<bool> ask;
sc_out<bool> ready;
sc_in_clk clk;
SC_CTOR(FF2P) {
SC_THREAD(process);
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ready.write(false); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
ready.write(false); value = input.read();
state = WRITEOUTPUT;
} else {
if (ask.read() == true) {
output.write(value); ready.write(true);
state = READINPUT;
} else {
ready.write(false); state = WRITEOUTPUT;
};
};
wait();
}
return;
}
};
template <class T> SC_MODULE(P2FF) {
sc_fifo_out<T> output;
sc_in<T> input;
sc_in<bool> ready;
sc_out<bool> ask;
sc_in_clk clk;
SC_CTOR(P2FF) {
SC_THREAD(process)
sensitive << clk.pos();
}
void process() {
T value;
enum ctrl_state {READINPUT, WRITEOUTPUT};
ctrl_state state;
// reset cycle
ask.write(true); state = READINPUT; wait();
while(true) {
if (state == READINPUT) {
if (ready.read() == true) {
value = input.read(); ask.write(false);
output.write(value); state = WRITEOUTPUT;
} else {
ask.write(true); state = READINPUT;
};
} else {
ask.write(true); state = READINPUT;
};
wait();
}
return;
}
};

104
the JPEG decoder

105
Exercise 3: communication
refinement for the JPEG encoder
 Goal: Replace the FIFO between the run-length encoder and decoder by
a handshake protocol
 You will find:
 In /exercises/exercise3/ : solution of exercise2
 In/exercises/modules/: JPEG-encoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules
{src,snk,test}.{h,cpp}
 In /exercises/add2systemc: FIFO to protocol conversion functions in
add2systemc: {FF2P, P2FF}.h
 Introduce a handshake protocol between rl_enc and rl_dec.
 introduce refined versions of rl_dec in jpeg_dec.h and main.cpp.
 simulate and verify correct operation.

Model Based System
Design
Class 4: computation
refinement
Marc Engels
e-mail:
marc.engels@flandersmake.be

107
Computation refinement in
SystemC
 Computation refinement
 Computation refinement in SystemC
 Exercise 4: computation refinement of a JPEG decoder

108
RTL refinement is the 3rd
step in
architectural design
operations
DataData
variables, arrays
floating point
memories
fixed point
operators
point-to-point
queues
busses
detailed protocol
resource allocation
scheduling
memory allocation
address generation
word sizing
bus allocation
introduce arbiters
include protocols
System
Functionality
System
Architecture

109
beh4beh4RTL4RTL4beh2beh2RTL2RTL2
beh3beh3RTL3RTL3func1func1
For synthesis all blocks needs
to be transformed to RTL
 Transformation is a gradual refinement process
 switch a behavioral block with a RTL block
 verify by system simulation
SYSTEMSYSTEM
S1S1
S2S2
S3S3
TESTBENCHTESTBENCH

110
Behavioral model can be
represented by an FSM
Process_behavioral{// SC_CTHREAD
ask.write(TRUE);
while (ready.read() == FALSE) {wait();}
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
wait();
d = x * b1;
y = d * b2;
output.write(y);
ask.write(TRUE);
while (ready.read() == FALSE)
{wait();}
wait();
}
}
=
!ready
ready !ready
ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y

111
Behavioral to RTL: scheduling of
operations in FSM
!ready
ready !ready
ready
ready
!ready
ready
!ready
ask=1
ask=0
x=input
ask=1
d = x * b1
y = d * b2
output = y
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y

112
Rescheduled FSM is
represented in RTL code
=
ready
!ready
ready
!ready!ready
ask=1
ask=0
x=input
d=x*b1
ask=1
y = d * b2
output = y
Process_RTL{// SC_CTHREAD
ask.write(TRUE);
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
d = x * b1;
wait();
ask.write(TRUE);
y = d * b2;
output.write(y);
{wait();}
wait();
}
}

113
RTL description corresponds to
a datapath
possiblepossible
mappingmapping
**
b1b1
b2b2
xx
yy
dd
11
00
askask
RT description introduces synthesis
decisions:
register inference
resource sharing
parallelism
readyready
D QD Q
D QD Q
D QD Q
Process_RTL{// SC_CTHREAD
ask.write(TRUE);
wait();
while(TRUE) {
ask.write(FALSE);
x = input.read();
d = x * b1;
wait();
ask.write(TRUE);
y = d * b2;
output.write(y);
{wait();}
wait();
}
}

114
ready
… and a controller
StateState
registerregister
OutputOutput
functionfunction
control: steers the register transfers in datapathcontrol: steers the register transfers in datapath
Next-stateNext-state
functionfunction
DatapathDatapath
ControllerController inputsinputs
outputsoutputs
controlcontrol
statusstatus
ins0
ins1
ins2
C0
c1
c2

115
Critical path of combinatorial
logic is crucial
Combinatorial
Logic
Multiplexers, Adders,
Multipliers, etc.
processclock
in
outcalc
clock
…
in
…
Critical path
calc
…
out

116
Pipelining reduces the critical
path
Area
critical
path
word
operator
delay
data Insertion
Interval (DII)
Non-pipelined
Bit word
pipelined
+
DII = operator delay
+
DII = critical path
+
+
1-bit
operator
delay
Word
pipelined
DII = operator delay/2
+
+
lsb
msb
+
+
+
…
…

117
Multiplexing reduces the area
of the solution
Area
data Insertion
Interval (DII)
Processor architecture
e.g. VLIW
Non
pipelined
DII = critical path
+
+
critical
path
Muxed DSP
+
DII = 2 x critical path

118
E.g. Robot Vision System
CCD
camera
line
delayobject
Sobel
operator
Edge
detector
Feature
extractor
Pattern
recognizer
Robot
controller
x
µ-CODE
ROM
PCLOGIC
µ-CODE
CONTROL
RAM
PROGRAM-
MABLE
FUNCTION.
UNITS
OFF-CHIP
MEMORY
MODULAR ARRAY OF
PROCESSING ELEMENTS
CON-
TROL
Global control and communication
µcoded processorMuxed DP's
HARDWIRED CONTROL
MEMORIES
DATA PATH
Array type
Real embedded systems show
architectural variability

119
Area can be estimated at a
high level
Source: Gaijski
State_reg
+
logic
# states
# states, # ctrl_lines, # states each ctrl_line is active
# bits and # words of each storage
# bits and type of each FU
#sources of muxes
+
# DP connections, # DP components
Storage
+
func_units (FU)
+
Muxes
+
wires
area Is a function of
Datapath(DP)
Control
Unit(CU)
TotalCircuit

120
Standard cell data can be
used to derive parameters
type name width
2 input MUX mxi2v0x1 3.08
2 input NMUX mxn2v0x1 3.52
2 input AND an2v4x2 2.20
2-bit half adders ha2v0x2 5.28
Q flip-flop dfnt1v0x2 7.92
… … …
Source: www.vlsitechnology.org

121
Storage: Registers vs. memories
 Inferred by
synthesis.
 Larger size per
storage bit.
 No overhead.
 Fast & parallel.
 Best < 1 kbits
storage
 Non sythesized – but
created by memory
generators.
 Smaller size per
storage bit.
 Fixed overhead.
 Slow & serial
 Best > 1 kbits
storage

122
SystemC

123
RTL design is modeled with modules
and processes
A sc_module is an identifiable hardware unit.
A module can contain multiple processes that run in parallel.
Signals are used to communicate between (executions of) processes.
Variables are used inside a single execution of a process.

124
Restrictions (1/2) in SystemC
Synthesizable Subset (draft 1.3)
 Modules
 Exactly one constructor.
 Processes
 Only SC_CTRHREAD and SC_METHOD are supported;
SC_THREAD is not supported.
 In a SC_CTHREAD there must be a wait() statement before
the infinite loop or as first statement in this loop.
 At most one clock signal is allowed per process.
 The reset behavior is specified in the process, not in the
constructor of the modules.
 Between two clock events, at most one assignment to a
signal is supported.
 Processes communicate through signals, not shared
variables.

125
Restrictions (2/2) in SystemC
Synthesizable Subset (draft 1.3)
 Datatypes:
 No floating point.
 Char is implemented as signed char, all integer types are
2’s complement.
 Pointers are not supported.
 Untemplated fixed point types are not supported.
 No division operator for fixed point types.
 No global variables but global constants are OK.
 Functions:
 No new(), delete() and sizeof() functions.
 Destructors have no effect.
 Exception handling is not supported.

126
Example: relation Synthesizable
SystemC and VHDL
System C
#include “systemc.h”
SC_MODULE(dff) {
sc_in<bool> din;
sc_clk_in clock;
sc_out<bool> dout;
void doit(); // Member function
SC_CTOR(dff) {
SC_CTHREAD(doit, clock.pos());
}
};
void dff::doit() { // Process body
while(TRUE){
wait();
dout.write(din.read());
}
}
VHDL
entity dff is
port ( din, clock : in bit; dout : out bit );
end dff;
architecture dff of dff is
begin
doit : process(clock) – Sensitivity List
begin
if (clock’event and clock=‘1’) then
dout <= din;
end if;
end process;
end dff;

127
Signals for communication
between processes
 Declaration
 Scalar Signal: sc_signal<sc_uint<32 > > a;
 Vector Signal: sc_signal<sc_logic> a[32];
 Signals use request-update mechanism: write takes effect after a delta-cycle
 When you assign a value to a signal or port, the value on the right side is
not transferred to the left side until the process halts. This means that the
signal value as seen by other processes is not updated immediately, but it
is deferred.
 When you assign a value to a variable, the value on the right side is
immediately transferred to the left side of the assignment statement.
 SystemC supports resolved Ports and Signals
 Multi-Valued Logic type : 0, 1, Z, X
 Allow Multiple Drives

128
Signals can infer registers
Synthesi
s
ww = x= x
y1 =y1 = ww * 10* 10
zz = x // writing at the end of cycle= x // writing at the end of cycle
wait()wait()
y2 =y2 = zz * 10 // reading at the beginning of cycle* 10 // reading at the beginning of cycle
x 1x 1 2 3 x2 3 x
y1 10 20 30 xy1 10 20 30 x
z x 1 2 3z x 1 2 3
y2 x 10 20 30y2 x 10 20 30
clockclock
ww
zz
1010
1010
xx
y1y1
y2y2
Simulation
D QD Q

129
Random Access Memory is
modeled with a behavioral model
// ram_asyn.h – asynchronous RAM
#include "systemc.h"
SC_MODULE(ram_asyn) {
sc_in<sc_unint<6> > addr;
sc_in<bool> rwb;
sc_in<int> datain;
sc_out<int> dout;
int memdata[64]; // local memory storage
void ramaction();
SC_CTOR(ram_asyn){
SC_METHOD(ramaction)
sensitive << addr << datain << rwb;
for (int i=0; i++; i<64) { memdata[i] = 0; }
}
};
Asynchronous
RAM (64)
address
datain
rwb
dataout

130
SystemC has a 4-step
simulation engine
1: Initialize
2: Iterative execution of
functional, behavioral & RTL
processes until no activity
3: Update primitive channels
4: Go back to 2
Functional1
behav2
RT3RT3
q1q1
s2s2
q3q3
q4q4
P2FF
s1s1
P2FF
s3s3
FF2P
s4s4

131
Measuring performance
 const sc_time& sc_time_stamp(): returns the current time
during simulation.
 Following functions are defined for sc_time:
 double to_seconds(): converts the time into seconds
 void print(): prints the time on the screen
 If the clock period is known, the number of clock cycles can
be calculated.
 Throughput ≥ Datarate/Simulation_time

132
Dump signals for wave plotting
sc_signal< sc_int<32> > signal1;
sc_signal<bool> signal2;
sc_trace_file *tracefile;
tracefile = sc_create_vcd_trace_file(tracefilename);
sc_trace(tracefile, signal1, “signal1");
sc_trace(tracefile, signal2, “signal2");
sc_close_vcd_trace_file(tracefile);

133
SystemC

134
How to start?
 Goal: refine run-length decoder in RTL model.
 You will find:
 In /exercises/exercise4/ : solution of exercise3
 In/exercises/modules/: JPEG-encoder modules
{b2r,idct,normalize,zz_dec}.{h,cpp} and test bench modules {src,snk,test}.
{h,cpp}
 In /exercises/add2systemc: behavioral RAM models.
 Make RTL model of run-length decoder.
 draw FSM of the RTL model.
 introduce the RTL model in jpeg_dec.h and integrate in main.cpp.
 simulate and verify correct operation with gtkwave viewer.
 Estimate the needed hardware for this RTL model.

Digital design with Systemc

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Digital design with Systemc

Similar to Digital design with Systemc (20)

Recently uploaded

Recently uploaded (20)

Digital design with Systemc

Editor's Notes