In this slide, I introduce how I implement RSA256 algorithm with verilog and verify with verilator.
The project use C++ to build the C-model and SystemC model.
To help build the model, we create a C++ class vint to simulate the behavior of Verilog. It supports normal Verilog operation with more strict rules.
The systemC model can be directly translated into Verilog, so the intention of Verilog design is quite clear and concise.
To simplify the simulation, we limit our module to be one input port and one output port. The port uses the valid/ready protocol to control the data flow, which can be modeled as sc_fifo in systemC.
With these abstraction, we can easily implement unit test for all of our modules, and make sure they act as what we want.
----
Please access the source code at:
https://github.com/yodalee/rsa256
Double Revolving field theory-how the rotor develops torque
COSCUP2023 RSA256 Verilator.pdf
1. Robust Verilog Testing Using
Verilator & SystemC & C++17
Yodalee <lc85301@gmail.com>
Yu-Sheng Lin <johnjohnlys@gmail.com>
Take RSA256 as an example
1
3. Outline
● Verilator is a good, opensource SystemVerilog (SV) simulation tool
○ Verilator compiles SystemVerilog into C++ class
○ Control the signals in C++ testbench is tedious
● Array and struct can simplify SV coding
○ Use C++17 to build structs, arrays works the same as SV
● Design patterns for signals using SystemC
○ Unify the control interface
○ Mapping them to SystemC sc_fifo
● Case study: RSA 256
3
4. WhyVerilator
● Open-sourced SystemVerilog simulation tool.
○ https://www.veripool.org/verilator/
○ https://github.com/verilator/verilator/
● Fast
● Decent SV support
● Free license.
○ Enable massively-parallel simulation.
○ Suitable for CI.
4
5. How Verilator works
5
module Mod(
input [7:0] i_data,
output output_ok,
output [12:0] o_data [2]
);
struct VMod {
u8 i_data;
u8 output_ok;
u16 o_data[2];
};
VMod m;
m.i_data = 45;
while (not m.output_ok) {}
EXPECT_EQ(m->o_data[0], 30);
Verilog
User C++ testbench
Generated C++
Simulation Binary
./run.exe
Gtest: 100 != 30
6. Challenges
6
module Mod(
input [7:0] i_data,
output output_ok,
output [12:0] o_data [2]
);
struct VMod {
u8 i_data;
u8 output_ok;
u16 o_data[2];
};
VMod m;
m.i_data = 45;
while (not m.output_ok) {}
EXPECT_EQ(m->o_data[0], 30);
Verilog
User C++ testbench
Generated C++
Simulation Binary
./run.exe
Gtest: 100 != 30
Too many signals to control
Solution: use struct
Bitwidth information loss
7. Use Struct to Simplify SV & Challenges
7
Verilog
Bitwidth information loss
Struct information loss [2]
Support struct v5.000+ [1]
module Mod(
input [7:0] i_data,
input [13:0] i_data2,
input i_data3
);
typedef struct {
logic [7:0] data;
logic [13:0] data2;
logic data3;
} ModIn ;
module Mod(
input ModIn i
);
SystemVerilog
class VMod {
u32 i;
};
Generated C++
[1] We use v5.006 right now.
[2] Verilator has intensive issue and PR about this.
rewrite
8. We need int/array/struct that works for SV and C++
● Challange: Information loss when converting from SV to Verilator
C++
○ Bitwidth & struct information
● Why?
○ The interface that Verilator will convert to is not standardized.
● Solution: Abstraction
○ Build a SV-compatible type system with C++17
8
typedef struct {
logic [7:0] data;
logic [13:0] data2;
logic data3;
} ModIn ;
class ModIn {
u32 i;
};
User code Adaptor
Generated C++
SV-C++ interface
9. 3 weapons mimicking the SV typing system
● vuint<11>
○ logic [10:0] sig;
○ Replace sc_uint
● varray<vuint<11>, 3, 4>
○ logic [10:0] sig [3][4];
○ std::array with multiple dimension
● vstruct (macro)
○ typedef struct packed { ... } iStruct;
○ C++17 based type reflection struct supporting $bits, $pack
9
← Example later
10. ● Why reinvent sc_uint<int>?
● sc_uint is virtual, we cannot memcpy
● sc_uint must link against libsystemc
● You need sc_biguint<int> for wide integers.
● vuint is strictly typed
○ vuint<10> == vuint<11> is disallowed, required by many Lint tools
● C++11 type deduction system and varadic length template
○ auto val = Concat(vuint<4>, vuint<99>, vuint<2>)
Arbitrary bitwidth integer
10
vuint<11> v
| |
logic [10:0] v
11. Array
● Just like std::array, but high-dim.
● Also support array of struct.
11
varray<vuint<11>,2,3> v
| |
logic [10:0] v [2][3];
or
logic [0:1][0:2][10:0] v ;
We treat them the same
12. vstruct (macro) : Verilog struct
● C++17 magic!
● We want to fully utilize C++ standard.
● In SystemVerilog:
○ $bit(oStruct) == 36
○ logic [$bit(oStruct)-1:0] v;
● Our C++ API allows us to:
○ vuint<bit<oStruct>> value;
○ vuint<36> value = packed(oStruct);
○ Also support: unpack, print json, Verilator I/O
struct iStruct {
vuint<3> a;
vuint<10> b;
};
struct oStruct {
vuint<10> sig;
varray<iStruct, 2> c;
};
12
13. vstruct enabled by type reflection (oversimplified)
● All you need is 1 line of macro for every struct.
● Based on boost::hana, you can loop through the struct...
struct oStruct {
vuint<10> sig;
varray<iStruct, 2> c;
MY_MACRO(sig, c);
};
bit<T>
constexpr unsigned b = 0;
for (int i = 0; i < T::N; ++i)
{
b += bit<get<i>()>;
}
return b;
packed(t)
return Concat(
packed(get<i>()) for i in range(N)
);
auto& get<0>() { return sig; }
auto& get<1>() { return c; }
constexpr unsigned N = 2;
13
Expand to...
(pseudocode)
Implement
15. How Verilator works
15
module Mod(
input [7:0] i_data,
output output_ok,
output [12:0] o_data [2]
);
struct VMod {
u8 i_data;
u8 output_ok;
u16 o_data[2];
};
VMod m;
m.i_data = 45;
while (not m.output_ok) {}
EXPECT_EQ(m->o_data[0], 30);
Verilog
User C++ testbench
Generated C++
Simulation Binary
./run.exe
Gtest: 100 != 30
How to drive
module?
How to feed data?
Verilog can have any
kind of interface
16. The Valid/Ready Protocol
16
● Used in ARM AXI, Intel Avalon... specification
● Cycle 1: Sender set valid to 0, no data to transfer.
● Cycle 2: Sender set valid to 1, having data to transfer, but Receiver
set ready as 0, so hold valid and data.
● Cycle 3: Receiver set ready to 1, so Sender can set valid to 0 in the
next cycle or send the next data.
Valid
Ready
Sender Receiver
17. The Valid/Ready Protocol
17
Sender should hold
valid before Receiver
accept it
Sender should not
change data before
Receiver accept it
Receiver can freely
set/reset the ready
18. We can abstract valid ready protocol by SystemC sc_fifo
● sc_fifo full → Valid = 1, Ready = 0
● sc_fifo empty → Valid = 0, Ready = 1
The Valid/Ready Protocol
18
Valid
Ready
Sender Receiver Sender Receiver
sc_fifo
19. SystemCAbstraction of Module
Limit module to be one input, one output.
SC_MODULE(Montgomery) {
sc_in_clk clk;
sc_fifo_in<MontgomeryIn> data_in;
sc_fifo_out<MontgomeryOut> data_out;
SC_CTOR(Montgomery) {
SC_THREAD(Thread);
}
void Thread();
};
Montgomery::Thread() {
while (true) {
MontgomeryIn in = data_in.read();
KeyType a = in.a;
KeyType b = in.b;
KeyType round_result(0);
…
data_out.write(round_result));
}
}
Define input/output with
structure or alias
19
sc_fifo as input/output interface
20. Translate SystemC to Verilog
Montgomery::Thread() {
while (true) {
MontgomeryIn in =
data_in.read();
KeyType a = in.a;
KeyType b = in.b;
KeyType round_result(0);
…
data_out.write(round_result));
}
}
20
The SystemC module is fully tested and translated to Verilog
Keep the module simple, otherwise it will be difficult to translate.
module Montgomery(
// input data
input i_valid,
output i_ready,
input MontgomeryIn i_in,
);
always_ff @(posedge clk) begin
if (i_ready && i_valid) begin
a <= {2'b0, i_in.a};
b <= {2'b0, i_in.b};
round_result <= 'b0;
end
end
…
sc_fifo translate to valid/ready
and optional data
21. Verilog Testbench
● Wrap the DUT class generated by Verilator
○ Assume that DUT has i_valid/i_ready and o_valid/o_ready
○ Testbench generates the clock with sc_clock
○ The driver/monitor implement before_clk and after_clk to control the
valid/ready
21
Testbench
Driver DUT
i_ready
i_valid
o_ready
o_valid
Monitor
22. Test Methodology
22
● At least 2 set of input/output data
○ More is better
○ Never just test 1 set of input/output
● Random
○ Deterministic input/output for basic correctness
○ Random input/output to find more issues
23. https://github.com/yodalee/rsa256
4 modules all with single input/single output.
Golden data generated by C model (or Python script)
1 to 1 mapping from SystemC to Verilog
The RSA256 Implementation
23
RSA256
plaintext
key
modulus ciphertext
TwoPower RSAMont
Montgomery
24. Conclusion
24
● The data type in SystemC is not suitable for simulating
Verilog
○ We create the vint, varray and vstructure.
○ The data type can directly map to SystemVerilog
● We design, implement and test RSA 256 modules, and
validate with Verilator
○ Abstraction over the interface, the designer (a.k.a me) can
focus on test data.
25. Some (Possible) Future Work
25
● Replace SystemC with pure C++ framework
● Support complex interface like AXI
● Really tapeout the chip with Skywater service