C-Based VLSI Design- An Overview
Dr. Chandan Karfa
Department of Computer Science and Engineering
IIT Guwahati 1
2
VLSI Design Flow
System Specification
Architectural Design
High-level Synthesis
Logic Synthesis
Physical Design
Fabrication
Packaging & Testing
IIT Guwahati
High-Level Synthesis (HLS)/C-Based VLSI Design
• C Gates
• Design time 
• Design complexity  (10x)
• Verification effort 
• Hardware/software co-design 
4
Importance of Design Automation
• Shorter design cycle
• Design space exploration
• Fewer errors in the design
• Less Verification efforts
• Specification driven optimization at the higher abstraction level
IIT Guwahati
C-Based VLSI Design
• Enables designs at higher abstraction level (e.g., C, C++, Java)
• 14 out of the top-20 semiconductor companies use HLS tools
• Communications, signal processing, computation, crypto, healthcare,
etc.
• Tailor
• Tailor implementation to match characteristics of target technology (e.g.,
speed, resources, area budget)
– Video components in Tegra X1 chip designed using Catapult HLS.
– NVIDIA 4K processing was designed with C-based HLS.
– Qualcomm designing parts of Snapdragon with Catapult HLS.
– Vivado HLS is part of xilinx design flow
– intel HLS is part of quartus design flow
5
6
HLS
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
always @(posedge ap_clk) begin
if(1'b1 == ap_CS_fsm_state5) begin
j_reg_126 <= j_4_reg_293;
end else if((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1)) begin
j_reg_126 <= 3'd0;
end
end
assign tmp_108_fu_235_p1_temp_6 = tmp_108_fu_235_p1 & 63'd12;
assign statemt_addr_28_reg_324_temp_7 = statemt_addr_28_reg_324 &
4'd19;
assign tmp_108_fu_235_p1_temp_6_temp_8 = tmp_108_fu_235_p1_temp_6
| statemt_addr_28_reg_324_temp_7;
ap_ST_fsm_state2: begin
if((exitcond_fu_175_p2 == 1'd1) & (1'b1 == ap_CS_fsm_state2)) begin
ap_NS_fsm = ap_ST_fsm_state1;
end else begin
ap_NS_fsm = ap_ST_fsm_state3;
end
end
7
HLS
Data-path
Controller
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
8
• Preprocessing: Intermediate representation (CDFG)
construction, data-dependency, live variable analysis,
compiler optimization.
• Scheduling: Assigns control step to the operations of the input
behaviour.
• Allocation: Computes minimum number of functional units
and registers.
• Binding: Variables are mapped to registers, operation to
functional units, data transfers to the interconnection units.
• Data path & Controller design: controller is designed based
on inter connections among the data path elements, data
transfer required in different control steps.
High-level Synthesis Steps
IIT Guwahati
9
High-level Synthesis Steps
| * |
<6 *>
<7 *>
5.
<3 *>
| * |
4.
<5 - >
| * |
| * |
6.
<8 - >
<9 +>
7.
| * |
<4 * >
3.
<0 * >
<2 + >
2.
< 1 *>
1.
Input behaviour
R1 : 3, v1
R2 : x u, v5
R3 : v0, v6
R4 : v3
FU1: op1, on3. ..
FU2: op2, op5, …
FU3: …
scheduling
Data-path
generation
Allocation &
binding
FU1:
Controller
generation
Data-path
Controller
Control signal
status signal
RTL behaviour
IIT Guwahati
pre-
processing
Working with an example
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
CDFG
Preprocessing
I
Read(p1, dx)
Read(p2, x)
Read(p3, a)
Read(p1,y)
Read(p2, u)
c = x < a
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
B2
Write(p1, y)
Basic Blocks with 3-address codes Control and Dataflow graph (CDFG)
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
Preprocessing
Preprocessing
IIT Guwahati 13
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Date dependency graph
a
Scheduling
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
IIT Guwahati 14
Register Allocation and Binding
IIT Guwahati 15
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Interval graph
Var
t1
t8
t7
t6
t5
t4
t3
t2
u
x
dx
y
c
3
a
S1 S4
S3
S2
R2
R1
R3
R2
R1
R2
R1
R11
R10
R9
R8
R7
R6
R5
R4
FU Allocation and Binding: Multiplier
V1
V6
V5
V4
V3
V2
M1
M2
M3
M2
M1
M3
S1 S4
S3
S2
IIT Guwahati 16
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
Mult operations with non-overlapping schedule
can be mapped to the same Multiplier FU
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
FU Allocation and Binding: Adder
IIT Guwahati 17
Var
V10
V8
V7
V9
S1 S4
S3
S2
A1
A1
A1
A1
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
FU allocation and Binding
A1: v10, v9, v7, v6
Add/Sub operations with non-overlapping
schedule can be mapped to the same adder FU
Functional Unit Allocation and Binding
IIT Guwahati 18
FU alloc and bind:
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
Register Transfer Level (RTL) Behaviour
IIT Guwahati 19
S1:
V1 : t1 = u * dx
V2 : t2 = 3 * x
V4 : t4 = u * dx
V10 : x = x + dx
S2:
V5 : t5 = t1 * t2
V3 : t3 = 3 * y
V9 : y = y + t4
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
Original behaviour
Register mapping
RTL behaviour
FU alloc and bind:
MULT: M1: V1, V5
MULT: M1: V2, V3
MULT: M1: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
FU mapping
Datapath Synthesis
IIT Guwahati 20
FU
R1, R2, R1, R5, R4 R6, R1, R5, R6, R2
R1, R1, R2, R7, R4
Data path Synthesis
IIT Guwahati 21
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
Data path Generation
IIT Guwahati 22
Controller Synthesis
IIT Guwahati 23
*
ALU
DATA-PATH CONTROL-UNIT
r2
r1
u
y
x
dx
3
a
REGISTERS
enable
Mux control
ALU control (+,-,<)
c
Control Signals
IIT Guwahati 24
Control Assertion Pattern: <FU, FU_MUX_in, Reg-en, Reg_Mux_in>
FU: 1 bit
FU_MUX_in: 7 bits
Reg_en: 11 bits
Reg_MUX_in: 2 bits
Total: 21 bits
S1: <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
S2: <1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0 0, 1, 0>
S3: <…..>
S4: <…..>
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
Final RTL
IIT Guwahati 25
<1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
26
High-level Synthesis Steps
| * |
<6 *>
<7 *>
5.
<3 *>
| * |
4.
<5 - >
| * |
| * |
6.
<8 - >
<9 +>
7.
| * |
<4 * >
3.
<0 * >
<2 + >
2.
< 1 *>
1.
Input behaviour
R1 : 3, v1
R2 : x u, v5
R3 : v0, v6
R4 : v3
FU1: op1, on3. ..
FU2: op2, op5, …
FU3: …
scheduling
Data-path
generation
Allocation &
binding
FU1:
Controller
generation
Data-path
Controller
Control signal
status signal
RTL behaviour
IIT Guwahati
pre-
processing
Topics to be covered
• Scheduling Possibilities
• Register and FU allocation and binding
• Datapath and Controller Synthesis
IIT Guwahati 27
Scheduling Possibilities
IIT Guwahati 28
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required At least 4 Multipliers required
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1 t3
t5
t8
t6
t7
t2 t4
Scheduling Possibilities
IIT Guwahati 29
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
* *
*
<
+
+ c
I
V1 V2
V5
V8
V10
V11
V6 V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required
At least 2 Multipliers required
Automation of register and FU allocation and
binding
• How to automate register allocation and binding?
• How to automate FU allocation and binding
• Map the problem to Graph colouring problem or clique partitioning
problem and solve.
IIT Guwahati 30
Data path and Controller Synthesis:
-Mux based and Bus based architecture
-Various way to optimize interconnections
Thank You
IIT Guwahati 31

W1M2_Introduction_HLS from under CBased VLSI.pdf

  • 1.
    C-Based VLSI Design-An Overview Dr. Chandan Karfa Department of Computer Science and Engineering IIT Guwahati 1
  • 2.
    2 VLSI Design Flow SystemSpecification Architectural Design High-level Synthesis Logic Synthesis Physical Design Fabrication Packaging & Testing IIT Guwahati
  • 3.
    High-Level Synthesis (HLS)/C-BasedVLSI Design • C Gates • Design time  • Design complexity  (10x) • Verification effort  • Hardware/software co-design 
  • 4.
    4 Importance of DesignAutomation • Shorter design cycle • Design space exploration • Fewer errors in the design • Less Verification efforts • Specification driven optimization at the higher abstraction level IIT Guwahati
  • 5.
    C-Based VLSI Design •Enables designs at higher abstraction level (e.g., C, C++, Java) • 14 out of the top-20 semiconductor companies use HLS tools • Communications, signal processing, computation, crypto, healthcare, etc. • Tailor • Tailor implementation to match characteristics of target technology (e.g., speed, resources, area budget) – Video components in Tegra X1 chip designed using Catapult HLS. – NVIDIA 4K processing was designed with C-based HLS. – Qualcomm designing parts of Snapdragon with Catapult HLS. – Vivado HLS is part of xilinx design flow – intel HLS is part of quartus design flow 5
  • 6.
    6 HLS High-level Behaviour High-level Synthesis RegisterTransfer Level Description IIT Guwahati Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end always @(posedge ap_clk) begin if(1'b1 == ap_CS_fsm_state5) begin j_reg_126 <= j_4_reg_293; end else if((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1)) begin j_reg_126 <= 3'd0; end end assign tmp_108_fu_235_p1_temp_6 = tmp_108_fu_235_p1 & 63'd12; assign statemt_addr_28_reg_324_temp_7 = statemt_addr_28_reg_324 & 4'd19; assign tmp_108_fu_235_p1_temp_6_temp_8 = tmp_108_fu_235_p1_temp_6 | statemt_addr_28_reg_324_temp_7; ap_ST_fsm_state2: begin if((exitcond_fu_175_p2 == 1'd1) & (1'b1 == ap_CS_fsm_state2)) begin ap_NS_fsm = ap_ST_fsm_state1; end else begin ap_NS_fsm = ap_ST_fsm_state3; end end
  • 7.
    7 HLS Data-path Controller High-level Behaviour High-level Synthesis RegisterTransfer Level Description IIT Guwahati Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end
  • 8.
    8 • Preprocessing: Intermediaterepresentation (CDFG) construction, data-dependency, live variable analysis, compiler optimization. • Scheduling: Assigns control step to the operations of the input behaviour. • Allocation: Computes minimum number of functional units and registers. • Binding: Variables are mapped to registers, operation to functional units, data transfers to the interconnection units. • Data path & Controller design: controller is designed based on inter connections among the data path elements, data transfer required in different control steps. High-level Synthesis Steps IIT Guwahati
  • 9.
    9 High-level Synthesis Steps |* | <6 *> <7 *> 5. <3 *> | * | 4. <5 - > | * | | * | 6. <8 - > <9 +> 7. | * | <4 * > 3. <0 * > <2 + > 2. < 1 *> 1. Input behaviour R1 : 3, v1 R2 : x u, v5 R3 : v0, v6 R4 : v3 FU1: op1, on3. .. FU2: op2, op5, … FU3: … scheduling Data-path generation Allocation & binding FU1: Controller generation Data-path Controller Control signal status signal RTL behaviour IIT Guwahati pre- processing
  • 10.
    Working with anexample Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end CDFG
  • 11.
    Preprocessing I Read(p1, dx) Read(p2, x) Read(p3,a) Read(p1,y) Read(p2, u) c = x < a B1 V1 : t1 = u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a B2 Write(p1, y) Basic Blocks with 3-address codes Control and Dataflow graph (CDFG) Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end
  • 12.
    B1 V1 : t1= u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a Preprocessing
  • 13.
    Preprocessing IIT Guwahati 13 B1 V1: t1 = u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a u dx x 3 y dx * - * * - * * * < + + c V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 Date dependency graph a
  • 14.
    Scheduling S1 S2 S4 S3 u dx x 3y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 IIT Guwahati 14
  • 15.
    Register Allocation andBinding IIT Guwahati 15 R1: t1, t3, t6 R2: t2, t5, t7 R3: t4 R4: t8 R5: u R6: x R7: dx R8: y R9: c R10: 3 R11: a S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 Interval graph Var t1 t8 t7 t6 t5 t4 t3 t2 u x dx y c 3 a S1 S4 S3 S2 R2 R1 R3 R2 R1 R2 R1 R11 R10 R9 R8 R7 R6 R5 R4
  • 16.
    FU Allocation andBinding: Multiplier V1 V6 V5 V4 V3 V2 M1 M2 M3 M2 M1 M3 S1 S4 S3 S2 IIT Guwahati 16 MULT: M1: V1, V5 MULT: M2: V2, V3 MULT: M3: V4, V6 Mult operations with non-overlapping schedule can be mapped to the same Multiplier FU S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4
  • 17.
    FU Allocation andBinding: Adder IIT Guwahati 17 Var V10 V8 V7 V9 S1 S4 S3 S2 A1 A1 A1 A1 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 FU allocation and Binding A1: v10, v9, v7, v6 Add/Sub operations with non-overlapping schedule can be mapped to the same adder FU
  • 18.
    Functional Unit Allocationand Binding IIT Guwahati 18 FU alloc and bind: MULT: M1: V1, V5 MULT: M2: V2, V3 MULT: M3: V4, V6 ADD: A1: V7, V8, V9, V10 COMP: C1: V11
  • 19.
    Register Transfer Level(RTL) Behaviour IIT Guwahati 19 S1: V1 : t1 = u * dx V2 : t2 = 3 * x V4 : t4 = u * dx V10 : x = x + dx S2: V5 : t5 = t1 * t2 V3 : t3 = 3 * y V9 : y = y + t4 S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3 R1: t1, t3, t6 R2: t2, t5, t7 R3: t4 R4: t8 R5: u R6: x R7: dx R8: y R9: c R10: 3 R11: a Original behaviour Register mapping RTL behaviour FU alloc and bind: MULT: M1: V1, V5 MULT: M1: V2, V3 MULT: M1: V4, V6 ADD: A1: V7, V8, V9, V10 COMP: C1: V11 FU mapping
  • 20.
    Datapath Synthesis IIT Guwahati20 FU R1, R2, R1, R5, R4 R6, R1, R5, R6, R2 R1, R1, R2, R7, R4
  • 21.
    Data path Synthesis IITGuwahati 21 S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3
  • 22.
  • 23.
    Controller Synthesis IIT Guwahati23 * ALU DATA-PATH CONTROL-UNIT r2 r1 u y x dx 3 a REGISTERS enable Mux control ALU control (+,-,<) c
  • 24.
    Control Signals IIT Guwahati24 Control Assertion Pattern: <FU, FU_MUX_in, Reg-en, Reg_Mux_in> FU: 1 bit FU_MUX_in: 7 bits Reg_en: 11 bits Reg_MUX_in: 2 bits Total: 21 bits S1: <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1> S2: <1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0 0, 1, 0> S3: <…..> S4: <…..> S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3
  • 25.
    Final RTL IIT Guwahati25 <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
  • 26.
    26 High-level Synthesis Steps |* | <6 *> <7 *> 5. <3 *> | * | 4. <5 - > | * | | * | 6. <8 - > <9 +> 7. | * | <4 * > 3. <0 * > <2 + > 2. < 1 *> 1. Input behaviour R1 : 3, v1 R2 : x u, v5 R3 : v0, v6 R4 : v3 FU1: op1, on3. .. FU2: op2, op5, … FU3: … scheduling Data-path generation Allocation & binding FU1: Controller generation Data-path Controller Control signal status signal RTL behaviour IIT Guwahati pre- processing
  • 27.
    Topics to becovered • Scheduling Possibilities • Register and FU allocation and binding • Datapath and Controller Synthesis IIT Guwahati 27
  • 28.
    Scheduling Possibilities IIT Guwahati28 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 At least 3 Multipliers required At least 4 Multipliers required S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4
  • 29.
    Scheduling Possibilities IIT Guwahati29 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 At least 3 Multipliers required At least 2 Multipliers required
  • 30.
    Automation of registerand FU allocation and binding • How to automate register allocation and binding? • How to automate FU allocation and binding • Map the problem to Graph colouring problem or clique partitioning problem and solve. IIT Guwahati 30 Data path and Controller Synthesis: -Mux based and Bus based architecture -Various way to optimize interconnections
  • 31.