Summary Of Course Projects

SUMMARY OF COURSE
PROJECTS
SETIAWAN SOEKAMTOPUTRA

MASTER OF ELECTRICAL AND COMPUTER
ENGINEERING
ILLINOIS INSTITUTE OF TECHNOLOGY
DECEMBER 2010 GRADUATE

CONTENTS

• 32-bit Pipelined CPU
• MC68K-Based Monitor Program
• Pipelined MIPS Processor with hazard handler and data
forwarding
• Simple Mesh-Like and Ring-Like Network on Chip Design
• Small office network design
• 4-bit 10t adder circuit with dual-vt logic design
• Single-ended 6T vs. standard 6T SRAM bitcell design
• QR Matrix Factorization
• Electro Active Polymer Energy Harvesting Design
• Advanced Encryption Standard Hardware Design

2

SPRING 2009

• Introduction to VLSI Design
• 32-bit Pipelined CPU
• Multiplier with accumulator and pipeline optimization
• Microcomputer
• MC68K-Based Monitor Program
• Advanced Computer Architecture
• Pipelined MIPS Processor with hazard handler and data
forwarding

Return 3

32-BIT PIPELINED CPU

• Hardware Description Language
• Verilog
• Tools
• Compiler: Cadence Verilog XL
• Logic Synthesis: Synopsys Design Compiler
• Simulation tool: Cadence‟s SimVision, Mentor Graphics
Modelsim
• Place and Route: Cadence SOC Encounter
• Mentor Graphic‟s Modelsim
• Objectives
• Execute ASIC Flow in this implementation using verilog
• RTL, post-synthesis, and post-PR simulation for verification
• Determine maximum frequency, area, delay, and power

Return 4


• 32-bit Memory File
• Eight ALU functions: multiplication, add, subtraction,
OR, AND, XOR, XNOR
• M:multiplicand, N: multiplier
• Multiplier:
• Radix 2r produce N/r partial products
• Radix-4 booth-encoded Multiplier  Reduces number of
partial products (N/2 vs. N)
• Wallace Tree  Reduces number of logic levels required to
perform summation

Return 5


Return 6


Return 7


Return 8


• Results
• Maximum frequency: 40 < f < 41
MHz

Return 9


• Case studies:
• Case 1: Modify ALU multiplier to multiplier with accumulator
(MAC) (useful for implementing DSP)
• Case 2: Pipeline optimization
• MAC benefit: reduces #instruction sets to compute
the final result of sum of product functions.
• Pipeline optimization is applied by inserting registers
at the critical path (in this case MAC unit)

Return 10

Case I

Return 11


• Case 1 results

• Case 2 results

Return 12


• Case 2 Decision to put registers

Return 13


• Provided:
• Multiplier accumulator block diagram
• Simple CPU design written in verilog
• All required tools
• Implementation
• Construct fore-mentioned unit in verilog and modify the
design to fit new unit
• Apply numbers of registers for pipelining
• Design functionality Test
• Verify in sumulation that function F= (-10)* 5 + (-60)*2 + (-
60)*8 outputs the correct result

Return 14


• Results

Return 15


• Additional Analysis Result
• Finding the maximum frequency
• Expected maximum frequency of the design: 58 MHz
• Frequency vs. area vs. power consumption

Return 16

MC68K-BASED MONITOR PROGRAM

• instructor: Dr. Jafar Saniie
• Requirements/Specifications
• Construct a simple monitor program for MC68000 processor
that allows user to execute common memory and register
accesses, basic exception handlers.
• Language
• 68000 assembly language
• Tools
• Easy68k Editor/Assembler/Simulator

Return 17


• Monitor program
flowchart

Return 18


• Monitor
program
system
diagram

Return 19


• Includes command interpreter that check and validate
user inputs.
• Monitor debugger commands:
• MEMD  Memory display
• MEMS  Memory Set
• SORT  Memory Sort
• FILL  Memory Fill
• MOVE  Memory move
• MEMM  Memory Modify
• FIND  Block Memory Search
• REGM  Register Modify
• REGD  Register Display
• RUNS  Execute program at specified location

Return 20


• Monitor debugger Exception handling commands:
• TBUS  Bus Error Exception
• TADD  Address Error
• TILL  Illegal Exception
• TPRI  Privilege Violation
• TDIV Division by Zero

Return 21


• Results (partial of 17 commands made)
Register display

Memory display

Return Command interpreter
22

HIGH-PERFORMANCE PIPELINED
MIPS PROCESSOR
• MIPS (Microprocessor without Interlocked Pipeline Stages) is a
reduced instruction set computer (RISC) instruction set
architecture (ISA)
• instructor: Prof. Jia Wang
• Requirements/Specifications
• Design a MIPS processor with pipeline, data forwarding, and hazard
handling capabilities.
• Run RTL Simulation to verify the functionalities
• Language
• VHDL
• Tools
• Modelsim PE 6.5
• MARS 3.6 MIPS Simulator
• Provided:
• Data memory unit design
• Testbench code

Return 23

MIPS PROCESSOR
• Data width: 32-bit
• Branch Hazard
• 5-stage pipeline
• Instruction Fetch • Branch calculation occurred in
• Instruction Decode Instruction Decode Stage
• Execute
• Memory Access
• Branch miss only costs one cycle
• Write-Back of stall.
• Main Modules • Data Hazard
• Program counter (PC)
• Control Unit • Stall if data being written is going
• ALU Control Unit to be used at the next instruction
• Register File
• ALU • Data Forwarding
• Instruction Memory
• Data Memory
• Result data is used immediately
• Hazard Detection Unit rather than written back to
• Forwarding Unit register file first.

Return 24

HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR

• MIPS Architecture

Return 25

MIPS PROCESSOR
• Test program (Running on MARS 3.6)

Return 26

MIPS PROCESSOR
• Result

Return 27

FALL 2009

• Hardware/Software Co-Design
• Simple Mesh-Like Network on Chip Design
• Simple Ring-Like Network on Chip Design
• Introduction to Computer Network
• Design of 2-story small office computer network

Return 28

HARDWARE/SOFTWARE CO-
DESIGN

• Projects:
• Network on chip prototype design with three
nodes
• Simple Mesh-Like Network on Chip Design

Return 29

NETWORK ON CHIP PROTOTYPE
DESIGN WITH THREE NODES
• Instructor: Prof. Jia Wang
• Specifications
• Three-node in partially connected mesh topology NoC
architecture
• Three processing elements and three routers.
• Queue system: FIFO
• Language
• SystemC running on Visual C++
• Tools
• Microsoft Visual C++

Return 30

• Three-node NoC System Diagram

• Third node function (called PE_dumpbox)
• It receives all packets that cannot be processed by the
destination processing unit due to overloading in the network

Return 31

• Results
• Overload in Router 1 network
buffer at cycle 3

• 3rd processing unit
PE_dumpbox receives
packet

Return 32

MESH-LIKE NETWORK ON CHIP
PROTOTYPE DESIGN
• Specifications
• a simple mesh-like NoC architecture.
• One router has one processing unit (PE).
• Queue system: FIFO
• 4 by 4 matrix-like size
• Language
• SystemC
• Tools
• Microsoft Visual C++

Return 33

PROTOTYPE DESIGN
• Simple NoC Architecture

Return 34

PROTOTYPE DESIGN
• Results
• Generated packets

• Result shows packets are
delivered

Return 35

PROTOTYPE DESIGN
• Results
• Delays due to the fact
that only one packet is
delivered to processing
element PE at a time

Return 36

PROTOTYPE DESIGN
• Benefit and drawback:
• Packet arrives in the destination address with fewer hops
 reducing contention and increasing average bit rate.
• Increases the complexity of the design and more wires
are needed.

Return 37

INTRODUCTION TO COMPUTER
NETWORK
• Project:
• Design a prototype of 2-story small office computer network
capable of serving 20 users with three department LANs,
four servers and wireless Internet
• Language
• N/A
• Tools
• Microsoft Visio

Return 38

SMALL OFFICE NETWORK DESIGN

• Proposed configurations
• IP address allocation

Return 39


• Proposed configurations
• Design Topology

Return 40


• Office Layout

2nd floor

Colored arrows show how
1st floor cables are managed
Return 41

SPRING 2010

• Advanced VLSI
• High Performance VLSI IC System
• Single-ended 6T vs. standard 6T SRAM bitcell design
comparison
• QR Factorization
• Implementing QR factorization algorithm in C

Return 42

4-BIT 10T ADDER CIRCUIT WITH
DUAL-VT LOGIC DESIGN
• Project:
• Specifications
• Adder circuit is based on:
J. Lin, M. Sheu, and C.Ho. A Novel High-Speed and Energy Efficient 10-Transistor Full
Adder Design. IEEE Trans. on Circuits and Systems, May 2007.
• Adder: cascaded Carry ripple Adders
• Technology node: 45nm (FreePDK)
• Voltage: 1.1V @ 25 MHz
• Performance measurements (delay and power consumption) for 10T
Adder Circuit using high-threshold (Vt), low-Vt, and dual-Vt transistors
• Tools
• Cadence Virtuoso Schematic Design
• Synopsys HSPICE Simulator
• Nanosim Simulator

Return 43

• High Vt vs. low Vt

• Full Adder Design (1-bit)
• Complementary and level restoring carry logic (CLRCL)

Return 44

• Full Adder Design (1-bit) Critical Path
• Dual-VT: Low-VT apply on transistors which are in critical path for
speed and High-VT for others for low leakage
• NMOS at multiplexer and PMOS in inverter are low-VT transistors

Return 45

• Logic Equation
Sum = (A XNOR B).Cin + (A XOR B). Cin_bar
Cout= (A XOR B) .Cin + (A XNOR B).A
• Design Components
• Inverter (left) and multiplexer (right)

Return 46

• 1-bit Full Adder (consisting of multiplexers and
inversters) and its symbol

• 4-bit Full Adder

Return 47

• Methodology
• Using combination of input vector to measure delay and
power consumptions
• Delay : Switching delay between least significant bit (bit 0)
and most significant bit (bit 3)
• Power : Average and maximum power during simulation

• Results 4.00E-10

3.50E-10
• Delay (in seconds)
3.00E-10

2.50E-10
High-VT
2.00E-10
Low-VT
1.50E-10
Dual-VT
1.00E-10

5.00E-11

0.00E+00
High-to-Low Low-to-High
Return 48

• Results
• Power consumption (in Watt)
6.00E-05 5.00E-04
4.50E-04
5.00E-05
4.00E-04
4.00E-05 3.50E-04
3.00E-04
3.00E-05 High-VT 2.50E-04 High-VT
Low-VT 2.00E-04
2.00E-05 Low-VT
1.50E-04
Dual-VT Dual-VT
1.00E-05 1.00E-04
5.00E-05
0.00E+00 0.00E+00

Average Power Maximum Power

Return 49

• Results

Return 50

• Issue
• Voltage degradation specifically for high-vt or high
frequency (> 125 MHz) due to pass transistors behavior to
deliver weak-1 (NMOS) or weak-0 (PMOS).

Return 51

SINGLE-ENDED 6T VS. STANDARD 6T
SRAM BITCELL DESIGN
• Specifications
• Design from:
J. Singh, et al. Single Ended 6T SRAM with Isolated Read-Port for Low-
Power Embedded Systems. IEEE. 2009
• Technology node: 45nm
• Use: high VT MOSFET
• Tools
• Cadence Virtuoso Schematic Design
• Synopsys HSPICE Simulator

Return 52

SRAM BITCELL DESIGN
• Background
• SRAM consumes majority of die area
• Dynamic power via reads and writes activities
• Static power : retaining its logic value
• Benefits/Drawbacks of Single-Ended SRAM
• Faster reading logic „1‟
• One bit line (no complementary bit bar line) wire
reduction
• More delay in Writing „1‟ due to weak-1 behavior of pass
transistor NMOS (but around 85% of writes are zero writes)
• Role of Isolated Read Port: Prevents bitcell content to be
exposed during READs
• Considerable lower power dissipation, better read SNM

Return 53

SRAM BITCELL DESIGN

Return 54

SRAM BITCELL DESIGN
• Standard 6T SRAM
• Read: precharge
BL and BL* 
WordLine=1
• Write: assert new
value to BL and BL*
 WordLine=1
• Transistor sizing:
• Access transistor:
medium
• Pullup TR: weak
• Pulldown TR: Strong

Return 55

SRAM BITCELL DESIGN

Return 56

SRAM BITCELL DESIGN

Return 57

SRAM BITCELL DESIGN

Return 58

SRAM BITCELL DESIGN

Return 59

SRAM BITCELL DESIGN
• Standard SRAM Design (using Cadence Virtuoso)

Return 60

SRAM BITCELL DESIGN
• Single-Ended SRAM Design

Return 61

SRAM BITCELL DESIGN
• Comparison Results
• Write Delay (0 to 0.5Vdd or 1 to 0.5Vdd)

“…around 85% of the instruction write bits are “0,” and over 90% of the data
write bits are “0.”.. “ (quoted from [3])
[3] Y. Chang, F. Lai, C. Yang. Zero-Aware Asymmetric SRAM Cell for
Reducing Cache Power in Writing Zero. IEEE Trans. On VLSI
Systems, Vol.12, No.8, August 2004.

Return 62

SRAM BITCELL DESIGN
• Comparison Results
• Power Consumption Comparison

Return 63

SRAM BITCELL DESIGN
• Noise Margin

Return 64

QR MATRIX FACTORIZATION

• Purposes:
• Implementing QR factorization algorithm in C
• Specifications
• Written in C under RedHat OS
• QR Factorization
• Decomposition method of a matrix to solve linear problems or
equations without inverting one of the left-hand side matrix.
• Applicable to: m-by-n matrix A
• Decomposition: A = QR where Q is an orthogonal matrix of size m-by-
m, and R is an upper triangular
• The QR decomposition provides an alternative way of solving the
system of equations Ax = b without inverting the matrix A. The fact that
Q is orthogonal means that QTQ = I, so that Ax = b is
• equivalent to Rx = QTb, which is easier to solve since R is triangular.

Return 65


• Algorithm

Return 66


• Result

Return 67

FALL 2010

• Electro Active Polymer Energy Harvesting
• Advanced Encryption Standard

Return 68

ELECTRO ACTIVE POLYMER
ENERGY HARVESTING DESIGN
• EAP Circuitry provides mechanical to electrical
energy conversion when it is stretched, given bias
voltage.
• EAP material  VHB 4905 tape and carbon grease

Return 69

• Previous prototype: • Drawbacks
• High energy consumption
• Charge management • EAP output power is too
IC: TI‟s bq2000 small to even turn on battery
• Li-ion battery 3V, 45mAh charging circuit (which
needs 20.6 mA)
• Application: TI‟s eZ430- • Solutions
F2013 • EAP material efficiency
• Boost Converter to • Higher capacitance
supply biasing voltage (5 • Battery and circuit that can
V  1.5KV): store small energy without
requiring much energy to
• EMCO Q15N-5 operate
• Apply low biasing voltage 
eliminate use of boost
converter

Return 70

• Simulation model using Simulink
• Circuit model parameters:
• EAP Model parameters, input voltage (battery), and output
capacitor Co

Return 71

• Simulation model using Simulink
• EAP Model Parameters:
• Cidle, Cforced, force frequency f(how often the EAP is stretched)
• Absolute function to create always-positive sine waveform from
original sine wave

Return 72

• Simulation result:

Return 73

• Prototype:
• Battery charging : Cymbet CBC5300
• Battery : 2xCBC050 (3x50uAh) at 3.5V output
• Capability to harvest 1.05V
• PCB Layout Tool : Altium Designer
• Application: MSP430-F2274 with CC2500 2.4GHz RF
Transceiver

Return 74

•

Return 75

•

Return 76


Battery Charging
profile for CBC050

Return 77

ADVANCED ENCRYPTION STANDARD
HARDWARE DESIGN
• Variant AES with 512-bit and 1024-bit key
• Area and power consumption comparison with 128-bit
and 256-bit AES keys
• CMOS technology : 45nm
• Operating Voltage : 1.1 V @ 100 MHz
• Verilog language
• Tools:
• Synthesis : Synopsys DC Compiler
• Simulation : Modelsim
• Find the relationship between key size and implemented
hardware area and power consumption.

Return 78

HARDWARE DESIGN
Cipher Key Plaintext
• Initial Round
Key Expansion RoundKey[0] AddRoundKey

Normal Round
SubBytes

ShiftRows

MixColumns i=i+1

RoundKey[i] AddRoundKey

yes
i < Number of
rounds?

Final Round
No

SubBytes

ShiftRows

AddRoundKey

Ciphered Text

Return 79

HARDWARE DESIGN
plaintext (in bytes)
• Block View of AES 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Operation XOR
First roundkey (in bytes)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

State Block State Block
0 4 8 12 SubBytes S0 S4 S8 S12
(Replaces each
SubBytes
1 5 9 13 byte with S-box S1 S5 S9 S13
Mux

Plain_text AddRoundKey and MixColumns
ShiftRows AddRoundKey
2 6 10 14 value) S2 S6 S10 S14
Mux

Ciphered
Mux

Initial _text

Cipher_key Key Expansion Module value
(zero) 3 7 11 15 S3 S7 S11 S15

State Block(after ShiftRows)
S0 S4 S8 S12
Ready MixColumns
for S5 S9 S13 S1 XOR a(x)
next round
S10 S14 S2 S6 Per Column
S15 S3 S7 S11

State Block(after MixColums) Next roundkey
M0 M4 M8 M12 K0 K4 K8 K12
M5 M9 M13 M1 k1 K5 K9 K13
Return M10 M14 M2 M6
XOR K2 K6 K10 K14
80
m15 M3 M7 M11 K3 K7 K11 K15

HARDWARE DESIGN
• Block Diagram

SubBytes
Mux

Plain_text AddRoundKey and MixColumns
ShiftRows AddRoundKey

Mux
Ciphered

Mux
Initial _text
Key Expansion Module value
Cipher_key
(zero)

Return 81

HARDWARE DESIGN
7
Results 6
y = 0.852x + 2.739
R² = 0.985

5 100000
95000
4 90000 power (dynamic) in mW
85000
80000 power (static) in mW
3 75000 Total Power in mW
70000
65000 Linear (Total Power in mW)
2
60000
55000
1 50000
AES128 AES256 AES512 AES1024
area 58824.876 64188.036 76881.193 96312.560
0

power (dynamic) in mW power (static) in mW Total Power in mW
AES128 3.3574 0.2971603 3.6545603
AES256 3.9442 0.3341722 4.2783722
AES512 5.0289 0.409219 5.438119
AES1024 5.6042 0.5053051 6.1095051

Return 82

HARDWARE DESIGN
Results: Area

100000

95000

90000

85000

80000

75000

70000

65000

60000

55000

50000
area 58824.87654 64188.0369 76881.19388 96312.56036

Return 83

Summary Of Course Projects

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Summary Of Course Projects

Similar to Summary Of Course Projects (20)

Summary Of Course Projects