SlideShare a Scribd company logo
1 of 86
Nirav A. Desai desai.nirav.12.09@gmail.com1
Nirav A. Desai desai.nirav.12.09@gmail.com2
Nirav A. Desai desai.nirav.12.09@gmail.com3
MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer
Work on the design done
together by:
Nirav Desai, Munkyo Seo,
Colin Sheldon, Mark Rodwell
Nirav A. Desai desai.nirav.12.09@gmail.com4
I assisted in these mm-wave MIMO
experiments at UCSB
Nirav A. Desai desai.nirav.12.09@gmail.com5
Nirav A. Desai desai.nirav.12.09@gmail.com6
Nirav A. Desai desai.nirav.12.09@gmail.com7
Nirav A. Desai desai.nirav.12.09@gmail.com8
Nirav A. Desai desai.nirav.12.09@gmail.com9
Nirav A. Desai desai.nirav.12.09@gmail.com10
Nirav A. Desai desai.nirav.12.09@gmail.com11
Nirav A. Desai desai.nirav.12.09@gmail.com12
Nirav A. Desai desai.nirav.12.09@gmail.com13
Nirav A. Desai desai.nirav.12.09@gmail.com14
Nirav A. Desai desai.nirav.12.09@gmail.com15
Nirav A. Desai desai.nirav.12.09@gmail.com16
Nirav A. Desai desai.nirav.12.09@gmail.com17
Nirav A. Desai desai.nirav.12.09@gmail.com18
Nirav A. Desai desai.nirav.12.09@gmail.com19
Nirav A. Desai desai.nirav.12.09@gmail.com20
Nirav A. Desai desai.nirav.12.09@gmail.com21
Nirav A. Desai desai.nirav.12.09@gmail.com22
Nirav A. Desai desai.nirav.12.09@gmail.com23
Nirav A. Desai desai.nirav.12.09@gmail.com24
EE 5323: VLSI DESIGN 1 PROJECT
Course Instructor: Prof. Chris Kim
16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS
Nirav Desai
ID: 4280229
Department of Electrical and Computer Engineering
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com25
Nirav A. Desai desai.nirav.12.09@gmail.com26
Brent Kung Adder Gate Level Diagram
1. Input Block with Pre Computation
Input Adder Chain 1
Input Adder Chain 2
Input Adder Chain 3
Input Adder Chain 4
1X
1X
1X
1X
1.224X
1.562X
1.23X
1.274X
1.097X
1.553X
1.108X
1.034X
3.883X
3.043X
2.943X
10.1683X
10.8506X
36X
40X
Output Buffers to drive
Capacitive Loads
Output Buffers to drive
Capacitive Loads
Pi*Pi-1
Gi + Pi*Gi-1
Nirav A. Desai desai.nirav.12.09@gmail.com27
Brent Kung Adder Gate Level Diagram
2. Intermediate Dot Product Blocks
Intermediate Adder Chain 1
Intermediate Adder Chain 2
1X
1X
1X
1X
1.72X
6X
4X
16X
16X
Output Buffers to drive
Capacitive Loads
Pi*Pi-1
Gi + Pi*Gi-1
Nirav A. Desai desai.nirav.12.09@gmail.com28
Brent Kung Adder Gate Level Diagram
3. Output Block for Post Computation
1.182X
1.117X
Ci-1
Pi
Output Buffers to drive
Capacitive Loads
Si
Nirav A. Desai desai.nirav.12.09@gmail.com29
Brent Kung Adder Transistor Level Design
XOR GATE
Nirav A. Desai desai.nirav.12.09@gmail.com30
Brent Kung Adder Transistor Level Design
Inverter Design Optimization
• NMOS Width = 90nm
• PMOS / NMOS Length = 50nM
• Vdd = 1.1V
• Current Averaged Over
One Period of 2 ns
• Optimal PMOS Width = 165nM
• βinverter = 165/90 = 1.834
• Sizing for NAND, NOR and XOR
Changed appropriately
Nirav A. Desai desai.nirav.12.09@gmail.com31
Brent Kung Adder Transistor Level Design
1. Input Block with Pre Computation
Input Adder Block Chain 1
Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER NOR INVERTER NAND LOAD h
g value 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540
f value 3.540 3.540 2.151 3.540 2.618648
b value 2.893 2.400 1.000 1.000 1.000 1.000
S Value 1.000 1.224 1.097 3.883 10.16831 36.000
Input Adder Block Chain 2
Gate Number 1.000 2.000 3.000 4.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER XOR NAND LOAD h
g value 1.000 1.000 1.893 1.295 13.748 2.451 13.748 12.359 416.510 4.518
f value 4.518 4.518 2.386 3.488
b value 2.893 2.400 1.780 1.000 1.000
S Value 1.000 1.562 1.553 3.043 13.748
Input Adder Block Chain 3
Gate Number 1.000 2.000 3.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER NOR LOAD h
g value 1.000 1.000 1.646 3.941 1.646 3.941 6.943 45.038 3.558
f value 3.558 3.558 2.162
b value 2.893 2.400 1.000
S Value 1.000 1.230 1.108 3.941
Input Adder Block Chain 4
Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER XOR NAND INVERTER LOAD h
g value 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686
f value 3.686 3.686 1.947 2.847 3.686447
b value 2.893 2.400 1.000 1.000 1.000 1.000
S Value 1.000 1.274 1.034 2.943 10.85056 40.000
3.94084
Logical Effort Design for Signal
Chains labeled in previous slide #2
Nirav A. Desai desai.nirav.12.09@gmail.com32
Brent Kung Adder Transistor Level Design
2. Intermediate Dot Product Blocks
Logical Effort Design for Signal
Chains labeled in previous slide #3
Intermediate Adder Block Chain 1
Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H
Gate Name INVERTER NAND LOAD h
g value 1.000 1.352 1.000 1.352 6.000 1.000 8.112 2.848
f value 2.848 2.107 2.848
b value 1.000 1.000 1.000
S Value 1.000 2.107 6.000
Intermediate Adder Block Chain 2
Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER NAND LOAD h
g value 1.000 1.352 2.848 1.352 2.848 2.000 7.701 2.775
f value 2.775 2.053
b value 2.000 1.000
S Value 1.000 1.026
Nirav A. Desai desai.nirav.12.09@gmail.com33
Brent Kung Adder Simulated Performance
Voltage (V) Delay Max-C14
(nS)
Power Max
(mW)
Power-Delay
Product (xE-12)
1.1 0.359 6.73 2.41
0.9 0.503 2.95 1.483
0.7 0.937 0.924 0.865
Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design
of individual chains
Voltage (V) Delay Max-C14
(nS)
Power Max
(mW)
Power-Delay
Product (xE-12)
1.1 0.403 5.186 2.089
0.9 0.569 2.277 1.295
0.7 1.069 0.692 0.739
Simulations with minimally sized 1 stage buffers
Without Parasitic Extraction and Interconnect Parasitics buffering doesn’t improve performance significantly.
Nirav A. Desai desai.nirav.12.09@gmail.com34
Brent Kung Adder Worst Case Delay
Input Pattern: A: FFFF B: 0000 -> 0001
Dotted Lines show Carry Bits 15 and 14
Carry Bit 15 Carry Bit 14
Nirav A. Desai desai.nirav.12.09@gmail.com35
Brent Kung Adder Layout
Input Block with Pre Computation
Input Inverters for Bit 0 and Bit 1
Output Buffers
PEX waveforms show
larger size may be needed
XOR
NAND
10X
Nirav A. Desai desai.nirav.12.09@gmail.com36
Brent Kung Adder Layout
XOR 1.553X
Nirav A. Desai desai.nirav.12.09@gmail.com37
Brent Kung Adder Layout
NAND 10.57X Layout with inter digitated fingers to reduce parasitics
Nirav A. Desai desai.nirav.12.09@gmail.com38
Brent Kung Adder Layout
Intermediate Dot Product Generator
Output Buffers
PEX Waveforms
show larger
Size may be necessary
here
Nirav A. Desai desai.nirav.12.09@gmail.com39
Brent Kung Adder Layout
Output Stage with Buffers
Nirav A. Desai desai.nirav.12.09@gmail.com40
Brent Kung Adder Layout
Full Layout: 49.5um X 48.6um
Nirav A. Desai desai.nirav.12.09@gmail.com41
Future Design Modifications
• The design uses large buffers at the output of every
stage to drive large capacitances
• The buffers are not needed at nodes with low fanouts
and can be eliminated.
• The buffers at input nodes right now cause more power
consumption and add to the delay .
• Thus the overall performance can be improved with fewer buffers.
Nirav A. Desai desai.nirav.12.09@gmail.com42
References:
Course Slides from Prof. Kia Bazargan’s Course
on VLSI
A Taxonomy of Parallel Prefix Networks
(David Harris ) – Reference paper on course
website
Digital Integrated Circuits by Jan Rabaey
Nirav A. Desai desai.nirav.12.09@gmail.com43
SRAM DESIGN PROJECT PHASE 2
Nirav Desai
4280229
VLSI DESIGN 2: Prof. Kia Bazargan
Dept. of ECE
College of Science and Engineering
University of Minnesota, Twin Cities
43
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com44
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE
•NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM
•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4
•Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website )
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com45
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE
•NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM
•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF
•Curve shows SRAM cell is close to write failure.
•Bitline Precharge to less than 1.1V could be explored to increase SNM.
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com46
Simulation Setup
• M0,M1,M3,M4 form the cross coupled inverter pair
• M5,M6 are access transistors
• C1, C2 is the bitline capacitance
• M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V
• V6 precharges bitbar and writes a 0 to the cell
V(write)
V(ic) V(word)
V(qbar)
V(q)
V(bitbar)V(bit)
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com47
Timing Waveforms for Characterization
V(write) – Applied to source of M7 (precharge switch)
V(word) – Wordline Voltage
V(qbar)
V(q)
V(ic) – Enables the precharge switch M7
V(bitbar)
V(bit)
• V(write) precharges Cbit to 0.8V via M7
• V(word) disables access transistors
M5 and M6 during precharge .
• V(qbar) and V(q) are used to generate
the butterfly curves.
• V(ic) enables M7 during precharge
It could be implemented as
NOT(V(word)).
• V(bitbar) precharges to 0.8V, shows
charge pumping when M7 turns off and
follows V(qbar) when wordline is
enabled.
• V(bit) follows V(q) after word line is
enabled.
• V(bit) precharged to Vdd by V6
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com48
PASS TRANSISTOR BASED TREE DESIGN
1:8 Row Decoder Tree
Similar Tree Decoder for 16 LSB Bits
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com49
TREE DECODER DESIGN
Nirav A. Desai desai.nirav.12.09@gmail.com50
PASS TRANSISTOR BASED TREE DESIGN
IN OUT
CK
CK
50
880
=
L
W
Identical Sizing for NMOS and PMOS to minimize charge injection effects
• Delay drops by ~40ps/2 for every
Doubling of transistor widths
• Delay drop saturates around
1000nM to 89ps
• Used W/L of 880/50 for final tree
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com51
TREE DECODER TIMING DIAGRAMS
The following waveforms were applied to the row and column selection inputs of the tree decoder
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com52
TREE DECODER TIMING DIAGRAMS
It takes one cycle for initializing
the tree decoder after which we get clean pulses for each row output
LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com53
TREE DECODER TIMING DIAGRAMS
The top waveforms shows the matrix point output where the row and column select inputs are high
The output node discharges when the input goes low
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com54
Nirav A. Desai desai.nirav.12.09@gmail.com55
READ WRITE CIRCUIT
( Design by Bong Jin )
Sense Amplifier Write Driver
Precharge Circuit
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com56
READ WRITE CIRCUIT TEST SETUP
Bitline Capacitance estimate from ASU PTM Website
Cbit estimate for 512 rows
NMOS Switches to allow read without disabling write circuit
Single SRAM Cell for
simulations
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com57
READ / WRITE TIMING WAVEFORMS
Precharge Pulse ( Active Low )
Data Meant to be written to cell
Write Enable Pulse
Read Enable Pulse
Output of Write Buffer
Disable output buffer ( tristate logic
Bitline
Bitline Bar
Data Output
Data Out Bar
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com58
SRAM Cell Layout
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com59
2X2 SRAM Array Layout
VDD
GND
GND
WORD 1
WORD 0
B0 B0BAR B1 B1BAR
This unit can be replicated in all directions without any changes. LVS check remaining
Array Size = 3.7975umX2.4725um
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com60
References
Digital Integrated Circuits
Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic
( SRAM Cell Design, Decoders, Read Write Circuits )
CMOS VLSI Design by Weste and Harris
( Butterfly Curves )
CMOS Circuit Design, Layout and Simulation
Baker, Li, Boyce (Decoder Design)
Course slides of Prof. Kia Bazargan
( Precharge Techniques, Decoders, SRAM Cell Design )
University of Minnesota
Nirav A. Desai desai.nirav.12.09@gmail.com61
System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) )
Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence
If the channel model H(z) is adapted using a LMS Model
Next few slides show regular LMS and modified LMS Error Convergence
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com62
Error Convergence for regular LMS takes more time than the modified LMS
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com63
Modified LMS Adapts all tap weights using different errors computed using as many
filter output estimates as the filter order. The assumption being that the optimum
gradient direction for each tap weight is different and is given by the corresponding error
Lattice Predictors would be a more efficient way to do this as compared to LMS since
each stage of a predictor is optimum for that order unlike modified LMS where you
adapt each tap weight in a sub optimal manner.
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com64
EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com65
Spectral Estimation for a low pass filtered impulse sequence using different techniques
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com66
Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains
Adaptive DSP Course by Prof. Keshab Parhi
Nirav A. Desai desai.nirav.12.09@gmail.com67
EE 5364 / CS 5204:
Advanced Computer Architecture
Final Course Project on
Design of a Branch Predictor
Prepared by:
Nirav Desai 4280229
Amanda Skinner 3749048
Course Instructor: Prof. Pen-Chung Yew
Department of ECE
University of Minnesota, Twin Cities
Nirav A. Desai desai.nirav.12.09@gmail.com68Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Why Branch Predictor?
• Branch Predictors improve the flow of
the instruction pipeline
• As Branch predictor accuracy increases,
cache misses decrease, or improve, for
both data and instruction caches
Nirav A. Desai desai.nirav.12.09@gmail.com69
Why Branch Predictor?
Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com70Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
• As branch predictor accuracy increases, cache misses go down
• Prefetching and increasing cache size decreases cache misses
Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache
associativities were changed
Why Prefetching ?
[4]
Nirav A. Desai desai.nirav.12.09@gmail.com71Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
• LA-PC runs ahead of PC and keeps track of load and store instructions
• RPT keeps track of previous reference addresses and strides for load
and store instructions
• L2 Cache prefetching can be done by storing spill over data and
instructions from L1 Cache blocks.
• INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop
Counter Local Branch Predictor
Reference Prediction Table[1]
Nirav A. Desai desai.nirav.12.09@gmail.com72
• Loop Counter would give high accuracy on matrix multiplication
• Track all registers for loop counter as possibility of different
interleaved threads using different registers
• Loop Counter error would imply dynamic update of registers
based on non-local values
• Tag registers giving repeated conditional branch errors on the
Branch Decision Table
• Use the O-GEHL predictor for all tagged branches
• Using the loop counter and duplicate ALU will allow indexing
long histories with limited geometric length
Design of Branch Predictor
Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com73Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Branch Decision Table
Branch
Address
Predicted
Direction
Predicted
Branch
Target
Actual
Direction
Actual
Branch
Target
Counters
Used
C(i)(j)
T
a
g
Counters
Used
C(i)(j)
Entered by
LA-PC
Entered by
Loop Counter or
O-GEHL
Entered by
Duplicate
ALU
Entered
by PC
Entered by
PC
Entered
by O-
GEHL
Entered by
O-GEHL
if prediction != actual decision
Prediction computed by Loop Counter ?
Yes - Incorrect Duplicate Register Values
Re-Initialize Duplicate Register Stack
Set LA-PC to PC
After 2 successive errors make an entry in O-GEHL
Also tag the branch address in Branch Decision Table
to be used with O-GEHL
Prediction computed by O-GEHL ?
Yes – Run the update equation on
counters listed in table
Set LA-PC to PC
Nirav A. Desai desai.nirav.12.09@gmail.com74Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Loop Counter Branch Predictor
Op-Code = 4 (beq) OR Op-Code = 5 (bne)
Duplicate Register Flag == 0 ?
Yes No
First Conditional Branch
Copy Register Stack to
Duplicate Register Stack
( Equivalent to initializing
the duplicate register stack)
Duplicate Register Stack Initialized
Set Register Flag for rs and rt = 1
These registers will be tracked by the Duplicate ALU
Proceed to Branch Prediction Computation
rs == rt ? rs != rt ?
Op code == 4 ? Op code == 5 ?
yesno yes noExecute
Copy Off-Set from bits 15 to bit 0
Sign Extend Off Set to bit 31 ( Total 32 bits )
Left Shift by 2 ( to get Word Address )
Add to PC+4 to get Branch Target Address
Inc
LA-PC
By 4
Inc
LA-PC
By 4
Do addition and subtraction for all
instructions having rs and rt with
register flags set to 1
rs – Bits 25:21 rt – Bits: 20:16
The loop counter looks at only
the conditional branches
Can be extended to bgtz, blez
Op-Code:
Bits 31:26
Nirav A. Desai desai.nirav.12.09@gmail.com75Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
O-GEHL Branch Predictor[2]
C12()
C11()
C24()
C23()
C22()
C21()
C39()
C38()
C37()
C36()
C35()
C34()
C33()
C32()
C31()
History Lengths go in Geometric Progression given by L(i) = αi-1
L(1) + constant
Best Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266
Dynamic History length fitting with variable α also possible.
C10266()
C10265()
C101()
Sum = ΣC(i)(j)+C(i+1)(k)+…C(i+9)(l)
• j,k,l .. Are incremented on every
unconditional branch.
• j increments are modulo 2,
k increments are modulo 4,
l increments are modulo 266.
• Each C(i)(j) is a 4 bit saturating counter
that counts -8 to 7.
• Counter Update given by:
if(p!=out)
if(branch==taken) c(i)(j)++
if(branch!=taken) c(i)(j)--
• Dynamic Threshold (θ) Fitting possible
• Threshold(θ) by default is 0.
Sum > θ then p = taken
Sum < θ then p = not taken
Nirav A. Desai desai.nirav.12.09@gmail.com76Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Duplicate ALU ( for MIPS )[3]
LA-PC Address -Instruction
Duplicate
Instruction Queue
Reg 3
Reg 2
Reg 1
Op
Code
31-26
25-21
20-16
15-11
Decode
Unit
Compare
Op-Code
Op-Code == 4 OR 5: (beq, bne) Use Loop Counter
Op-Code == 2 OR 3: (jump, jal) Always take
Op-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take
Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4
bits 27:2: Jump Target from instruction
bits 1:0 : 00 ( Word Addresses )
Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses
Compare Register Flags for reg1, reg2, reg3
If register flags set, do the computation for
Op-Code: 0 bits(5:0) 32: add r1, r2, r3
Op-Code: 0 bits(5:0) 34: sub r1, r2, r3
Op-Code: 0 bits(5:0) 33: addu r1, r2, r3
Op-Code: 0 bits(5:0) 35: subu r1, r2, r3
Op-Code: 8: addi r1, constant
Op-Code: 9: addiu r1, constant
• Set LA-PC Busy bit on instruction read
• When LA-PC updated by branch predictors,
busy bit reset
• For arithmetic, reset busy bit after 2 cycles
• Instruction read when busy bit reset
• LA-PC different from that used in RPT
This branch predictor can be used on Multi Threaded CPUs
Nirav A. Desai desai.nirav.12.09@gmail.com77
Test results on O-GEHL Branch
Predictor[5]
Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Nirav A. Desai desai.nirav.12.09@gmail.com78Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
References
1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty
Jean-Loup Baer, Tien-Fu Chen
Department of Computer Science and Engineering,
University of Washington, Seattle, WA 98195
Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
2. The O-GEHL Branch Predictor
Andre Seznec
The 1st JILP Championship Branch Prediction Competition CBP1 (2004)
Available from www.jilp.org
3. Computer Organisation and Design
The Hardware-Software Interface
David Patterson and John Hennessy
4. http://en.wikipedia.org/wiki/CPU_cache
5. Analysis of the Optimized GEHL Predictor
Andre Seznec
Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf
Nirav A. Desai desai.nirav.12.09@gmail.com79
Research Ideas I am working on
right now
Nirav A. Desai desai.nirav.12.09@gmail.com80
Strained Silicon on SiGe Solar Cell
• Requires Chemical Vapor Deposition or MBE techniques for fabrication
• Completed a short term course on Semiconductor Technology and Manufacturing at IIT Bombay
to learn about these techniques in November 2012.
• Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps.
• Optimal thickness at quarter wavelength will give maximum absorption at designed frequency
• Back plate metal contacts and top plate fingered contacts
• Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking
gas cylinders.
• Long term viability for power generation feasible due to low operating costs and low distribution
costs in a distributed model.
• Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high
conversion efficiency:
Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth IEEE
Date of Conference: 19-24 May 2002
Author(s): Usami, N.
Inst. for Mater. Res., Tohoku Univ., Sendai, Japan
Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K.
Page(s): 247 - 249
Nirav A. Desai desai.nirav.12.09@gmail.com81
Rake Receiver with MDS Codes
• Rake receivers could be used to identify strongest multi path component from a received signal.
• This could be achieved by correlating the received signal with itself over different delays and
finding the strongest delay component.
• This does not involve maximal ratio combining.
• It could be combined with MDS codes for wireless communications where given any d bits
corrupted by channel noise or multi path effects, the signal could still be recovered uniquely.
• Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2
taught at MIT.
• Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link:
http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-in-DSP
• Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB
Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?
article=1044&context=ojii_volumes
Nirav A. Desai desai.nirav.12.09@gmail.com82
Class S RF Power Amplifiers on
GaN HEMTs
• Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical
100% efficiency.
• GaN HEMTs give the best high frequency switching characteristics.
• The 2 features could be combined to give a high efficiency RF power amplifier topology.
• Under-graduate project on Class S Audio Amplifier Design
• Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg
• Reference: Device Evaluation for Current Mode Class D RF Power Amplifiers with high output
power and efficiency. Thesis of Thomas Dellsperger
http://www.ece.ucsb.edu/rad/pubs/master/tdellsperger_2003.pdf
• Reference: High linearity and high efficiency Class B RF Power Amplifiers in GaN HEMTs
http://www.ece.ucsb.edu/faculty/rodwell/publications_and_presentations/publications/239.pdf
Nirav A. Desai desai.nirav.12.09@gmail.com83
Microprocessor Design
• The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16
asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache
prefetch unit for a MIPS microprocessor.
• These design ideas could be combined with other ideas for pipeline design, ALU design and
interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM
CMOS.
• Various power reduction and clock gating techniques could be applied at a higher level of the
hierarchy.
• Clock gating could be done at a coarse level like not clocking a core which is not being used or
at a fine level where the modules not being used are not clocked. In a deeply pipelined design,
the divider need not be clocked if only multiply and accumulate operations are being carried out.
• References for clock gating: Clock Tree Power Optimization based on RTL clock gating:
http://dl.acm.org/citation.cfm?id=775989
• Attended tutorials at the VLSI Design Conference in 2013 to learn more about these techniques.
• Clock gating could be done using higher power FETs.
Nirav A. Desai desai.nirav.12.09@gmail.com84
mm-wave MIMO OFDM
• mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity
• mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial
diversity and get higher data rate.
• Reference:
• 4 channel spatial multiplexing over a mm-wave line of sight link
Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International
Date of Conference: 7-12 June 2009
Author(s): Sheldon, C.
Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA
Munkyo Seo ; Torkildson, E. ; Rodwell, M. ; Madhow, U.
Page(s): 389 - 392
Nirav A. Desai desai.nirav.12.09@gmail.com85
Routing algorithm to reduce
congestion
• The routing algorithm to reduce congestion could be based on the idea of sparsity.
• High congestion nodes could be dropped from the network map till congestion on the node
drops.
• The underlying packet streams would be using a flow control based routing protocol.
• Each node would store a map of the network which would be updated periodically using ping
back messages.
• Could be applied to packet switched networks, traffic control and wireless sensor networks.
• Reference: Flow control based routers developed by Anagran.
• Reference: Ad-Hoc On Demand Distance based algorithms treat packets as flows by leaving
backwards pointers to subsequent packets in the chain at each router nodes.
http://www.cs.ucsb.edu/~ebelding/txt/wmcsa99.pdf
Nirav A. Desai desai.nirav.12.09@gmail.com86
Photonic Computers
• These could use multiplexer based logic gates.
• Photonic multiplexers have been widely researched and developed for optical communications.
• Phase detectors could be used to identify the phase and thus the value of the stored signal.
• These would use electronic charge storage and high speed electro-optic conversion.
• Reference: Prior research on this has been carried out in UCSB.

More Related Content

Viewers also liked

How to raise money for an NGO
How to raise money for an NGOHow to raise money for an NGO
How to raise money for an NGONirav Desai
 
Accenture managing-maintenance-and-support-costs
Accenture managing-maintenance-and-support-costsAccenture managing-maintenance-and-support-costs
Accenture managing-maintenance-and-support-costsKarthik Arumugham
 
Design of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSDesign of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSNirav Desai
 
Active Noise Reduction by the Filtered xLMS Algorithm
Active Noise Reduction by the Filtered xLMS AlgorithmActive Noise Reduction by the Filtered xLMS Algorithm
Active Noise Reduction by the Filtered xLMS AlgorithmNirav Desai
 
Design of a web portal for farmer's insurance company
Design of a web portal for farmer's insurance companyDesign of a web portal for farmer's insurance company
Design of a web portal for farmer's insurance companyNirav Desai
 
Porter’s value chain food retail 2
Porter’s value chain food retail 2Porter’s value chain food retail 2
Porter’s value chain food retail 2Nirav Desai
 
Design of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOSDesign of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOSNirav Desai
 
A CASE STUDY ON SONY CORPORATION
A CASE STUDY ON SONY CORPORATIONA CASE STUDY ON SONY CORPORATION
A CASE STUDY ON SONY CORPORATIONNirav Desai
 

Viewers also liked (8)

How to raise money for an NGO
How to raise money for an NGOHow to raise money for an NGO
How to raise money for an NGO
 
Accenture managing-maintenance-and-support-costs
Accenture managing-maintenance-and-support-costsAccenture managing-maintenance-and-support-costs
Accenture managing-maintenance-and-support-costs
 
Design of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOSDesign of a high speed low power Brent Kung Adder in 45nM CMOS
Design of a high speed low power Brent Kung Adder in 45nM CMOS
 
Active Noise Reduction by the Filtered xLMS Algorithm
Active Noise Reduction by the Filtered xLMS AlgorithmActive Noise Reduction by the Filtered xLMS Algorithm
Active Noise Reduction by the Filtered xLMS Algorithm
 
Design of a web portal for farmer's insurance company
Design of a web portal for farmer's insurance companyDesign of a web portal for farmer's insurance company
Design of a web portal for farmer's insurance company
 
Porter’s value chain food retail 2
Porter’s value chain food retail 2Porter’s value chain food retail 2
Porter’s value chain food retail 2
 
Design of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOSDesign of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOS
 
A CASE STUDY ON SONY CORPORATION
A CASE STUDY ON SONY CORPORATIONA CASE STUDY ON SONY CORPORATION
A CASE STUDY ON SONY CORPORATION
 

Similar to Research presentation

Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Labs
 
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...Javed Barkatullah
 
Fixed Point Conversion
Fixed Point ConversionFixed Point Conversion
Fixed Point ConversionRajesh Sharma
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Cache Attacks On Secret Key Cryptosystems
Cache Attacks On Secret Key CryptosystemsCache Attacks On Secret Key Cryptosystems
Cache Attacks On Secret Key Cryptosystemslawuah
 
Final vlsi projectreport
Final vlsi projectreportFinal vlsi projectreport
Final vlsi projectreportphilipsinter
 
Layout design on MICROWIND
Layout design on MICROWINDLayout design on MICROWIND
Layout design on MICROWINDvaibhav jindal
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon
 
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLA Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLidescitation
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifierSoumyajit Langal
 
Paper on Optimized AES Algorithm Core Using FeedBack Architecture
Paper on Optimized AES Algorithm Core Using  FeedBack Architecture Paper on Optimized AES Algorithm Core Using  FeedBack Architecture
Paper on Optimized AES Algorithm Core Using FeedBack Architecture Dhaval Kaneria
 
Handout_fft_see_this.pdf Fast forrier transform
Handout_fft_see_this.pdf Fast forrier transformHandout_fft_see_this.pdf Fast forrier transform
Handout_fft_see_this.pdf Fast forrier transformatharmarajah
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Design Basics on Power Amplifiers
Design Basics on Power Amplifiers Design Basics on Power Amplifiers
Design Basics on Power Amplifiers ls234
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Hemant Jha
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
nbsingh-CSIR-CEERI-Semiconductor-Activities
nbsingh-CSIR-CEERI-Semiconductor-Activitiesnbsingh-CSIR-CEERI-Semiconductor-Activities
nbsingh-CSIR-CEERI-Semiconductor-ActivitiesNarendra Bahadur Singh
 

Similar to Research presentation (20)

Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
M Tech New Syllabus(2012)
M Tech New Syllabus(2012)M Tech New Syllabus(2012)
M Tech New Syllabus(2012)
 
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...
GOLDSTRIKETM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BI...
 
Fixed Point Conversion
Fixed Point ConversionFixed Point Conversion
Fixed Point Conversion
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Cache Attacks On Secret Key Cryptosystems
Cache Attacks On Secret Key CryptosystemsCache Attacks On Secret Key Cryptosystems
Cache Attacks On Secret Key Cryptosystems
 
Rc6 algorithm
Rc6 algorithmRc6 algorithm
Rc6 algorithm
 
Final vlsi projectreport
Final vlsi projectreportFinal vlsi projectreport
Final vlsi projectreport
 
Layout design on MICROWIND
Layout design on MICROWINDLayout design on MICROWIND
Layout design on MICROWIND
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 
11
1111
11
 
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLA Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifier
 
Paper on Optimized AES Algorithm Core Using FeedBack Architecture
Paper on Optimized AES Algorithm Core Using  FeedBack Architecture Paper on Optimized AES Algorithm Core Using  FeedBack Architecture
Paper on Optimized AES Algorithm Core Using FeedBack Architecture
 
Handout_fft_see_this.pdf Fast forrier transform
Handout_fft_see_this.pdf Fast forrier transformHandout_fft_see_this.pdf Fast forrier transform
Handout_fft_see_this.pdf Fast forrier transform
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Design Basics on Power Amplifiers
Design Basics on Power Amplifiers Design Basics on Power Amplifiers
Design Basics on Power Amplifiers
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
nbsingh-CSIR-CEERI-Semiconductor-Activities
nbsingh-CSIR-CEERI-Semiconductor-Activitiesnbsingh-CSIR-CEERI-Semiconductor-Activities
nbsingh-CSIR-CEERI-Semiconductor-Activities
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Research presentation

  • 1. Nirav A. Desai desai.nirav.12.09@gmail.com1
  • 2. Nirav A. Desai desai.nirav.12.09@gmail.com2
  • 3. Nirav A. Desai desai.nirav.12.09@gmail.com3 MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer Work on the design done together by: Nirav Desai, Munkyo Seo, Colin Sheldon, Mark Rodwell
  • 4. Nirav A. Desai desai.nirav.12.09@gmail.com4 I assisted in these mm-wave MIMO experiments at UCSB
  • 5. Nirav A. Desai desai.nirav.12.09@gmail.com5
  • 6. Nirav A. Desai desai.nirav.12.09@gmail.com6
  • 7. Nirav A. Desai desai.nirav.12.09@gmail.com7
  • 8. Nirav A. Desai desai.nirav.12.09@gmail.com8
  • 9. Nirav A. Desai desai.nirav.12.09@gmail.com9
  • 10. Nirav A. Desai desai.nirav.12.09@gmail.com10
  • 11. Nirav A. Desai desai.nirav.12.09@gmail.com11
  • 12. Nirav A. Desai desai.nirav.12.09@gmail.com12
  • 13. Nirav A. Desai desai.nirav.12.09@gmail.com13
  • 14. Nirav A. Desai desai.nirav.12.09@gmail.com14
  • 15. Nirav A. Desai desai.nirav.12.09@gmail.com15
  • 16. Nirav A. Desai desai.nirav.12.09@gmail.com16
  • 17. Nirav A. Desai desai.nirav.12.09@gmail.com17
  • 18. Nirav A. Desai desai.nirav.12.09@gmail.com18
  • 19. Nirav A. Desai desai.nirav.12.09@gmail.com19
  • 20. Nirav A. Desai desai.nirav.12.09@gmail.com20
  • 21. Nirav A. Desai desai.nirav.12.09@gmail.com21
  • 22. Nirav A. Desai desai.nirav.12.09@gmail.com22
  • 23. Nirav A. Desai desai.nirav.12.09@gmail.com23
  • 24. Nirav A. Desai desai.nirav.12.09@gmail.com24 EE 5323: VLSI DESIGN 1 PROJECT Course Instructor: Prof. Chris Kim 16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS Nirav Desai ID: 4280229 Department of Electrical and Computer Engineering University of Minnesota
  • 25. Nirav A. Desai desai.nirav.12.09@gmail.com25
  • 26. Nirav A. Desai desai.nirav.12.09@gmail.com26 Brent Kung Adder Gate Level Diagram 1. Input Block with Pre Computation Input Adder Chain 1 Input Adder Chain 2 Input Adder Chain 3 Input Adder Chain 4 1X 1X 1X 1X 1.224X 1.562X 1.23X 1.274X 1.097X 1.553X 1.108X 1.034X 3.883X 3.043X 2.943X 10.1683X 10.8506X 36X 40X Output Buffers to drive Capacitive Loads Output Buffers to drive Capacitive Loads Pi*Pi-1 Gi + Pi*Gi-1
  • 27. Nirav A. Desai desai.nirav.12.09@gmail.com27 Brent Kung Adder Gate Level Diagram 2. Intermediate Dot Product Blocks Intermediate Adder Chain 1 Intermediate Adder Chain 2 1X 1X 1X 1X 1.72X 6X 4X 16X 16X Output Buffers to drive Capacitive Loads Pi*Pi-1 Gi + Pi*Gi-1
  • 28. Nirav A. Desai desai.nirav.12.09@gmail.com28 Brent Kung Adder Gate Level Diagram 3. Output Block for Post Computation 1.182X 1.117X Ci-1 Pi Output Buffers to drive Capacitive Loads Si
  • 29. Nirav A. Desai desai.nirav.12.09@gmail.com29 Brent Kung Adder Transistor Level Design XOR GATE
  • 30. Nirav A. Desai desai.nirav.12.09@gmail.com30 Brent Kung Adder Transistor Level Design Inverter Design Optimization • NMOS Width = 90nm • PMOS / NMOS Length = 50nM • Vdd = 1.1V • Current Averaged Over One Period of 2 ns • Optimal PMOS Width = 165nM • βinverter = 165/90 = 1.834 • Sizing for NAND, NOR and XOR Changed appropriately
  • 31. Nirav A. Desai desai.nirav.12.09@gmail.com31 Brent Kung Adder Transistor Level Design 1. Input Block with Pre Computation Input Adder Block Chain 1 Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H Gate Name BUFFER INVERTER NOR INVERTER NAND LOAD h g value 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540 f value 3.540 3.540 2.151 3.540 2.618648 b value 2.893 2.400 1.000 1.000 1.000 1.000 S Value 1.000 1.224 1.097 3.883 10.16831 36.000 Input Adder Block Chain 2 Gate Number 1.000 2.000 3.000 4.000 Stage G Stage F Stage B Stage H Gate H Gate Name BUFFER INVERTER XOR NAND LOAD h g value 1.000 1.000 1.893 1.295 13.748 2.451 13.748 12.359 416.510 4.518 f value 4.518 4.518 2.386 3.488 b value 2.893 2.400 1.780 1.000 1.000 S Value 1.000 1.562 1.553 3.043 13.748 Input Adder Block Chain 3 Gate Number 1.000 2.000 3.000 Stage G Stage F Stage B Stage H Gate H Gate Name BUFFER INVERTER NOR LOAD h g value 1.000 1.000 1.646 3.941 1.646 3.941 6.943 45.038 3.558 f value 3.558 3.558 2.162 b value 2.893 2.400 1.000 S Value 1.000 1.230 1.108 3.941 Input Adder Block Chain 4 Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H Gate Name BUFFER INVERTER XOR NAND INVERTER LOAD h g value 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686 f value 3.686 3.686 1.947 2.847 3.686447 b value 2.893 2.400 1.000 1.000 1.000 1.000 S Value 1.000 1.274 1.034 2.943 10.85056 40.000 3.94084 Logical Effort Design for Signal Chains labeled in previous slide #2
  • 32. Nirav A. Desai desai.nirav.12.09@gmail.com32 Brent Kung Adder Transistor Level Design 2. Intermediate Dot Product Blocks Logical Effort Design for Signal Chains labeled in previous slide #3 Intermediate Adder Block Chain 1 Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H Gate Name INVERTER NAND LOAD h g value 1.000 1.352 1.000 1.352 6.000 1.000 8.112 2.848 f value 2.848 2.107 2.848 b value 1.000 1.000 1.000 S Value 1.000 2.107 6.000 Intermediate Adder Block Chain 2 Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H Gate Name BUFFER NAND LOAD h g value 1.000 1.352 2.848 1.352 2.848 2.000 7.701 2.775 f value 2.775 2.053 b value 2.000 1.000 S Value 1.000 1.026
  • 33. Nirav A. Desai desai.nirav.12.09@gmail.com33 Brent Kung Adder Simulated Performance Voltage (V) Delay Max-C14 (nS) Power Max (mW) Power-Delay Product (xE-12) 1.1 0.359 6.73 2.41 0.9 0.503 2.95 1.483 0.7 0.937 0.924 0.865 Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design of individual chains Voltage (V) Delay Max-C14 (nS) Power Max (mW) Power-Delay Product (xE-12) 1.1 0.403 5.186 2.089 0.9 0.569 2.277 1.295 0.7 1.069 0.692 0.739 Simulations with minimally sized 1 stage buffers Without Parasitic Extraction and Interconnect Parasitics buffering doesn’t improve performance significantly.
  • 34. Nirav A. Desai desai.nirav.12.09@gmail.com34 Brent Kung Adder Worst Case Delay Input Pattern: A: FFFF B: 0000 -> 0001 Dotted Lines show Carry Bits 15 and 14 Carry Bit 15 Carry Bit 14
  • 35. Nirav A. Desai desai.nirav.12.09@gmail.com35 Brent Kung Adder Layout Input Block with Pre Computation Input Inverters for Bit 0 and Bit 1 Output Buffers PEX waveforms show larger size may be needed XOR NAND 10X
  • 36. Nirav A. Desai desai.nirav.12.09@gmail.com36 Brent Kung Adder Layout XOR 1.553X
  • 37. Nirav A. Desai desai.nirav.12.09@gmail.com37 Brent Kung Adder Layout NAND 10.57X Layout with inter digitated fingers to reduce parasitics
  • 38. Nirav A. Desai desai.nirav.12.09@gmail.com38 Brent Kung Adder Layout Intermediate Dot Product Generator Output Buffers PEX Waveforms show larger Size may be necessary here
  • 39. Nirav A. Desai desai.nirav.12.09@gmail.com39 Brent Kung Adder Layout Output Stage with Buffers
  • 40. Nirav A. Desai desai.nirav.12.09@gmail.com40 Brent Kung Adder Layout Full Layout: 49.5um X 48.6um
  • 41. Nirav A. Desai desai.nirav.12.09@gmail.com41 Future Design Modifications • The design uses large buffers at the output of every stage to drive large capacitances • The buffers are not needed at nodes with low fanouts and can be eliminated. • The buffers at input nodes right now cause more power consumption and add to the delay . • Thus the overall performance can be improved with fewer buffers.
  • 42. Nirav A. Desai desai.nirav.12.09@gmail.com42 References: Course Slides from Prof. Kia Bazargan’s Course on VLSI A Taxonomy of Parallel Prefix Networks (David Harris ) – Reference paper on course website Digital Integrated Circuits by Jan Rabaey
  • 43. Nirav A. Desai desai.nirav.12.09@gmail.com43 SRAM DESIGN PROJECT PHASE 2 Nirav Desai 4280229 VLSI DESIGN 2: Prof. Kia Bazargan Dept. of ECE College of Science and Engineering University of Minnesota, Twin Cities 43 University of Minnesota
  • 44. Nirav A. Desai desai.nirav.12.09@gmail.com44 SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE •NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM •NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4 •Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website ) University of Minnesota
  • 45. Nirav A. Desai desai.nirav.12.09@gmail.com45 SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE •NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM •NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF •Curve shows SRAM cell is close to write failure. •Bitline Precharge to less than 1.1V could be explored to increase SNM. University of Minnesota
  • 46. Nirav A. Desai desai.nirav.12.09@gmail.com46 Simulation Setup • M0,M1,M3,M4 form the cross coupled inverter pair • M5,M6 are access transistors • C1, C2 is the bitline capacitance • M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V • V6 precharges bitbar and writes a 0 to the cell V(write) V(ic) V(word) V(qbar) V(q) V(bitbar)V(bit) University of Minnesota
  • 47. Nirav A. Desai desai.nirav.12.09@gmail.com47 Timing Waveforms for Characterization V(write) – Applied to source of M7 (precharge switch) V(word) – Wordline Voltage V(qbar) V(q) V(ic) – Enables the precharge switch M7 V(bitbar) V(bit) • V(write) precharges Cbit to 0.8V via M7 • V(word) disables access transistors M5 and M6 during precharge . • V(qbar) and V(q) are used to generate the butterfly curves. • V(ic) enables M7 during precharge It could be implemented as NOT(V(word)). • V(bitbar) precharges to 0.8V, shows charge pumping when M7 turns off and follows V(qbar) when wordline is enabled. • V(bit) follows V(q) after word line is enabled. • V(bit) precharged to Vdd by V6 University of Minnesota
  • 48. Nirav A. Desai desai.nirav.12.09@gmail.com48 PASS TRANSISTOR BASED TREE DESIGN 1:8 Row Decoder Tree Similar Tree Decoder for 16 LSB Bits University of Minnesota
  • 49. Nirav A. Desai desai.nirav.12.09@gmail.com49 TREE DECODER DESIGN
  • 50. Nirav A. Desai desai.nirav.12.09@gmail.com50 PASS TRANSISTOR BASED TREE DESIGN IN OUT CK CK 50 880 = L W Identical Sizing for NMOS and PMOS to minimize charge injection effects • Delay drops by ~40ps/2 for every Doubling of transistor widths • Delay drop saturates around 1000nM to 89ps • Used W/L of 880/50 for final tree University of Minnesota
  • 51. Nirav A. Desai desai.nirav.12.09@gmail.com51 TREE DECODER TIMING DIAGRAMS The following waveforms were applied to the row and column selection inputs of the tree decoder University of Minnesota
  • 52. Nirav A. Desai desai.nirav.12.09@gmail.com52 TREE DECODER TIMING DIAGRAMS It takes one cycle for initializing the tree decoder after which we get clean pulses for each row output LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next University of Minnesota
  • 53. Nirav A. Desai desai.nirav.12.09@gmail.com53 TREE DECODER TIMING DIAGRAMS The top waveforms shows the matrix point output where the row and column select inputs are high The output node discharges when the input goes low University of Minnesota
  • 54. Nirav A. Desai desai.nirav.12.09@gmail.com54
  • 55. Nirav A. Desai desai.nirav.12.09@gmail.com55 READ WRITE CIRCUIT ( Design by Bong Jin ) Sense Amplifier Write Driver Precharge Circuit University of Minnesota
  • 56. Nirav A. Desai desai.nirav.12.09@gmail.com56 READ WRITE CIRCUIT TEST SETUP Bitline Capacitance estimate from ASU PTM Website Cbit estimate for 512 rows NMOS Switches to allow read without disabling write circuit Single SRAM Cell for simulations University of Minnesota
  • 57. Nirav A. Desai desai.nirav.12.09@gmail.com57 READ / WRITE TIMING WAVEFORMS Precharge Pulse ( Active Low ) Data Meant to be written to cell Write Enable Pulse Read Enable Pulse Output of Write Buffer Disable output buffer ( tristate logic Bitline Bitline Bar Data Output Data Out Bar University of Minnesota
  • 58. Nirav A. Desai desai.nirav.12.09@gmail.com58 SRAM Cell Layout University of Minnesota
  • 59. Nirav A. Desai desai.nirav.12.09@gmail.com59 2X2 SRAM Array Layout VDD GND GND WORD 1 WORD 0 B0 B0BAR B1 B1BAR This unit can be replicated in all directions without any changes. LVS check remaining Array Size = 3.7975umX2.4725um University of Minnesota
  • 60. Nirav A. Desai desai.nirav.12.09@gmail.com60 References Digital Integrated Circuits Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic ( SRAM Cell Design, Decoders, Read Write Circuits ) CMOS VLSI Design by Weste and Harris ( Butterfly Curves ) CMOS Circuit Design, Layout and Simulation Baker, Li, Boyce (Decoder Design) Course slides of Prof. Kia Bazargan ( Precharge Techniques, Decoders, SRAM Cell Design ) University of Minnesota
  • 61. Nirav A. Desai desai.nirav.12.09@gmail.com61 System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) ) Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence If the channel model H(z) is adapted using a LMS Model Next few slides show regular LMS and modified LMS Error Convergence Adaptive DSP Course by Prof. Keshab Parhi
  • 62. Nirav A. Desai desai.nirav.12.09@gmail.com62 Error Convergence for regular LMS takes more time than the modified LMS Adaptive DSP Course by Prof. Keshab Parhi
  • 63. Nirav A. Desai desai.nirav.12.09@gmail.com63 Modified LMS Adapts all tap weights using different errors computed using as many filter output estimates as the filter order. The assumption being that the optimum gradient direction for each tap weight is different and is given by the corresponding error Lattice Predictors would be a more efficient way to do this as compared to LMS since each stage of a predictor is optimum for that order unlike modified LMS where you adapt each tap weight in a sub optimal manner. Adaptive DSP Course by Prof. Keshab Parhi
  • 64. Nirav A. Desai desai.nirav.12.09@gmail.com64 EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences Adaptive DSP Course by Prof. Keshab Parhi
  • 65. Nirav A. Desai desai.nirav.12.09@gmail.com65 Spectral Estimation for a low pass filtered impulse sequence using different techniques Adaptive DSP Course by Prof. Keshab Parhi
  • 66. Nirav A. Desai desai.nirav.12.09@gmail.com66 Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains Adaptive DSP Course by Prof. Keshab Parhi
  • 67. Nirav A. Desai desai.nirav.12.09@gmail.com67 EE 5364 / CS 5204: Advanced Computer Architecture Final Course Project on Design of a Branch Predictor Prepared by: Nirav Desai 4280229 Amanda Skinner 3749048 Course Instructor: Prof. Pen-Chung Yew Department of ECE University of Minnesota, Twin Cities
  • 68. Nirav A. Desai desai.nirav.12.09@gmail.com68Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS Why Branch Predictor? • Branch Predictors improve the flow of the instruction pipeline • As Branch predictor accuracy increases, cache misses decrease, or improve, for both data and instruction caches
  • 69. Nirav A. Desai desai.nirav.12.09@gmail.com69 Why Branch Predictor? Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
  • 70. Nirav A. Desai desai.nirav.12.09@gmail.com70Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS • As branch predictor accuracy increases, cache misses go down • Prefetching and increasing cache size decreases cache misses Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache associativities were changed Why Prefetching ? [4]
  • 71. Nirav A. Desai desai.nirav.12.09@gmail.com71Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS • LA-PC runs ahead of PC and keeps track of load and store instructions • RPT keeps track of previous reference addresses and strides for load and store instructions • L2 Cache prefetching can be done by storing spill over data and instructions from L1 Cache blocks. • INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop Counter Local Branch Predictor Reference Prediction Table[1]
  • 72. Nirav A. Desai desai.nirav.12.09@gmail.com72 • Loop Counter would give high accuracy on matrix multiplication • Track all registers for loop counter as possibility of different interleaved threads using different registers • Loop Counter error would imply dynamic update of registers based on non-local values • Tag registers giving repeated conditional branch errors on the Branch Decision Table • Use the O-GEHL predictor for all tagged branches • Using the loop counter and duplicate ALU will allow indexing long histories with limited geometric length Design of Branch Predictor Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
  • 73. Nirav A. Desai desai.nirav.12.09@gmail.com73Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS Branch Decision Table Branch Address Predicted Direction Predicted Branch Target Actual Direction Actual Branch Target Counters Used C(i)(j) T a g Counters Used C(i)(j) Entered by LA-PC Entered by Loop Counter or O-GEHL Entered by Duplicate ALU Entered by PC Entered by PC Entered by O- GEHL Entered by O-GEHL if prediction != actual decision Prediction computed by Loop Counter ? Yes - Incorrect Duplicate Register Values Re-Initialize Duplicate Register Stack Set LA-PC to PC After 2 successive errors make an entry in O-GEHL Also tag the branch address in Branch Decision Table to be used with O-GEHL Prediction computed by O-GEHL ? Yes – Run the update equation on counters listed in table Set LA-PC to PC
  • 74. Nirav A. Desai desai.nirav.12.09@gmail.com74Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS Loop Counter Branch Predictor Op-Code = 4 (beq) OR Op-Code = 5 (bne) Duplicate Register Flag == 0 ? Yes No First Conditional Branch Copy Register Stack to Duplicate Register Stack ( Equivalent to initializing the duplicate register stack) Duplicate Register Stack Initialized Set Register Flag for rs and rt = 1 These registers will be tracked by the Duplicate ALU Proceed to Branch Prediction Computation rs == rt ? rs != rt ? Op code == 4 ? Op code == 5 ? yesno yes noExecute Copy Off-Set from bits 15 to bit 0 Sign Extend Off Set to bit 31 ( Total 32 bits ) Left Shift by 2 ( to get Word Address ) Add to PC+4 to get Branch Target Address Inc LA-PC By 4 Inc LA-PC By 4 Do addition and subtraction for all instructions having rs and rt with register flags set to 1 rs – Bits 25:21 rt – Bits: 20:16 The loop counter looks at only the conditional branches Can be extended to bgtz, blez Op-Code: Bits 31:26
  • 75. Nirav A. Desai desai.nirav.12.09@gmail.com75Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS O-GEHL Branch Predictor[2] C12() C11() C24() C23() C22() C21() C39() C38() C37() C36() C35() C34() C33() C32() C31() History Lengths go in Geometric Progression given by L(i) = αi-1 L(1) + constant Best Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266 Dynamic History length fitting with variable α also possible. C10266() C10265() C101() Sum = ΣC(i)(j)+C(i+1)(k)+…C(i+9)(l) • j,k,l .. Are incremented on every unconditional branch. • j increments are modulo 2, k increments are modulo 4, l increments are modulo 266. • Each C(i)(j) is a 4 bit saturating counter that counts -8 to 7. • Counter Update given by: if(p!=out) if(branch==taken) c(i)(j)++ if(branch!=taken) c(i)(j)-- • Dynamic Threshold (θ) Fitting possible • Threshold(θ) by default is 0. Sum > θ then p = taken Sum < θ then p = not taken
  • 76. Nirav A. Desai desai.nirav.12.09@gmail.com76Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS Duplicate ALU ( for MIPS )[3] LA-PC Address -Instruction Duplicate Instruction Queue Reg 3 Reg 2 Reg 1 Op Code 31-26 25-21 20-16 15-11 Decode Unit Compare Op-Code Op-Code == 4 OR 5: (beq, bne) Use Loop Counter Op-Code == 2 OR 3: (jump, jal) Always take Op-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4 bits 27:2: Jump Target from instruction bits 1:0 : 00 ( Word Addresses ) Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses Compare Register Flags for reg1, reg2, reg3 If register flags set, do the computation for Op-Code: 0 bits(5:0) 32: add r1, r2, r3 Op-Code: 0 bits(5:0) 34: sub r1, r2, r3 Op-Code: 0 bits(5:0) 33: addu r1, r2, r3 Op-Code: 0 bits(5:0) 35: subu r1, r2, r3 Op-Code: 8: addi r1, constant Op-Code: 9: addiu r1, constant • Set LA-PC Busy bit on instruction read • When LA-PC updated by branch predictors, busy bit reset • For arithmetic, reset busy bit after 2 cycles • Instruction read when busy bit reset • LA-PC different from that used in RPT This branch predictor can be used on Multi Threaded CPUs
  • 77. Nirav A. Desai desai.nirav.12.09@gmail.com77 Test results on O-GEHL Branch Predictor[5] Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS
  • 78. Nirav A. Desai desai.nirav.12.09@gmail.com78Nirav Desai 4280229 ECE Amanda Skinner 3749048 CS References 1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty Jean-Loup Baer, Tien-Fu Chen Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing 2. The O-GEHL Branch Predictor Andre Seznec The 1st JILP Championship Branch Prediction Competition CBP1 (2004) Available from www.jilp.org 3. Computer Organisation and Design The Hardware-Software Interface David Patterson and John Hennessy 4. http://en.wikipedia.org/wiki/CPU_cache 5. Analysis of the Optimized GEHL Predictor Andre Seznec Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf
  • 79. Nirav A. Desai desai.nirav.12.09@gmail.com79 Research Ideas I am working on right now
  • 80. Nirav A. Desai desai.nirav.12.09@gmail.com80 Strained Silicon on SiGe Solar Cell • Requires Chemical Vapor Deposition or MBE techniques for fabrication • Completed a short term course on Semiconductor Technology and Manufacturing at IIT Bombay to learn about these techniques in November 2012. • Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps. • Optimal thickness at quarter wavelength will give maximum absorption at designed frequency • Back plate metal contacts and top plate fingered contacts • Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking gas cylinders. • Long term viability for power generation feasible due to low operating costs and low distribution costs in a distributed model. • Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high conversion efficiency: Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth IEEE Date of Conference: 19-24 May 2002 Author(s): Usami, N. Inst. for Mater. Res., Tohoku Univ., Sendai, Japan Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K. Page(s): 247 - 249
  • 81. Nirav A. Desai desai.nirav.12.09@gmail.com81 Rake Receiver with MDS Codes • Rake receivers could be used to identify strongest multi path component from a received signal. • This could be achieved by correlating the received signal with itself over different delays and finding the strongest delay component. • This does not involve maximal ratio combining. • It could be combined with MDS codes for wireless communications where given any d bits corrupted by channel noise or multi path effects, the signal could still be recovered uniquely. • Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2 taught at MIT. • Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link: http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-in-DSP • Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi? article=1044&context=ojii_volumes
  • 82. Nirav A. Desai desai.nirav.12.09@gmail.com82 Class S RF Power Amplifiers on GaN HEMTs • Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical 100% efficiency. • GaN HEMTs give the best high frequency switching characteristics. • The 2 features could be combined to give a high efficiency RF power amplifier topology. • Under-graduate project on Class S Audio Amplifier Design • Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg • Reference: Device Evaluation for Current Mode Class D RF Power Amplifiers with high output power and efficiency. Thesis of Thomas Dellsperger http://www.ece.ucsb.edu/rad/pubs/master/tdellsperger_2003.pdf • Reference: High linearity and high efficiency Class B RF Power Amplifiers in GaN HEMTs http://www.ece.ucsb.edu/faculty/rodwell/publications_and_presentations/publications/239.pdf
  • 83. Nirav A. Desai desai.nirav.12.09@gmail.com83 Microprocessor Design • The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16 asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache prefetch unit for a MIPS microprocessor. • These design ideas could be combined with other ideas for pipeline design, ALU design and interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM CMOS. • Various power reduction and clock gating techniques could be applied at a higher level of the hierarchy. • Clock gating could be done at a coarse level like not clocking a core which is not being used or at a fine level where the modules not being used are not clocked. In a deeply pipelined design, the divider need not be clocked if only multiply and accumulate operations are being carried out. • References for clock gating: Clock Tree Power Optimization based on RTL clock gating: http://dl.acm.org/citation.cfm?id=775989 • Attended tutorials at the VLSI Design Conference in 2013 to learn more about these techniques. • Clock gating could be done using higher power FETs.
  • 84. Nirav A. Desai desai.nirav.12.09@gmail.com84 mm-wave MIMO OFDM • mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity • mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial diversity and get higher data rate. • Reference: • 4 channel spatial multiplexing over a mm-wave line of sight link Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International Date of Conference: 7-12 June 2009 Author(s): Sheldon, C. Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA Munkyo Seo ; Torkildson, E. ; Rodwell, M. ; Madhow, U. Page(s): 389 - 392
  • 85. Nirav A. Desai desai.nirav.12.09@gmail.com85 Routing algorithm to reduce congestion • The routing algorithm to reduce congestion could be based on the idea of sparsity. • High congestion nodes could be dropped from the network map till congestion on the node drops. • The underlying packet streams would be using a flow control based routing protocol. • Each node would store a map of the network which would be updated periodically using ping back messages. • Could be applied to packet switched networks, traffic control and wireless sensor networks. • Reference: Flow control based routers developed by Anagran. • Reference: Ad-Hoc On Demand Distance based algorithms treat packets as flows by leaving backwards pointers to subsequent packets in the chain at each router nodes. http://www.cs.ucsb.edu/~ebelding/txt/wmcsa99.pdf
  • 86. Nirav A. Desai desai.nirav.12.09@gmail.com86 Photonic Computers • These could use multiplexer based logic gates. • Photonic multiplexers have been widely researched and developed for optical communications. • Phase detectors could be used to identify the phase and thus the value of the stored signal. • These would use electronic charge storage and high speed electro-optic conversion. • Reference: Prior research on this has been carried out in UCSB.

Editor's Notes

  1. Department of ECE University of Minnesota University of Minnesota
  2. Department of ECE University of Minnesota University of Minnesota
  3. Department of ECE University of Minnesota University of Minnesota