3. Nirav A. Desai desai.nirav.12.09@gmail.com3
MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer
Work on the design done
together by:
Nirav Desai, Munkyo Seo,
Colin Sheldon, Mark Rodwell
4. Nirav A. Desai desai.nirav.12.09@gmail.com4
I assisted in these mm-wave MIMO
experiments at UCSB
24. Nirav A. Desai desai.nirav.12.09@gmail.com24
EE 5323: VLSI DESIGN 1 PROJECT
Course Instructor: Prof. Chris Kim
16-bit BRENT KUNG ADDER DESIGN in 45nM CMOS
Nirav Desai
ID: 4280229
Department of Electrical and Computer Engineering
University of Minnesota
28. Nirav A. Desai desai.nirav.12.09@gmail.com28
Brent Kung Adder Gate Level Diagram
3. Output Block for Post Computation
1.182X
1.117X
Ci-1
Pi
Output Buffers to drive
Capacitive Loads
Si
29. Nirav A. Desai desai.nirav.12.09@gmail.com29
Brent Kung Adder Transistor Level Design
XOR GATE
30. Nirav A. Desai desai.nirav.12.09@gmail.com30
Brent Kung Adder Transistor Level Design
Inverter Design Optimization
• NMOS Width = 90nm
• PMOS / NMOS Length = 50nM
• Vdd = 1.1V
• Current Averaged Over
One Period of 2 ns
• Optimal PMOS Width = 165nM
• βinverter = 165/90 = 1.834
• Sizing for NAND, NOR and XOR
Changed appropriately
31. Nirav A. Desai desai.nirav.12.09@gmail.com31
Brent Kung Adder Transistor Level Design
1. Input Block with Pre Computation
Input Adder Block Chain 1
Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER NOR INVERTER NAND LOAD h
g value 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540
f value 3.540 3.540 2.151 3.540 2.618648
b value 2.893 2.400 1.000 1.000 1.000 1.000
S Value 1.000 1.224 1.097 3.883 10.16831 36.000
Input Adder Block Chain 2
Gate Number 1.000 2.000 3.000 4.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER XOR NAND LOAD h
g value 1.000 1.000 1.893 1.295 13.748 2.451 13.748 12.359 416.510 4.518
f value 4.518 4.518 2.386 3.488
b value 2.893 2.400 1.780 1.000 1.000
S Value 1.000 1.562 1.553 3.043 13.748
Input Adder Block Chain 3
Gate Number 1.000 2.000 3.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER NOR LOAD h
g value 1.000 1.000 1.646 3.941 1.646 3.941 6.943 45.038 3.558
f value 3.558 3.558 2.162
b value 2.893 2.400 1.000
S Value 1.000 1.230 1.108 3.941
Input Adder Block Chain 4
Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER INVERTER XOR NAND INVERTER LOAD h
g value 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686
f value 3.686 3.686 1.947 2.847 3.686447
b value 2.893 2.400 1.000 1.000 1.000 1.000
S Value 1.000 1.274 1.034 2.943 10.85056 40.000
3.94084
Logical Effort Design for Signal
Chains labeled in previous slide #2
32. Nirav A. Desai desai.nirav.12.09@gmail.com32
Brent Kung Adder Transistor Level Design
2. Intermediate Dot Product Blocks
Logical Effort Design for Signal
Chains labeled in previous slide #3
Intermediate Adder Block Chain 1
Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H
Gate Name INVERTER NAND LOAD h
g value 1.000 1.352 1.000 1.352 6.000 1.000 8.112 2.848
f value 2.848 2.107 2.848
b value 1.000 1.000 1.000
S Value 1.000 2.107 6.000
Intermediate Adder Block Chain 2
Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate H
Gate Name BUFFER NAND LOAD h
g value 1.000 1.352 2.848 1.352 2.848 2.000 7.701 2.775
f value 2.775 2.053
b value 2.000 1.000
S Value 1.000 1.026
33. Nirav A. Desai desai.nirav.12.09@gmail.com33
Brent Kung Adder Simulated Performance
Voltage (V) Delay Max-C14
(nS)
Power Max
(mW)
Power-Delay
Product (xE-12)
1.1 0.359 6.73 2.41
0.9 0.503 2.95 1.483
0.7 0.937 0.924 0.865
Simulations with maximally sized 1 stage buffers as determined by Logical Effort Design
of individual chains
Voltage (V) Delay Max-C14
(nS)
Power Max
(mW)
Power-Delay
Product (xE-12)
1.1 0.403 5.186 2.089
0.9 0.569 2.277 1.295
0.7 1.069 0.692 0.739
Simulations with minimally sized 1 stage buffers
Without Parasitic Extraction and Interconnect Parasitics buffering doesn’t improve performance significantly.
34. Nirav A. Desai desai.nirav.12.09@gmail.com34
Brent Kung Adder Worst Case Delay
Input Pattern: A: FFFF B: 0000 -> 0001
Dotted Lines show Carry Bits 15 and 14
Carry Bit 15 Carry Bit 14
35. Nirav A. Desai desai.nirav.12.09@gmail.com35
Brent Kung Adder Layout
Input Block with Pre Computation
Input Inverters for Bit 0 and Bit 1
Output Buffers
PEX waveforms show
larger size may be needed
XOR
NAND
10X
36. Nirav A. Desai desai.nirav.12.09@gmail.com36
Brent Kung Adder Layout
XOR 1.553X
37. Nirav A. Desai desai.nirav.12.09@gmail.com37
Brent Kung Adder Layout
NAND 10.57X Layout with inter digitated fingers to reduce parasitics
38. Nirav A. Desai desai.nirav.12.09@gmail.com38
Brent Kung Adder Layout
Intermediate Dot Product Generator
Output Buffers
PEX Waveforms
show larger
Size may be necessary
here
39. Nirav A. Desai desai.nirav.12.09@gmail.com39
Brent Kung Adder Layout
Output Stage with Buffers
40. Nirav A. Desai desai.nirav.12.09@gmail.com40
Brent Kung Adder Layout
Full Layout: 49.5um X 48.6um
41. Nirav A. Desai desai.nirav.12.09@gmail.com41
Future Design Modifications
• The design uses large buffers at the output of every
stage to drive large capacitances
• The buffers are not needed at nodes with low fanouts
and can be eliminated.
• The buffers at input nodes right now cause more power
consumption and add to the delay .
• Thus the overall performance can be improved with fewer buffers.
42. Nirav A. Desai desai.nirav.12.09@gmail.com42
References:
Course Slides from Prof. Kia Bazargan’s Course
on VLSI
A Taxonomy of Parallel Prefix Networks
(David Harris ) – Reference paper on course
website
Digital Integrated Circuits by Jan Rabaey
43. Nirav A. Desai desai.nirav.12.09@gmail.com43
SRAM DESIGN PROJECT PHASE 2
Nirav Desai
4280229
VLSI DESIGN 2: Prof. Kia Bazargan
Dept. of ECE
College of Science and Engineering
University of Minnesota, Twin Cities
43
University of Minnesota
44. Nirav A. Desai desai.nirav.12.09@gmail.com44
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE
•NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM
•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4
•Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website )
University of Minnesota
45. Nirav A. Desai desai.nirav.12.09@gmail.com45
SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE
•NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM
•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF
•Curve shows SRAM cell is close to write failure.
•Bitline Precharge to less than 1.1V could be explored to increase SNM.
University of Minnesota
46. Nirav A. Desai desai.nirav.12.09@gmail.com46
Simulation Setup
• M0,M1,M3,M4 form the cross coupled inverter pair
• M5,M6 are access transistors
• C1, C2 is the bitline capacitance
• M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V
• V6 precharges bitbar and writes a 0 to the cell
V(write)
V(ic) V(word)
V(qbar)
V(q)
V(bitbar)V(bit)
University of Minnesota
47. Nirav A. Desai desai.nirav.12.09@gmail.com47
Timing Waveforms for Characterization
V(write) – Applied to source of M7 (precharge switch)
V(word) – Wordline Voltage
V(qbar)
V(q)
V(ic) – Enables the precharge switch M7
V(bitbar)
V(bit)
• V(write) precharges Cbit to 0.8V via M7
• V(word) disables access transistors
M5 and M6 during precharge .
• V(qbar) and V(q) are used to generate
the butterfly curves.
• V(ic) enables M7 during precharge
It could be implemented as
NOT(V(word)).
• V(bitbar) precharges to 0.8V, shows
charge pumping when M7 turns off and
follows V(qbar) when wordline is
enabled.
• V(bit) follows V(q) after word line is
enabled.
• V(bit) precharged to Vdd by V6
University of Minnesota
48. Nirav A. Desai desai.nirav.12.09@gmail.com48
PASS TRANSISTOR BASED TREE DESIGN
1:8 Row Decoder Tree
Similar Tree Decoder for 16 LSB Bits
University of Minnesota
49. Nirav A. Desai desai.nirav.12.09@gmail.com49
TREE DECODER DESIGN
50. Nirav A. Desai desai.nirav.12.09@gmail.com50
PASS TRANSISTOR BASED TREE DESIGN
IN OUT
CK
CK
50
880
=
L
W
Identical Sizing for NMOS and PMOS to minimize charge injection effects
• Delay drops by ~40ps/2 for every
Doubling of transistor widths
• Delay drop saturates around
1000nM to 89ps
• Used W/L of 880/50 for final tree
University of Minnesota
51. Nirav A. Desai desai.nirav.12.09@gmail.com51
TREE DECODER TIMING DIAGRAMS
The following waveforms were applied to the row and column selection inputs of the tree decoder
University of Minnesota
52. Nirav A. Desai desai.nirav.12.09@gmail.com52
TREE DECODER TIMING DIAGRAMS
It takes one cycle for initializing
the tree decoder after which we get clean pulses for each row output
LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next
University of Minnesota
53. Nirav A. Desai desai.nirav.12.09@gmail.com53
TREE DECODER TIMING DIAGRAMS
The top waveforms shows the matrix point output where the row and column select inputs are high
The output node discharges when the input goes low
University of Minnesota
55. Nirav A. Desai desai.nirav.12.09@gmail.com55
READ WRITE CIRCUIT
( Design by Bong Jin )
Sense Amplifier Write Driver
Precharge Circuit
University of Minnesota
56. Nirav A. Desai desai.nirav.12.09@gmail.com56
READ WRITE CIRCUIT TEST SETUP
Bitline Capacitance estimate from ASU PTM Website
Cbit estimate for 512 rows
NMOS Switches to allow read without disabling write circuit
Single SRAM Cell for
simulations
University of Minnesota
57. Nirav A. Desai desai.nirav.12.09@gmail.com57
READ / WRITE TIMING WAVEFORMS
Precharge Pulse ( Active Low )
Data Meant to be written to cell
Write Enable Pulse
Read Enable Pulse
Output of Write Buffer
Disable output buffer ( tristate logic
Bitline
Bitline Bar
Data Output
Data Out Bar
University of Minnesota
58. Nirav A. Desai desai.nirav.12.09@gmail.com58
SRAM Cell Layout
University of Minnesota
59. Nirav A. Desai desai.nirav.12.09@gmail.com59
2X2 SRAM Array Layout
VDD
GND
GND
WORD 1
WORD 0
B0 B0BAR B1 B1BAR
This unit can be replicated in all directions without any changes. LVS check remaining
Array Size = 3.7975umX2.4725um
University of Minnesota
60. Nirav A. Desai desai.nirav.12.09@gmail.com60
References
Digital Integrated Circuits
Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic
( SRAM Cell Design, Decoders, Read Write Circuits )
CMOS VLSI Design by Weste and Harris
( Butterfly Curves )
CMOS Circuit Design, Layout and Simulation
Baker, Li, Boyce (Decoder Design)
Course slides of Prof. Kia Bazargan
( Precharge Techniques, Decoders, SRAM Cell Design )
University of Minnesota
61. Nirav A. Desai desai.nirav.12.09@gmail.com61
System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) )
Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence
If the channel model H(z) is adapted using a LMS Model
Next few slides show regular LMS and modified LMS Error Convergence
Adaptive DSP Course by Prof. Keshab Parhi
62. Nirav A. Desai desai.nirav.12.09@gmail.com62
Error Convergence for regular LMS takes more time than the modified LMS
Adaptive DSP Course by Prof. Keshab Parhi
63. Nirav A. Desai desai.nirav.12.09@gmail.com63
Modified LMS Adapts all tap weights using different errors computed using as many
filter output estimates as the filter order. The assumption being that the optimum
gradient direction for each tap weight is different and is given by the corresponding error
Lattice Predictors would be a more efficient way to do this as compared to LMS since
each stage of a predictor is optimum for that order unlike modified LMS where you
adapt each tap weight in a sub optimal manner.
Adaptive DSP Course by Prof. Keshab Parhi
64. Nirav A. Desai desai.nirav.12.09@gmail.com64
EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences
Adaptive DSP Course by Prof. Keshab Parhi
65. Nirav A. Desai desai.nirav.12.09@gmail.com65
Spectral Estimation for a low pass filtered impulse sequence using different techniques
Adaptive DSP Course by Prof. Keshab Parhi
66. Nirav A. Desai desai.nirav.12.09@gmail.com66
Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains
Adaptive DSP Course by Prof. Keshab Parhi
67. Nirav A. Desai desai.nirav.12.09@gmail.com67
EE 5364 / CS 5204:
Advanced Computer Architecture
Final Course Project on
Design of a Branch Predictor
Prepared by:
Nirav Desai 4280229
Amanda Skinner 3749048
Course Instructor: Prof. Pen-Chung Yew
Department of ECE
University of Minnesota, Twin Cities
68. Nirav A. Desai desai.nirav.12.09@gmail.com68Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Why Branch Predictor?
• Branch Predictors improve the flow of
the instruction pipeline
• As Branch predictor accuracy increases,
cache misses decrease, or improve, for
both data and instruction caches
70. Nirav A. Desai desai.nirav.12.09@gmail.com70Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
• As branch predictor accuracy increases, cache misses go down
• Prefetching and increasing cache size decreases cache misses
Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache
associativities were changed
Why Prefetching ?
[4]
71. Nirav A. Desai desai.nirav.12.09@gmail.com71Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
• LA-PC runs ahead of PC and keeps track of load and store instructions
• RPT keeps track of previous reference addresses and strides for load
and store instructions
• L2 Cache prefetching can be done by storing spill over data and
instructions from L1 Cache blocks.
• INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop
Counter Local Branch Predictor
Reference Prediction Table[1]
72. Nirav A. Desai desai.nirav.12.09@gmail.com72
• Loop Counter would give high accuracy on matrix multiplication
• Track all registers for loop counter as possibility of different
interleaved threads using different registers
• Loop Counter error would imply dynamic update of registers
based on non-local values
• Tag registers giving repeated conditional branch errors on the
Branch Decision Table
• Use the O-GEHL predictor for all tagged branches
• Using the loop counter and duplicate ALU will allow indexing
long histories with limited geometric length
Design of Branch Predictor
Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
73. Nirav A. Desai desai.nirav.12.09@gmail.com73Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Branch Decision Table
Branch
Address
Predicted
Direction
Predicted
Branch
Target
Actual
Direction
Actual
Branch
Target
Counters
Used
C(i)(j)
T
a
g
Counters
Used
C(i)(j)
Entered by
LA-PC
Entered by
Loop Counter or
O-GEHL
Entered by
Duplicate
ALU
Entered
by PC
Entered by
PC
Entered
by O-
GEHL
Entered by
O-GEHL
if prediction != actual decision
Prediction computed by Loop Counter ?
Yes - Incorrect Duplicate Register Values
Re-Initialize Duplicate Register Stack
Set LA-PC to PC
After 2 successive errors make an entry in O-GEHL
Also tag the branch address in Branch Decision Table
to be used with O-GEHL
Prediction computed by O-GEHL ?
Yes – Run the update equation on
counters listed in table
Set LA-PC to PC
74. Nirav A. Desai desai.nirav.12.09@gmail.com74Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Loop Counter Branch Predictor
Op-Code = 4 (beq) OR Op-Code = 5 (bne)
Duplicate Register Flag == 0 ?
Yes No
First Conditional Branch
Copy Register Stack to
Duplicate Register Stack
( Equivalent to initializing
the duplicate register stack)
Duplicate Register Stack Initialized
Set Register Flag for rs and rt = 1
These registers will be tracked by the Duplicate ALU
Proceed to Branch Prediction Computation
rs == rt ? rs != rt ?
Op code == 4 ? Op code == 5 ?
yesno yes noExecute
Copy Off-Set from bits 15 to bit 0
Sign Extend Off Set to bit 31 ( Total 32 bits )
Left Shift by 2 ( to get Word Address )
Add to PC+4 to get Branch Target Address
Inc
LA-PC
By 4
Inc
LA-PC
By 4
Do addition and subtraction for all
instructions having rs and rt with
register flags set to 1
rs – Bits 25:21 rt – Bits: 20:16
The loop counter looks at only
the conditional branches
Can be extended to bgtz, blez
Op-Code:
Bits 31:26
75. Nirav A. Desai desai.nirav.12.09@gmail.com75Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
O-GEHL Branch Predictor[2]
C12()
C11()
C24()
C23()
C22()
C21()
C39()
C38()
C37()
C36()
C35()
C34()
C33()
C32()
C31()
History Lengths go in Geometric Progression given by L(i) = αi-1
L(1) + constant
Best Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266
Dynamic History length fitting with variable α also possible.
C10266()
C10265()
C101()
Sum = ΣC(i)(j)+C(i+1)(k)+…C(i+9)(l)
• j,k,l .. Are incremented on every
unconditional branch.
• j increments are modulo 2,
k increments are modulo 4,
l increments are modulo 266.
• Each C(i)(j) is a 4 bit saturating counter
that counts -8 to 7.
• Counter Update given by:
if(p!=out)
if(branch==taken) c(i)(j)++
if(branch!=taken) c(i)(j)--
• Dynamic Threshold (θ) Fitting possible
• Threshold(θ) by default is 0.
Sum > θ then p = taken
Sum < θ then p = not taken
76. Nirav A. Desai desai.nirav.12.09@gmail.com76Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
Duplicate ALU ( for MIPS )[3]
LA-PC Address -Instruction
Duplicate
Instruction Queue
Reg 3
Reg 2
Reg 1
Op
Code
31-26
25-21
20-16
15-11
Decode
Unit
Compare
Op-Code
Op-Code == 4 OR 5: (beq, bne) Use Loop Counter
Op-Code == 2 OR 3: (jump, jal) Always take
Op-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take
Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4
bits 27:2: Jump Target from instruction
bits 1:0 : 00 ( Word Addresses )
Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses
Compare Register Flags for reg1, reg2, reg3
If register flags set, do the computation for
Op-Code: 0 bits(5:0) 32: add r1, r2, r3
Op-Code: 0 bits(5:0) 34: sub r1, r2, r3
Op-Code: 0 bits(5:0) 33: addu r1, r2, r3
Op-Code: 0 bits(5:0) 35: subu r1, r2, r3
Op-Code: 8: addi r1, constant
Op-Code: 9: addiu r1, constant
• Set LA-PC Busy bit on instruction read
• When LA-PC updated by branch predictors,
busy bit reset
• For arithmetic, reset busy bit after 2 cycles
• Instruction read when busy bit reset
• LA-PC different from that used in RPT
This branch predictor can be used on Multi Threaded CPUs
77. Nirav A. Desai desai.nirav.12.09@gmail.com77
Test results on O-GEHL Branch
Predictor[5]
Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
78. Nirav A. Desai desai.nirav.12.09@gmail.com78Nirav Desai 4280229 ECE
Amanda Skinner 3749048 CS
References
1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty
Jean-Loup Baer, Tien-Fu Chen
Department of Computer Science and Engineering,
University of Washington, Seattle, WA 98195
Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
2. The O-GEHL Branch Predictor
Andre Seznec
The 1st JILP Championship Branch Prediction Competition CBP1 (2004)
Available from www.jilp.org
3. Computer Organisation and Design
The Hardware-Software Interface
David Patterson and John Hennessy
4. http://en.wikipedia.org/wiki/CPU_cache
5. Analysis of the Optimized GEHL Predictor
Andre Seznec
Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf
79. Nirav A. Desai desai.nirav.12.09@gmail.com79
Research Ideas I am working on
right now
80. Nirav A. Desai desai.nirav.12.09@gmail.com80
Strained Silicon on SiGe Solar Cell
• Requires Chemical Vapor Deposition or MBE techniques for fabrication
• Completed a short term course on Semiconductor Technology and Manufacturing at IIT Bombay
to learn about these techniques in November 2012.
• Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps.
• Optimal thickness at quarter wavelength will give maximum absorption at designed frequency
• Back plate metal contacts and top plate fingered contacts
• Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking
gas cylinders.
• Long term viability for power generation feasible due to low operating costs and low distribution
costs in a distributed model.
• Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high
conversion efficiency:
Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth IEEE
Date of Conference: 19-24 May 2002
Author(s): Usami, N.
Inst. for Mater. Res., Tohoku Univ., Sendai, Japan
Takahashi, T. ; Fujiwara, K. ; Ujihara, T. ; Sazaki, G. ; Murakami, Y. ; Nakajima, K.
Page(s): 247 - 249
81. Nirav A. Desai desai.nirav.12.09@gmail.com81
Rake Receiver with MDS Codes
• Rake receivers could be used to identify strongest multi path component from a received signal.
• This could be achieved by correlating the received signal with itself over different delays and
finding the strongest delay component.
• This does not involve maximal ratio combining.
• It could be combined with MDS codes for wireless communications where given any d bits
corrupted by channel noise or multi path effects, the signal could still be recovered uniquely.
• Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2
taught at MIT.
• Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link:
http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-in-DSP
• Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB
Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?
article=1044&context=ojii_volumes
82. Nirav A. Desai desai.nirav.12.09@gmail.com82
Class S RF Power Amplifiers on
GaN HEMTs
• Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical
100% efficiency.
• GaN HEMTs give the best high frequency switching characteristics.
• The 2 features could be combined to give a high efficiency RF power amplifier topology.
• Under-graduate project on Class S Audio Amplifier Design
• Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg
• Reference: Device Evaluation for Current Mode Class D RF Power Amplifiers with high output
power and efficiency. Thesis of Thomas Dellsperger
http://www.ece.ucsb.edu/rad/pubs/master/tdellsperger_2003.pdf
• Reference: High linearity and high efficiency Class B RF Power Amplifiers in GaN HEMTs
http://www.ece.ucsb.edu/faculty/rodwell/publications_and_presentations/publications/239.pdf
83. Nirav A. Desai desai.nirav.12.09@gmail.com83
Microprocessor Design
• The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16
asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache
prefetch unit for a MIPS microprocessor.
• These design ideas could be combined with other ideas for pipeline design, ALU design and
interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM
CMOS.
• Various power reduction and clock gating techniques could be applied at a higher level of the
hierarchy.
• Clock gating could be done at a coarse level like not clocking a core which is not being used or
at a fine level where the modules not being used are not clocked. In a deeply pipelined design,
the divider need not be clocked if only multiply and accumulate operations are being carried out.
• References for clock gating: Clock Tree Power Optimization based on RTL clock gating:
http://dl.acm.org/citation.cfm?id=775989
• Attended tutorials at the VLSI Design Conference in 2013 to learn more about these techniques.
• Clock gating could be done using higher power FETs.
84. Nirav A. Desai desai.nirav.12.09@gmail.com84
mm-wave MIMO OFDM
• mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity
• mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial
diversity and get higher data rate.
• Reference:
• 4 channel spatial multiplexing over a mm-wave line of sight link
Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S International
Date of Conference: 7-12 June 2009
Author(s): Sheldon, C.
Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA
Munkyo Seo ; Torkildson, E. ; Rodwell, M. ; Madhow, U.
Page(s): 389 - 392
85. Nirav A. Desai desai.nirav.12.09@gmail.com85
Routing algorithm to reduce
congestion
• The routing algorithm to reduce congestion could be based on the idea of sparsity.
• High congestion nodes could be dropped from the network map till congestion on the node
drops.
• The underlying packet streams would be using a flow control based routing protocol.
• Each node would store a map of the network which would be updated periodically using ping
back messages.
• Could be applied to packet switched networks, traffic control and wireless sensor networks.
• Reference: Flow control based routers developed by Anagran.
• Reference: Ad-Hoc On Demand Distance based algorithms treat packets as flows by leaving
backwards pointers to subsequent packets in the chain at each router nodes.
http://www.cs.ucsb.edu/~ebelding/txt/wmcsa99.pdf
86. Nirav A. Desai desai.nirav.12.09@gmail.com86
Photonic Computers
• These could use multiplexer based logic gates.
• Photonic multiplexers have been widely researched and developed for optical communications.
• Phase detectors could be used to identify the phase and thus the value of the stored signal.
• These would use electronic charge storage and high speed electro-optic conversion.
• Reference: Prior research on this has been carried out in UCSB.
Editor's Notes
Department of ECE University of Minnesota University of Minnesota
Department of ECE University of Minnesota University of Minnesota
Department of ECE University of Minnesota University of Minnesota