WIRELESS COMMUNICATIONS
From Systems to Silicon
Raghu Rao
Wireless Systems Group,
Xilinx Inc.
R. M. Rao, 2008
2
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
R. M. Rao, 2008
3
Communications Roadmap
• Key markets
• Core DSP technologies
– OFDM
– MIMO
• IP Network is key
• Enables new
approaches to
– QoS management
– Robustness
– Capacity
R. M. Rao, 2008
4
Wireless Environment
• Multipaths caused by reflections from various
objects.
R. M. Rao, 2008
5
Modeling the Channel
• As the mobile moves through the environment, the
field strength varies due to :
– Free space path loss
– Long term (slow) fading
– Short term (fast) fading
log(distance)
Signal
Level
(dB)
pathloss
long term fading
short term fading
R. M. Rao, 2008
6
Doppler
• Changes in the received carrier frequency due to the
relative motion of the mobile to the base station
• f= fd = (v/l) cos(q)
– for f=900 MHz, v = 70 MPH (112 km/h)
– fD-max = v/l = 93.3 Hz
q
D=v. t
R. M. Rao, 2008
7
Delay Spread
• Measure of the time distribution of power in the channel
impulse response
– Typical office 25 ns to 60 ns
– Large Lobbies and atria: 100 ns
– Warehouse and factory floors: 100 ns to 200 ns
– Delay spreads are up to 10 microseconds in cellular environments
• Greater than 3 msec in urban areas
• 0.5 ms in suburban and open areas
R. M. Rao, 2008
8
Exponential Power Delay Profile
• If the delay spread of the channel is larger than the symbol interval
we will see multiple paths in our channel.
• Leads to inter-symbol interference (ISI).
• Leads to a frequency selective channel.
• Average energy of the channel impulse response follows an
exponential power-delay profile.
R. M. Rao, 2008
9
Coherence Bandwidth
• Maximum frequency bandwidth for which the signals
are still considered to be correlated.
• Bc in Hz = 1/(2ptrms) when considering amplitude
correlation (correlation coefficient = 0.5)
• trms is the rms-delay-spread of the channel
R. M. Rao, 2008
10
Coherence Time
• Maximum time period for which the signals are still
considered to be correlated.
• It is used to characterize the time varying nature of
the channel.
• Rule of Thumb 9/(16pfm)<Tc<0.423/(fm)
– fm is the maximum Doppler frequency
– Correlation coefficient = 0.5
R. M. Rao, 2008
11
Link Budget
• A link budget is used to compute the range, transmit
power, receiver sensitivity and other requirements of the
communication system.
• In free space the path loss is given by the Friis equation :
• Gt , Gr represent transmit and receive antenna gains. Pt ,
Pr represent the transmit power and receive power.
is the wavelength, d is the distance.
2
2 2
(4 )
t t r
r
PG G
P
d
l
p

l
R. M. Rao, 2008
12
Link Budget
• Expressing path loss in dB :
• Note: is the path loss exponent depending on the
environment (2 in free space).
• To compute the SNR at the baseband we need to include
thermal noise in the signal bandwidth B, and noise figure
of the system NF.
( ) ( ) ( ) ( ) 20log( ) ( ).10log( )
4
r t t r
P dB P dB G dB G dB d
l

p
    

( ) 174 / 10log( )
r
P dB dBm Hz NF B SNR
    
R. M. Rao, 2008
13
Link Budget
• Margin for desired outage taking into account receiver
structure and antenna diversity.
– Standards specify outage probabilities
– WiMax – 90% in the cell, 75% at the boundary of the cell.
• Compensation factors for other impairments
– Interference from neighbouring cell
– Shadow fading, etc.
• Diversity helps achieve the outage probability (or reduces
the margin for outage) without increase in transmit power.
R. M. Rao, 2008
14
Diversity
• Diversity provides the receiver with multiple looks at the transmitted
signal.
• Prob(all channels in a fade) << Prob(any 1 channel in a fade)
• Diversity improves link reliability.
0 20 40 60 80 100 120 140 160 180 200
-20
-15
-10
-5
0
5
10
Time
Signal
Level
(dB)
Channel 1
Channel 2
Combined
channel
R. M. Rao, 2008
15
Diversity Techniques
• Spatial Diversity
– Antennas “sufficiently spaced” apart (> ½ wavelength).
– Will result in an independent channel response and provide another look at the
transmitted signal.
• Frequency Diversity
– Transmit over multiple carrier frequencies.
– If the frequencies are “sufficiently far” (coherence bandwidth) apart the channel
response will be different on the different frequencies.
• Time Diversity
– Channel is continuously changing.
– Transmit signals “sufficiently spaced” (coherence time) apart in time so the 2nd
transmission “sees” a different channel compared to the first one.
• Polarization Diversity
– Signals transmitted on two orthogonal polarizations exhibit uncorrelated fading
statistics.
R. M. Rao, 2008
16
MIMO Systems
Tx Antenna 1
Tx Antenna 2
Rx Antenna 1
Rx Antenna 2
Tx Antenna M Rx Antenna N
H
• MIMO systems:
• Multiple Antennas at the transmitter and
receiver.
• 3 types of MIMO Systems:
• STBC MIMO systems
• Diversity gain.
• Spatial Multiplexing MIMO systems
• Capacity/throughput gain.
• Feedback MIMO systems
• Higher performance thru interference
reduction.
• MISO (multiple input single output) Systems:
• STBC can be used with just 1 receive antenna.
• Provides diversity gain.
• To achieve array gain, need knowledge of
channel at the transmitter (feedback).
R. M. Rao, 2008
17
Spatial Multiplexing
• A spatial multiplexing MIMO system transmits different data symbols from each
transmitter.
• The signals from each transmitter combine over the air and are received by multiple
receive antennas.
• SM systems have a rate=M (num transmit antennas). The diversity order depends on
the type of encoding and receiver (uncoded SM with ML decoding has diversity
order=N (num receive antennas)).
MODULATOR
MODULATOR
MODULATOR
MIMO
Receiver
MIMO
Receiver
x(t)
y(t)
z(t)
r1(t) = a11x(t)+a12y(t)+a13z(t)
r3(t) = a31x(t)+a32y(t)+a33z(t)
x(n)
y(n)
z(n)
x(n)
y(n)
z(n)
R. M. Rao, 2008
18
Spatial Multiplexing Receivers
Zero Forcing receiver:
11
h
22
h
21
h
12
h
Tx Antenna 1
Tx Antenna 2
Rx Antenna 1
Rx Antenna 2
1 11 1 12 2 1
2 21 1 22 2 2
1 11 12 1 1
2 21 22 2 2
1 1
2 2
1
1 11 12 1
2 21 22 2
ˆ
ˆ
ˆ
ˆ
y h x h x n
y h x h x n
y h h x n
y h h x n
x y
x y
x h h y
x h h y
W

  
  
       
 
       
       
   
 

   
 
 
   
     

     
     
Significant increase in noise when the channel is in a deep fade.
For ZF receivers 1
W H 

R. M. Rao, 2008
19
Spatial Multiplexing Receivers
• MMSE MIMO Decoders:
– Cancels interference and minimizes noise.
– Minimizes the over all error (mean squared error).
2
ˆ
[( ) ]
E x x

1
H H
MMSE M
s
M M
W H H I H
E SNR

 
 
 
 
R. M. Rao, 2008
20
Spatial Multiplexing Receivers
• Zero-Forcing
• MMSE
• Successive Interference cancellation receivers
• Sphere detectors (sub-optimal Maximum
Likelihood)
R. M. Rao, 2008
21
Transmit Diversity
• Space Time Block Code (STBC)
– 2 Antenna STBC also known as “Alamouti Code”.
– Improves BER/SER performance.
Information
Source
Constellation
Mapper Alamouti ST
block code
h1
h2
Symbol
Period 2
Symbol
Period 1
STBC
Decoder
ML Decision
ML Decision
Soft decision for
c1
Soft decision for
c2
1 1 1 2 2
r hc h c
 
* *
2 1 2 2 1
( ) ( )
r h c h c
  
R. M. Rao, 2008
22
STBC Decoder
1 1 2 1 1
* * * *
2 2 1 2 2
r h h c n
r h h c n
       
 
       

       
r Hc n
 
Decoder:
2 2 1 1
1 2 *
2 2
ˆ ( )
0
ˆ ( )
0
H H
c H r H Hc n
c n
c h h
c n
  
   
  
   
   
In matrix form the received signal is:
Low complexity decoder.
Just 2 complex mults
per symbol for a 2
antenna system (and
grows linearly with block
length/num antennas).
R. M. Rao, 2008
23
Other MIMO schemes
• Achieving high rate high diversity MIMO systems
is an area of active research.
• There are many suboptimal STBC schemes that
improve the rate but reduce the diversity order.
• There are also combinations of spatial
multiplexing and STBC schemes.
• One such scheme is 2 (or more) Alamouti’s in
parallel.
R. M. Rao, 2008
24
Stacked Alamouti
Information
Source
Constellation
Mapper Alamouti ST
block code
Constellation
Mapper Alamouti ST
block code
Data Stream 1
Data Stream 2
Interference Cancellation and ML
Decision
C1
C2
Data Stream 1
Data Stream 2
r1
r2
Receiver for Interference Cancelling STBC
Transmitter for Interference Cancelling STBC
• Interference Cancelling STBC
• 2 Alamouti’s in parallel
• Rate 2 system
• Diversity order =
N*(M-K+1)
– K : co-channel users
– N : transmit antennas per user.
– M : receive antennas
• Requires N*(K-1)+1 antennas
at the receiver to suppress K-1
interferers.
R. M. Rao, 2008
25
Orthogonal Frequency Division
Multiplexing (OFDM)
Frequency
Magnitude
OFDM divides a frequency selective channel into a number
of flat fading channels
R. M. Rao, 2008
26
OFDM Modulation
QAM
Mapping
IFFT
Cyclic
Prefix
S/P P/S
D/A
and
RF
(a)
RF
and
A/D
Strip
cyclic
prefix
S/P FFT P/S
QAM
decoding
(b)
FEQ
• A QAM symbol is modulated onto each subcarrier
• IFFT/FFT are used for efficient modulation and demodulation
Frequency Domain Time Domain
Time Domain Frequency Domain
R. M. Rao, 2008
27
Combating Multipath
• Sampling at instant Ts all channels experience
the same channel and there is no ICI
Multipath components
tmax
Sampling Instant
Ts
OFDM Symbol
CP
Constructing the cyclic prefix (CP)
R. M. Rao, 2008
28
MIMO and OFDM
• MIMO – Multiple Input Multiple Output
Communication System. Employs multiple
antennas at both transmitter and receiver.
• OFDM – Orthogonal Frequency Division
Multiplexing. Breaks up a broadband channel into
many parallel narrowband channels (subcarriers).
• MIMO-OFDM – A Combination of MIMO and
OFDM. Appears like many parallel MIMO systems
on orthogonal subcarriers.
R. M. Rao, 2008
29
MIMO-OFDM System
OFDM
TRANSMITTER 1
OFDM
TRANSMITTER N
OFDM
DEMODULATOR 1
OFDM
DEMODULATOR N
RICH
SCATTERING
ENVIRONMENT
MIMO
DECODER
Each transmitter is an independent OFDM modulator.
The source symbols could be space-time block coded or just QAM modulated
for spatial multiplexing.
Each receiver is an OFDM demodulator combined with a MIMO decoder to
invert the channel on each subcarrier and extract the source symbols.
R. M. Rao, 2008
30
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
R. M. Rao, 2008
31
802.16/802.16e
• The 802.16 WirelessMAN standard includes
requirements for operation in :
– Line Of Sight (LOS), 10-66 GHz for fixed wireless systems.
– Non Line Of Sight (NLOS), <11 GHz for fixed wireless
systems.
• 802.16e (Mobile WiWax) adds enhancements for mobility
in the <11 GHz licensed and unlicensed bands.
– Operation in mobile mode is limited to licensed bands between
2 GHz and 6 GHz.
R. M. Rao, 2008
32
Scalable OFDMA parameters
Parameters Values
System bandwidth (MHz) 1.25 5 10 20
FFT size (NFFT) 128 512 1024 2048
Sampling Frequency (Fs, MHz) 1.4 5.6 11.2 22.4
Sample Time (1/Fs ns) 714.28 178.57 89.28 44.64
Subcarrier spacing 10.94 KHz
Useful Symbol time 91.4 us
Guard interval 11.4 us
OFDMA symbol time 102.9 us
R. M. Rao, 2008
33
Link Budget
Downlink Uplink
Transmit Power 10 Watts = 40dBm
(max=20 Watts)
200 mW = 23dBm
(max=200 mW)
Antenna Height 32 meters 1.5 meters
Antenna Gain 15 dBi (BS) -1 dBi (mobile)
EIRP 55 dBm (approx) 22 dBm
# occupied subcarriers 840 out of 1024 840 out of 1024
Power/subcarrier 28 dBm 3.44 dBm
Noise Figure 9 dB (at mobile) 4 dB (at BS)
Total margin for interference, shadow fading, ..
(75% coverage at cell edge, 90% overall)
20 dB 20 dB
BS to BS distance 2.8 kms 2.8 kms
SNR Required (Modulation – QPSK 1/8,
(repetition code = 4)) (BER=10^-6 after FEC)
-3.31 dB -2.5 dB
Rx sensitivity -100.7 dB -111.1 dB
Max allowable path loss 136.4 dB 133 dB
R. M. Rao, 2008
34
Time Division Duplexing
• 802.16e can be deployed in TDD and FDD environments.
• Initial certification profiles are only for TDD.
• The DL subframe and UL subframe lengths are adjustable.
• TDD assures channel reciprocity.
Frame (j-2) Frame (j+2)
Frame (j+1)
Frame (j)
Frame (j-1)
Downlink subframe Uplink subframe
Adaptive
TTG : Transmit-
Receive transition gap
RTG : Receive-
Transmit transition gap
R. M. Rao, 2008
35
OFDMA Frame Structure
DL-MAP – Downlink MAP : downlink allocations
UL-MAP – Uplink MAP : uplink allocations
FCH – Frame control header : contains information about the DL-MAP
FC
H
FC
H
Downlink (DL) Subframe Uplink (UL) Subframe
TTG RTG
OFDMA Symbol Number
Subchannel
logical
number
Preamble
DL-MAP
UL-MAP
DL Burst SS1
DL Burst
Broadcast
DL Burst
Multicast
DL Burst SS2
DL Burst
SS3
DL Burst
SS1
(From BS2)
DL Burst
SS4
Preamble
DL-MAP
UL Burst SS1
UL Burst SS2
UL Burst SS3
UL Burst SS4
Ranging subchannel
R. M. Rao, 2008
36
Data rates for SIMO/MIMO
configurations
Source: WiMax Forum
64 QAM with 5/6 CTC
R. M. Rao, 2008
37
Baseband Transmission Model
• OFDM receiver provides estimates of
– Channel hn,i(t)
– Frequency offset W0
– Sample timing T'
– OFDM symbol timing
OFDM
Transmitter
Channel
Inner
Receiver
Outer
Receiver
ai,k
s(t) r(t)
ADC
Resulting
Channel
hi(t)
Timing Delay
d(t-eT')
s(t)
hn,i(t)
Timing Delay
W0(t) Noise
n(t)
T'
r(n)
r(n)
R. M. Rao, 2008
38
Generic OFDM Transmitter
• Figure shows a generic MIMO OFDM Tx
– MIMO not an element of 802.11a, but it is in 802.11n,
3GPP-LTE and 802.16e
MAC
Source
Coding
e.g.
LDPC
Space-Time
Encoder
Beamforming
IFFT
Append
CP
Insert
Pilots
CFR DUC DPD DAC RF PA
IFFT
Append
CP
Insert
Pilots
CFR DUC DPD DAC RF PA
R. M. Rao, 2008
39
OFDM Receiver Architecture
• Figure illustrates architecture for generic OFDM Rx
• Details will vary as a function of
– Packet-based versus broadcast transmission
– Existance of a preamble (or not) in the waveform
ADC
DAC
DDC
Sample
Clock Adj.
Course Freq.
Offset
Correction
Symbol Timing
CP
Removal
FFT
Extract
Pilots
Fine
Sample
Clock Adj
Fine Freq.
Offset Adj.
Freq. Domain
Equalizer
Channel
Estimation
Power
Est.
Extract Preamble
Channel
Decoding, e.g.
LDPC
Medium
Access
Controller
To/From
Network
R. M. Rao, 2008
40
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
– Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
R. M. Rao, 2008
41
Digital Receiver Architecture:
Abstracted Architecture
• Common model of abstraction for digital receiver is inner/outer receiver
Ø Frequency Offset Estimation/Correction
Ø Sample Clock Offset Correction
Ø Channel Estimation/Equalization
Ø Frame detection
Ø AGC
Ø Successive Interference Cancellation
Ø Space-Time-Coding
Ø IFFT/FFT
Ø Per sub-carrier processing
Inner Receiver
Receiver Abstraction
Outer Receiver
Control, Protocol and Link Layer processing
Digital IF Processing
q Beamforming
q QRD-RLS
Ø Up-Conversion
Ø Down-Conversion
Ø Channelizer
Ø Fast AGC
Ø Channel Coding
q LDPC
q TPC
q CTC
q Viterbi
q (De-) Interleave
Ø Medium Access Control (MAC)
Ø Link Layer Processing
Ø System Initialization, Control and Monitoring
Ø Application
Ø Ethernet
Ø PCI Express
Ø SRIO
Ø CPRI
Ø OBSAI
R. M. Rao, 2008
42
Receiver Abstraction and
Projection on to Platform FPGA
Receiver
Function
Characteristics FPGA
Platform
Comments
Digital IF
Processing
Ø MAC Intensive SX Ø DSP48 main
requirement
Inner Receiver Ø MAC intensive
Ø Some functions LUT
intensive
CORDIC in QRD-RLS
Ø FFT processing for OFDM
Ø Correlation processing for
timing
Ø Per-carrier complexity
processing (MIMO-OFDM)
SX/LX Ø DSP48 leveraged
FFT
Ø FPGA fabric for
CORDIC
FFT
Outer
Receiver
Ø Symbol rate tasks
Ø Channel coding
LX Ø ACS/ACSO dominated
by low bit precision
add/multiplexors
Good match for
fabric
Lots of memory
required
Control/
Protocol
Ø Gigabit connectivity
Ø Linux
Ø OS “heavy” tasks
Ø TCP/IP
FX Ø Embedded PPC used
Ø Rocket IO for
PCI Express
SRIO
Num. Sub-carriers
TX RX
N N
 
SX/LX
Receiver Abstraction
LX
FX
SX
FPGA product portfolio
Tailored for various
processing Tasks in
communications
receiver
R. M. Rao, 2008
43
Digital Frontend
Digital upconversion (downconversion)
Crest factor reduction
Digital pre-distortion
R. M. Rao, 2008
44
Serial Gigabit
OBSAI/CPRI
Proprietary serial
backplane
Inter-chip connectivity
Embedded Software
MAC (Media Access)
Decision oriented
tasks
CORBA
RTOS
NBAP
SCA (JTRS radios)
Connectivity
DAC
DAC
ADC
ADC
Logic & IO
OBSAI/CPRI
SRIO
AD/DA interface
EMIF
DUC,DDC
CFR,DPD
RACH
Searcher
OFDM PHY
TCC
MIMO
High Performance
Processing
High MIPs tasks
Radio PHY
Supported by embedded
DSP tiles, distributed
memory, block memory and
logic fabric
SRIO
EMIF
The Platform
R. M. Rao, 2008
45
Virtex-4/5 FPGA Arhitecture
High-Level View
• FPGA family with 3 members
tailored for specific classes of
processing
– SX: DSP
– LX: Logic centric
– FX: Full featured
• Embedded PowerPC hard IP
• Giga-bit serial connectivity
• DSP processing tiles “DSP48”
R. M. Rao, 2008
46
Virtex-5 FPGA Platform
• 2 slices per CLB, 4 LUTs per CLB
• Can be configured as a shift
register
• Can be configured as distributed
memory
Can be configured as RAM
Can be configured as a
shift register
R. M. Rao, 2008
47
Arithmatica Parallel Counter
20% Faster Performance and
Uses Less Area
Integrated Cascade
Routing Enables
Scalable Performance
Arithmatica A+Adder
20% Faster Than
Other Implementations
Pipeline Registers
Enable 500Mhz
Performance
Scalable 500MHz Performance Not Possible Using
Standard Cell Libraries and Standard Cell Design Flow
Virtex-4 DSP48 Slice
R. M. Rao, 2008
48
Z
Y
X

36
36
48
A
B
BCIN
18
18
18
P
48
CIN
SUB
36
18
18
18
BCOUT
48
ZERO 48
48
PCOUT
48
PCIN
48
18
72
Wire Shift Right By 17b
C
48
48
48
To Adjacent DSP48 Tile
Register
48
Pipelined Multiplier
3 delay latency
18
18
B
A
P (PCOUT)
LS Word
MS Word
48
36b product sign extended to 48b
z-3
R. M. Rao, 2008
49
Pipelined Complex 18x18 MPY
Ar
18
Bi
18
‘0’
48
Ar
18
Bi
18
48
S1
S2
48
sn = Slice n
Ar
18
Br
18
‘0’
48
Ai
18
Bi
18
48
S3
S4
48
-
Pi
Pr
Register
36
Sign Extension
R. M. Rao, 2008
50
Wide Filters At Full Speed
Within the Virtex-4 DSP Slice Column
• Systolic N-tap FIR
– Scalable N-levels deep implementation
– N-levels deep at 500MHz performance
• Uses Integrated Pipeline Registers to
Synchronize Filter Inputs
• Utilizes Input and Output Cascade Routing
Build Massively Parallel 512-TAP FIR Filter
In a Single Device Achieving
256 GMACCs/s Performance
Equivalent Implementation Would Consume
444 Embedded Multipliers and 77,008 LCs
And Would Only Achieve ½ The Performance
R. M. Rao, 2008
51
Xilinx FFT IP (4)
• FFT fully utilizes FPGA arithmetic hardware resources
• FFT viewed as a recursion using a butterfly kernel

b
(  b)
Phase factors: e-j2pk/N
(  b) e-j2pk/N
CADD1
CADD2
CMPY
• CADD{1|2}: complex adder
• CMPY: complex multiplier
R. M. Rao, 2008
52
Virtex-4 DSP Slice
• DSP slice key for
implementing high-
performance arithmetic
• Embedded 18x18 MPY
and 48b adder
– Butterfly phase rotator
– Cross-addition
R. M. Rao, 2008
53
Butterfly CMPLX MPY
• Complex MPY used in FFT
butterfly
• Optimized to employ Virtex-4
DSP Slice
– 4 and 3 MPY option
• Complex MPY available as IP
module†
Ar
Br
Ai
Bi
Pi
Pr
DSP Slice 1
DSP Slice 4
DSP Slice 2
DSP Slice 3
Pr + jPi = (Ar+jAi) x (Br + jBi)
† Available: 6.2i IP Update 2
R. M. Rao, 2008
54
Performance/Parallelism/Area
• FPGA: highly parallel computing machine
• Achieve performance using functional unit parallelism
• Area/throughput tradeoff delivered via Xilinx IP library
• Butterfly array to produce high-
performance FFT processor
• High computation rate using (possibly)
hundreds of DSP slices
– Allocate resources as appropriate to meet
system requirements
• Large memory bandwidth using multi-
port memory constructed from BRAMs
Mem read BW: 320 x 36 x 500e6 = 5.76 Tera-bps
R. M. Rao, 2008
55
FFT Architecture
• For small number of carriers and modest data rates single
butterfly (I)FFT is probably suitable - Small FPGA footprint
switch
Phase
Factor ROM
Data
Ram 0
Data
Ram 1 switch
Output Data
Input Data
Iteration Engine
R. M. Rao, 2008
56
Block boundary detection/Fine
timing acquisition
Z-1 Z-1 Z-1
Z-1 Z-1 Z-1 Z-1
Z-1
Z-1 Z-1 Z-1
Z-1 Z-1 Z-1 Z-1
Z-1
||2
()*
arg
SAMPLES
KNOWN
SEQUENCE
1 OFDM block of
repeated data
Timing Est
Freq Est
ave
Half an OFDM block
F. Tufvesson, O. Edfors, M. Faulkner, “Time and Frequency Synchronization for OFDM
using PN-Sequence Preambles”, VTC-1999/Fall, vol 4, pp.2203-7, New Jersey, 1999.
R. M. Rao, 2008
57
Fine-timing acquisition using a
clipped correlator
1
yn
sysgen
cast
bc3
sysgen
cast
bc2
sysgen
d
en
q
z
-1
in0
in1
out0
Register1
sysgen
a
b
sub
a  b
AddSub
3
ld
2
coeff
1
a
2
xnz
1
yn
sysgen
addr
z
-1
ROM1
sysgen
d
addr
en
q
R
a
coef f
ld
y n
MAC
sysgen
z
-1
Delay2
4
LD
3
CAddr
2
DAddr
1
xn
1
y
BaudClk
Data Addr
Coef Addr
load
FSM
sysgen
en
z
-1
Delay7
sysgen
en
z
-7
Delay6
sysgen
en
z
-1
Delay5
sysgen
z
-1
Delay4
sysgen
en
z
-8
Delay3
sysgen
z
-1
Delay2
sysgen
en
z
-8
Delay1
sysgen
z
-2
Delay
xn
DAddr
CAddr
LD
y n
xnz
C7
xn
DAddr
CAddr
LD
y n
xnz
C6
xn
DAddr
CAddr
LD
y n
xnz
C5
xn
DAddr
CAddr
LD
y n
xnz
C4
xn
DAddr
CAddr
LD
y n
xnz
C3
xn
DAddr
CAddr
LD
y n
xnz
C2
xn
DAddr
CAddr
LD
y n
xnz
C1
sysgen
a
b
en
a
+
b
z
-1
AddSub4
sysgen
a
b
en
a
+
b
z
-1
AddSub2
sysgen
a
b
en
a
+
b
z
-1
AddSub13
sysgen
a
b
en
a
+
b
z
-1
AddSub12
sysgen
a
b
en
a
+
b
z
-1
AddSub1
sysgen
a
b
en
a
+
b
z
-1
AddSub
2
BaudClk
1
x
Bank of correlators
1-bit correlator
10 time multiplexed
correlators
Each 1-bit correlator :
10 slices
Total for clipped correlator :
589 slices
Full precision correlators :
32 embedded multipliers
896 flipflops
R. M. Rao, 2008
58
QRD
• One of the popular methods of matrix inversion is
based on QRD.
• Q is Unitary and R is upper triangular
• A Unitary matrix has a trival inverse,
• An upper triangular matrix can be inverted by
back-substitution
H QR

1 H
Q Q


1 1 H
H R Q
 

R. M. Rao, 2008
59
Givens Rotations
• For a 2x1 vector of real numbers
• For a NxM matrix, repeat the process 2 cells at a time.
2 2
2 2 2 2
0
,
c s a a b
s c b
a b
c s
a b a b
 
    
  
   

     
 
 
 
11 12 13 11 12 13
11 12 13
11 12 13
21 22 23 21 22 23 22 23 22 23
31 32 33 32 33 32 33 33
0 0
0 0 0 0
a a a a a a
a a a
a a a
a a a a a a a a a a
a a a a a a a a
 
 
 
   
 
 
   
 
  
 
   
 
 
   
   
     
R. M. Rao, 2008
60
Systolic Arrays
• Structured arrays with identical cells. Usually a
“boundary” cell and an “internal” cell for the QRD
process.
Boundary cell
Internal cell
1. The boundary cell generates the
rotations.
2. Internal cell applies the rotations to all
the cells in the row.
3. The systolic array in this figure can
handle any matrix below 3x3.
R. M. Rao, 2008
61
Triangularization mode
• For QRD of upto a 3x3
matrix we need 3 boundary
cells and 3 internal cells.
• Boundary cells calculate
rotation vectors and
internal cells store them.
• Data is fed column-wise
into the systolic array.
• This may have to be
staggered depending on
the pipelining delays thru
the boundary cell and
internal cell.
11 12 13
11 12 13 11 12 13
11 12 13
21 22 23 22 23 22 23 22 23
31 32 33 31 32 33 32 33 33
0 0 0
0 0 0
a a a
a a a a a a
a a a
a a a a a a a a a
a a a a a a a a a
 
 
 
   
 
 
   
 
  
 
   
 
 
   
   
 
     
31
21
11
a
a
a
32
22
12
a
a
a
33
23
13
a
a
a
The rotation factors for
zeroing out cell A(2,1)
are stored in cell
A(1,2), etc.
R. M. Rao, 2008
62
Q-matrix computation mode
H
H H
Q A R
Q I Q


11 12 13
21 21 31 31 11 12 13
32 32 21 21 21 22 23 22 23
32 32 31 31 31 32 33 33
1 0 0 0 0
0 0 0 1 0 0
0 0 0 1 0 0 0
a a a
c s c s a a a
c s s c a a a a a
s c s c a a a a
 
         
         
 
         
       
 
         
 
0
0
1
0
1
0
1
0
0
first column of Q matrix
second column of Q matrix
third column of Q matrix
* *
* . * .
* . * .
;
s x I s s I c
z x I c s I s
c c
 
 

H
Q R
A
R. M. Rao, 2008
63
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
– Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
R. M. Rao, 2008
64
FPGA Tools for DSP Systems
Design
• Higher level tools are raising the level of
abstraction.
• Allows non-hardware engineers (algorithm
designers) to get a first look at hardware.
• System Generator
– Simulink to Hardware
• C-to-Gates tools
– C or “higher” level languages to gates
R. M. Rao, 2008
65
System Generator
System Level Modeling & Simulation Framework
Work in the language of your problem
HDL
C
R. M. Rao, 2008
66
HDL Simulation Flow
1. Develop Algorithm &
System Model
Download to FPGA
DSP Development Flow
2. Automatic Code
Generation
Simulink MDL
Bitstream
System Generator Flow
3. Xilinx Implementation
Flow
HDL Test Bench Test Vectors
RTL VHDL & Cores
FPGA
R. M. Rao, 2008
67
Configurable MIMO-OFDM
Transmitter
8
ImagOut4
7
RealOut4
6
ImagOut3
5
RealOut3
4
ImagOut2
3
RealOut2
2
ImagOut1
1
RealOut1
RealIn
ImagIn
WriteFIFO
BaudClk
RealOut1
ImagOut1
RealOut2
ImagOut2
RealOut3
ImagOut3
RealOut4
ImagOut4
Spatial Demultiplexing
RealIn
ImagIn
SampleClk
Bdata
rfd
Preamble
BFrame
FFTbusy
RealOut
ImagOut
Start
Enable
DataRequest
DataSubcarrier
Pilot Insertion
and Data loading
DataIn
SampleClk
Zeroblks
Preamble
Bdata
DataSubc
DataEnable
RealOut
ImagOut
Packetization
and Encoding
SampleClk
Zeroblks
Preamble
Bdata
BFrame
Packet Controller
sysgen
and
z
-0
Logical2
sysgen
and
z
-0
Logical
sysgen
not
Inverter FFT
xn_re
xn_im
start
enable
xk_re
xk_im
xk_index
rfd
vout
Busy
FFT
Clock Generator
SampleClk
BaudClk
Clock
Generator
RealIn
ImagIn
Addr
WriteFIFO
RealOut
ImagOut
ReadFIFO
Add Cyclic Extension
3
DataDone
2
DataEnable
1
DataIn
double double
double
double
double double
double
Fix_16_10
UFix_6_0
double
double
double
Fix_16_10
double
double
double
double
double
double
double
double
double
double
double
double
double
double
double
double
Bool
Bool
Bool
double double
Bool
double
double
Packet
Controller
Packetization and
configurable STBC
encoding
Pilot insertion and
data loading
Time shared
FFT across
antennas
Add Cyclic
Extension/Block
Shaping
Spatial
Demultiplexing
and Interpolation
Resource sharing (folding factor)
Ratio of System clock rate to symbol rate > 8 needed for a 4 transmit antenna system
R. M. Rao, 2008
68
MIMO Receiver Architecture
Samples processed at sample clock rate Samples processed
at system clock rate
Packet
Detection
Packet
Detection
Packet
Detection
Packet
Detection
Block
Boundary
Detection
Block
Boundary
Coarse CFO
estimate
Coarse CFO
estimate
CFO
estimator
Strip
CP
Strip
CP
Strip
CP
Strip
CP
Input
FIFO
Input
FIFO
Input
FIFO
Input
FIFO
FFT
FFT
FFT
FFT
Rx 1
Rx 2
Rx 3
Rx 4
Channel
Estimator
Output
FIFO
Output
FIFO
Output
FIFO
Output
FIFO
Combine
PD
MIMO
Decoder
Matrix
(MMSE, etc)
MIMO
Decode
Soft
Decisions
MIMO
Decoder
FIFO
Pilot based CFO
estimator
Packet
Controller
Preamble
Payload
CFO
Compensator
R. M. Rao, 2008
69
Fine-timing acquisition using a
clipped correlator
1
yn
sysgen
cast
bc3
sysgen
cast
bc2
sysgen
d
en
q
z
-1
in0
in1
out0
Register1
sysgen
a
b
sub
a  b
AddSub
3
ld
2
coeff
1
a
2
xnz
1
yn
sysgen
addr
z
-1
ROM1
sysgen
d
addr
en
q
R
a
coef f
ld
y n
MAC
sysgen
z
-1
Delay2
4
LD
3
CAddr
2
DAddr
1
xn
1
y
BaudClk
Data Addr
Coef Addr
load
FSM
sysgen
en
z
-1
Delay7
sysgen
en
z
-7
Delay6
sysgen
en
z
-1
Delay5
sysgen
z
-1
Delay4
sysgen
en
z
-8
Delay3
sysgen
z
-1
Delay2
sysgen
en
z
-8
Delay1
sysgen
z
-2
Delay
xn
DAddr
CAddr
LD
y n
xnz
C7
xn
DAddr
CAddr
LD
y n
xnz
C6
xn
DAddr
CAddr
LD
y n
xnz
C5
xn
DAddr
CAddr
LD
y n
xnz
C4
xn
DAddr
CAddr
LD
y n
xnz
C3
xn
DAddr
CAddr
LD
y n
xnz
C2
xn
DAddr
CAddr
LD
y n
xnz
C1
sysgen
a
b
en
a
+
b
z
-1
AddSub4
sysgen
a
b
en
a
+
b
z
-1
AddSub2
sysgen
a
b
en
a
+
b
z
-1
AddSub13
sysgen
a
b
en
a
+
b
z
-1
AddSub12
sysgen
a
b
en
a
+
b
z
-1
AddSub1
sysgen
a
b
en
a
+
b
z
-1
AddSub
2
BaudClk
1
x
Bank of correlators
1-bit correlator
10 time multiplexed
correlators
Each 1-bit correlator :
10 slices
Total for clipped correlator :
589 slices
Full precision correlators :
32 embedded multipliers
896 flipflops
R. M. Rao, 2008
70
MIMO-OFDM Receiver
10
ValidOut
9
PacketDetect
8
SoftDecImag4
7
SoftDecReal4
6
SoftDecImag3
5
SoftDecReal3
4
SoftDecImag2
3
SoftDecReal2
2
SoftDecImag1
1
SoftDecReal1
Ch_tx1rx1
Ch_tx1rx2
Ch_tx1rx3
Ch_tx1rx4
Ch_tx2rx1
Ch_tx2rx2
Ch_tx2rx3
Ch_tx2rx4
Ch_tx3rx1
Ch_tx3rx2
Ch_tx3rx3
Ch_tx3rx4
Ch_tx4rx1
Ch_tx4rx2
Ch_tx4rx3
Ch_tx4rx4
En
Addr
wreal_1_1
wimag_1_1
wreal_1_2
wimag_1_2
wreal_1_3
wimag_1_3
wreal_1_4
wimag_1_4
wreal_2_1
wimag_2_1
wreal_2_2
wimag_2_2
wreal_2_3
wimag_2_3
wreal_2_4
wimag_2_4
wreal_3_1
wimag_3_1
wreal_3_2
wimag_3_2
wreal_3_3
wimag_3_3
wreal_3_4
wimag_3_4
wreal_4_1
wimag_4_1
wreal_4_2
wimag_4_2
wreal_4_3
wimag_4_3
wreal_4_4
wimag_4_4
Weight Matrix Computation
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ValidData
Addr
Out_real1
Out_imag1
Out_real2
Out_imag2
Out_real3
Out_imag3
Out_real4
Out_imag4
ReadFIFO
AddrOut
Output FIFO
RealIn1
ImagIn1
RealIn2
ImagIn2
Baud_clk
PacketDetect
CFO_Est
PktDetPulse
MIMO Packet Detect1
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ReadFIFO
Addr
wreal_1_1
wimag_1_1
wreal_1_2
wimag_1_2
wreal_1_3
wimag_1_3
wreal_1_4
wimag_1_4
wreal_2_1
wimag_2_1
wreal_2_2
wimag_2_2
wreal_2_3
wimag_2_3
wreal_2_4
wimag_2_4
wreal_3_1
wimag_3_1
wreal_3_2
wimag_3_2
wreal_3_3
wimag_3_3
wreal_3_4
wimag_3_4
wreal_4_1
wimag_4_1
wreal_4_2
wimag_4_2
wreal_4_3
wimag_4_3
wreal_4_4
wimag_4_4
BaudClk
Out_real1
Out_imag1
v alid_out
ReadWeightMatrix
Out_real2
Out_imag2
Out_real3
Out_imag3
Out_real4
Out_imag4
MIMO Decoder
WriteFIFO
RxStream1
RxStream2
RxStream3
RxStream4
Enable
ReadFIFO
CFO_est
FFT_Start
CFO_Valid
RxOut1
RxOut2
RxOut3
RxOut4
FIFO_status_f lag
Input Buffer
RealIn
ImagIn
BaudClk
Out2
BBDValid
Fine Timing Acquisition
RxStream1
RxStream2
RxStream3
RxStream4
FIFO_status_f lag
Enable
CFO_Valid
Reset
RxReal1
RxImag1
RxReal2
RxImag2
RxReal3
RxImag3
RxReal4
RxImag4
Valid out
Addr
FFT_RFD
FFT_Start
FFT
0
Display2
0
Display1
z
-1
Delay8
en
z
-1
Delay7
en
z
-1
Delay6
en
z
-1
Delay5
en
z
-1
Delay4
en
z
-1
Delay3
en
z
-1
Delay2
en
z
-1
Delay1
en
z
-1
Delay
BlkBounDetect
RealIn1
ImagIn1
RealIn2
ImagIn2
RealIn3
ImagIn3
RealIn4
ImagIn4
PacketDetect
BaudClk
ReadEnable
RxStream1
RxStream2
RxStream3
RxStream4
Cyclic Prefix Removal
Clock Generator
SampleClk
BaudClk
Clock
Generator
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ValidData
Addr
ReadAddr
Ch_1_1
Ch_1_2
Ch_1_3
Ch_1_4
Ch_2_1
Ch_2_2
Ch_2_3
Ch_2_4
Ch_3_1
Ch_3_2
Ch_3_3
Ch_3_4
Ch_4_1
Ch_4_2
Ch_4_3
Ch_4_4
CFO_Est
CFO_Est_Valid
Channel Estimation
a
b
a - b
AddSub
9
Reset
8
ImagIn4
7
RealIn4
6
ImagIn3
5
RealIn3
4
ImagIn2
3
RealIn2
2
ImagIn1
1
RealIn1
Packet Detection
Fine Timing Acq
Cyclic prefix
removal
Channel
Estimation
Weight Matrix
Computation
MIMO Decoder
FFT
Carrier Frequency
Offset Correction
Output FIFO
R. M. Rao, 2008
71
Channel Estimation
32
Chimag16
31
Chreal16
30
Chimag15
29
Chreal15
28
Chimag14
27
Chreal14
26
Chimag13
25
Chreal13
24
Chimag12
23
Chreal12
22
Chimag11
21
Chreal11
20
Chimag10
19
Chreal10
18
Chimag9
17
Chreal9
16
Chimag8
15
Chreal8
14
Chimag7
13
Chreal7
12
Chimag6
11
Chreal6
10
Chimag5
9
Chreal5
8
Chimag4
7
Chreal4
6
Chimag3
5
Chreal3
4
Chimag2
3
Chreal2
2
Chimag1
1
Chreal1
Enable
Reset
Pilot_real
Training Symbols
Tx4
Enable
Reset
Pilot_real
Training Symbols
Tx3
Enable
Reset
Pilot_real
Training Symbols
Tx2
Enable
Reset
Pilots
Addr
Training Symbols
Tx1
simout11
To Workspace2
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM3
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM2
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM1
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM
sysgen
sel
d0
d1
Mux1
sysgen
sel
d0
d1
Mux
sysgen
and
z
-2
Logical
sysgen
z
-2
Delay9
sysgen
z
-2
Delay8
sysgen
z
-2
Delay7
sysgen
z
-1
Delay6
sysgen
z
-2
Delay5
sysgen
z
-2
Delay4
sysgen
z
-2
Delay3
sysgen
z
-2
Delay2
sysgen
z
-2
Delay12
sysgen
z
-2
Delay11
sysgen
z
-2
Delay10
sysgen
z
-3
Delay1
sysgen
rst
en
out
Counter2
sysgen
rst
en
out
Counter1
ValidData
ChEstPilots
ChEstEn
ChEstRst
En
Rst
En2
ChEstPilots1
ControlSignals
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx1
sysgen
x 0.3535
CMult7
sysgen
x 0.3535
CMult6
sysgen
x 0.3535
CMult5
sysgen
x 0.3535
CMult4
sysgen
x 0.3535
CMult3
sysgen
x 0.3535
CMult2
sysgen
x 0.3535
CMult1
sysgen
x 0.3535
CMult
12
ReadAddr
11
ChEstPilots
10
Addr
9
ValidData
8
Rximag4
7
Rxreal4
6
Rximag3
5
Rxreal3
4
Rximag2
3
Rxreal2
2
Rximag1
1
Rxreal1
double
double
Bool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
UFix_6_0
Fix_16_10
UFix_6_0
UFix_6_0
UFix_6_0
Fix_16_10
Fix_16_10
double
double
double
Bool
double
double
UFix_6_0
Fix_16_10
Fix_16_10
Bool
double
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_32_20
Fix_32_20
Fix_32_20
double
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_2_0
Fix_32_20
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_2_0
UFix_6_0
double
double
double (8)
double
double
double
double
double
double
double
double
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Channel Estimation Pilots
for Tx4
Channel Estimation Pilots
for Tx1
4x4 Channel Estimation
Memory
Control Signals
Input FIFO
R. M. Rao, 2008
72
Packet Detection
DecisionMetric = (CorrelationPeak >= 0.5(AveragePower))
3
AvePower
2
CorrMetric
1
PacketDetect
In1
En
Out1
Squarer
In
BaudClk
Out
Sliding Window
Averager
sysgen
X >> 1
z-0
Shift
sysgen
a
b
a>b
z-1
Relational
sysgen
force
Reinterpret1
sysgen
force
Reinterpret
Complex
Multiply
RealIn1
ImagIn1
En
RealOut
Power Calculator1
Complex
Multiply
RealIn1
ImagIn1
BaudClk
PwrOut
Power Calculator
sysgen
en
z-32
Delay1
sysgen
en
z-32
Delay
Complex
Multiply
RealIn1
ImagIn1
RealIn2
ImagIn2
BaudClk
RealOut
ImagOut
Correlate
sysgen
cast
Convert1
sysgen
cast
Convert
RealIn
ImagIn
BaudClk
RealOut
ImagOut
Complex Sliding
Window Averager
3
BaudClk
2
ImagIn
1
RealIn
double
double
Fix_8_0
Fix_8_0
Bool
Fix_8_0
Fix_8_0
Fix_8_0
Fix_16_0
Fix_16_0
Fix_16_0
double
double
Fix_32_0 double
Fix_8_0
Fix_8_0
Fix_16_8
Fix_16_8
Schmidl and Cox algorithm for Packet
Detection and coarse carrier frequency
offset estimation.
T. M. Schmidl, D. C. Cox, “Low Overhead Low
Complexity Synchronization for OFDM”,
ICC 1996, Vol 3, pp 1301-1306. Z-D
C
P
2
2
( )

r(n)
c(n)
p(n)
m(n)
*
*
Identical halves of 1 OFDM symbol
R. M. Rao, 2008
73
Two Branch CFO estimation using
Schmidl and Cox algo
AvePwr
3
CorrMetric _imag
2
CorrMetric _real
1
Sliding Window
Averager
In
BaudClk
Rst
Out
Slice5
[a:b]
Slice3
[a:b]
Slice2
[a:b]
Slice1
[a:b]
Reinterpret 4
reinterpret
Reinterpret 3
reinterpret
Reinterpret 2
reinterpret
Reinterpret 1
reinterpret
Magnitude -Squared 1
Squarer
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
Delay 4
en
z
-32
Delay 3
en
z
-32
Delay 2
en
z
-2
Delay 1
en
z
-32
Delay
en
z
-32
Complex Sliding
Window Averager 1
RealIn
ImagIn
BaudClk
Rst
RealOut
ImagOut
Complex Sliding
Window Averager
RealIn
ImagIn
BaudClk
Rst
RealOut
ImagOut
Complex Multiply 3
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 2
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
AddSub 2
a
b
a + b
z
-1
AddSub 1
a
b
a + b
z
-1
Rst
6
BaudClk
5
ImagIn 2
4
RealIn 2
3
ImagIn 1
2
RealIn 1
1
a
b
Combine the metric
from both Antennas
Carrier Frequency Offset causes a linearly increasing rotation in the time domain
j
Ye q
Y
R. M. Rao, 2008
74
Carrier Frequency Offset
Estimation
• Pre-FFT
– Uses a dedicated preamble or symbol for CFO estimation
• Post-FFT using channel estimation pilots
– Uses channel estimation training symbols
• Post-FFT CFO Tracking
– Needs continuous pilots during payload symbols
• CFO Estimation using Cyclic Prefix
– Works well when you have a lengthy cyclic prefix
– Examples: WiMax, 3GPP-LTE, DVB-T/H
– Does not need preamble or pilot support
R. M. Rao, 2008
75
Pre-FFT Carrier Frequency Offset
Estimation
CFO_Est
1
Truncate
In 1
In 2
In 3
Out 1
Out 2
Out 3
Rising edge
detector
In
1
Out
1
Register1
d
rst
en
q
z
- 1
Packet Detection 3
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
Rst
CorrMetric _real
CorrMetric _imag
AvePwr
Delay6
en
z-24
Delay5
en
z-14
Convert
cast
CORDIC ATAN
z-17
x
y
mag
atan
CMult8
x 0. 003906
z
- 2
BBD
7
Rst
6
Baud_clk
5
ImagIn2
4
RealIn 2
3
ImagIn1
2
RealIn 1
1
The angle of the correlation metric is
proportional to the Carrier frequency offset.
Right size the number of bits before the
CORDIC operation.
CORDIC ATAN from the Xilinx Math library
calculates the angle.
ˆ
2
2
s
N
q
p
 
 
 
R. M. Rao, 2008
76
Post-FFT CFO Estimation and
tracking
Location of channel estimation
training symbols for Antenna 1
for a 2 antenna MIMO system
A subset of channel estimation
training symbols is used for CFO
estimation
Angular rotation
on symbol 1
Angular rotation
on symbol 2
( )
k
q
Proportional to
CFO
( ( ))
ˆ
2 (1 )
mean k
c N
CP
N
s
q
e
p


CFO causes a linear
rotation every sample in
the time domain.
CFO causes a constant
rotation on all subcarriers
in the frequency domain.
This rotation increases
from OFDM symbol to
symbol and can be used
to estimate CFO.
R. M. Rao, 2008
77
Carrier Frequency Offset
Correction
ImagOut 4
8
RealOut 4
7
ImagOut 3
6
RealOut 3
5
ImagOut 2
4
RealOut 2
3
ImagOut 1
2
RealOut 1
1
Rising edge
detector
In1 Out1
Relational 1
a
b
a <=b
z
- 0
Relational
a
b
a <b
z
- 0
Negate 1
x(- 1 )
Logical 1
or
z-0
Logical
and
z
-0
Delay 7
z
-1
Delay 6
z
-1
Delay 5
z
-1
Delay 4
z
-1
Delay 3
z
-1
Delay 2
z
-1
Delay 1
z
-1
Delay
z
-1
DDS
freq_off
Enable
Reset
cos_out
sin_out
Counter
rst out
Constant 3
1
Constant 2
78
Constant 1
0
Complex Multiply 3
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 2
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 1
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply
Complex
Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
CMult
x 0 .01563
Reset
12
CFO_Est_valid
11
FFT_Start
10
CFO_Est
9
ImagIn 4
8
RealIn 4
7
ImagIn 3
6
RealIn 3
5
ImagIn 2
4
RealIn 2
3
ImagIn 1
2
RealIn 1
1
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Bool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_15
Fix_16_15 Fix_17_15
Fix_16_12
Fix_16_10
Fix_16_10
UFix_16_0
UFix_16_0
UFix_16_0
Bool
Bool
Bool
Bool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Bool
Bool
Fix_16_10
Fix_16_10
Fix_16_16
double
Direct digital synthesizer (DDS) from
the Xilinx DSP SysGen library.
R. M. Rao, 2008
78
Design methodology issues
• FPGA tools
– Where to from here?
• C-to-gates
– Higher level design languages to gates
– Raising the level of abstraction
R. M. Rao, 2008
79
End of Roadmap for the
Von Neumann Model
SPECInt92/MHz
Source: Ronen [2001]
CPUs are as smart as they can be!
MHz
L2 $
Spot the CPU!
L1 $
CPU
Source: Agarwala [2002]
TI 6416
Clock
frequency
scaling
Absolute
power limits
With Moore’s law you
also get leakage!
Source: Borkar [1999]
Divide and
conquer
Source: Zu & Baas [2006]
Multi-core Arrays
1945-2005
Sequential
programming
2005 - ????
Concurrent
programming 6x6 GALS Processor Array
R. M. Rao, 2008
80
Merging Mindsets:
Software Design vs. Hardware Design
class A
start()
class B
class C
class D
resourceA resourceB resourceC
 Events
 Protocols
 Ordering
 Sequential
execution
 Encapsulation
 Abstraction
 Portability
 Re-use
Implementation Detail 
Control Logic 
Interface Glue 
Concurrency 
Communication 
Architecture 
Clocks 
Signals 
Timing 
Combining the strengths of both paradigms can bring about a radical
improvement in hardware/software system design productivity.
R. M. Rao, 2008
81
Objective for a New Methodology:
reduce design cost (by a lot)
• Quality of result (QoR) is not a design goal!
Ø Performance, power, BOM cost budgets make QoR a design constraint
• The real objective is to meet the QoR target and minimize:
Ø Non-recurring engineering costs (NRE)
Ø Time-to-market (TTM)
• The new methodology should save on design cost by enabling
Ø Design of portable, retargetable, composable IP blocks
Ø Rapid design space exploration and system composition
Total Design Cost
NRE $, TTM
Traditional HDL Flow
QoR
performance/$
performance/W New methodology
Abstraction
Profit
abstraction
cost
R. M. Rao, 2008
82
‘C’ or higher level language to
Gates
• There is interest in higher level design
methodologies, such as C-to-Gates from the
design community.
• ESL (Electronic system level) tools/design
methodologies are being explored.
• But, extracting all the concurrency from a
sequential description is not an easy problem.
R. M. Rao, 2008
83
Actor/Dataflow Programming Model
encapsulated
state
Actions
State
point-to-point, buffered
token-passing connections
actors guarded atomic actions
• A well-known and researched model for concurrent systems
– Edward Lee et. al. (UC Berkeley)
– Arvind et. al. (MIT)
• Broadly applicable to heterogeneous HW/SW systems
• Actors are described in the CAL language (UC Berkeley)
– Open source simulator available from SourceForge
– Under consideration as reference model for MPEG
R. M. Rao, 2008
84
Conclusion
• FPGAs are finding wide use in infrastructure
communication systems and signal processing
systems.
• FPGA are an efficient choice for exploring VLSI
architectures.
• FPGA tools are raising the level of abstraction to
allow algorithm designers the ability to explore
h/w architectures without learning “h/w design
tools/languages”.
R. M. Rao, 2008
85
Questions?

rao-vlsi-comm-09 (1).ppt

  • 1.
    WIRELESS COMMUNICATIONS From Systemsto Silicon Raghu Rao Wireless Systems Group, Xilinx Inc.
  • 2.
    R. M. Rao,2008 2 Agenda • Introduction to Wireless communications – Systems design and considerations • The wireless environment • Link budget • MIMO and OFDM Systems – High level view of wireless communication systems • Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning • PHY/MAC etc. • The Platform FPGA • Overview of FPGAs and FPGA tools – Building DSP sub-systems on FPGAs – Digital baseband • FPGA tools and design methodology
  • 3.
    R. M. Rao,2008 3 Communications Roadmap • Key markets • Core DSP technologies – OFDM – MIMO • IP Network is key • Enables new approaches to – QoS management – Robustness – Capacity
  • 4.
    R. M. Rao,2008 4 Wireless Environment • Multipaths caused by reflections from various objects.
  • 5.
    R. M. Rao,2008 5 Modeling the Channel • As the mobile moves through the environment, the field strength varies due to : – Free space path loss – Long term (slow) fading – Short term (fast) fading log(distance) Signal Level (dB) pathloss long term fading short term fading
  • 6.
    R. M. Rao,2008 6 Doppler • Changes in the received carrier frequency due to the relative motion of the mobile to the base station • f= fd = (v/l) cos(q) – for f=900 MHz, v = 70 MPH (112 km/h) – fD-max = v/l = 93.3 Hz q D=v. t
  • 7.
    R. M. Rao,2008 7 Delay Spread • Measure of the time distribution of power in the channel impulse response – Typical office 25 ns to 60 ns – Large Lobbies and atria: 100 ns – Warehouse and factory floors: 100 ns to 200 ns – Delay spreads are up to 10 microseconds in cellular environments • Greater than 3 msec in urban areas • 0.5 ms in suburban and open areas
  • 8.
    R. M. Rao,2008 8 Exponential Power Delay Profile • If the delay spread of the channel is larger than the symbol interval we will see multiple paths in our channel. • Leads to inter-symbol interference (ISI). • Leads to a frequency selective channel. • Average energy of the channel impulse response follows an exponential power-delay profile.
  • 9.
    R. M. Rao,2008 9 Coherence Bandwidth • Maximum frequency bandwidth for which the signals are still considered to be correlated. • Bc in Hz = 1/(2ptrms) when considering amplitude correlation (correlation coefficient = 0.5) • trms is the rms-delay-spread of the channel
  • 10.
    R. M. Rao,2008 10 Coherence Time • Maximum time period for which the signals are still considered to be correlated. • It is used to characterize the time varying nature of the channel. • Rule of Thumb 9/(16pfm)<Tc<0.423/(fm) – fm is the maximum Doppler frequency – Correlation coefficient = 0.5
  • 11.
    R. M. Rao,2008 11 Link Budget • A link budget is used to compute the range, transmit power, receiver sensitivity and other requirements of the communication system. • In free space the path loss is given by the Friis equation : • Gt , Gr represent transmit and receive antenna gains. Pt , Pr represent the transmit power and receive power. is the wavelength, d is the distance. 2 2 2 (4 ) t t r r PG G P d l p  l
  • 12.
    R. M. Rao,2008 12 Link Budget • Expressing path loss in dB : • Note: is the path loss exponent depending on the environment (2 in free space). • To compute the SNR at the baseband we need to include thermal noise in the signal bandwidth B, and noise figure of the system NF. ( ) ( ) ( ) ( ) 20log( ) ( ).10log( ) 4 r t t r P dB P dB G dB G dB d l  p       ( ) 174 / 10log( ) r P dB dBm Hz NF B SNR     
  • 13.
    R. M. Rao,2008 13 Link Budget • Margin for desired outage taking into account receiver structure and antenna diversity. – Standards specify outage probabilities – WiMax – 90% in the cell, 75% at the boundary of the cell. • Compensation factors for other impairments – Interference from neighbouring cell – Shadow fading, etc. • Diversity helps achieve the outage probability (or reduces the margin for outage) without increase in transmit power.
  • 14.
    R. M. Rao,2008 14 Diversity • Diversity provides the receiver with multiple looks at the transmitted signal. • Prob(all channels in a fade) << Prob(any 1 channel in a fade) • Diversity improves link reliability. 0 20 40 60 80 100 120 140 160 180 200 -20 -15 -10 -5 0 5 10 Time Signal Level (dB) Channel 1 Channel 2 Combined channel
  • 15.
    R. M. Rao,2008 15 Diversity Techniques • Spatial Diversity – Antennas “sufficiently spaced” apart (> ½ wavelength). – Will result in an independent channel response and provide another look at the transmitted signal. • Frequency Diversity – Transmit over multiple carrier frequencies. – If the frequencies are “sufficiently far” (coherence bandwidth) apart the channel response will be different on the different frequencies. • Time Diversity – Channel is continuously changing. – Transmit signals “sufficiently spaced” (coherence time) apart in time so the 2nd transmission “sees” a different channel compared to the first one. • Polarization Diversity – Signals transmitted on two orthogonal polarizations exhibit uncorrelated fading statistics.
  • 16.
    R. M. Rao,2008 16 MIMO Systems Tx Antenna 1 Tx Antenna 2 Rx Antenna 1 Rx Antenna 2 Tx Antenna M Rx Antenna N H • MIMO systems: • Multiple Antennas at the transmitter and receiver. • 3 types of MIMO Systems: • STBC MIMO systems • Diversity gain. • Spatial Multiplexing MIMO systems • Capacity/throughput gain. • Feedback MIMO systems • Higher performance thru interference reduction. • MISO (multiple input single output) Systems: • STBC can be used with just 1 receive antenna. • Provides diversity gain. • To achieve array gain, need knowledge of channel at the transmitter (feedback).
  • 17.
    R. M. Rao,2008 17 Spatial Multiplexing • A spatial multiplexing MIMO system transmits different data symbols from each transmitter. • The signals from each transmitter combine over the air and are received by multiple receive antennas. • SM systems have a rate=M (num transmit antennas). The diversity order depends on the type of encoding and receiver (uncoded SM with ML decoding has diversity order=N (num receive antennas)). MODULATOR MODULATOR MODULATOR MIMO Receiver MIMO Receiver x(t) y(t) z(t) r1(t) = a11x(t)+a12y(t)+a13z(t) r3(t) = a31x(t)+a32y(t)+a33z(t) x(n) y(n) z(n) x(n) y(n) z(n)
  • 18.
    R. M. Rao,2008 18 Spatial Multiplexing Receivers Zero Forcing receiver: 11 h 22 h 21 h 12 h Tx Antenna 1 Tx Antenna 2 Rx Antenna 1 Rx Antenna 2 1 11 1 12 2 1 2 21 1 22 2 2 1 11 12 1 1 2 21 22 2 2 1 1 2 2 1 1 11 12 1 2 21 22 2 ˆ ˆ ˆ ˆ y h x h x n y h x h x n y h h x n y h h x n x y x y x h h y x h h y W                                                                        Significant increase in noise when the channel is in a deep fade. For ZF receivers 1 W H  
  • 19.
    R. M. Rao,2008 19 Spatial Multiplexing Receivers • MMSE MIMO Decoders: – Cancels interference and minimizes noise. – Minimizes the over all error (mean squared error). 2 ˆ [( ) ] E x x  1 H H MMSE M s M M W H H I H E SNR         
  • 20.
    R. M. Rao,2008 20 Spatial Multiplexing Receivers • Zero-Forcing • MMSE • Successive Interference cancellation receivers • Sphere detectors (sub-optimal Maximum Likelihood)
  • 21.
    R. M. Rao,2008 21 Transmit Diversity • Space Time Block Code (STBC) – 2 Antenna STBC also known as “Alamouti Code”. – Improves BER/SER performance. Information Source Constellation Mapper Alamouti ST block code h1 h2 Symbol Period 2 Symbol Period 1 STBC Decoder ML Decision ML Decision Soft decision for c1 Soft decision for c2 1 1 1 2 2 r hc h c   * * 2 1 2 2 1 ( ) ( ) r h c h c   
  • 22.
    R. M. Rao,2008 22 STBC Decoder 1 1 2 1 1 * * * * 2 2 1 2 2 r h h c n r h h c n                            r Hc n   Decoder: 2 2 1 1 1 2 * 2 2 ˆ ( ) 0 ˆ ( ) 0 H H c H r H Hc n c n c h h c n                   In matrix form the received signal is: Low complexity decoder. Just 2 complex mults per symbol for a 2 antenna system (and grows linearly with block length/num antennas).
  • 23.
    R. M. Rao,2008 23 Other MIMO schemes • Achieving high rate high diversity MIMO systems is an area of active research. • There are many suboptimal STBC schemes that improve the rate but reduce the diversity order. • There are also combinations of spatial multiplexing and STBC schemes. • One such scheme is 2 (or more) Alamouti’s in parallel.
  • 24.
    R. M. Rao,2008 24 Stacked Alamouti Information Source Constellation Mapper Alamouti ST block code Constellation Mapper Alamouti ST block code Data Stream 1 Data Stream 2 Interference Cancellation and ML Decision C1 C2 Data Stream 1 Data Stream 2 r1 r2 Receiver for Interference Cancelling STBC Transmitter for Interference Cancelling STBC • Interference Cancelling STBC • 2 Alamouti’s in parallel • Rate 2 system • Diversity order = N*(M-K+1) – K : co-channel users – N : transmit antennas per user. – M : receive antennas • Requires N*(K-1)+1 antennas at the receiver to suppress K-1 interferers.
  • 25.
    R. M. Rao,2008 25 Orthogonal Frequency Division Multiplexing (OFDM) Frequency Magnitude OFDM divides a frequency selective channel into a number of flat fading channels
  • 26.
    R. M. Rao,2008 26 OFDM Modulation QAM Mapping IFFT Cyclic Prefix S/P P/S D/A and RF (a) RF and A/D Strip cyclic prefix S/P FFT P/S QAM decoding (b) FEQ • A QAM symbol is modulated onto each subcarrier • IFFT/FFT are used for efficient modulation and demodulation Frequency Domain Time Domain Time Domain Frequency Domain
  • 27.
    R. M. Rao,2008 27 Combating Multipath • Sampling at instant Ts all channels experience the same channel and there is no ICI Multipath components tmax Sampling Instant Ts OFDM Symbol CP Constructing the cyclic prefix (CP)
  • 28.
    R. M. Rao,2008 28 MIMO and OFDM • MIMO – Multiple Input Multiple Output Communication System. Employs multiple antennas at both transmitter and receiver. • OFDM – Orthogonal Frequency Division Multiplexing. Breaks up a broadband channel into many parallel narrowband channels (subcarriers). • MIMO-OFDM – A Combination of MIMO and OFDM. Appears like many parallel MIMO systems on orthogonal subcarriers.
  • 29.
    R. M. Rao,2008 29 MIMO-OFDM System OFDM TRANSMITTER 1 OFDM TRANSMITTER N OFDM DEMODULATOR 1 OFDM DEMODULATOR N RICH SCATTERING ENVIRONMENT MIMO DECODER Each transmitter is an independent OFDM modulator. The source symbols could be space-time block coded or just QAM modulated for spatial multiplexing. Each receiver is an OFDM demodulator combined with a MIMO decoder to invert the channel on each subcarrier and extract the source symbols.
  • 30.
    R. M. Rao,2008 30 Agenda • Introduction to Wireless communications – Systems design and considerations • The wireless environment • Link budget • MIMO and OFDM Systems – High level view of wireless communication systems • Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning • PHY/MAC etc. • The Platform FPGA • Overview of FPGAs and FPGA tools – Building DSP sub-systems on FPGAs – Digital baseband • FPGA tools and design methodology
  • 31.
    R. M. Rao,2008 31 802.16/802.16e • The 802.16 WirelessMAN standard includes requirements for operation in : – Line Of Sight (LOS), 10-66 GHz for fixed wireless systems. – Non Line Of Sight (NLOS), <11 GHz for fixed wireless systems. • 802.16e (Mobile WiWax) adds enhancements for mobility in the <11 GHz licensed and unlicensed bands. – Operation in mobile mode is limited to licensed bands between 2 GHz and 6 GHz.
  • 32.
    R. M. Rao,2008 32 Scalable OFDMA parameters Parameters Values System bandwidth (MHz) 1.25 5 10 20 FFT size (NFFT) 128 512 1024 2048 Sampling Frequency (Fs, MHz) 1.4 5.6 11.2 22.4 Sample Time (1/Fs ns) 714.28 178.57 89.28 44.64 Subcarrier spacing 10.94 KHz Useful Symbol time 91.4 us Guard interval 11.4 us OFDMA symbol time 102.9 us
  • 33.
    R. M. Rao,2008 33 Link Budget Downlink Uplink Transmit Power 10 Watts = 40dBm (max=20 Watts) 200 mW = 23dBm (max=200 mW) Antenna Height 32 meters 1.5 meters Antenna Gain 15 dBi (BS) -1 dBi (mobile) EIRP 55 dBm (approx) 22 dBm # occupied subcarriers 840 out of 1024 840 out of 1024 Power/subcarrier 28 dBm 3.44 dBm Noise Figure 9 dB (at mobile) 4 dB (at BS) Total margin for interference, shadow fading, .. (75% coverage at cell edge, 90% overall) 20 dB 20 dB BS to BS distance 2.8 kms 2.8 kms SNR Required (Modulation – QPSK 1/8, (repetition code = 4)) (BER=10^-6 after FEC) -3.31 dB -2.5 dB Rx sensitivity -100.7 dB -111.1 dB Max allowable path loss 136.4 dB 133 dB
  • 34.
    R. M. Rao,2008 34 Time Division Duplexing • 802.16e can be deployed in TDD and FDD environments. • Initial certification profiles are only for TDD. • The DL subframe and UL subframe lengths are adjustable. • TDD assures channel reciprocity. Frame (j-2) Frame (j+2) Frame (j+1) Frame (j) Frame (j-1) Downlink subframe Uplink subframe Adaptive TTG : Transmit- Receive transition gap RTG : Receive- Transmit transition gap
  • 35.
    R. M. Rao,2008 35 OFDMA Frame Structure DL-MAP – Downlink MAP : downlink allocations UL-MAP – Uplink MAP : uplink allocations FCH – Frame control header : contains information about the DL-MAP FC H FC H Downlink (DL) Subframe Uplink (UL) Subframe TTG RTG OFDMA Symbol Number Subchannel logical number Preamble DL-MAP UL-MAP DL Burst SS1 DL Burst Broadcast DL Burst Multicast DL Burst SS2 DL Burst SS3 DL Burst SS1 (From BS2) DL Burst SS4 Preamble DL-MAP UL Burst SS1 UL Burst SS2 UL Burst SS3 UL Burst SS4 Ranging subchannel
  • 36.
    R. M. Rao,2008 36 Data rates for SIMO/MIMO configurations Source: WiMax Forum 64 QAM with 5/6 CTC
  • 37.
    R. M. Rao,2008 37 Baseband Transmission Model • OFDM receiver provides estimates of – Channel hn,i(t) – Frequency offset W0 – Sample timing T' – OFDM symbol timing OFDM Transmitter Channel Inner Receiver Outer Receiver ai,k s(t) r(t) ADC Resulting Channel hi(t) Timing Delay d(t-eT') s(t) hn,i(t) Timing Delay W0(t) Noise n(t) T' r(n) r(n)
  • 38.
    R. M. Rao,2008 38 Generic OFDM Transmitter • Figure shows a generic MIMO OFDM Tx – MIMO not an element of 802.11a, but it is in 802.11n, 3GPP-LTE and 802.16e MAC Source Coding e.g. LDPC Space-Time Encoder Beamforming IFFT Append CP Insert Pilots CFR DUC DPD DAC RF PA IFFT Append CP Insert Pilots CFR DUC DPD DAC RF PA
  • 39.
    R. M. Rao,2008 39 OFDM Receiver Architecture • Figure illustrates architecture for generic OFDM Rx • Details will vary as a function of – Packet-based versus broadcast transmission – Existance of a preamble (or not) in the waveform ADC DAC DDC Sample Clock Adj. Course Freq. Offset Correction Symbol Timing CP Removal FFT Extract Pilots Fine Sample Clock Adj Fine Freq. Offset Adj. Freq. Domain Equalizer Channel Estimation Power Est. Extract Preamble Channel Decoding, e.g. LDPC Medium Access Controller To/From Network
  • 40.
    R. M. Rao,2008 40 Agenda • Introduction to Wireless communications – Systems design and considerations • The wireless environment • Link budget • MIMO and OFDM Systems – High level view of wireless communication systems • Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning • PHY/MAC etc. • The Platform FPGA – Overview of FPGAs and FPGA tools – Building DSP sub-systems on FPGAs – Digital baseband • FPGA tools and design methodology
  • 41.
    R. M. Rao,2008 41 Digital Receiver Architecture: Abstracted Architecture • Common model of abstraction for digital receiver is inner/outer receiver Ø Frequency Offset Estimation/Correction Ø Sample Clock Offset Correction Ø Channel Estimation/Equalization Ø Frame detection Ø AGC Ø Successive Interference Cancellation Ø Space-Time-Coding Ø IFFT/FFT Ø Per sub-carrier processing Inner Receiver Receiver Abstraction Outer Receiver Control, Protocol and Link Layer processing Digital IF Processing q Beamforming q QRD-RLS Ø Up-Conversion Ø Down-Conversion Ø Channelizer Ø Fast AGC Ø Channel Coding q LDPC q TPC q CTC q Viterbi q (De-) Interleave Ø Medium Access Control (MAC) Ø Link Layer Processing Ø System Initialization, Control and Monitoring Ø Application Ø Ethernet Ø PCI Express Ø SRIO Ø CPRI Ø OBSAI
  • 42.
    R. M. Rao,2008 42 Receiver Abstraction and Projection on to Platform FPGA Receiver Function Characteristics FPGA Platform Comments Digital IF Processing Ø MAC Intensive SX Ø DSP48 main requirement Inner Receiver Ø MAC intensive Ø Some functions LUT intensive CORDIC in QRD-RLS Ø FFT processing for OFDM Ø Correlation processing for timing Ø Per-carrier complexity processing (MIMO-OFDM) SX/LX Ø DSP48 leveraged FFT Ø FPGA fabric for CORDIC FFT Outer Receiver Ø Symbol rate tasks Ø Channel coding LX Ø ACS/ACSO dominated by low bit precision add/multiplexors Good match for fabric Lots of memory required Control/ Protocol Ø Gigabit connectivity Ø Linux Ø OS “heavy” tasks Ø TCP/IP FX Ø Embedded PPC used Ø Rocket IO for PCI Express SRIO Num. Sub-carriers TX RX N N   SX/LX Receiver Abstraction LX FX SX FPGA product portfolio Tailored for various processing Tasks in communications receiver
  • 43.
    R. M. Rao,2008 43 Digital Frontend Digital upconversion (downconversion) Crest factor reduction Digital pre-distortion
  • 44.
    R. M. Rao,2008 44 Serial Gigabit OBSAI/CPRI Proprietary serial backplane Inter-chip connectivity Embedded Software MAC (Media Access) Decision oriented tasks CORBA RTOS NBAP SCA (JTRS radios) Connectivity DAC DAC ADC ADC Logic & IO OBSAI/CPRI SRIO AD/DA interface EMIF DUC,DDC CFR,DPD RACH Searcher OFDM PHY TCC MIMO High Performance Processing High MIPs tasks Radio PHY Supported by embedded DSP tiles, distributed memory, block memory and logic fabric SRIO EMIF The Platform
  • 45.
    R. M. Rao,2008 45 Virtex-4/5 FPGA Arhitecture High-Level View • FPGA family with 3 members tailored for specific classes of processing – SX: DSP – LX: Logic centric – FX: Full featured • Embedded PowerPC hard IP • Giga-bit serial connectivity • DSP processing tiles “DSP48”
  • 46.
    R. M. Rao,2008 46 Virtex-5 FPGA Platform • 2 slices per CLB, 4 LUTs per CLB • Can be configured as a shift register • Can be configured as distributed memory Can be configured as RAM Can be configured as a shift register
  • 47.
    R. M. Rao,2008 47 Arithmatica Parallel Counter 20% Faster Performance and Uses Less Area Integrated Cascade Routing Enables Scalable Performance Arithmatica A+Adder 20% Faster Than Other Implementations Pipeline Registers Enable 500Mhz Performance Scalable 500MHz Performance Not Possible Using Standard Cell Libraries and Standard Cell Design Flow Virtex-4 DSP48 Slice
  • 48.
    R. M. Rao,2008 48 Z Y X  36 36 48 A B BCIN 18 18 18 P 48 CIN SUB 36 18 18 18 BCOUT 48 ZERO 48 48 PCOUT 48 PCIN 48 18 72 Wire Shift Right By 17b C 48 48 48 To Adjacent DSP48 Tile Register 48 Pipelined Multiplier 3 delay latency 18 18 B A P (PCOUT) LS Word MS Word 48 36b product sign extended to 48b z-3
  • 49.
    R. M. Rao,2008 49 Pipelined Complex 18x18 MPY Ar 18 Bi 18 ‘0’ 48 Ar 18 Bi 18 48 S1 S2 48 sn = Slice n Ar 18 Br 18 ‘0’ 48 Ai 18 Bi 18 48 S3 S4 48 - Pi Pr Register 36 Sign Extension
  • 50.
    R. M. Rao,2008 50 Wide Filters At Full Speed Within the Virtex-4 DSP Slice Column • Systolic N-tap FIR – Scalable N-levels deep implementation – N-levels deep at 500MHz performance • Uses Integrated Pipeline Registers to Synchronize Filter Inputs • Utilizes Input and Output Cascade Routing Build Massively Parallel 512-TAP FIR Filter In a Single Device Achieving 256 GMACCs/s Performance Equivalent Implementation Would Consume 444 Embedded Multipliers and 77,008 LCs And Would Only Achieve ½ The Performance
  • 51.
    R. M. Rao,2008 51 Xilinx FFT IP (4) • FFT fully utilizes FPGA arithmetic hardware resources • FFT viewed as a recursion using a butterfly kernel  b (  b) Phase factors: e-j2pk/N (  b) e-j2pk/N CADD1 CADD2 CMPY • CADD{1|2}: complex adder • CMPY: complex multiplier
  • 52.
    R. M. Rao,2008 52 Virtex-4 DSP Slice • DSP slice key for implementing high- performance arithmetic • Embedded 18x18 MPY and 48b adder – Butterfly phase rotator – Cross-addition
  • 53.
    R. M. Rao,2008 53 Butterfly CMPLX MPY • Complex MPY used in FFT butterfly • Optimized to employ Virtex-4 DSP Slice – 4 and 3 MPY option • Complex MPY available as IP module† Ar Br Ai Bi Pi Pr DSP Slice 1 DSP Slice 4 DSP Slice 2 DSP Slice 3 Pr + jPi = (Ar+jAi) x (Br + jBi) † Available: 6.2i IP Update 2
  • 54.
    R. M. Rao,2008 54 Performance/Parallelism/Area • FPGA: highly parallel computing machine • Achieve performance using functional unit parallelism • Area/throughput tradeoff delivered via Xilinx IP library • Butterfly array to produce high- performance FFT processor • High computation rate using (possibly) hundreds of DSP slices – Allocate resources as appropriate to meet system requirements • Large memory bandwidth using multi- port memory constructed from BRAMs Mem read BW: 320 x 36 x 500e6 = 5.76 Tera-bps
  • 55.
    R. M. Rao,2008 55 FFT Architecture • For small number of carriers and modest data rates single butterfly (I)FFT is probably suitable - Small FPGA footprint switch Phase Factor ROM Data Ram 0 Data Ram 1 switch Output Data Input Data Iteration Engine
  • 56.
    R. M. Rao,2008 56 Block boundary detection/Fine timing acquisition Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 ||2 ()* arg SAMPLES KNOWN SEQUENCE 1 OFDM block of repeated data Timing Est Freq Est ave Half an OFDM block F. Tufvesson, O. Edfors, M. Faulkner, “Time and Frequency Synchronization for OFDM using PN-Sequence Preambles”, VTC-1999/Fall, vol 4, pp.2203-7, New Jersey, 1999.
  • 57.
    R. M. Rao,2008 57 Fine-timing acquisition using a clipped correlator 1 yn sysgen cast bc3 sysgen cast bc2 sysgen d en q z -1 in0 in1 out0 Register1 sysgen a b sub a  b AddSub 3 ld 2 coeff 1 a 2 xnz 1 yn sysgen addr z -1 ROM1 sysgen d addr en q R a coef f ld y n MAC sysgen z -1 Delay2 4 LD 3 CAddr 2 DAddr 1 xn 1 y BaudClk Data Addr Coef Addr load FSM sysgen en z -1 Delay7 sysgen en z -7 Delay6 sysgen en z -1 Delay5 sysgen z -1 Delay4 sysgen en z -8 Delay3 sysgen z -1 Delay2 sysgen en z -8 Delay1 sysgen z -2 Delay xn DAddr CAddr LD y n xnz C7 xn DAddr CAddr LD y n xnz C6 xn DAddr CAddr LD y n xnz C5 xn DAddr CAddr LD y n xnz C4 xn DAddr CAddr LD y n xnz C3 xn DAddr CAddr LD y n xnz C2 xn DAddr CAddr LD y n xnz C1 sysgen a b en a + b z -1 AddSub4 sysgen a b en a + b z -1 AddSub2 sysgen a b en a + b z -1 AddSub13 sysgen a b en a + b z -1 AddSub12 sysgen a b en a + b z -1 AddSub1 sysgen a b en a + b z -1 AddSub 2 BaudClk 1 x Bank of correlators 1-bit correlator 10 time multiplexed correlators Each 1-bit correlator : 10 slices Total for clipped correlator : 589 slices Full precision correlators : 32 embedded multipliers 896 flipflops
  • 58.
    R. M. Rao,2008 58 QRD • One of the popular methods of matrix inversion is based on QRD. • Q is Unitary and R is upper triangular • A Unitary matrix has a trival inverse, • An upper triangular matrix can be inverted by back-substitution H QR  1 H Q Q   1 1 H H R Q   
  • 59.
    R. M. Rao,2008 59 Givens Rotations • For a 2x1 vector of real numbers • For a NxM matrix, repeat the process 2 cells at a time. 2 2 2 2 2 2 0 , c s a a b s c b a b c s a b a b                            11 12 13 11 12 13 11 12 13 11 12 13 21 22 23 21 22 23 22 23 22 23 31 32 33 32 33 32 33 33 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a                                               
  • 60.
    R. M. Rao,2008 60 Systolic Arrays • Structured arrays with identical cells. Usually a “boundary” cell and an “internal” cell for the QRD process. Boundary cell Internal cell 1. The boundary cell generates the rotations. 2. Internal cell applies the rotations to all the cells in the row. 3. The systolic array in this figure can handle any matrix below 3x3.
  • 61.
    R. M. Rao,2008 61 Triangularization mode • For QRD of upto a 3x3 matrix we need 3 boundary cells and 3 internal cells. • Boundary cells calculate rotation vectors and internal cells store them. • Data is fed column-wise into the systolic array. • This may have to be staggered depending on the pipelining delays thru the boundary cell and internal cell. 11 12 13 11 12 13 11 12 13 11 12 13 21 22 23 22 23 22 23 22 23 31 32 33 31 32 33 32 33 33 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a                                                  31 21 11 a a a 32 22 12 a a a 33 23 13 a a a The rotation factors for zeroing out cell A(2,1) are stored in cell A(1,2), etc.
  • 62.
    R. M. Rao,2008 62 Q-matrix computation mode H H H Q A R Q I Q   11 12 13 21 21 31 31 11 12 13 32 32 21 21 21 22 23 22 23 32 32 31 31 31 32 33 33 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 a a a c s c s a a a c s s c a a a a a s c s c a a a a                                                         0 0 1 0 1 0 1 0 0 first column of Q matrix second column of Q matrix third column of Q matrix * * * . * . * . * . ; s x I s s I c z x I c s I s c c      H Q R A
  • 63.
    R. M. Rao,2008 63 Agenda • Introduction to Wireless communications – Systems design and considerations • The wireless environment • Link budget • MIMO and OFDM Systems – High level view of wireless communication systems • Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning • PHY/MAC etc. • The Platform FPGA – Overview of FPGAs and FPGA tools – Building DSP sub-systems on FPGAs – Digital baseband • FPGA tools and design methodology
  • 64.
    R. M. Rao,2008 64 FPGA Tools for DSP Systems Design • Higher level tools are raising the level of abstraction. • Allows non-hardware engineers (algorithm designers) to get a first look at hardware. • System Generator – Simulink to Hardware • C-to-Gates tools – C or “higher” level languages to gates
  • 65.
    R. M. Rao,2008 65 System Generator System Level Modeling & Simulation Framework Work in the language of your problem HDL C
  • 66.
    R. M. Rao,2008 66 HDL Simulation Flow 1. Develop Algorithm & System Model Download to FPGA DSP Development Flow 2. Automatic Code Generation Simulink MDL Bitstream System Generator Flow 3. Xilinx Implementation Flow HDL Test Bench Test Vectors RTL VHDL & Cores FPGA
  • 67.
    R. M. Rao,2008 67 Configurable MIMO-OFDM Transmitter 8 ImagOut4 7 RealOut4 6 ImagOut3 5 RealOut3 4 ImagOut2 3 RealOut2 2 ImagOut1 1 RealOut1 RealIn ImagIn WriteFIFO BaudClk RealOut1 ImagOut1 RealOut2 ImagOut2 RealOut3 ImagOut3 RealOut4 ImagOut4 Spatial Demultiplexing RealIn ImagIn SampleClk Bdata rfd Preamble BFrame FFTbusy RealOut ImagOut Start Enable DataRequest DataSubcarrier Pilot Insertion and Data loading DataIn SampleClk Zeroblks Preamble Bdata DataSubc DataEnable RealOut ImagOut Packetization and Encoding SampleClk Zeroblks Preamble Bdata BFrame Packet Controller sysgen and z -0 Logical2 sysgen and z -0 Logical sysgen not Inverter FFT xn_re xn_im start enable xk_re xk_im xk_index rfd vout Busy FFT Clock Generator SampleClk BaudClk Clock Generator RealIn ImagIn Addr WriteFIFO RealOut ImagOut ReadFIFO Add Cyclic Extension 3 DataDone 2 DataEnable 1 DataIn double double double double double double double Fix_16_10 UFix_6_0 double double double Fix_16_10 double double double double double double double double double double double double double double double double Bool Bool Bool double double Bool double double Packet Controller Packetization and configurable STBC encoding Pilot insertion and data loading Time shared FFT across antennas Add Cyclic Extension/Block Shaping Spatial Demultiplexing and Interpolation Resource sharing (folding factor) Ratio of System clock rate to symbol rate > 8 needed for a 4 transmit antenna system
  • 68.
    R. M. Rao,2008 68 MIMO Receiver Architecture Samples processed at sample clock rate Samples processed at system clock rate Packet Detection Packet Detection Packet Detection Packet Detection Block Boundary Detection Block Boundary Coarse CFO estimate Coarse CFO estimate CFO estimator Strip CP Strip CP Strip CP Strip CP Input FIFO Input FIFO Input FIFO Input FIFO FFT FFT FFT FFT Rx 1 Rx 2 Rx 3 Rx 4 Channel Estimator Output FIFO Output FIFO Output FIFO Output FIFO Combine PD MIMO Decoder Matrix (MMSE, etc) MIMO Decode Soft Decisions MIMO Decoder FIFO Pilot based CFO estimator Packet Controller Preamble Payload CFO Compensator
  • 69.
    R. M. Rao,2008 69 Fine-timing acquisition using a clipped correlator 1 yn sysgen cast bc3 sysgen cast bc2 sysgen d en q z -1 in0 in1 out0 Register1 sysgen a b sub a  b AddSub 3 ld 2 coeff 1 a 2 xnz 1 yn sysgen addr z -1 ROM1 sysgen d addr en q R a coef f ld y n MAC sysgen z -1 Delay2 4 LD 3 CAddr 2 DAddr 1 xn 1 y BaudClk Data Addr Coef Addr load FSM sysgen en z -1 Delay7 sysgen en z -7 Delay6 sysgen en z -1 Delay5 sysgen z -1 Delay4 sysgen en z -8 Delay3 sysgen z -1 Delay2 sysgen en z -8 Delay1 sysgen z -2 Delay xn DAddr CAddr LD y n xnz C7 xn DAddr CAddr LD y n xnz C6 xn DAddr CAddr LD y n xnz C5 xn DAddr CAddr LD y n xnz C4 xn DAddr CAddr LD y n xnz C3 xn DAddr CAddr LD y n xnz C2 xn DAddr CAddr LD y n xnz C1 sysgen a b en a + b z -1 AddSub4 sysgen a b en a + b z -1 AddSub2 sysgen a b en a + b z -1 AddSub13 sysgen a b en a + b z -1 AddSub12 sysgen a b en a + b z -1 AddSub1 sysgen a b en a + b z -1 AddSub 2 BaudClk 1 x Bank of correlators 1-bit correlator 10 time multiplexed correlators Each 1-bit correlator : 10 slices Total for clipped correlator : 589 slices Full precision correlators : 32 embedded multipliers 896 flipflops
  • 70.
    R. M. Rao,2008 70 MIMO-OFDM Receiver 10 ValidOut 9 PacketDetect 8 SoftDecImag4 7 SoftDecReal4 6 SoftDecImag3 5 SoftDecReal3 4 SoftDecImag2 3 SoftDecReal2 2 SoftDecImag1 1 SoftDecReal1 Ch_tx1rx1 Ch_tx1rx2 Ch_tx1rx3 Ch_tx1rx4 Ch_tx2rx1 Ch_tx2rx2 Ch_tx2rx3 Ch_tx2rx4 Ch_tx3rx1 Ch_tx3rx2 Ch_tx3rx3 Ch_tx3rx4 Ch_tx4rx1 Ch_tx4rx2 Ch_tx4rx3 Ch_tx4rx4 En Addr wreal_1_1 wimag_1_1 wreal_1_2 wimag_1_2 wreal_1_3 wimag_1_3 wreal_1_4 wimag_1_4 wreal_2_1 wimag_2_1 wreal_2_2 wimag_2_2 wreal_2_3 wimag_2_3 wreal_2_4 wimag_2_4 wreal_3_1 wimag_3_1 wreal_3_2 wimag_3_2 wreal_3_3 wimag_3_3 wreal_3_4 wimag_3_4 wreal_4_1 wimag_4_1 wreal_4_2 wimag_4_2 wreal_4_3 wimag_4_3 wreal_4_4 wimag_4_4 Weight Matrix Computation Rxreal1 Rximag1 Rxreal2 Rximag2 Rxreal3 Rximag3 Rxreal4 Rximag4 ValidData Addr Out_real1 Out_imag1 Out_real2 Out_imag2 Out_real3 Out_imag3 Out_real4 Out_imag4 ReadFIFO AddrOut Output FIFO RealIn1 ImagIn1 RealIn2 ImagIn2 Baud_clk PacketDetect CFO_Est PktDetPulse MIMO Packet Detect1 Rxreal1 Rximag1 Rxreal2 Rximag2 Rxreal3 Rximag3 Rxreal4 Rximag4 ReadFIFO Addr wreal_1_1 wimag_1_1 wreal_1_2 wimag_1_2 wreal_1_3 wimag_1_3 wreal_1_4 wimag_1_4 wreal_2_1 wimag_2_1 wreal_2_2 wimag_2_2 wreal_2_3 wimag_2_3 wreal_2_4 wimag_2_4 wreal_3_1 wimag_3_1 wreal_3_2 wimag_3_2 wreal_3_3 wimag_3_3 wreal_3_4 wimag_3_4 wreal_4_1 wimag_4_1 wreal_4_2 wimag_4_2 wreal_4_3 wimag_4_3 wreal_4_4 wimag_4_4 BaudClk Out_real1 Out_imag1 v alid_out ReadWeightMatrix Out_real2 Out_imag2 Out_real3 Out_imag3 Out_real4 Out_imag4 MIMO Decoder WriteFIFO RxStream1 RxStream2 RxStream3 RxStream4 Enable ReadFIFO CFO_est FFT_Start CFO_Valid RxOut1 RxOut2 RxOut3 RxOut4 FIFO_status_f lag Input Buffer RealIn ImagIn BaudClk Out2 BBDValid Fine Timing Acquisition RxStream1 RxStream2 RxStream3 RxStream4 FIFO_status_f lag Enable CFO_Valid Reset RxReal1 RxImag1 RxReal2 RxImag2 RxReal3 RxImag3 RxReal4 RxImag4 Valid out Addr FFT_RFD FFT_Start FFT 0 Display2 0 Display1 z -1 Delay8 en z -1 Delay7 en z -1 Delay6 en z -1 Delay5 en z -1 Delay4 en z -1 Delay3 en z -1 Delay2 en z -1 Delay1 en z -1 Delay BlkBounDetect RealIn1 ImagIn1 RealIn2 ImagIn2 RealIn3 ImagIn3 RealIn4 ImagIn4 PacketDetect BaudClk ReadEnable RxStream1 RxStream2 RxStream3 RxStream4 Cyclic Prefix Removal Clock Generator SampleClk BaudClk Clock Generator Rxreal1 Rximag1 Rxreal2 Rximag2 Rxreal3 Rximag3 Rxreal4 Rximag4 ValidData Addr ReadAddr Ch_1_1 Ch_1_2 Ch_1_3 Ch_1_4 Ch_2_1 Ch_2_2 Ch_2_3 Ch_2_4 Ch_3_1 Ch_3_2 Ch_3_3 Ch_3_4 Ch_4_1 Ch_4_2 Ch_4_3 Ch_4_4 CFO_Est CFO_Est_Valid Channel Estimation a b a - b AddSub 9 Reset 8 ImagIn4 7 RealIn4 6 ImagIn3 5 RealIn3 4 ImagIn2 3 RealIn2 2 ImagIn1 1 RealIn1 Packet Detection Fine Timing Acq Cyclic prefix removal Channel Estimation Weight Matrix Computation MIMO Decoder FFT Carrier Frequency Offset Correction Output FIFO
  • 71.
    R. M. Rao,2008 71 Channel Estimation 32 Chimag16 31 Chreal16 30 Chimag15 29 Chreal15 28 Chimag14 27 Chreal14 26 Chimag13 25 Chreal13 24 Chimag12 23 Chreal12 22 Chimag11 21 Chreal11 20 Chimag10 19 Chreal10 18 Chimag9 17 Chreal9 16 Chimag8 15 Chreal8 14 Chimag7 13 Chreal7 12 Chimag6 11 Chreal6 10 Chimag5 9 Chreal5 8 Chimag4 7 Chreal4 6 Chimag3 5 Chreal3 4 Chimag2 3 Chreal2 2 Chimag1 1 Chreal1 Enable Reset Pilot_real Training Symbols Tx4 Enable Reset Pilot_real Training Symbols Tx3 Enable Reset Pilot_real Training Symbols Tx2 Enable Reset Pilots Addr Training Symbols Tx1 simout11 To Workspace2 addr Real Imag WE EN real_out imag_out Single Port RAM3 addr Real Imag WE EN real_out imag_out Single Port RAM2 addr Real Imag WE EN real_out imag_out Single Port RAM1 addr Real Imag WE EN real_out imag_out Single Port RAM sysgen sel d0 d1 Mux1 sysgen sel d0 d1 Mux sysgen and z -2 Logical sysgen z -2 Delay9 sysgen z -2 Delay8 sysgen z -2 Delay7 sysgen z -1 Delay6 sysgen z -2 Delay5 sysgen z -2 Delay4 sysgen z -2 Delay3 sysgen z -2 Delay2 sysgen z -2 Delay12 sysgen z -2 Delay11 sysgen z -2 Delay10 sysgen z -3 Delay1 sysgen rst en out Counter2 sysgen rst en out Counter1 ValidData ChEstPilots ChEstEn ChEstRst En Rst En2 ChEstPilots1 ControlSignals addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx4-Rx4 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx4-Rx3 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx4-Rx2 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx4-Rx1 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx3-Rx4 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx3-Rx3 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx3-Rx2 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx3-Rx1 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx2-Rx4 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx2-Rx3 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx2-Rx2 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx2-Rx1 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx1-Rx4 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx1-Rx3 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx1-Rx2 addr Pilots1 Real Imag WE VDATA real_out imag_out Real_in Imag_in ChEst Tx1-Rx1 sysgen x 0.3535 CMult7 sysgen x 0.3535 CMult6 sysgen x 0.3535 CMult5 sysgen x 0.3535 CMult4 sysgen x 0.3535 CMult3 sysgen x 0.3535 CMult2 sysgen x 0.3535 CMult1 sysgen x 0.3535 CMult 12 ReadAddr 11 ChEstPilots 10 Addr 9 ValidData 8 Rximag4 7 Rxreal4 6 Rximag3 5 Rxreal3 4 Rximag2 3 Rxreal2 2 Rximag1 1 Rxreal1 double double Bool Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 UFix_6_0 Fix_16_10 UFix_6_0 UFix_6_0 UFix_6_0 Fix_16_10 Fix_16_10 double double double Bool double double UFix_6_0 Fix_16_10 Fix_16_10 Bool double Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_32_20 Fix_32_20 Fix_32_20 double double Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 double Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_2_0 Fix_32_20 Fix_32_20 Fix_32_20 double Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_2_0 UFix_6_0 double double double (8) double double double double double double double double Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_32_20 Fix_32_20 double Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Fix_32_20 Channel Estimation Pilots for Tx4 Channel Estimation Pilots for Tx1 4x4 Channel Estimation Memory Control Signals Input FIFO
  • 72.
    R. M. Rao,2008 72 Packet Detection DecisionMetric = (CorrelationPeak >= 0.5(AveragePower)) 3 AvePower 2 CorrMetric 1 PacketDetect In1 En Out1 Squarer In BaudClk Out Sliding Window Averager sysgen X >> 1 z-0 Shift sysgen a b a>b z-1 Relational sysgen force Reinterpret1 sysgen force Reinterpret Complex Multiply RealIn1 ImagIn1 En RealOut Power Calculator1 Complex Multiply RealIn1 ImagIn1 BaudClk PwrOut Power Calculator sysgen en z-32 Delay1 sysgen en z-32 Delay Complex Multiply RealIn1 ImagIn1 RealIn2 ImagIn2 BaudClk RealOut ImagOut Correlate sysgen cast Convert1 sysgen cast Convert RealIn ImagIn BaudClk RealOut ImagOut Complex Sliding Window Averager 3 BaudClk 2 ImagIn 1 RealIn double double Fix_8_0 Fix_8_0 Bool Fix_8_0 Fix_8_0 Fix_8_0 Fix_16_0 Fix_16_0 Fix_16_0 double double Fix_32_0 double Fix_8_0 Fix_8_0 Fix_16_8 Fix_16_8 Schmidl and Cox algorithm for Packet Detection and coarse carrier frequency offset estimation. T. M. Schmidl, D. C. Cox, “Low Overhead Low Complexity Synchronization for OFDM”, ICC 1996, Vol 3, pp 1301-1306. Z-D C P 2 2 ( )  r(n) c(n) p(n) m(n) * * Identical halves of 1 OFDM symbol
  • 73.
    R. M. Rao,2008 73 Two Branch CFO estimation using Schmidl and Cox algo AvePwr 3 CorrMetric _imag 2 CorrMetric _real 1 Sliding Window Averager In BaudClk Rst Out Slice5 [a:b] Slice3 [a:b] Slice2 [a:b] Slice1 [a:b] Reinterpret 4 reinterpret Reinterpret 3 reinterpret Reinterpret 2 reinterpret Reinterpret 1 reinterpret Magnitude -Squared 1 Squarer RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut Delay 4 en z -32 Delay 3 en z -32 Delay 2 en z -2 Delay 1 en z -32 Delay en z -32 Complex Sliding Window Averager 1 RealIn ImagIn BaudClk Rst RealOut ImagOut Complex Sliding Window Averager RealIn ImagIn BaudClk Rst RealOut ImagOut Complex Multiply 3 Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut Complex Multiply 2 Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut AddSub 2 a b a + b z -1 AddSub 1 a b a + b z -1 Rst 6 BaudClk 5 ImagIn 2 4 RealIn 2 3 ImagIn 1 2 RealIn 1 1 a b Combine the metric from both Antennas Carrier Frequency Offset causes a linearly increasing rotation in the time domain j Ye q Y
  • 74.
    R. M. Rao,2008 74 Carrier Frequency Offset Estimation • Pre-FFT – Uses a dedicated preamble or symbol for CFO estimation • Post-FFT using channel estimation pilots – Uses channel estimation training symbols • Post-FFT CFO Tracking – Needs continuous pilots during payload symbols • CFO Estimation using Cyclic Prefix – Works well when you have a lengthy cyclic prefix – Examples: WiMax, 3GPP-LTE, DVB-T/H – Does not need preamble or pilot support
  • 75.
    R. M. Rao,2008 75 Pre-FFT Carrier Frequency Offset Estimation CFO_Est 1 Truncate In 1 In 2 In 3 Out 1 Out 2 Out 3 Rising edge detector In 1 Out 1 Register1 d rst en q z - 1 Packet Detection 3 RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk Rst CorrMetric _real CorrMetric _imag AvePwr Delay6 en z-24 Delay5 en z-14 Convert cast CORDIC ATAN z-17 x y mag atan CMult8 x 0. 003906 z - 2 BBD 7 Rst 6 Baud_clk 5 ImagIn2 4 RealIn 2 3 ImagIn1 2 RealIn 1 1 The angle of the correlation metric is proportional to the Carrier frequency offset. Right size the number of bits before the CORDIC operation. CORDIC ATAN from the Xilinx Math library calculates the angle. ˆ 2 2 s N q p      
  • 76.
    R. M. Rao,2008 76 Post-FFT CFO Estimation and tracking Location of channel estimation training symbols for Antenna 1 for a 2 antenna MIMO system A subset of channel estimation training symbols is used for CFO estimation Angular rotation on symbol 1 Angular rotation on symbol 2 ( ) k q Proportional to CFO ( ( )) ˆ 2 (1 ) mean k c N CP N s q e p   CFO causes a linear rotation every sample in the time domain. CFO causes a constant rotation on all subcarriers in the frequency domain. This rotation increases from OFDM symbol to symbol and can be used to estimate CFO.
  • 77.
    R. M. Rao,2008 77 Carrier Frequency Offset Correction ImagOut 4 8 RealOut 4 7 ImagOut 3 6 RealOut 3 5 ImagOut 2 4 RealOut 2 3 ImagOut 1 2 RealOut 1 1 Rising edge detector In1 Out1 Relational 1 a b a <=b z - 0 Relational a b a <b z - 0 Negate 1 x(- 1 ) Logical 1 or z-0 Logical and z -0 Delay 7 z -1 Delay 6 z -1 Delay 5 z -1 Delay 4 z -1 Delay 3 z -1 Delay 2 z -1 Delay 1 z -1 Delay z -1 DDS freq_off Enable Reset cos_out sin_out Counter rst out Constant 3 1 Constant 2 78 Constant 1 0 Complex Multiply 3 Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut Complex Multiply 2 Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut Complex Multiply 1 Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut Complex Multiply Complex Multiply RealIn 1 ImagIn 1 RealIn 2 ImagIn 2 BaudClk RealOut ImagOut CMult x 0 .01563 Reset 12 CFO_Est_valid 11 FFT_Start 10 CFO_Est 9 ImagIn 4 8 RealIn 4 7 ImagIn 3 6 RealIn 3 5 ImagIn 2 4 RealIn 2 3 ImagIn 1 2 RealIn 1 1 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Bool Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_15 Fix_16_15 Fix_17_15 Fix_16_12 Fix_16_10 Fix_16_10 UFix_16_0 UFix_16_0 UFix_16_0 Bool Bool Bool Bool Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Fix_16_10 Bool Bool Fix_16_10 Fix_16_10 Fix_16_16 double Direct digital synthesizer (DDS) from the Xilinx DSP SysGen library.
  • 78.
    R. M. Rao,2008 78 Design methodology issues • FPGA tools – Where to from here? • C-to-gates – Higher level design languages to gates – Raising the level of abstraction
  • 79.
    R. M. Rao,2008 79 End of Roadmap for the Von Neumann Model SPECInt92/MHz Source: Ronen [2001] CPUs are as smart as they can be! MHz L2 $ Spot the CPU! L1 $ CPU Source: Agarwala [2002] TI 6416 Clock frequency scaling Absolute power limits With Moore’s law you also get leakage! Source: Borkar [1999] Divide and conquer Source: Zu & Baas [2006] Multi-core Arrays 1945-2005 Sequential programming 2005 - ???? Concurrent programming 6x6 GALS Processor Array
  • 80.
    R. M. Rao,2008 80 Merging Mindsets: Software Design vs. Hardware Design class A start() class B class C class D resourceA resourceB resourceC  Events  Protocols  Ordering  Sequential execution  Encapsulation  Abstraction  Portability  Re-use Implementation Detail  Control Logic  Interface Glue  Concurrency  Communication  Architecture  Clocks  Signals  Timing  Combining the strengths of both paradigms can bring about a radical improvement in hardware/software system design productivity.
  • 81.
    R. M. Rao,2008 81 Objective for a New Methodology: reduce design cost (by a lot) • Quality of result (QoR) is not a design goal! Ø Performance, power, BOM cost budgets make QoR a design constraint • The real objective is to meet the QoR target and minimize: Ø Non-recurring engineering costs (NRE) Ø Time-to-market (TTM) • The new methodology should save on design cost by enabling Ø Design of portable, retargetable, composable IP blocks Ø Rapid design space exploration and system composition Total Design Cost NRE $, TTM Traditional HDL Flow QoR performance/$ performance/W New methodology Abstraction Profit abstraction cost
  • 82.
    R. M. Rao,2008 82 ‘C’ or higher level language to Gates • There is interest in higher level design methodologies, such as C-to-Gates from the design community. • ESL (Electronic system level) tools/design methodologies are being explored. • But, extracting all the concurrency from a sequential description is not an easy problem.
  • 83.
    R. M. Rao,2008 83 Actor/Dataflow Programming Model encapsulated state Actions State point-to-point, buffered token-passing connections actors guarded atomic actions • A well-known and researched model for concurrent systems – Edward Lee et. al. (UC Berkeley) – Arvind et. al. (MIT) • Broadly applicable to heterogeneous HW/SW systems • Actors are described in the CAL language (UC Berkeley) – Open source simulator available from SourceForge – Under consideration as reference model for MPEG
  • 84.
    R. M. Rao,2008 84 Conclusion • FPGAs are finding wide use in infrastructure communication systems and signal processing systems. • FPGA are an efficient choice for exploring VLSI architectures. • FPGA tools are raising the level of abstraction to allow algorithm designers the ability to explore h/w architectures without learning “h/w design tools/languages”.
  • 85.
    R. M. Rao,2008 85 Questions?