Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Design and FPGA implementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. T...
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
Vector precoding In uncoordinated receiver scenarios, the use of  precoding  techniques at the base station can allow the ...
Vector precoding Linear precoding techniques Main linear approaches: Zero-Forcing:  Regularized: MMSE (WF):
Vector precoding Vector precoding The perturbation vector  a   that minimizes the unscaled transmitted power can be found ...
Vector precoding Solution: search for the closest point in a lattice The problem is similar to maximum likelihood (ML) det...
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
Fixed Sphere Encoder Sphere encoder (SE): <ul><ul><li>Reduces the complexity in comparison to an exhaustive search. </li><...
Fixed Sphere Encoder Sphere encoder search tree Sequential algorithm    Suboptimal resource usage. Variable complexity  ...
Fixed Sphere Encoder <ul><li>Originally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum...
Fixed Sphere Encoder <ul><li>In order to fix the tree, the lattice must be reduced.  The following candidate points (25 pe...
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
Channel matrix pre-processing Ordering of the channel matrix <ul><li>Since most of the branches of the SE are going to be ...
Channel matrix pre-processing Ordering of the channel matrix Averaged values of  u ii   for different levels depending on ...
Channel matrix pre-processing Effect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
Simulation results <ul><li>Multiuser setups considered: </li></ul><ul><ul><li>4x4 </li></ul></ul><ul><ul><li>6x6 </li></ul...
Simulation results Number of visited nodes:
Simulation results <ul><li>BER comparison: </li></ul>
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
FPGA implementation and optimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - ...
FPGA implementation and optimization Algorithm implementation Implemented using Xilinx System Generator for DSP
FPGA implementation and optimization Special features <ul><li>PED 6 </li></ul><ul><ul><li>-The  n 6 =6  closest points to ...
FPGA implementation and optimization 274 multipliers required    Prohibitive for low-cost FPGA implementation. A series o...
FPGA implementation and optimization Optimization 3:  Approximated Euclidean distance Replace the  -norm calculation perfo...
FPGA implementation and optimization Optimization 4:  Simplified 2-point slicer <ul><li>So far, the decision of whether  o...
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
FPGA implementation and optimization Summary of results The performance loss derived from the implementation of the optimi...
Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </...
Summary and conclusions <ul><li>-The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems w...
End Thank you for your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]
Upcoming SlideShare
Loading in …5
×

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

846 views

Published on

Presentation on the design and FPGA implementation of low-complexity multiuser vector precoders, given at ETHZ Zurich in October 2010

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

  1. 1. Design and FPGA implementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. Thompson Signal Theory and Communications Area Mondragon Goi Eskola Politeknikoa University of Mondragon <ul><li>TexPoint fonts used in EMF. </li></ul><ul><li>Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A </li></ul>
  2. 2. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  3. 3. Vector precoding In uncoordinated receiver scenarios, the use of precoding techniques at the base station can allow the separation of users’ information streams. . . . x 1 x 2 x M-1 x M y 2 User 2 y K User K Wireless K x M channel matrix H User 1 y 1 Precoder Multiuser MIMO downlink channel s 1 s 2 s K-1 s K . . . Base Station
  4. 4. Vector precoding Linear precoding techniques Main linear approaches: Zero-Forcing: Regularized: MMSE (WF):
  5. 5. Vector precoding Vector precoding The perturbation vector a that minimizes the unscaled transmitted power can be found as: Another approach is to minimize the MMSE (WF-VP):
  6. 6. Vector precoding Solution: search for the closest point in a lattice The problem is similar to maximum likelihood (ML) detection in MIMO systems: The main differences are the following: 1- VP lattice, which is infinite, must be reduced to be implemented. 2- VP search is not affected by noise. 3- Quantization is less critical in VP since both s and a belong to known sets. 4.- A failure of the search causes bit errors in MIMO detection, whereas it only means a larger unscaled power and a more noisy reception in VP, which may affect BER slightly.
  7. 7. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  8. 8. Fixed Sphere Encoder Sphere encoder (SE): <ul><ul><li>Reduces the complexity in comparison to an exhaustive search. </li></ul></ul><ul><ul><li>The search is constrained to the perturbation vectors a belonging to a hypersphere of radius R around the signal s . </li></ul></ul><ul><ul><li>The triangular vector , obtained through the Cholesky or QR decomposition of the precoding matrix is used to enable a recursive search through a tree. </li></ul></ul>
  9. 9. Fixed Sphere Encoder Sphere encoder search tree Sequential algorithm  Suboptimal resource usage. Variable complexity  Variable throughput.
  10. 10. Fixed Sphere Encoder <ul><li>Originally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum fixed complexity tree search. </li></ul><ul><ul><li>Tree configuration vector </li></ul></ul>[Barbero06] L. Barbero, Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems, PhD dissertation, University of Edinburgh, 2006.
  11. 11. Fixed Sphere Encoder <ul><li>In order to fix the tree, the lattice must be reduced. The following candidate points (25 per level) have been considered: </li></ul>Real Imaginary %
  12. 12. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  13. 13. Channel matrix pre-processing Ordering of the channel matrix <ul><li>Since most of the branches of the SE are going to be removed to design the FSE, the following considerations must be taken: </li></ul><ul><li>- The mean number of visited nodes per tree level is inversely proportional to . </li></ul><ul><ul><ul><li>- This effect is more relevant at the top levels of the tree, since T i decreases with i. </li></ul></ul></ul><ul><li>The following ordering strategies have been considered best: </li></ul><ul><li>- V-BLAST like iterative algorithm from MIMO detection literature based on the minimization of the norm of the pseudoinverse of the precoding matrix. </li></ul><ul><li>- Simple non-iterative ordering of the columns of the precoding matrix according to their norm. </li></ul>
  14. 14. Channel matrix pre-processing Ordering of the channel matrix Averaged values of u ii for different levels depending on the ordering: Averaged numbers of evaluated nodes at each level:
  15. 15. Channel matrix pre-processing Effect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
  16. 16. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  17. 17. Simulation results <ul><li>Multiuser setups considered: </li></ul><ul><ul><li>4x4 </li></ul></ul><ul><ul><li>6x6 </li></ul></ul><ul><ul><li>8x8 </li></ul></ul><ul><li>Tree configurations: </li></ul><ul><ul><li>n 4x4 = [1, 1, 2, 5] </li></ul></ul><ul><ul><li>n 6x6 = [1, 1, 1, 2, 3, 4] </li></ul></ul><ul><ul><li>n 8x8 = [1, 1, 1, 1, 2, 2, 3, 4] </li></ul></ul><ul><li>Rayleigh channel, constant per each block. </li></ul><ul><li>16-QAM modulation </li></ul>
  18. 18. Simulation results Number of visited nodes:
  19. 19. Simulation results <ul><li>BER comparison: </li></ul>
  20. 20. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  21. 21. FPGA implementation and optimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - Restricted group (5x5=25 points) of integers instead of the lattice. - Channel ordering, which is carried out every transmission block, has not been considered. - Distance computation: Implemented VP FSE algorithm PED AED
  22. 22. FPGA implementation and optimization Algorithm implementation Implemented using Xilinx System Generator for DSP
  23. 23. FPGA implementation and optimization Special features <ul><li>PED 6 </li></ul><ul><ul><li>-The n 6 =6 closest points to each symbol s 6 are known beforehand </li></ul></ul><ul><ul><li>- Due to symmetries, the set of 6 points can be computed by mapping each symbol to its equivalent in the first quadrant and varying the sign of the set for the equivalent point accordingly. </li></ul></ul><ul><li>PED 5 </li></ul><ul><ul><li>- 2-point slicer needed to compute </li></ul></ul><ul><ul><li>the closest 2 points. </li></ul></ul><ul><ul><li>- First closest point: </li></ul></ul><ul><ul><li>- Second closest point: </li></ul></ul><ul><ul><li> or </li></ul></ul>
  24. 24. FPGA implementation and optimization 274 multipliers required  Prohibitive for low-cost FPGA implementation. A series of hardware optimizations have been proposed to reduce the number of required embedded multipliers. Optimization 1: Rearrangement of complex multiplications - Initial system  4 multipliers and 2 adders - Alternative complex multiplication  3 multipliers and 5 adders - Required number of multipliers after OPT. 1  224 Optimization 2: Hard quantization If the values of u ij /u ii are quantized to a very small number of bits , and the multiplications required to compute z i are implemented using programmable logic, the number of multipliers reduces to 74 , although the number of required slices is slightly incremented. Small degradation is introduced.
  25. 25. FPGA implementation and optimization Optimization 3: Approximated Euclidean distance Replace the -norm calculation performed to obtain the PEDs by a simpler method. 1.- The Manhattan distance metric ( ) 2.- The metric Both of these techniques introduce a small BER performance degradation. However, after the implementation of OPT3 the number of multipliers has been reduced to 30 .
  26. 26. FPGA implementation and optimization Optimization 4: Simplified 2-point slicer <ul><li>So far, the decision of whether or </li></ul><ul><li>was the second closest point required the computation of both distances. </li></ul><ul><li>A new technique which does not require of any extra distance calculation has been derived. </li></ul><ul><ul><li>Interior </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><ul><li>Edge </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><ul><li>Vertex </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><li>No performance loss after OPT4. </li></ul><ul><li>The total number of multipliers is 22 </li></ul>
  27. 27. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  28. 28. FPGA implementation and optimization Summary of results The performance loss derived from the implementation of the optimization strategies is just 0.2 dB at a BER of 10 -4 . As for the HW resources, a reduced-complexity implement-ation has been achieved.
  29. 29. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  30. 30. Summary and conclusions <ul><li>-The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems with: </li></ul><ul><ul><li>- Reduced complexity. Only a small subset of tree branches is analyzed in comparison to the SE. </li></ul></ul><ul><ul><li>- Fixed architecture and throughput. The number of nodes and branches to be computed is fixed and can be paralellized. </li></ul></ul><ul><li>- A 6x6 FPGA implementation has been presented which achieves a good performance and throughput. </li></ul><ul><li>- Implementation and optimization issues have been presented which show that the sphere encoder is less sensitive to quantization and suboptimal designed choices than its MIMO detection counterpart. </li></ul>
  31. 31. End Thank you for your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]

×