Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

697 views
643 views

Published on

Presentation on the design and FPGA implementation of low-complexity multiuser vector precoders, given at ETHZ Zurich in October 2010

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
697
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

  1. 1. Design and FPGA implementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. Thompson Signal Theory and Communications Area Mondragon Goi Eskola Politeknikoa University of Mondragon <ul><li>TexPoint fonts used in EMF. </li></ul><ul><li>Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A </li></ul>
  2. 2. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  3. 3. Vector precoding In uncoordinated receiver scenarios, the use of precoding techniques at the base station can allow the separation of users’ information streams. . . . x 1 x 2 x M-1 x M y 2 User 2 y K User K Wireless K x M channel matrix H User 1 y 1 Precoder Multiuser MIMO downlink channel s 1 s 2 s K-1 s K . . . Base Station
  4. 4. Vector precoding Linear precoding techniques Main linear approaches: Zero-Forcing: Regularized: MMSE (WF):
  5. 5. Vector precoding Vector precoding The perturbation vector a that minimizes the unscaled transmitted power can be found as: Another approach is to minimize the MMSE (WF-VP):
  6. 6. Vector precoding Solution: search for the closest point in a lattice The problem is similar to maximum likelihood (ML) detection in MIMO systems: The main differences are the following: 1- VP lattice, which is infinite, must be reduced to be implemented. 2- VP search is not affected by noise. 3- Quantization is less critical in VP since both s and a belong to known sets. 4.- A failure of the search causes bit errors in MIMO detection, whereas it only means a larger unscaled power and a more noisy reception in VP, which may affect BER slightly.
  7. 7. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  8. 8. Fixed Sphere Encoder Sphere encoder (SE): <ul><ul><li>Reduces the complexity in comparison to an exhaustive search. </li></ul></ul><ul><ul><li>The search is constrained to the perturbation vectors a belonging to a hypersphere of radius R around the signal s . </li></ul></ul><ul><ul><li>The triangular vector , obtained through the Cholesky or QR decomposition of the precoding matrix is used to enable a recursive search through a tree. </li></ul></ul>
  9. 9. Fixed Sphere Encoder Sphere encoder search tree Sequential algorithm  Suboptimal resource usage. Variable complexity  Variable throughput.
  10. 10. Fixed Sphere Encoder <ul><li>Originally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum fixed complexity tree search. </li></ul><ul><ul><li>Tree configuration vector </li></ul></ul>[Barbero06] L. Barbero, Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems, PhD dissertation, University of Edinburgh, 2006.
  11. 11. Fixed Sphere Encoder <ul><li>In order to fix the tree, the lattice must be reduced. The following candidate points (25 per level) have been considered: </li></ul>Real Imaginary %
  12. 12. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  13. 13. Channel matrix pre-processing Ordering of the channel matrix <ul><li>Since most of the branches of the SE are going to be removed to design the FSE, the following considerations must be taken: </li></ul><ul><li>- The mean number of visited nodes per tree level is inversely proportional to . </li></ul><ul><ul><ul><li>- This effect is more relevant at the top levels of the tree, since T i decreases with i. </li></ul></ul></ul><ul><li>The following ordering strategies have been considered best: </li></ul><ul><li>- V-BLAST like iterative algorithm from MIMO detection literature based on the minimization of the norm of the pseudoinverse of the precoding matrix. </li></ul><ul><li>- Simple non-iterative ordering of the columns of the precoding matrix according to their norm. </li></ul>
  14. 14. Channel matrix pre-processing Ordering of the channel matrix Averaged values of u ii for different levels depending on the ordering: Averaged numbers of evaluated nodes at each level:
  15. 15. Channel matrix pre-processing Effect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
  16. 16. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  17. 17. Simulation results <ul><li>Multiuser setups considered: </li></ul><ul><ul><li>4x4 </li></ul></ul><ul><ul><li>6x6 </li></ul></ul><ul><ul><li>8x8 </li></ul></ul><ul><li>Tree configurations: </li></ul><ul><ul><li>n 4x4 = [1, 1, 2, 5] </li></ul></ul><ul><ul><li>n 6x6 = [1, 1, 1, 2, 3, 4] </li></ul></ul><ul><ul><li>n 8x8 = [1, 1, 1, 1, 2, 2, 3, 4] </li></ul></ul><ul><li>Rayleigh channel, constant per each block. </li></ul><ul><li>16-QAM modulation </li></ul>
  18. 18. Simulation results Number of visited nodes:
  19. 19. Simulation results <ul><li>BER comparison: </li></ul>
  20. 20. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  21. 21. FPGA implementation and optimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - Restricted group (5x5=25 points) of integers instead of the lattice. - Channel ordering, which is carried out every transmission block, has not been considered. - Distance computation: Implemented VP FSE algorithm PED AED
  22. 22. FPGA implementation and optimization Algorithm implementation Implemented using Xilinx System Generator for DSP
  23. 23. FPGA implementation and optimization Special features <ul><li>PED 6 </li></ul><ul><ul><li>-The n 6 =6 closest points to each symbol s 6 are known beforehand </li></ul></ul><ul><ul><li>- Due to symmetries, the set of 6 points can be computed by mapping each symbol to its equivalent in the first quadrant and varying the sign of the set for the equivalent point accordingly. </li></ul></ul><ul><li>PED 5 </li></ul><ul><ul><li>- 2-point slicer needed to compute </li></ul></ul><ul><ul><li>the closest 2 points. </li></ul></ul><ul><ul><li>- First closest point: </li></ul></ul><ul><ul><li>- Second closest point: </li></ul></ul><ul><ul><li> or </li></ul></ul>
  24. 24. FPGA implementation and optimization 274 multipliers required  Prohibitive for low-cost FPGA implementation. A series of hardware optimizations have been proposed to reduce the number of required embedded multipliers. Optimization 1: Rearrangement of complex multiplications - Initial system  4 multipliers and 2 adders - Alternative complex multiplication  3 multipliers and 5 adders - Required number of multipliers after OPT. 1  224 Optimization 2: Hard quantization If the values of u ij /u ii are quantized to a very small number of bits , and the multiplications required to compute z i are implemented using programmable logic, the number of multipliers reduces to 74 , although the number of required slices is slightly incremented. Small degradation is introduced.
  25. 25. FPGA implementation and optimization Optimization 3: Approximated Euclidean distance Replace the -norm calculation performed to obtain the PEDs by a simpler method. 1.- The Manhattan distance metric ( ) 2.- The metric Both of these techniques introduce a small BER performance degradation. However, after the implementation of OPT3 the number of multipliers has been reduced to 30 .
  26. 26. FPGA implementation and optimization Optimization 4: Simplified 2-point slicer <ul><li>So far, the decision of whether or </li></ul><ul><li>was the second closest point required the computation of both distances. </li></ul><ul><li>A new technique which does not require of any extra distance calculation has been derived. </li></ul><ul><ul><li>Interior </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><ul><li>Edge </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><ul><li>Vertex </li></ul></ul><ul><ul><li>If  </li></ul></ul><ul><li>No performance loss after OPT4. </li></ul><ul><li>The total number of multipliers is 22 </li></ul>
  27. 27. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  28. 28. FPGA implementation and optimization Summary of results The performance loss derived from the implementation of the optimization strategies is just 0.2 dB at a BER of 10 -4 . As for the HW resources, a reduced-complexity implement-ation has been achieved.
  29. 29. Outline <ul><li>Vector precoding </li></ul><ul><li>Fixed sphere encoder </li></ul><ul><li>Channel matrix pre-processing </li></ul><ul><li>Simulation results </li></ul><ul><li>FPGA implementation and optimization </li></ul><ul><li>Implementation results </li></ul><ul><li>Summary and conclusions </li></ul>
  30. 30. Summary and conclusions <ul><li>-The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems with: </li></ul><ul><ul><li>- Reduced complexity. Only a small subset of tree branches is analyzed in comparison to the SE. </li></ul></ul><ul><ul><li>- Fixed architecture and throughput. The number of nodes and branches to be computed is fixed and can be paralellized. </li></ul></ul><ul><li>- A 6x6 FPGA implementation has been presented which achieves a good performance and throughput. </li></ul><ul><li>- Implementation and optimization issues have been presented which show that the sphere encoder is less sensitive to quantization and suboptimal designed choices than its MIMO detection counterpart. </li></ul>
  31. 31. End Thank you for your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]

×