• Save
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)
Upcoming SlideShare
Loading in...5
×
 

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

on

  • 593 views

Presentation on the design and FPGA implementation of low-complexity multiuser vector precoders, given at ETHZ Zurich in October 2010

Presentation on the design and FPGA implementation of low-complexity multiuser vector precoders, given at ETHZ Zurich in October 2010

Statistics

Views

Total Views
593
Views on SlideShare
584
Embed Views
9

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 9

http://mj89sp3sau2k7lj1eg3k40hkeppguj6j-a-sites-opensocial.googleusercontent.com 4
http://www.slideshare.net 3
https://mj89sp3sau2k7lj1eg3k40hkeppguj6j-a-sites-opensocial.googleusercontent.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010) Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010) Presentation Transcript

    • Design and FPGA implementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. Thompson Signal Theory and Communications Area Mondragon Goi Eskola Politeknikoa University of Mondragon
      • TexPoint fonts used in EMF.
      • Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • Vector precoding In uncoordinated receiver scenarios, the use of precoding techniques at the base station can allow the separation of users’ information streams. . . . x 1 x 2 x M-1 x M y 2 User 2 y K User K Wireless K x M channel matrix H User 1 y 1 Precoder Multiuser MIMO downlink channel s 1 s 2 s K-1 s K . . . Base Station
    • Vector precoding Linear precoding techniques Main linear approaches: Zero-Forcing: Regularized: MMSE (WF):
    • Vector precoding Vector precoding The perturbation vector a that minimizes the unscaled transmitted power can be found as: Another approach is to minimize the MMSE (WF-VP):
    • Vector precoding Solution: search for the closest point in a lattice The problem is similar to maximum likelihood (ML) detection in MIMO systems: The main differences are the following: 1- VP lattice, which is infinite, must be reduced to be implemented. 2- VP search is not affected by noise. 3- Quantization is less critical in VP since both s and a belong to known sets. 4.- A failure of the search causes bit errors in MIMO detection, whereas it only means a larger unscaled power and a more noisy reception in VP, which may affect BER slightly.
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • Fixed Sphere Encoder Sphere encoder (SE):
        • Reduces the complexity in comparison to an exhaustive search.
        • The search is constrained to the perturbation vectors a belonging to a hypersphere of radius R around the signal s .
        • The triangular vector , obtained through the Cholesky or QR decomposition of the precoding matrix is used to enable a recursive search through a tree.
    • Fixed Sphere Encoder Sphere encoder search tree Sequential algorithm  Suboptimal resource usage. Variable complexity  Variable throughput.
    • Fixed Sphere Encoder
      • Originally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum fixed complexity tree search.
        • Tree configuration vector
      [Barbero06] L. Barbero, Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems, PhD dissertation, University of Edinburgh, 2006.
    • Fixed Sphere Encoder
      • In order to fix the tree, the lattice must be reduced. The following candidate points (25 per level) have been considered:
      Real Imaginary %
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • Channel matrix pre-processing Ordering of the channel matrix
      • Since most of the branches of the SE are going to be removed to design the FSE, the following considerations must be taken:
      • - The mean number of visited nodes per tree level is inversely proportional to .
          • - This effect is more relevant at the top levels of the tree, since T i decreases with i.
      • The following ordering strategies have been considered best:
      • - V-BLAST like iterative algorithm from MIMO detection literature based on the minimization of the norm of the pseudoinverse of the precoding matrix.
      • - Simple non-iterative ordering of the columns of the precoding matrix according to their norm.
    • Channel matrix pre-processing Ordering of the channel matrix Averaged values of u ii for different levels depending on the ordering: Averaged numbers of evaluated nodes at each level:
    • Channel matrix pre-processing Effect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • Simulation results
      • Multiuser setups considered:
        • 4x4
        • 6x6
        • 8x8
      • Tree configurations:
        • n 4x4 = [1, 1, 2, 5]
        • n 6x6 = [1, 1, 1, 2, 3, 4]
        • n 8x8 = [1, 1, 1, 1, 2, 2, 3, 4]
      • Rayleigh channel, constant per each block.
      • 16-QAM modulation
    • Simulation results Number of visited nodes:
    • Simulation results
      • BER comparison:
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • FPGA implementation and optimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - Restricted group (5x5=25 points) of integers instead of the lattice. - Channel ordering, which is carried out every transmission block, has not been considered. - Distance computation: Implemented VP FSE algorithm PED AED
    • FPGA implementation and optimization Algorithm implementation Implemented using Xilinx System Generator for DSP
    • FPGA implementation and optimization Special features
      • PED 6
        • -The n 6 =6 closest points to each symbol s 6 are known beforehand
        • - Due to symmetries, the set of 6 points can be computed by mapping each symbol to its equivalent in the first quadrant and varying the sign of the set for the equivalent point accordingly.
      • PED 5
        • - 2-point slicer needed to compute
        • the closest 2 points.
        • - First closest point:
        • - Second closest point:
        • or
    • FPGA implementation and optimization 274 multipliers required  Prohibitive for low-cost FPGA implementation. A series of hardware optimizations have been proposed to reduce the number of required embedded multipliers. Optimization 1: Rearrangement of complex multiplications - Initial system  4 multipliers and 2 adders - Alternative complex multiplication  3 multipliers and 5 adders - Required number of multipliers after OPT. 1  224 Optimization 2: Hard quantization If the values of u ij /u ii are quantized to a very small number of bits , and the multiplications required to compute z i are implemented using programmable logic, the number of multipliers reduces to 74 , although the number of required slices is slightly incremented. Small degradation is introduced.
    • FPGA implementation and optimization Optimization 3: Approximated Euclidean distance Replace the -norm calculation performed to obtain the PEDs by a simpler method. 1.- The Manhattan distance metric ( ) 2.- The metric Both of these techniques introduce a small BER performance degradation. However, after the implementation of OPT3 the number of multipliers has been reduced to 30 .
    • FPGA implementation and optimization Optimization 4: Simplified 2-point slicer
      • So far, the decision of whether or
      • was the second closest point required the computation of both distances.
      • A new technique which does not require of any extra distance calculation has been derived.
        • Interior
        • If 
        • Edge
        • If 
        • Vertex
        • If 
      • No performance loss after OPT4.
      • The total number of multipliers is 22
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • FPGA implementation and optimization Summary of results The performance loss derived from the implementation of the optimization strategies is just 0.2 dB at a BER of 10 -4 . As for the HW resources, a reduced-complexity implement-ation has been achieved.
    • Outline
      • Vector precoding
      • Fixed sphere encoder
      • Channel matrix pre-processing
      • Simulation results
      • FPGA implementation and optimization
      • Implementation results
      • Summary and conclusions
    • Summary and conclusions
      • -The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems with:
        • - Reduced complexity. Only a small subset of tree branches is analyzed in comparison to the SE.
        • - Fixed architecture and throughput. The number of nodes and branches to be computed is fixed and can be paralellized.
      • - A 6x6 FPGA implementation has been presented which achieves a good performance and throughput.
      • - Implementation and optimization issues have been presented which show that the sphere encoder is less sensitive to quantization and suboptimal designed choices than its MIMO detection counterpart.
    • End Thank you for your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]