

**Master Thesis** 

Alberto Sonnino

Institut für Technik der Informationsverarbeitung



**Directors** 

Prof. Dr.-Ing. Dr. h. c. J. Becker Prof. Dr.-Ing. E. Sax Prof. Dr. rer. nat. W. Stork **Supervising Tutors** M. Tech G. Shalina Dipl.-Ing. P. Figuli

Institut für Technik der Informationsverarbeitung (ITIV)

### Performance Driven Optimizations in FPGA Based QAM Systems

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

www.kit.edu

# **Introduction and Motivation**



- Challenges in the current trend
  - Pursuit of high SNR and high data rate
  - Contribution to reach future terabit's communication
  - FPGAs clocked below 1GHz: need for parallelism



# **Introduction and Motivation**



- My work: performance optimization of QAM transmitter
  - Exploiting parallelism
  - FPGA platform
  - Mixed-domains (time and frequency) approach
- Current state-of-the-art
  - 2012: 128.6 MHz achieved (University of Shanghai, China) [2]
    - Transmitter and receiver
    - On Xilinx Virtex IV
  - 2013: 625.0 MHz achieved (University of Paderborn, Germany) [3]
    - Only transmitter
    - On Xilinx Virtex VI
  - 2015: 750.0 MHz achieved (E2v Semiconductor, UK) [4]
    - Only transmitter and no filter
    - On Xilinx Virtex VI

# **Introduction and Motivation**



#### Hardware Choice

- FPGA because of great configurability, flexibility and cost
- Growing technology
- Modulation Choice
  - Quadrature Amplitude Modulation (QAM)
  - Allow carrying many bits per symbol
- Filter Choice
  - Avoid Inter Symbol Interferences (ISI)
  - Finite impulse response (FIR)
  - Squared Raised Root Cosine (SRRC)
  - No filter optimizations in this work [5]



# Outline

- Introduction and Motivation
- Fundamentals
  - Standard transmission chain
  - Fundamentals of each block
- Concepts & Methodology
  - Strategy
  - Ideal model
- Implementation
  - Implementation of each block
- Experimental Results
  - Achieved precision
  - Achieved performances
- Summary & Further Improvements



#### Standard Transmission Chain



#### Focus of this work

- QAM mapper
- Filter
- Modulator





#### QAM Mapper

- M-QAM formats (M={
- Clusterization in log<sub>2</sub>(
- Gray code for hammi
- Rectangular constella
- Large M implies high
- But symbol's misinter





- Modulator
  - Local oscillator delive
  - Multiplication and sut



Re-



#### Fourier Transform

- Signal's decomposition into an alternative representation
- Discrete Fourier Transform (DFT) sends in the Fourier domain
- Inverse Discrete Fourier Transform (IDFT) takes it back

$$X[k] = \sum_{0}^{N-1} x[n] e^{-2\pi i k n/N} \qquad k \in \mathbb{Z}$$

$$x[n] = \frac{1}{N} \sum_{0}^{N-1} X[k] e^{2\pi i k n/N} \qquad n \in \mathbb{Z}$$

Linear operations have equivalent in Fourier domain
Useful for this work: convolution becomes multiplication

$$\begin{array}{rcl} \mathcal{F}\{f \ast g\} &=& \mathcal{F}\{f\} \cdot \mathcal{F}\{g\} & \qquad f \ast g &=& \mathcal{F}^{-1}\{\mathcal{F}\{f\} \cdot \mathcal{F}\{g\}\} \\ &=& G \cdot F & \qquad =& \mathcal{F}^{-1}\{G \cdot F\} \end{array}$$



#### Filter

9

- Nyquist criteria avoids ISI
- Pulse Shaping Filter to limit the transmission band
- FIR filter: linear phase, inherent stability, no feedback
- Matched filter improves SNR (if only stochastic noises)
- Good compromise: SRRC filter



# **Concepts & Methodology**



#### Strategy

- Reference MATLAB model
- Identify which part to implement in frequency domain
- Prototype a single channel (non parallel) transmitter
- Optimize for Xilinx Virtex 7
- Generic model with parallelization and scalability







#### Implemented System



#### Data Packing

- Parallel inputs/outputs packed into the same bus
- Precision fixed to 16 bits
- Each *data*; is a 16-bit vector







#### Specifications

| Latency    | 17 cycles |                      |              |             |        |
|------------|-----------|----------------------|--------------|-------------|--------|
| Parameters | Ν         | # of parallel inputs |              |             | -      |
|            | FORMAT    | QAM format           | in           |             | tvalid |
| Inputs     | clk       | Clock                |              | Transmitter |        |
|            | reset     | Reset                | <u>clock</u> |             |        |
|            | in        | Cluseterd stream     | reset        | N           | out    |
| Outputs    | tvalid    | Validity flag        |              | FORMAT      |        |
|            | out       | Output data          |              |             |        |

#### Characteristics

- Input width: (FORMAT x N)
- Output width: 16N
- Uses 2N<sup>2</sup> complex multipliers, 4N<sup>2</sup>-2N adder and 4N multipliers



#### QAM Mapper



- Three parameters (N, W, FORMAT): number of inputs, bus width, QAM format
- 8-QAM, 16-QAM, 32-QAM, 64-QAM support
- Each format implemented in a separated Verilog file
- Generates only the circuit for the desired format



#### DFT & IDFT



- One parameter (N) : number of inputs
- No parallel DFT / IDFT Xilinx IP cores available yet
- Each one uses N<sup>2</sup> complex multipliers and 2N(N-1) adders
- Rescaling of 2<sup>17</sup> to fit the 16-bit bus



#### Filter



- One parameter (N) : number of inputs
- Frequency domain: simple multiplication with filter coefficients
- Uses 2N multipliers
- Rescaling of 2<sup>16</sup> to fit the 16-bit bus



#### Modulator



- One parameter (N) : number of inputs
- Uses 2N multipliers and N adders (configured in subtracter mode)
- Rescaling of 2<sup>16</sup> to fit the 16-bit bus



#### Fourier QAM Modulator (FQM) Utility



# **Experimental Results**



#### Test Conditions

- N = 16, 100 Hz carriers
- Different configurations for Adders and Multipliers cores
- All supported QAM formats

#### Design Precision

Less than 1% error respect to MATLAB !



### **Experimental Results**



#### Final Result

Adders using the fabric and Multipliers using DSP Slices



Effective speed of 16 x 62.5 = 1 GHz (instead of 750 MHz [3])

Throughput per modulation formats:

| · |          |   |                   |
|---|----------|---|-------------------|
|   | 8 - QAM  | : | 3*16*62.5 = 3Gb/s |
|   | 16 - QAM | : | 4*16*62.5=4Gb/s   |
|   | 32 - QAM | : | 5*16*62.5 = 5Gb/s |
|   | 64 - QAM | : | 6*16*62.5 = 6Gb/s |

# **Summary & Further Improvements**



#### Topic

- Performance optimization of QAM transmitter
- Exploiting parallelism using a mixed-domain approach
- Achieved during this term
  - Familiarization with Xilinx tools
  - Understanding of the underlying physical concepts
  - MATLAB simulation and prototyping a single-cannel transmitter
  - Build and optimize the parallel design
  - Scalable generic model
- Further improvements
  - Implement FFT instead of DFT (or wait for next Xilinx release)
  - Reduce the DSP utilization to allow N = 32
  - Support additional modulation formats

# Bibliography



- [1] The End of Moore's Law? Why It Matters
  - TIMnovate, Prof. S. Maital
  - https://timnovate.wordpress.com/2015/01/23/the-end-of-moores-law-why-it-matters/
- [2] FPGA Implementation of High-throughput Complex Adaptive Equalizer for QAM Receiver
  - Siqiang MA, Yong'en CHEN
  - Tongji University, Shanghai, China
  - **2012**
- [3] The Influence of Laser Phase noise on Carrier Phase Estimation of a Real-Time 16-QAM Transmission with FPGA Based Coherent Receiver
  - A. Al-Bermania, C. Wördehoffb, O. Jana, K. Puntsria, M. F. Panhwara, U. Rückert b, R. Noéa
  - University of Paderborn, Paderborn, Germany
  - 2013

# Bibliography



- [4] A high speed transmission system using QAM and direct conversion with high bandwidth converters
  - Marc Stackler, Andrew Gloascott-Johnes, NicolasChantier
  - E2v Semiconductors
  - 2015
- [5] Parametric Design Space Exploration for Optimizing QAM Based Highspeed Communication
  - S. Percy George Ford, P. Figuli and J. Becker
  - IEEE/CIC International Conference on Communications in China
  - **2015**



# Thank you for your attention !