| |
Introduction Emerging broadband wireless protocols, based on WiMax and its derivatives, demand an increasing amount of throughput and data-rate. Fast chip-rate and digital RF processing by these protocols are optimally implemented in hardware using FPGA solutions. FPGAs are ideal as a high-performance, cost-effective solution to implementing the digital functionality of these physical layer protocols because they include the following resources:
This article focuses on the first resource, DSP multiplier blocks. Through reduction and optimal implementation of DSP multiplier blocks in FFT and FIR implementation, the designer is able to minimize resources while meeting throughput requirements. This allows users to migrate to the most cost-effective FPGA devices available. The four reduction techniques are as follows:
DUC/DDC: Digital upconversion/downconversion Table 1- Four Techniques for Multiplier Reduction in WiMax System Design These four multiplier saving techniques are described in the following sections. Efficient Complex Multiplication for OFDM Functions in WiMax One key feature of WiMax system design is the support for orthogonal frequency division multiplexing, or OFDM. FPGAs make it extremely easy to implement OFDM transmitters and receivers in discrete time using IFFT (inverse FFT) and FFT (fast Fourier transform), respectively. Some protocols such as 802.16a (WiMAX for fixed mobile) require a specific size of FFT at 256 points. Other protocols require a range of FFT sizes (802.16e, mobile WiMax), or a scalable adjustment to FFT size in order to adjust to dynamic channel and bandwidth requirements (Scalable OFDMA). Complex MultiplicationThe most efficient use of multipliers when implementing 256 and 1024 point FFTs is through a Radix-4 structure. FFT algorithms are decomposed via the reuse of 4-point discrete Fourier transform (DFT) butterfly structures. For example, a 16-point FFT is implemented with 2 stages of radix-4 DFT structures (16= 4 2) using decimation-in-time, decimation-in-frequency or other related forms of decomposition. The first stage consists of four 4-point DFTs and the second stage also consists of four 4-point DFTs. As the output of each DFT requires applying three phase factors to the result before feeding to the next stage, a total of 9 phase factors between the first and second stages require nine complex multiplications. At first glance, there are four multipliers and two adder/subtractors required to perform a complex multiplication. However, that expression can be rewritten as another expression that requires only three multiplications, three adders and two subtractors. Note that adders are implemented in the FPGA core logic, which utilize the abundant general-purpose Programmable Logic Cell (PLC) slices in ripple mode. If D = Dr + jDi is the complex data and C = Cr + jCi is the complex coefficient, the standard expression for the complex multiplication is as follows : R = D • C = (Dr + jDi) • (Cr + jCi) = Rr + jRi
The above standard expression requires the use of four multipliers. This expression can be algebraically rearranged as follows: Rr = Dr*Cr - Di*Ci The new expression for the complex result is: Rr = ((Dr + Di)*(Cr - Ci)) + (Dr*Ci - Di*Cr) (three multiplications) As shown schematically in Figure 1, the optimal complex multiplication is implemented with 3 multipliers, 3 adders and 2 subtractors. Note that add/subtract blocks utilize less relative die area than 18x18 multiplier blocks in an FPGA.
In summary, a 25% reduction in multiplier usage allows the choice one of the two benefits: 1. Reduction of multipliers to achieve the same FFT throughput 2. Increase of FFT throughput with the same number of multipliers Efficient FIR Filter Implementation in Digital Upconverters/Downconverters The next three efficient multiplier techniques involve implementation of digital upconversion and downconversion in FPGAs. This has become an area of focus for optimization as wireless designers attempt to address the need to move data from very high frequency sample rates to chip processing rates. Digital down/up converter (DDC/DUC) sub-systems are some of the main digital components of the transmitter/receiver functionality within a base station, and historically have been implemented with expensive analog/mixed signal components. There are three techniques that can be used to reduce the number of multipliers in an FPGA implementation: 1. Coefficient symmetry of FIR filters save multipliers 2. Distributed arithmetic operations use block memory (EBR) 3. Cascaded-integrator comb filters use adders Up-Conversion / Down-Conversion Overview As described in the upper portion of figure 2, the DDC is composed of the following components: an I/Q splitter that is based on a numerical controlled oscillator (NCO) that modulates the input signal that comes from the RF section with sine and cosine waves, using two mixers, and a decimation section that can be configured from 3 levels of FIR decimation filters or FIR decimation filter followed by a cascaded integrator comb (CIC) filter.
The DUC in the lower portion of figure 2 is composed of the following components: 3 levels of FIR interpolation filters or a CIC filter followed by a FIR interpolation filter. An I/Q mixer that is based on NCO and two mixers that demodulate I and Q output signals before they are sent to the RF section. Remember, decimation involves the removal of samples to reach a lower sample rate, while interpolation involves adding extrapolated samples to increase the sample rate. General Implementation Guidelines for Converters The DDC/DUC system is a multiplier-intensive system. Decimation and interpolation filters are typically implemented by an array of multipliers and adders and the mixer function is a multiplier. An area-efficient method to implement NCO is based on phase shifting using complex multipliers. The first step in overcoming the multiplier-intensive system challenge is to split and cascade the filters:
Once the filter steps are defined, there are techniques to reduce the number of multipliers in the actual filters. This is discussed in the next section. Three Specific Multiplier Saving Techniques for Converters 1. Symmetry decimation and interpolation filters The symmetry of coefficients of the DDC decimation filters and DUC interpolation filters can be used to achieve up to 50% multiplier reduction. In case of symmetry the n taps FIR filter coefficients h(0), h(1), …,h(n) satisfy the condition of h (k) = h(n-k) {0 =< k =< n}. Since h(k) = h(n-k) one multiplication of h(k) with the sum of the two, correspondent samples can be done and therefore the number of multipliers required can be reduced by a factor of up to 2 (for an even number of coefficients). In FPGAs, the inexpensive ripple-mode logic is available to implement the addition of the two data samples that will use the same coefficient. 2. FIR filters implemented with EBR memory blocks via distributed arithmetic functions. Efficient use of FPGA resources is extremely important for multiplier intensive applications such as DDC or DUC. The use of memories and LUT fabric resources as multipliers can increase significantly the implementation efficiency. EBRs and the fabric’s distributed memories can be used as FIR filter multipliers using the technique of distributed memories, also known as the soft multipliers technique. Using this technique, the number of multipliers in FPGA devices typically can be increased by a factor of 2 to 5. Figure 3 demonstrates how EBR can be used to implement FIR filter using a distributed arithmetic technique. The samples are serially shifted into the EBR address bus. Inside the EBR there is a table of pre-computed result multiplications and summation of each input sample bit (address bit) with its appropriate coefficient. The accumulator accumulates the n (n is sample bit resolution) intermediate results and provides a complete FIR filter result after n clock cycles.
3. CIC filter uses adders instead of multipliers Replacement of some interpolation/decimation FIR filter chain portion with Cascaded Integrator-Comb (CIC) multipliers is another method to reduce the number of multipliers needed for implementation. CIC multipliers have no multipliers. They are based on adders and subtractors. Digital Up/Down converters usually require a large rate change on the order of a few hundred. High rate changing interpolation or decimation filters tend to be very expensive in hardware. CIC filters, also called Hogenauer filters, can serve as inexpensive high-factor decimation or interpolation filters [1]. They are used to achieve arbitrary and large rate changes in digital systems and can be implemented efficiently by using only adders and subtractors. Because FPGAs have fast carry chains for implementing adders, a CIC filter is very amenable for FPGA implementation. The structure and characteristics of integrator and comb are listed in Table 2
Implementing Converter and OFDM Functions Using Intellectual Property Cores It is fairly simple to implement DDC or DUC converters in Lattice FPGAs, due to the availability of the constituent components as IP cores. One application that uses CIC filters as interpolators in digital rate conversion is shown in figure 4, which shows the use of a CIC interpolator for up-conversion for digital radio applications.
The digital up-converter uses the following IP core configurations:
Advantages of LatticeECP2/M The LatticeECP2/M family of low-cost FPGAs has several high-performance features that are very relevant to WiMax system design. Very few if any of these features are found in other low-cost FPGA families; instead, they are only found in expensive high-end FPGA families:
These abundant and high performance resources are found in the low-cost LatticeECP2/M family, at price points that are far lower than other FPGA devices. The WiMax system designer is also able to use several design techniques to reduce the number of DSP multipliers required, thus providing opportunities to migrate to smaller, less expensive FPGA devices. September 20, 2007 Comments on this article? Send them to comments@fpgajournal.com |
All
material on this site copyright © 2006 techfocus media, inc.
All rights reserved.
FPGA and Structured ASIC Journal Privacy Statement |