| |
Evaluating Performance – FPGAs vs. DSPs The latest digital signal processing-enhanced FPGAs boast huge gate counts, ample amounts of hard-wired SRAM, and an abundance of hard-wired multipliers. These attributes hint at the potential for phenomenal performance in digital signal processing applications. But experienced engineers know that the difference between potential performance and actual performance can be huge. Simplified Metrics Don’t Cut It Experienced engineers also know to view the performance claims of chip vendors with skepticism. This cynicism definitely holds true when evaluating the digital signal processing performance of FPGAs. For example, FPGA vendors sometimes quote the digital signal processing performance of their chips in terms of MACs (multiply-accumulates) per second. On the surface, this approach seems reasonable; after all, many digital signal processing algorithms make heavy use of MACs, and DSP vendors are also fond of quoting processor performance in terms of MACs per second. But, like other oversimplified performance metrics, such as MIPS, the MACs-per-second measurement suffers from several serious flaws, not the least of which is that no universal definition exists for exactly what a MAC operation comprises. When FPGA vendors quote MACs-per-second numbers for their devices, they often base their figures on distributed-arithmetic implementations of FIR (nonrecursive) filters. Distributed arithmetic is a natural way to implement a FIR filter on an FPGA. The problem is that in a distributed-arithmetic FIR, the MAC operators rely on the fact that the coefficients of the filter are constants. If the MAC operations in your algorithm don’t use constant coefficients (for example, if you’re building an adaptive filter or a correlator), the FPGA will deliver lower MACs-per-second performance—perhaps much lower. How, then, should you go about determining the true digital signal processing application performance of these new FPGAs? This question is the one that BDTI (Berkeley Design Technology) recently began considering. For years, the company has been benchmarking the signal-processing performance of DSPs and general-purpose processors using the BDTI Benchmarks™—a suite of digital signal processing algorithm kernels (for example, an FFT and a Viterbi decoder)—coupled with a methodology designed to ensure fair comparisons. The company has evaluated more than 50 processors with the BDTI Benchmarks™, so using the same benchmarks for FPGAs seemed to be an attractive approach that would allow the company to quickly compare FPGAs with those and other processors. Unfortunately, BDTI found that this approach was the wrong way to go. Holistic Optimization Although a handful of algorithms dominates the computation requirements of a typical digital signal processing application, individual algorithm kernels are unsuitable as benchmarks for high-capacity FPGAs, for several reasons. With a processor, digital signal process processing application developers aggressively optimize each of the key performance-hungry algorithms for speed. When an algorithm is running, it has exclusive use of all of the processor’s computation resources. But with an FPGA, designers have the flexibility to trade off parallelism (and hence performance) against resources used (logic blocks and multipliers, for example). Unlike with a processor, when using an FPGA it makes no sense to use all of the available resources for a single algorithm, because doing so would leave no resources for the rest of the application. Instead, the designer must optimize the application as a whole, allocating the available hardware among each of the constituent algorithms (Table A).
This observation led the company to the conclusion that, to evaluate the digital signal processing performance of FPGAs, it needed a benchmark that looked more like a complete application and less like a single algorithm kernel. The next question facing BDTI was what kind of application to use as the basis of the benchmark. Informal market research indicated that the new digital signal processing-enhanced FPGAs were of strong interest to developers of communications systems, leading BDTI to develop a benchmark based on a communications receiver. Cost vs. Cost/Performance Although they are powerful, the latest digital signal processing-enhanced FGPAs are also expensive. Even relatively inexpensive members of Xilinx’s Virtex-II and Altera’s Stratix families cost hundreds of dollars, and the most expensive family members cost thousands of dollars per chip. Such prices render these chips unsuitable for highly cost-constrained products, such as cable-TV set-top boxes or DSL modems. But in communications infrastructure equipment, a several-hundred-dollar price tag does not automatically disqualify a chip—especially if that chip can handle the processing for many communications channels. Therefore, for the initial benchmarking of digital signal processing-enhanced FPGAs, BDTI evaluated how many channels of a communications receiver each of the benchmarked chips could support. The benchmark comprises a simplified OFDM (orthogonal frequency division multiplexing) receiver (Figure A). OFDM is a complex technique that is finding increasing use in a variety of high-speed data communications applications, such as fixed-location wireless systems. A detailed benchmark specification spells out all of the key parameters of the benchmark receiver, such as sample rates, filter lengths, and channel-code constraint lengths. Engineers implement as many channels of the receiver as they can cram into a single chip.
The Benchmark BDTI invited Altera and Xilinx to implement the BDTI Communications Benchmark™ (OFDM) on their digital signal processing-enhanced FPGAs. Xilinx initially agreed but later withdrew from the project. If you’d like to see Xilinx results for the benchmark, e-mail BDTI at xilinx-benchmark@BDTI.com; BDTI will send Xilinx a summary of your requests. BDTI also invited Motorola and Texas Instruments to implement the benchmarks on their high-end DSPs, which target communications infrastructure equipment. Altera and Motorola took our challenge, and each delivered a highly optimized implementation of the benchmark. BDTI has not had an opportunity to implement the benchmark on Stratix hardware, so it obtained preliminary results for the Altera chips from simulations. In contrast, BDTI obtained hardware-verified results for the Motorola MSC8101 DSP, which is based on the StarCore SC140 core. BDTI scrutinized both Altera’s and Motorola’s benchmark implementations for correct operation and conformance to specifications. Surprising Results It’s clear that the new digital signal processing-enhanced FPGAs, with dozens of hard-wired multipliers and RAMs and tens of thousands of configurable-logic blocks, are capable of vast parallelism in applications that are amenable to parallelization. As with a processor, though, an FPGA’s throughput in an application depends not only on how much work it can perform in one cycle but also on the clock speed it can attain. For processors, the maximum clock speed is easy to ascertain. But for a given FPGA, the clock speed attainable depends in part on what you’re doing with it. Altera used simulations to estimate the clock speed attainable on its Stratix FPGAs with its implementation of BDTI’s Communications Benchmark™ (OFDM). Although BDTI expected the digital signal processing-enhanced FPGAs to perform well, the company was surprised by just how well they did. For example, the Altera Stratix 1S20-6 chip is expected to support more about ten channels of BDTI’s Communications Benchmark. In contrast, the 300 MHz Motorola MSC8101 can support less than one channel (Table B).
The 1S20-6 sells for $325 (for 1,000 unit purchases, as of June 2003). The MSC8101 sells for $116 in similar quantities, which is fairly typical of high-end DSPs. As a result, on a cost/performance basis, the advantage of the 1S20-6 over the MSC8101 is somewhat smaller when compared with its throughput advantage, but FPGAs retain a significant lead. Full details on the results of BDTI’s FPGA-versus-DSP benchmarking work appear in BDTI’s report, FPGAs for DSP. Other Factors BDTI’s initial benchmarking work suggests that the new digital signal processing-enhanced FPGAs can indeed achieve impressive performance in certain types of DSP applications. But experience with these new devices and discussions with users indicate that factors other than performance often greatly influence decisions regarding FPGA use. For example, one key challenge facing digital signal processing developers using FPGAs is the relative complexity of the design process and the lack of digital signal processing-specific features in the development tools, compared with the tools available for the best supported DSPs. Clearly, as with most technology-selection choices, deciding whether to use an FPGA for a digital signal processing application requires a sophisticated, multidimensional evaluation—one that depends on many specifics of the target application. In researching BDTI’s recently completed report, the company developed a framework to guide developers in this challenging process. BDTI plans to continue its benchmarking and analysis, tracking the new generations of digital signal processing-enhanced FPGAs, processors, and other emerging alternatives. Jeff Bier is general manager and co-founder of BDTI (Berkeley, CA), a well-known digital signal processing technology analysis and consulting company. BDTI’s Web site (www.BDTI.com) contains a wealth of free information of interest to developers of digital signal processing systems. The company recently expanded its focus beyond traditional DSPs and microprocessors to include FPGAs. October 7, 2003 Comments on this article? Send them to comments@fpgajournal.com |
All
material on this site copyright © 2006 techfocus media, inc.
All rights reserved.
FPGA and Structured ASIC Journal Privacy Statement |