| |
These days, FPGAs are the rarely-disputed champions of single-chip DSP processing power. The sheer volume of data you can crunch with a chip that can (conceptually, at least) crank out hundreds of multiplications per cycle at clock frequencies approaching half-a-gigahertz is staggering. No competing technology matches the raw GMACs numbers that FPGAs can claim. Often, a claim is as far as it goes, however, with significant design barriers barring the path between your elegantly-parallelizable algorithm and the neatly-arranged rows of multipliers and accumulators waiting on your FPGA. Sometimes, the barrier is in the I/O ring of the chip, as it can be difficult to feed all those hungry multipliers with enough data to keep them busy. Now, with their new 65nm Virtex-5 SXT family, Xilinx has added high-speed serial transceivers to the DSP-enabled FPGA mix. Along with beefed-up DSP blocks, additional RAM, and other goodies, this FPGA family hits a whole new level in potential DSP performance. Beginning with Virtex-4, Xilinx began offering a variety of flavors in their FPGA families – some with more connectivity, some with more embedded computing resources, some with more DSP resources. With “last season’s” 90nm Virtex-4, Xilinx had 3 flavors – “LX,” which had the most basic logic fabric, “SX,” which had the most DSP resources, and “FX,” which had high-speed transceivers and embedded PowerPC processors. Now, with Virtex-5, they’ve expanded to four flavors, and transceivers are being offered with a much larger number of them – a testament to the proliferation of serial connectivity in today’s system designs. With Virtex-4, Xilinx also learned the hard lesson of rolling out all the flavors of a new family at the same time. They had difficulty bringing all the versions of Virtex-4 to volume availability in a timely manner, frustrating some customers in the process. With Virtex-5, the company is taking a new approach – announcing each family in sequence closer to volume availability. This week, it was time for Virtex-5 SXT – the updated DSP version of the device with the newly added “T” for transceivers. Xilinx claims that SXT can deliver “352 GMACs at 550MHz” – a theoretical number you will definitely never see in any real-world design scenario. FPGA companies tend to estimate GMACs by multiplying the maximum number of multipliers (640 in this case) by the data sheet maximum clock frequency (550MHz for SXT) and 640X.55 = 352 GMACs. Of course, in the real world, you’re not going to have a design that uses exactly all 640 multipliers and runs at exactly the maximum 550MHz with no timing problems elsewhere in the design, or has a clean enough I/O and datapath scheme to keep those 640 multipliers fed and busy 100% of the time. That caveat aside – you still will probably be able to get more real-world DSP performance than you can with any other silicon platform right now. The limiting factor will likely be your own design tools, time, and ability, rather than a lack of resources or performance available on the device. Efficiently mapping your DSP algorithm to FPGA hardware is much more complicated and involved than whipping up some software to run on a conventional DSP processor – but if you read FPGA Journal often, you already knew that. Virtex-5 SXT is based on Xilinx’s new 65nm Virtex-5 fabric, including the new 6-input LUT and the new, more efficient and predictable routing architecture. The SXT family has a wealth of DSP blocks, this time featuring asymmetric 18X25 multipliers rather than the previous-generation 18X18 ones. This means that wider operations can now be implemented in a single stage rather than ganging two together – a potential power, performance and area win. It is important to remember that these same blocks are available in other Virtex-5 families as well – the SXT version just has a higher ratio of these DSP blocks in comparison with other resources. Xilinx achieved a significant dynamic power savings with these new blocks as well, claiming a 40% reduction over Virtex-4’s DSP blocks. Xilinx claims 1.4mW/100MHz for DSP operations – significantly less power per calculation than one would get with a traditional DSP algorithm on a processor-based architecture. Xilinx also claims a 35% improvement in logic speeds with the new technology, boosting the theoretical performance cap. The DSP blocks also include cascaded routing resources designed so that many common DSP designs such as filters, can be implemented with minimal regular FPGA fabric – relying on cascaded DSP blocks for almost all of the logic. Often, DSP performance is limited by the speed with which data can be moved to and from memory. Xilinx has included up to 10.3MB of memory on the largest SXT device, and bandwidth for moving data into and out of on-chip memory is one of the least appreciated performance features of FPGAs. FPGAs automatically provide an enormous parallel pipe for shifting data around to on-chip memory and incredible flexibility for partitioning memory resources for most efficient use with your algorithm. Xilinx has also allowed for high-performance memory interfaces at the I/O ring when external memory capacity is needed. SXT boasts support for a number of popular DSP applications standards for I/O including CPRI, OBSAI, SRIO, PCIe and additional video and telecom standards. It includes hard-wired blocks for PCI express and 10/100/1000 Ethernet. This I/O support is implemented with the same transceivers previously announced with the Virtex-5 LXT line. Unlike Virtex-4, these transceivers focus on lower power rather than higher-performance standards like 10Gbps. The SXT family has three members planned – SX35T, SX50T, and SX95T. Of those, the middle-sized SX50T is shipping now. Xilinx also says that the full family is supported in the current ISE 9.1i release of their software development tool suite. Xilinx has worked to provide a variety of design flows for DSP design with SXT – allowing designers to pursue their personal preferences. A traditional Verilog/VHDL – HDL-based design flow is (of course) supported, along with a C++/Matlab language-based flow and several different Simulink-based flows. With Xilinx’s System Generator, a mixed MATLAB/Simulink/C++ flow is also available. Xilinx has a large library of DSP-related IP available as well – so you can often be off and running without necessarily needing to jump into any HDL coding or new ESL methodologies right away. While DSP design for FPGAs is still far from being a solved problem from a design tool perspective, more robust IP libraries and improved tools make the situation better every year.
February 6, 2007
|
All
material on this site copyright © 2003-2007 techfocus media, inc.
All rights reserved. FPGA and Structured ASIC Journal Privacy Statement |