HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS



Accelerating VoIP applications using Virtex FPGAs

A. Tavoularis, M.G. Manousos, D. Economou, G. Lykakis
National Technical University of Athens

The emergence of VoIP gave a thrust to the development of a new stack of protocols supporting telephony through IP networks that seems very appealing to the end user due to its reduced cost. Voice samples are encoded/decoded according to various standards such as G.711, G.726, and G.729 and are encapsulated to RTP/UDP/IP streams that can be forwarded to an Ethernet or an xDSL interface.

While G.711 is a compression scheme of low complexity, G729 requires significant processing power, so the usage of external DSPs must be considered for real time applications. Thus, extra delays are inserted to software execution, both from the driver that implements the low level protocol for communication exchange with external DSP and from the fact that DSP host buses usually operate at lower speeds than CPU buses.

In addition, the full path from a voice CODEC to an access interface requires enhanced processing capabilities due to the complexity of the protocol stacks that must be implemented. In the transmit direction the formatted voice frames have to be uploaded to the CPU, encapsulated to RTP/UDP/IP packets and then be forwarded to an Ethernet or xDSL link. In the case of xDSL, IP packets are encapsulated to PPP and formatted to AAL5 streams, a task that adds overhead to the CPU and results to reduction of the voice channels supported by the system. Implementing whole system functionality in software may seem the most obvious solution but results to degraded system performance and thus more sophisticated system architecture is required.

MRG100 Architecture: Network Processors are an emerging class of chips that are designed for wire speed processing of packets, integrating a variety of technologies such as IP or ATM. They consist of a CPU and peripheral processing elements that implement low level protocol functionality in hardware, thus alleviating the CPU from time consuming tasks such as bit manipulation, protocol encapsulation and stripping, and syndrome generation/validation. Although this class of chips was originally oriented for the core network, the usage of various technologies for home networking leads to the usage of Network processors into access devices such as IADs and Residential Gateways. Towards this direction, we developed the REGATE chip, which is a Network Processor incorporating various peripheral machines implementing protocol functionality in hardware, such as DES/TDES, ATM, AAL0/5, HDLC etc. In this text, we present a subset of REGATE that was port into a Virtex FPGA. Although REGATE will have its own processor on chip, experiments took place with the use of an external CPU showing that this particular approach is flexible and easily adoptable.

The whole system as can be seen in figure 1 is based on the SA1110 that handles higher-level protocols whereas lower level protocols are implemented in hardware. Crucial role in the system plays the programmable DMA controller. Bus resources are shared between the CPU and the DMA controller depending on the traffic needs at any given instance. Furthermore, DMA cycles can be shared between the protocol machines, based on a programmable weighted round robin mechanism. Upon the reception/transmission of a packet the DMA controller requests ownership of the bus and transfers data between system SDRAM and the respective machine.

Figure 1 : System Architecture

VoIP streams include a number of processing stages very consuming in terms of processing power. Among others, the FPGA integrates a DSP interface as well as a low level protocol that handles the exchange of voice samples with the external DSP for both receive and transmit directions. At regular time intervals an interrupt is issued to the FPGA and exchange of information, concerning the number of channels having voice packets and packet length for each channel, takes place. Data are read/written from/to the DSP, stored in the FPGA Bulk RAM and subsequently the DMA controller requests ownership of the bus in order to transfer data from/to system SDRAM. In this way, the CPU is only informed for the availability of data in memory and its operation is not delayed by the low speed operation of the DSP host interface.

Evaluation :In the transmit path, encoded voice samples acquired from the DSP are encapsulated by the CPU into RTP/UDP/IP packets and to the AAL5 machine and finally to the UTOPIA interface. In the receive direction AAL5 packets are stripped and uploaded to system SDRAM. The CPU then removes RTP/UDP/IP overhead and voice packets are passed to the DSP for decoding and playback. During the experiments that were carried out, the performance of the system was examined, opposed to the performance of a legacy processor implementing both the low-level DSP protocol functionality and the AAL5 processing in software. In both systems CPU operating frequency was set to 100MHz. The regular interval between DSP interrupts was set to 5ms, which was found to be a compromise between end-to-end delay and FPGA buffer length.

Measurements concerned a variety of DSP packet lengths from 10 bytes up to 100 bytes, covering the full range from G.729 to G.711 codecs and the most common frame packetization intervals (e.g. 10ms, 20ms, 30ms), which results to AAL5 packets with lengths from 50 up to 140 bytes respectively. Experimental results have shown that implementing AAL5 in hardware may be up to 80 to 100 times faster than software implementations and CPU engagement is decreased at about 200 times. Chart 1 and chart 2 show execution time, in clock cycles, for AAL5 packetization and stripping respectively. The fields represented are the software implementation of the protocol and the hardware implementation as well as the CPU bus utilization. Finally, Chart 3 represents the execution time, in clock cycles, of the low level protocol designed for exchange of information with the DSP implemented both in hardware and in software. This protocol is symmetric, executing in the same time for both directions and thus only one chart is included in the text.

Chart 1 : AAL5 packetization



Chart 2 : AAL5 protocol stripping and validation



Chart 3 : DSP Protocol execution time vs Packet length

Conclusion: The presented system architecture enhances system performance, accelerates protocol processing and alleviates CPU resources thus allowing the support of additional functions or VoIP channels. Its benefits arise from the fact that CPU engagement during data transfers is minimized and protocol execution takes place in hardware. This approach can be used in the development of devices such as media gateways, VoIP phones etc.

Acknowledgement

The described work has been performed within the IST-2000-28429-REGATE R&D project co-funded by the European Commission.

Antonis Tavoularis holds a Diploma in Electrical and Computer Engineering from the University of Patras. He is currently a research associate of Telecommunications Laboratory of NTUA where he works towards a PhD in the field of Broadband Access Systems.

Michael G. Manousos is currently a research associate of Telecommunications Laboratory of NTUA and his main research activities include embedded systems and quality control of VoIP applications.

Demetrios Economou holds a Diploma in Electrical and Computer Engineering from NTUA. He is currently a technical manager of the software development group at InAccess Networks S.A.

George Lykakis holds a Diploma in Electrical and Computer Engineering from NTUA. He is a senior engineer at Inaccess Networks S.A. He’s been working for 8 years developing hardware and FPGAs for telecom access systems. His interests include embedded systems and digital VLSI design.

March 2, 2004

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement