HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS



Stretch Goals
Bridging the DSP/FPGA Gap

BUUUUUUZZZZZZ. That’s your alarm clock. Time to get up at 0-dark-hundred and head to the airport for that early bird flight to San Jose (or “insert your city here”). You’re so familiar with this trip that the flight attendants greet you by name. Your status with the airline is something north of Super Platinum Plus. While it’s nice to feel, er, “special,” wouldn’t it be cool if there was another way to get to where you’re going? Imagine that you could get in your car at 6:05 a.m. and drive to your destination city as fast as the plane could get you there. No security scan required (you long ago stopped wearing your favorite belt because it sets off the sensor), no long-term parking and shuttle bus from hell, no insufferably long line at the coffee counter, and we won’t even touch on the random draw of people who regale you with stories of their travels once you’re in your assigned seat. All of that would be gone, replaced by the comfort of your car, your favorite CD, your custom lumbar settings. Best of all, you wouldn’t need any special training to learn how to make your car drive like a plane. No pilot’s license, no flight training. You would be in control, surrounded by what feels familiar, but you would be armed with completely new capabilities – in this case speed (and perhaps a really good radar detector…).

At the center of our frequent flyer fantasy is a simple notion. Take something totally familiar (your car) and enable it with unheard-of capabilities. This idea forms the foundation of the innovative technology at Stretch, Inc., a company that does for the semiconductor world what that souped-up car did for your commute.

In embedded system design there is a constant struggle between having a flexible programmable solution that can vary with changing market dynamics, and delivering the cost/performance results required to create a differentiated product. ASICs and ASSPs really aren’t an option here. The market moves too quickly. General-purpose processors and DSPs offer flexibility, but often fall short on performance in compute-intensive applications. You could consider combining a processor with an FPGA, with the FPGA doing the tough stuff, but that increases your design complexity and adds cost.

Stretch has created a software-configurable off-the-shelf processor that provides designers with a significant performance boost, helping bridge the technology gap between DSPs and FPGAs, while retaining important benefits of both. The Stretch processor has the ease of development and the flexibility of DSPs, but by embedding programmable logic entirely inside the processor architecture, it offers the performance of an FPGA. Essentially, it gives software designers the ability to accelerate their software algorithm at the C level, rather than having to offload this piece to the hardware designer.

C, C++, Señor

Stretch designed their processor with a RISC architecture in order to leverage existing optimization technology and to keep the programming environment familiar to designers. By allowing software designers to program, debug, and optimize performance in a software development environment, Stretch is able to provide the power of programmable logic acceleration to the much larger embedded developer market in a familiar, trusted format.

Stretch has been able to deliver superior performance by making one fundamental change to the current process – designers can now reduce an entire C language function into a single instruction and automatically generate a hardware accelerator in programmable logic to map to that instruction. Stretch calls it “crushing” software hot spots. Everything else about the software remains the same.

While this doesn’t sound like a big deal, the result is significant. On a conventional processor, like a DSP, performance optimizing a hot spot would likely be done in assembly code, directly representing the sequence of processor operations one by one. On a Stretch processor, an entire hot spot – expressed in C/C++ – is reduced (or crushed) to a single instruction that the compiler can schedule along with the other instructions. According to Stretch, the performance gain can be significant, with what would have been hundreds of instructions on other processors reduced to just one on the Stretch processor.

Avoiding the need for the programmer to step into the hardware domain is key. In hardware, an HDL-savvy engineer would create state machines and pipelining to exact specifications, and then design everything around that specific microarchitecture. Once completed, they have to hope they’re happy with that decision, because changing it is not an easy thing. The “everything” that was designed around it – including the controller, the interface protocol, etc. – would have to change as well. It would be a near-total rewrite of the HDL.

With processors, the situation is very different. Adding logic and increasing instruction latency by one or two is no big deal. The compiler understands when you have a longer latency for an accelerated instruction, and will automatically optimize the rest of the program to your new number. Optimizing a program in software usually requires many iterative revisions to the accelerated instructions to arrive at the best combinations, and because the Stretch compiler is mature enough to accommodate that iteration smoothly, the designer is relieved of the hassle of hardware-related rewrites.

“We offer the fastest way to get the most challenging computation problems done,” said Gary Banta, Co-founder and CEO of Stretch. “The key to this is the reduction of large functions into single instructions. Our technology is a very powerful model of development, using C, and it produces performance results beyond any processor.”

 Shaken, Not Stirred

Mixing programmable logic with processors is not for the faint of heart. This challenge is actually what brought the co-founders of the company, Gary Banta and Albert Wang, together. Banta, a veteran in the processor industry, was working as an entrepreneur in residence at a Silicon Valley VC firm, exploring chip performance challenges facing the semiconductor industry. He had come to the conclusion years before that software would be the ultimate design methodology, which meant that processors had to be the platform for design rather than an HDL or hardware-based platform. It was this thinking that led him to Wang’s doorstep, at Tensilica.

Banta wanted to find a way to leverage an existing solution for his new venture rather than starting from scratch. He found the technology he was looking for at Tensilica. His original plan was to have Tensilica supply his future company with IP. During discussions with the company, he met Wang, chief engineer at Tensilica. Turns out they had both been thinking about the same kinds of problems and how to solve them. They believed that, together, they had an opportunity to develop a product with the ease of use of a processor programmable solution, but with much higher performance.

Which brings us back to the challenge of mixing programmable logic with processors. Each of the company’s processor chips is based on the Stretch S5 engine, which incorporates the Tensilica Xtensa V RISC processor core and Stretch Instruction Set Extension Fabric (ISEF). The ISEF is a software-configurable datapath based on proprietary programmable logic.

According to Stretch, the FPGA-like ISEF logic is designed specifically for implementing variable-sized ALUs, multipliers, and shifters – all datapath extensions to the processor. The ISEF computes complex functions in parallel, but it can be tailored by system designers to meet their needs. Using the ISEF, system designers extend the processor instruction set and define the new instructions using only their C/C++ code. As a result, developers get the performance of logic with the ease of C/C++ development.

One major difference between the ISEF and FPGAs is density. The ISEF programmable datapath is much denser than an FPGA. In the ISEF, only the S5000 is configurable. The processor around it is a custom ASIC. Additionally, since the ISEF is loadable and unloadable during operations, a small ISEF area can support a huge number of instructions, increasing its efficiency over an FPGA.

Extending to New Markets

By keeping pricing low (under $100 for the most robust of the solutions), Stretch is hoping to continue to expand its reach beyond compute-intensive applications. They point out that applying their technology would help just about any embedded system designer to increase their productivity.

“If you were to look at someone doing a good job of coding on a DSP or a RISC processor, they would focus their attention in the same places that they would with a Stretch processor,” said Albert Wang, Co-founder and CTO at Stretch. “In focusing their attention, they would often build reusable functions that they can call on. From the standpoint of looking at that function, they would be seeking to minimize the number of cycles in the function to speed it up as much as possible. That process involves their understanding the implications as it goes through the compiler. In that sense, we have the designers looking in the same place, but we give them the opportunity to take that entire function, implement it in hardware, and then begin to go through the process to see how effectively they used the hardware and the input bandwidth and the output bandwidth to produce that function most effectively. And they can iterate on that. Very quickly, they can see the achievement of vastly more work being done in each instruction.”

Stretch processors come in three flavors, all powered by the same engine. The S5000 processor is the base model; the S5620 processor targets PowerPC users; and the S5610 processor is for systems using 64-bit MIPS-based processors. Stretch provides an Integrated Development Environment (IDE) for its processors, as well as two development platforms.

There’s something to be said for the flexibility that a designer can experience when he or she is wed not to the permanence of silicon, but to the impermanence of programmable logic. It allows an unprecedented level of freedom to explore and create a more powerful solution. You may almost feel as though you can fly…

Click here for printable PDF
(By clicking on this link you agree to FPGA Journal's Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

Amy Malagamba, FPGA and Programmable Logic Journal

August 2, 2005

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement