HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS





Raising the Bar

Nallatech elevates FPGA-based system design

It’s fun sometimes to see how the pros do things. I find it inspiring (and a little bit humbling) to watch someone who’s good enough at what they do to throw the book away and use their intimate, almost intuitive understanding of the subject to accomplish great things with an elegance of simplicity that belies the difficulty of the task.

Nallatech, LTD. of Scotland is well known as one of the leading experts in Xilinx FPGAs. They provide a wide range of high-performance computing products on FPGA-based platforms as well as developing custom solutions for specific clients. In interviewing them about one of their recent projects, I immediately got the sensation of watching masters at work.

Their challenge was to design a complex multi-board image processing system with a mass storage interface. They had to complete the design in only 4 months combining digital signal processing (DSP), high-speed serial interfaces, and multiple system-on-chip designs with parallel processors and custom hardware and software components. What they brought to the party was a highly experienced team, previous designs from which they could borrow technology, and an established design process with the flexibility to incorporate innovation.

More specifically, Nallatech was redesigning a flexible image image-processing platform by upgrading from a PCI-based design to a compact PC104-based design and adding a high-speed storage system. They had considerable previous experience designing with Xilinx FPGAs, and set their sights on several members of the Virtex II family along with the MicroBlaze soft-core embedded processor. The project, originally a contract job, was based on some existing standard products. The optical interface, packaged in a separate module, was to consist of a Virtex 2 6000 part doing massively parallel DSP calculations, then feeding data from Cypress Hotlink transceivers to a compact PC104 card where high speed data storage was to be accomplished. The front-end processing was to be user-configurable for targeting a variety of end-user image-processing applications.

BenNUEY PC104+

Probably the most challenging new component of the system was the high-speed storage module. Each module plugged into a carrier PC104 slot with lvds serial connections needing to be mapped to SCSI for the disk drives.

“The storage module was a nice application for the MicroBlaze embedded processor,” said Derek Stark, Senior Software Engineer at Nallatech. “We wanted to re-use off-the-shelf products as much as possible. We wrote routines in C for the MicroBlaze for asynchronous transfer including the packet handler and interpretation. We took the approach of first writing the whole application in C, then identifying parts that needed acceleration. The routines that needed acceleration were re-coded in RTL and moved to hardware implementations on the same Virtex II device.”

OK, hold on a minute here. For those of us with an ASIC background, we just drove right off the road and halfway across the pasture. Sure, we’ve all been to trade shows and read about developing system-level models in C or some other high-level language, doing transaction-based simulation, then beginning our hardware/software partitioning based on various performance analyses. But in the real world, we usually just go back to our desks and bang out some RTL for the parts we know will be performance intensive, and then wait while the software guys finish the code that runs on the embedded processor. Nallatech just calmly and logically decided to use the Virtex II/MicroBlaze combo as it’s own emulator, then did performance analysis on their entire application written in C to decide what to accelerate.

This approach to hardware/software partitioning allowed the system to be developed and debugged in the final hardware environment (at reduced speed), thus eliminating a host of problems setting up a virtual prototyping environment. They got by with a tiny fraction of the design tool budget that would be required for a conventional hardware/software co-design system, and they probably had their application up and running before most teams would have decided how to evaluate the tools that they’d need.

Over in the data capture side of the project, folks were using a fairly conventional FPGA design process, writing VHDL code for the V2-6000 devices, simulating with Aldec’s ActiveHDL, and implementing using Xilinx’s ISE tools. The big Xilinx devices left ample room for later user configuration when the final application was known. The 18-bit multipliers in the Virtex-II chips were a natural fit for image processing. “If we’d gone with traditional DSP processors, we would have had an I/O bottleneck, and a number of them would have been required,” said Stark.

On the storage side, the quick reprogrammability of the MicroBlaze was an asset for aspects of the protocol that weren’t completely nailed down at design time. Changes and refinements could be worked into the C code and tested in the final system without even re-configuring the hardware. “The storage elements needed to take data streams in, process them, and put the data out to disk --,” said Stark. “we could accomplish that 100% with the Micro-Blaze, just not at speed. The MicroBlaze allowed us to update the program without reprogramming the device, drop in a test program in C, set breakpoints, and functionally debug by examining memory, variables and registers. This helped us debug the functionality very quickly. We also were able to do performance analysis and profiling that helped us decide what modules needed acceleration. There were several surprises in which pieces were the bottlenecks, and these may only have been uncovered as a result of the hardware/software partitioning approach we used.” Those modules were re-coded in RTL and implemented in hardware to get the design to run at speed.

“The software environment also allowed us to very quickly add features that were required, but not needed at hardware speeds,” Stark continued. Nallatech’s team consisted of one software developer working on the front-end interface, three hardware designers developing RTL code and simulating with ActiveHDL, one designer developing firmware and the MicroBlaze disk controllers, one engineer doing customer interface, and two PCB/mechanical designers working on the board and chassis, which had to be ruggedized.

The FPGA count was two Virtex-II 6000 devices for the image capture module, two Virtex-II 3000 devices for control, and four Virtex-II 1000 devices in the storage system.

BenNUEY PC104+ used as a platform for an on-board Avionics system.

Obviously, this is a highly complex and performance-intensive system for such a short development cycle. The thing that stands out the most is the simplicity, the novelty, and the success of the design process Nallatech employed. “We didn’t want to depend on simulation,” said Stark. “We simulated the RTL code with testbenches, then put the module in hardware coupled to the MicroBlaze and captured responses. We used Xilinx’s ChipScope ILA to capture and debug problems in hardware after that.”

Nallatech felt that, even with their experience, the storage system portion of the design was surprisingly quick. The design cycle consisted of two months to get the flow all the way through the system, then a month or two of fine tuning with a three to four month overall schedule. “Understanding the SCSI protocol itself helped immensely,” said Stark, “as well as the flexibility of the devices we used. Hardware we could base on legacy designs such as the optical module also reduced our development time.”

The back-end of the FPGA portion of the design was done mostly with Xilinx-supplied tools including XST for logic synthesis. For the C compilation, the team used Microsoft’s C++ Visual Studio and Xilinx’s Embedded Developers Kit (EDK). The project also utilized Nallatech’s Field Upgradeable Systems Environment (FUSE) which serves as “middleware,” handling data transfer to and from the host, and abstracting the details of the FPGA-based communication from the software layer.

The team experienced no significant timing closure issues on the project. A few iterations were required for serializing data and high-speed links, but overall the hardware (running at 80MHz with serial links up to 200MHz) ran fairly easily at speed, owing probably to effective hardware/software partitioning and experienced RTL coding on hardware modules.

The biggest obstacles the Nallatech team encountered were the initial planning and familiarization tasks such as understanding the schedule specifications, gaining familiarity with the MicroBlaze processor and environment, and understanding the microprocessor- based data flow. There were also some issues that arose on the cabling front that were easily conquered.

While Nallatech’s project seems conventional enough overall, their use of the FPGA as it’s own debug and emulation environment and also as a software/hardware partitioning and embedded software environment should make industry experts take notice. The combination of a talented design team, a high-performance/high-density FPGA with embedded processor, and embedded debugging tools is an extremely powerful and productive design and implementation path. One could easily speculate that accomplishing this same design with an ASIC-based solution, or with discrete processors could have required an order of magnitude more development time, engineering cost, and tools investment.

As is often the case, engineering expertise and experience with a technology reaps huge rewards in productivity, and often makes the difference between technical and economic success and failure in a design project.

Kevin Morris, FPGA and Programmable Logic Journal

March 16, 2004

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement