HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS



Physics Drives Physical into the Mainstream
New demands on design tools

You’ve finished the RTL for your FPGA design. You’re targeting the latest SRAM FPGA family and using the works: embedded soft-core processor, internal memory with external RAM interfaces, a DSP datapath leveraging built-in hardwired multipliers, and the FPGA vendor’s PCI core. After weeks of architectural and RTL work, it finally simulates correctly, and you’re ready to go to synthesis and place-and-route. It should be smooth sailing from here.

The first run through synthesis (after you got rid of all the non-synthesizable constructs and entered all those timing constraints) looks like not-so-bad news. You have three paths not meeting timing, all with minimal negative slack. We’ll call them A, B, and C. Since they’re within 5% of the constraint, you figure you’ll just dash on into place-and-route and see what happens. (You read our “synthesis shootout” article and know not to put too much stock in pre-layout timing estimates anyway).

Your first hint that it might be a bad day comes when place-and-route is still running as you leave your desk for lunch. You’d hoped it would be done so you could look at the results before you took your break. The plot begins to thicken when you come back from lunch and place-and-route is still chugging away. You double-check the process monitor and see that your machine is apparently fine. The tool is just running longer than expected. You stay an extra hour after work to see if it completes by the end of the day, and then leave, a little concerned that there might be a problem.

Your first hint that it might be a bad week comes the next morning when your layout job is still running. In mid-morning, when you’re about to kill it and try again, you’re surprised when it finishes and presents you with a timing report. The news is discouraging. There are now eleven paths that do not meet timing, some by as much as 20% over the clock cycle. You also notice that paths A, B, and C are not among the suspects, and that paths D through N are now at fault. These are obviously not the results you’d hoped for, but at least you’ve got a starting place with accurate timing numbers. You can now go back and change that part of the RTL that you always suspected was sloppy, and try again with some knowledge of what’s going to happen downstream. The good news is that your design fits on the part and routed 100%.

Your first hint that it might be a bad month comes the next day when you see that the newly re-written design now has fifteen critical paths, none of which were in the original results. Three weeks and 35 iterations through synthesis and place-and-route later, it becomes clear that the hints and harbingers have hatched, and you’re now in full-blown design project hell.

Our “Mr. Moore’s Wild Ride” article made it clear that we’ve entered a new era in FPGA design. With 130nm and 90nm FPGAs hitting their stride, the rules of the game have changed enough to demand a change in our design approach. With 90nm design rules, the commanding share of delay along a path is caused by the interconnect between logic blocks and not by the blocks themselves. These delays are not well-behaved. You can’t just add up the delay from the logic elements between registers and lump in a little parasitic and hope to be close. With the massive scale of the high-density devices facilitated by these smaller geometries, the interconnect delay is highly variable and almost completely a function of the physical implementation of your design.

Each time you run through layout in a traditional design flow, the entire design is thrown up in the air and re-placed. If the design or constraints or tool options are even slightly changed, you may get a completely new and unrecognizable result. If your design has tight constraints and millions of gates of logic, you can end up in exactly the scenario described above: unconstrained iteration that never converges through a lengthy sequence of design tools.

Since most companies cannot tolerate project schedules that are plus-or-minus several thousand percent (depending on the layout lottery), a better solution is required. Several months ago, in our “Getting Physical” article, we surveyed the emerging trend of using ASIC-like physical synthesis and floorplanning tools in FPGA design. Given the latest generation of parts, this idea has now moved into the mainstream.

EDA and FPGA vendors are seeing a rapidly increasing adoption of physical tools for FPGA design. While these tools tend to be on the expensive end of the spectrum for programmable logic, they are much cheaper than blowing project schedules or spending 30% more for silicon by requiring a higher speed grade.

Mentor Graphics recently announced its Precision Physical solution for the design of high-density FPGAs after several years of development and customer trials. “When we first developed this technology,” says Dr. Peter Suaris – Chief Scientist for Mentor Graphics Synthesis Group, “we were focused on automating the timing closure process for difficult designs. What we quickly found was that pushbutton physical optimization was not the main advantage. Designers were more excited about the fact that they could now see exactly what was causing timing problems and correlate back to the original RTL or to physical layout to correct them.” Mentor says that the biggest challenge in deploying physical design technology was getting perfect timing correlation between Precision Physical and the FPGA vendor’s layout. With that correlation established, their solution gives designers the capability to determine if timing problems are caused by logic problems, such as too many levels of logic, or physical problems, such as paths that meander across the entire devices.

“If the problem is purely physical, the tool can usually solve it automatically,” continues Suaris. “If the problem is in the original RTL, the tool can pinpoint the part of the RTL code that is causing the issue and facilitate the solution.” Correcting a problem in RTL, however, is only part of a complete solution. Bringing new RTL into a traditional design flow would likely yield a completely new placement with entirely different timing characteristics. Mentor’s solution allows the design to be incrementally modified from one run to the next, so that only the trouble sections are affected. Designers have found this feature to be critical because, otherwise, solving one problem can give rise to host of new ones when the design is re-placed. “It can be like squeezing a balloon,” says Suaris. “When you get one path under control, the problem pops out somewhere else. Incremental optimization allows convergence on a solution to be predictable.”

RTL, schematic, and layout views of timing paths facilitate rapid closure.
Image courtesy of Mentor Graphics Corporation.

Precision Physical’s automated flow uses techniques such as logic replication, register re-timing, and re-synthesis of the design in both the logical and physical domain to reach timing closure. Mentor says the average design can typically be improved in the range of 15% using the automated flow alone. Precision Physical also includes an interactive mode, however, for solving problems where designer expertise comes in handy. In DSP datapaths, for example, where embedded high-speed multipliers are often used, the registers associated with each multiplier must be placed in specific orientations for best performance. A designer using the interactive mode can easily locate the parts of the datapath in the optimal configuration where an automated, algorithmic approach might never find the perfect solution.

Mentor’s Precision Physical supports Xilinx’s Virtex E, Virtex II, Virtex II-Pro, Spartan II-E and Spartan III series FPGAs with Altera Stratix in beta. Additional technologies are scheduled to be supported in the next production release.

Supporting a wider range of technologies is Synplicity’s Amplify physical synthesis tool. Amplify has been on the market a number of years, and the latest version benefits from the experience of the many projects that have used it in the past. “With the new, smaller geometry FPGAs, it is even more important to do physical synthesis for performance optimization and timing closure,” says Jeff Garrison, Director of FPGA Tools Marketing for Synplicity. “With design projects getting continuously more complex, predictability is becoming even more important than performance.”

Amplify takes the netlist and timing results from place-and-route and incrementally re-synthesizes and re-places the design concurrently to reach timing closure. Many designers choose to use an interactive mode that allows pre-placement of design blocks into regions, which guides later detailed placement. “Many design teams have made the interactive Amplify flow their normal process,” says Garrison. “Others use a conventional flow and try the automatic mode of Amplify if they’re within 10-15% of meeting timing.”

Synplicity says that, although Amplify was originally used only by teams going for the maximum performance, it is now moving more and more into the mainstream FPGA user’s tool suite. Even design teams using the new super-low-cost families such as Xilinx Spartan-3 are finding that they can benefit from a physical solution if it makes their schedules more predictable, or if it can lop-off a significant percentage of their silicon cost by saving a speed grade. In these high-volume applications, cost is a major concern, and a tool can easily earn its keep on the first design.

Amplify’s latest version adds a new hierarchical timing report. The new report extracts physical hierarchy from the design and shows “islands” that simplify the process of floorplanning by isolating critical paths and showing how timing-critical paths interact with each other. By floorplanning the physical hierarchy of the design, a designer can reduce the uncertainty caused by subsequent placement runs and get faster, more predictable convergence.

Amplify supports a wide range of technologies from Altera and Xilinx including Altera’s Stratix, Stratix GX, Cyclone, Mercury, Excalibur ARM, Apex II, Apex20K/20KE/20KC, Flex 10K/10KE, and ACEX 1K and Xilinx’s Virtex, Virtex-E, Virtex-II, Virtex-II Pro, Spartan-II, Spartan IIE and Spartan-3.

Magma’s Palace Physical synthesis tool supports Actel’s ProASIC Plus, Altera's Stratix, Cyclone, APEX, and Max7K, QuickLogic’s Eclipse, and Xilinx’s Virtex-II and Virtex-II Pro in their current release. Magma’s solution leverages technology acquired from Aplus Design Technologies as well as their considerable experience in ASIC Physical Synthesis.

Actel has seen particularly good results using Palace on their flash-based ProASIC Plus series, showing that physical synthesis isn’t useful just for high-density SRAM devices. Magma claims that the tool can save an average of a speed grade, and for high-volume and consumer applications such as those targeted by Actel’s ProASIC Plus, the cost savings can be substantial.

Hier Design’s PlanAhead technology, while not purely physical synthesis, plays in the same space. PlanAhead uses an ASIC-like hierarchical floorplanning approach to constrain the design with the same primary goal as physical synthesis – to make the design cycle more predictable and controllable and to reach higher levels of performance. PlanAhead has seen considerable acceptance by companies migrating from ASIC design to FPGAs where there is familiarity with ASIC design methodology.

One of the big challenges with physical design (and with FPGA design in general) is management and interpretation of design constraints. The timing analysis engines that form the basis of many of the physical synthesis technologies described here are driven by user-specified timing constraints. In the ASIC world, there are fairly universal standards for constraint formats and semantics, but FPGA has a frightening blend of proprietary standards and ASIC-based ones. Interpretation of those constraints and support for the various formats is a major headache for tool developers. Maintaining and supporting all those formats requires a lot of effort and offers very little return. On the FPGA vendor side, there’s little incentive to migrate to a standard that would make designs more portable to other vendors.

The FPGA vendors are not, however, sitting on the sidelines while commercial EDA develops physical synthesis. Altera’s Quartus II package has actually included physical synthesis capability for a while. Everyone who gets Altera’s Quartus II has access to their physical synthesis technology. Altera’s solution is fully automatic and is invoked by a checkbox inside Quartus II. Altera reports that their solution can give a 12-20% improvement in Fmax at a cost of about 30% on compile time. Their solution is also compatible with solutions from EDA vendors, so if you’re trying to maximize design performance, there seems to be no reason not to turn it on.

Altera also provides capabilities in their place-and-route that facilitate physical synthesis solutions. Key among these is Logic Lock, which allows absolute or relative placements of hierarchical blocks within a design to be locked down so that only the smaller, changed portions of the design move from iteration to iteration. The idea is to preserve what’s working while you optimize that which is not.

Physical synthesis is the most difficult EDA technology to develop for FPGA, particularly because the physical architectural details are generally well-guarded secrets at the FPGA vendors. Deploying a physical synthesis solution requires significant cooperation between the EDA developer and the FPGA vendor’s architecture team, and development schedules are typically long. This is why, for example, even with significant pre-work and lead time, none of the commercial solutions described above yet supports Altera’s Stratix-II 90nm FPGA family, which would, of course, be an ideal application for physical synthesis technology. The trend of delayed availability of physical tools is likely to continue as vendors rush to get new devices to market and EDA companies scramble to keep up.

The question of whether physical synthesis is more naturally in the domain of EDA or FPGA companies persists. On the side of EDA is the fact that considerable research, development, and experience have been gained by EDA companies in doing ASIC physical synthesis. Algorithms, expertise, and even patent portfolios gained from ASIC give EDA a substantial boost in deploying technology for FPGA. On the side of the FPGA vendors is the fact that they are closer to the physical architecture and thus control more of the variables. FPGAs are architecturally very complex, and hard-IP, fixed and variable LUT structures, and routing restrictions all make FPGA physical design more of a challenge than the ASIC variety.

Additionally, some algorithms developed for ASIC don’t travel well to FPGA because of these architectural differences. In ASIC, for example, greater distance almost always translates into greater delay, whereas in FPGA delay increases in rather unpredictable steps depending on the specific device resources being used. FPGA physical synthesis must, therefore, use a more complex model, and converging on a solution is more difficult.

Physical synthesis is certainly a place where EDA is making a stand to re-establish itself with the FPGA market, however. After seeing simulation and RTL synthesis become somewhat commoditized in FPGA, vendors are turning to the promise of physical synthesis to establish tools that can earn their keep with value-based pricing. If this trend holds, we might look for even more commercial vendors to enter the fray and a return of big EDA to the FPGA scene.

Kevin Morris, FPGA and Programmable Logic Journal

March 2, 2004

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement