HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS


SPONSORED WHITE PAPER

Timing-Closure In High-End FPGAs:
The Premier Solution

Jeff Garrison, Sr. Director, FPGA Implementation Marketing, Synplicity, Inc.

Introduction

Timing-closure is a growing concern for FPGA designers, particularly with the recent introduction of multi-million gate architectures fabricated at the 90nm and 65nm technology nodes. It is not sufficient for a timing-closure solution – the entire flow, including synthesis – to only meet the required timing; such a solution must also minimize the number of time-consuming synthesis-place-route iterations and provide results that remain stable across multiple physical synthesis runs and during final routing.

A Problem With Conventional Technologies

It is no secret that wire delays dominate cell delays in modern silicon chips. In the case of FPGAs implemented at the 90nm technology node, for example, wire delays can account for 80-to-90% of each delay path. This causes a problem with conventional FPGA synthesis solutions, because only the cell delays are known, while the wire delays – which cannot be fully characterized until after place-and-route – are estimated.

As noted in Figure 1 below, the distribution of delays – the difference between what is estimated after logic synthesis compared to the actual delays following place-and-route – increases with each technology node and also as a function of the size of a design.

The result of these inaccurate delay estimations is a timing-closure nightmare, because the critical paths seen by the synthesis engine do not match the critical paths seen by place-and-route. In many cases, based on these inaccurate delay estimations, true critical paths may be under-optimized (thereby degrading performance), while non-critical paths may be over-optimized (thereby degrading area). Ultimately, this results in a significant increase in the number of synthesis-place-route iterations with poor convergence. Also, it leads to instability, because minor changes to the RTL can result in large, unpredictable changes to the results from synthesis and place-and-route.

Figure 1. There is increasing uncertainty in delay distribution as designs use newer technology nodes

Some Considerations Associated With FPGA Architectures

The majority of today's conventional FPGA logic synthesis solutions are derived from technology that was originally designed with ASIC synthesis in mind. In the case of physically-aware synthesis for ASICs, for example, placement and synthesis are combined in a single pass. The reason this works so well in the ASIC world is that routing tracks are built-to-order. This means that the delays associated with the final placed-and-routed design tend to correlate reasonably well to those estimated by the synthesis engine.

One way to think of this is that working with an ASIC is like living in an undeveloped area (such as a desert) with a large budget. In this case you can build new roads as you need them and – generally speaking – each road can be as fast as you wish. Thus, if you wished to construct a path from your home to the office, this path could be implemented as a point-to-point super-fast highway. Based on this, it would be easy for you to accurately estimate the time it will take for you to travel to work and back.

By comparison, working with an FPGA is similar to living in the middle of a large, well-established city, whose “interconnects” are a mixture of small back streets, medium-sized roads, and really fast highways. In that case, when you are returning home from the office, you don’t necessarily aim at driving the shortest distance. Instead, you may commence your journey by heading the “wrong way” until you can get on a highway, at which point you may reverse your direction and pass the office again as you head toward your home.

Planning on traveling in a straight line is great if nothing goes wrong. For example, if you wish to travel from point A to point B in a car and you plan on taking a certain road, you may estimate that the trip will take only 1 hour assuming good weather and no significant amount of traffic. But suppose some fog comes in and it starts to rain, plus it turns out that a major festival is taking place and there are cars everywhere. There may be lots of alternative routes available: some offering a short detour but with lots of tight curves, while others are straighter but longer – which route offers the best solution?

A similar situation occurs with regard to synthesis-place-route – basing one's estimations on a straight-line route for a signal is great if nothing gets in the way and nothing goes wrong. In this case, the logic synthesis will work and the place-and-route will closely match its results. In a real-world design, however, multiple gates and routes end up fighting for the same resources. This means that some placement-signal combinations will have to "take a detour", but which ones have to take the detour and which detour should they take?

The situation is further confused by the fact that having two logical functions in close proximity to each other on an FPGA does not necessarily imply the availability of a fast connection between them. Although this is somewhat non-intuitive – and it is dependant on the available routing resources – one may achieve better routing and timing results by placing the logic functions further apart. This is why ASIC-derived synthesis technologies return less-than-optimal results when applied to FPGA architectures. This is one reason why design flows based on these technologies require large numbers of time-consuming iterations between the front-end (synthesis) and back-end (place-and-route) engines in order to achieve correlation and timing closure.

A More-Detailed Look At Conventional Synthesis Technologies

In order to improve the results from standard RTL synthesis, there are three conventional synthesis-enhancement techniques: floorplanning, in-place optimization (IPO), and physically-aware synthesis.

Floorplanning

Floorplanning tools allow designers to define physical regions on the device, to place regions (by hand or auto-interactively), and to assign distinct portions of the design to the regions. (In the case of FPGAs, floorplanning is not limited to module assignments – it is also very common to floorplan detailed, point-to-point paths.) The floorplanned design is typically synthesized and optimized on a block-by-block basis and then everything is "stitched together" at the end.

In the case of FPGA implementations, analysis of a number of representative designs shows that critical paths in a design typically end up passing through three “rooms” in the floorplan. Due to the regular structures associated with FPGA architectures, as logic modules are assigned to physical regions things actually end up getting spread out, with the result that the use of floorplanning can actually end up degrading the design’s performance.

Floorplanning tools do not consider routing resources globally and it is not possible to accurately analyze all of the timing paths until after the design has been fully routed, which makes it difficult to realize global optimizations. This can result in large numbers of time-consuming iterations between the front-end and back-end tools. Furthermore, having a floorplan is detrimental with regard to migrating the design to a different device or FPGA vendor. Since floorplanning is largely a manual process, it requires special expertise on the part of the user.

Figure 2. Manual floorplanning can reduce the FPGA delay distribution, but is limited and requires special expertise

In-Place Optimization

In the case of in-place optimization (IPO), the output from synthesis is passed to the downstream place-and-route engines. Following place-and-route and parasitic extraction, real-world delays are back-annotated into the synthesis domain. These new values trigger incremental optimizations in the synthesis engine such as logic restructuring and replication. The result is a new netlist that has been partially modified. This netlist is then handed over to incremental place-and-route engines which generate a modified design topology.

The end results from an IPO-based flow are typically better than those achieved by all but the very best of floorplans and hand routing. Once again, however, this technique can require numerous time-consuming iterations between the front-end and back-end tools.

Furthermore, a significant problem with an IPO-based technique is that optimizing timing paths on an individual basis often leads to degradation of other paths and incomplete timing closure. Designers need reliable results that allow them to make changes to the design without losing what they have achieved in earlier versions. However, the IPO-based technique does not produce stable results over multiple design iterations, because optimizing critical paths in one iteration can create new critical paths in the next. Similarly, adding constraints to improve timing in one area can worsen timing in other areas. All of this can result in convergence issues.

Physically-Aware Synthesis

As noted earlier, the underlying concept behind physically-aware synthesis is to combine placement and synthesis in a single pass. This works well in the ASIC domain, because the placement-aware synthesis engine can base its timing estimations on the proximity of the placed cells and on the ensuing Steiner and Manhattan routing estimations.

Starting around the 2002-2003 timeframe, some conventional synthesis solutions started to apply ASIC-derived physically-aware synthesis technology to FPGA designs, but this technology has not enjoyed a high level of success because things are different in the FPGA domain. The problem is that, unlike tracks in the ASIC world that are “built-to-order,” an FPGA has a fixed amount of pre-defined routing resources, and not all of these routes are created equal.

Most of the physically-aware synthesis techniques apply partial solutions for improving the critical paths in the design. There are integrations of retiming and placement, replication and placement, clustering and placement, etc. As a result of these approaches, the netlist is modified in a physically-aware way, but not legalized. The last step of legalization can undo the improvement made by physically-aware optimizations, or make them show unrealistic improvements. They don't provide integral solution of placement, mapping and optimizations.

Lets take a look at the nature of the problem that we are trying to solve.

The Local Optima Problem

Assume that (for purposes unknown) you construct a simple device consisting of a power source and two variable resistors in series with an incandescent light bulb (Figure 3). Let’s further assume that the resistors can be rotated through a 180 degree range from –90 degrees to +90 degrees, and that the point of least resistance for each resistor is its center (upright) position.

Figure 3. A simple device consisting of a power source and two potentiometers in series with a light

Now, suppose that you were to set the resistors to random positions, then you give this device to someone and ask them to use it to make the light as bright as possible. Most people would select one of the knobs, turn it a little way to the left or right, and note the result. If the light gets brighter they will continue to turn the knob in that direction. Alternatively, if the light dims, they will reverse their original action and try turning the knob the other way. After a little experimentation, the user would quickly determine the optimal setting for the first resistor. They would then turn their attentions to the second resistor and perform a similar sequence of actions.

The phrase “solution space” refers to all of the possible solutions associated with a particular problem. We can represent the simple solution space corresponding to our light bulb problem in the form of a three-dimensional graph (Figure 4).

Figure 4. The simple solution space for our light bulb problem

In this particular case, we’re assuming that the x axis corresponds to the rotation of the first resistor, the z axis corresponds to the second, and the y axis reflects the brightness of the light.

If we were to use a software program to solve a similar problem, one approach would be to use a “hill climbing” algorithm. In this case, the program would commence at some random point, make a small modification to one of the variables under its control, and then check to see if the quality of the solution had improved or degraded. In the case of the former, the program would continue to effect changes in the same direction; but if the original modification had caused the solution to worsen, then the program would reverse the direction in which it had changed the variable and try again. The program could repeat this process with all of the variables under its control. Also, the program would require some way to recognize when it had reached the optimal solution.

In the real world, of course, the solution space associated with a particular problem may consist of a number of “hills” and “valleys” (Figure 5). In this case, a hill climbing algorithm may end up on some local maxima (the top of one of the “hills”), but the algorithm has no way of “seeing” any other “hills,” some of which may be higher.

Figure 5. A slightly more complex solution space

Furthermore, as the number of variables used to describe the problem increases, the result can be a solution space containing constructs such as “tunnels,” “bridges,” and even elements like “cities in the sky” (the names of these constructs are fairly self-explanatory). As the topology of a solution space approaches these levels of complexity, finding the highest peak (or even being able to tell the difference between “up” and “down”) becomes highly problematic.

The point of all this is that – due to the complexities of FPGA fabrics and routing topologies – design flows based on the use of conventional synthesis solutions can spend vast amounts of time and computational resources, only to arrive at a less-than-optimal solution.

Graph-Based Physical Synthesis Solution

An alternative synthesis solution that truly handles the complexities associated with FPGA architectures is that of graph-based physical synthesis. Conventional synthesis approaches are reactive in that they constantly have to react to bad decisions and delay estimations. By comparison, graph-based physical synthesis is proactive in that it starts off by making the correct decisions, which result in accurate delay estimations. In turn, these good delay estimations dramatically reduce the number of synthesis-place-route estimations. Furthermore, the way in which graph-based physical synthesis works means that it is inherently stable; small changes to the RTL result in small localized changes to the synthesis-place-route timing path results.

The way this works is to characterize all of the tracks in the FPGA – including entry points, end points, and internal exit points, and to then build a “map” of all of these tracks. In the software world this type of map is referred to as a Graph; hence the reason why this technique is known as "Graph-Based Physical Synthesis".

Figure 6. Graph-based physical synthesis further reduces FPGA delay distribution resulting in very tight correlation between post synthesis timing and final post P&R timing

In addition to the tracks themselves, this map also includes details as to which LUT pins have access to which types of track, any differences in input-to-output delays through each LUT, and the size and locations of any hard macros in the device. Returning to our analogy for a moment, this is like having access to a map of a city showing the streets and highways and features such as parks (hard macros) that you will have to drive around. When wishing to travel between two points in the city as quickly as possible, you would use the map to select the fastest route – in many cases this will not be the shortest and most direct route.

Similarly, instead of looking for proximity, the synthesis engine focuses on speed using an interconnect-centric approach. Starting with the most critical paths and working its way down to the least critical paths (thereby ensuring that the fastest routes are available for the most critical paths), the synthesis engine will select tracks and their associated entry points and exit points; from these tracks it will derive placements; from the tracks and placements it will derive accurate delays; and it will then optimize and iterate as required.

A key point is that all of the optimization and iteration is performed in the front-end (synthesis) portion of the flow. During all steps of graph-based synthesis, the netlist is legally placed and routing is accurately estimated. Therefore, all improvements to the critical path done during physical synthesis are realistic. The output from the graph-based physical synthesis engine is a fully placed netlist (including the specific LUT pins that are to be associated with each track) that can be handed over to the FPGA’s back-end route engine.

This dramatically reduces the amount of delay distribution. In turn, these highly accurate delay estimations significantly reduce iterations and increase design stability.

As we noted in the previous topic, a conventional synthesis solution can easily arrive at a less-than-optimal solution. It mistakenly thinks it's reached the highest peak in its solution space, but it can't "see" taller surrounding hills because they are too far away. One approach is the FPGA synthesis engine's equivalent of the film Groundhog Day – that is, to generate a new pseudo-random seed (starting point) and do the whole thing again (sort of like shaking a snow globe and seeing where all the flakes end up falling).

By comparison, the graph-based physical synthesis engine's view of the solution space is like flying high over the landscape in a plane. Its ability to "see" the whole world means that the graph-based physical synthesis engine can easily spot the highest mountain and then "parachute" directly in to a soft landing on top.

The end result is a single-pass, push-button synthesis step requiring minimal iterations with the downstream place-and-route tools.

Summary

There is an increasingly serious timing closure problem when using high-performance, high-complexity FPGAs implemented at the 90nm technology node and lower. The solution is to use a graph-based physical synthesis approach.

Unlike the ASIC-centric physically-aware synthesis world in which routing tracks are derived from placement selections, it is the placements derived from the routing selections when using a Graph-Based Physical Synthesis approach in the FPGA domain.

Due to potentially inaccurate delay estimation, FPGA design flows based on existing (ASIC-derived) physical synthesis engines tend to exhibit a high degree of instability with regard to their results (small changes to the RTL can result in major changes to the synthesis-place-route results). In turn, this may require many time-consuming iterations between the front-end (synthesis) and back-end (place-and-route) portions of the flow. And, after all of this, they may still fail to converge on a timing solution.

By comparison, an analysis of graph-based physical synthesis results from more than 200 designs shows that 90% are within 10% of the final real-world timing and 80% of these designs are within 5% (by comparison, only 30% of designs using state-of-the art ASIC-derived physical synthesis are within 5% of the real-world timing values and many designs can easily be in error by 30% or more).

In addition to the fact that a flow using graph-based physical synthesis requires minimal iterations with the downstream place-and-route engines, the quality of the placed netlist is dramatically improved; this optimal placement means that the router is presented with obvious decisions, thereby allowing it to run significantly faster.

by Jeff Garrison, Sr. Director, FPGA Implementation Marketing, Synplicity, Inc.

May 31, 2007

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement