HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS


SPONSORED WHITE PAPER

Strategies to Improve Runtime in ISE 9.1i

By Philippe Garrault – Technical Marketing Engineer, Xilinx, Inc.

In recent years, FPGAs have tremendously evolved in terms of capacity and performance. They’ve taken on more core functionality of a system. Therefore logic and constraint content of the FPGA place and route (P&R) tool inputs have increased. From an algorithmic perspective, extra features and density often represent an exponential complexity. If left unchecked, implications of this complexity on place and route runtime could become quite an impediment to designer’s productivity. At the same time, from a user perspective software runtime must be kept reasonably “fast”. Today, whether it is during logic creation, logic verification, design constraints closure or in-system debugging, designers need the ability to perform multiple design iterations through the place and route tool per day in order to move the project toward completion at an acceptable pace to management.

The first section in this document places the FPGA design cycle in the broader context of the entire system development. It highlights which steps typically require fast implementation runtime and for each step what the design properties are. The second section covers ISE 9.1i algorithm improvements and a description of new flows and options available to control software runtime. In the last section we list a set of common project and design strategies which affect runtime.

1. The need for faster implementation tools

Figure 1 illustrates the system development process, detailing the different phases in the FPGA implementation. The right-hand side notes highlight demands on the FPGA software for faster runtime. These are:

  • Fast runtime with few and approximate constraints
  • Fast runtime with incomplete constraints and large design portions added or modified
  • Fast runtime with detailed constraints and small logic or constraint changes

We can easily see that the design complexity in terms of logic and constraint is totally different between the different stages in the FPGA development phases. Therefore the FPGA software algorithms and options need to be tailored to meet the user’s varied expectations.

Figure 1: System development cycle with FPGA runtime software requirements highlighted

In addition to the design being vastly different depending on the project completion level, trying to achieve one goal, runtime reduction here, often comes at the detriment of something else; as with a lot of things in life. In the FPGA implementation software world, optimizing algorithms for runtime can negatively impact maximum achievable performance, power consumption or logic area usage. For instance, there are many algorithms which the FPGA place and route tool can attempt on a design to squeeze out the maximum performance for the selected device. Try too many algorithms and runtime grows to unacceptable lengths. Try too few algorithms and performance becomes sub-optimal compared to the hardware capabilities.

With each software release, Xilinx runs many customer designs through the tool, tuning algorithms and options to statistically exceed previous releases in these conflicting goals. With 9.1i release, Xilinx also introduces new algorithms, flows and user controlled options which we discuss in the following paragraphs.

2. ISE 9.1i runtime improvements

With every release, P&R algorithms are enhanced to add support for new architectures or tuned for existing parts to achieve better QoR, runtime, etc… The same review process is done for project options and flows. ISE 9.1i is no exception and the next few sections cover the changes in this release which influence the overall application runtime.

2.1. P&R algorithm changes

P&R algorithms are completely constraint driven. This means algorithms applied to implement a particular design are dependent on the estimated timing, area, location and power performance compared to user defined constraints. Therefore, the next implementation algorithm step will depend on the margin between the required performance and the current implementation status. It then becomes obvious that user applied constraints AND software performance estimation accuracy have a tremendous impact on runtime. Inaccurate software estimation can lead to wasted runtime in more optimization than required by the constraints and overzealous user constraints may force the software in attempting to meet constraints that are at or beyond the device capabilities.

To this effect, ISE 9.1i packer, placer and especially router timing estimation engine has been improved and is now more capable of early detection of placement situations that are not routable in the given timing constraint environment. This increase in timing estimation accuracy saves runtime by not spending CPU cycles exploring impossible situations. ISE 9.1i is therefore able to spend additional cycles on realistically achievable timing constraints while skipping impractical configurations.

These enhancements apply to all FPGA architectures and provide QoR AND runtime benefits as illustrated in Figure 2. Gains are most visible for tough timing constraints that is designs for which constraint are at or slightly above what the place and route tool can deliver.

Figure 2: ISE 9.1i provides an average 2.5X faster runtimes for tough constraints over ISE8.2i

2.2. New SmartCompile Flows

2.2.1. SmartCompile – SmartGuide

The purpose of this new flow in ISE 9.1i is to save runtime by reusing placement and routing information from a previous implementation. Yet Smartguide gives priority to meeting all design constraints.

SmartGuide is very simple to use this feature since it does not require any methodology or constraint change; simply turn on SmartGuide as illustrated in Figure 3.

Figure 3: How-to enable SmartGuide

During the re-implementation phase SmartGuide will compare the new netlist with the reference one and match a maximum number of logic elements and routes. It will then place and route matched elements exactly the same way and run regular pace and route algorithms on changed design elements. Since meeting user timing constraints on the entire design is the primary objective, the algorithm has the possibility of modifying placement or routing of guided elements if necessary to achieve performance requirements on the unguided design portion.

Typically, bigger runtime savings are achieved when netlist logic and routing percentage changes are smallest. Location of timing critical paths also has an influence on runtime. For instance, if the new or modified logic contains paths for which meeting timing is difficult, place and route algorithms will potentially need to move around large parts of the unchanged design portion to achieve timing closure on these paths resulting in lower runtime savings.

Figure 4a illustrates the guiding process and presents runtime savings which can be expected depending on the type of design change.

Figure 4a: SmartGuide process (guided area highlighted in blue)


Figure 4b: SmartGuide typical runtime savings with regard to the quantity and type of design

2.2.2. SmartCompile – Partitions

SmartCompile – Partitions is a flow whose primary goal is to only re-implement changed design partitions thus exactly preserving previous implementation results for unchanged partitions. At the same time8, this direct copy-and-paste of the unchanged partitions provides appreciable runtime savings.

Throughout the development cycle, you will find recurring situations where after design changes you need to redo the implementation and quickly perform a post implementation analysis. For example:

• You may have a completed design portion for which meeting timing was difficult. You then need to add another piece of logic to the design while preserving results on the existing design portion.

• You may also want to go in the lab and perform hardware debugging on an unchanged portion of the design in the midst of changes in other unrelated design portions.

• You may have received, as a project manager, a modified portion of the design from one of the development teams and need to verify that the entire design logic actually still fits the selected device.

The recurrent theme here is that you need to re-implement only a portion of the design and meeting timing constraints on the modified design portion is not the primary objective at that point. Instead, quick implementation runtime is required to enable post implementation analyses and conclusions shared back to the development teams quickly.

The partition flow is illustrated in Figure 5 and the main steps are as follows:

Figure 5: ISE 9.1i SmartCompile – Partition flow,
Copy-Paste results of unchanged partitions.

1. Split the design hierarchy into multiple partitions.

Fastest runtime are achieved when few partitions have changed between two implementations. Typically, partitions follow the natural design hierarchy. Other good partition candidates are design portions with high logic utilization or tight timing. Also a general recommendation is to maintain critical paths within a single partition.

For each partition select the appropriate preservation level. Typically, the more information is preserved the more runtime will be saved

• Synthesis - Preserves the netlist associated with this partition. Synthesis will be preserved unless the RTL description for this partition has changed

• Placement - Preserves the netlist and logic elements placement unless the RTL or netlist constraints have changed.

• Routing - Preserves the netlist, logic elements placement and routing data unless the RTL, netlist or placement constraints have changed.

2. Set your constraints and run the design implementation tools.

3. Make changes and re-run the design implementation tools.

Each application report lists which partitions were preserved and which were re-implemented, as well as the cause for the re-implementation.

2.3. SmartPreview

This option lets you interrupt the router process using CTRL+C keyboard combination. SmartPreview generates a menu with options to save the design database in its current state and then decide whether or not to let the tool continue. This database snapshot can be very useful in saving runtime at different stages in the design cycle. For instance:

• Identify difficult designs early.

• You may load this database snapshot within Floorplan Editor, Floorplanner, FPGA Editor and then analyze placement or visualize which part of the design is not yet routed. You may already be able to extract congestion information. What type of logic is in the bottleneck? You may then decide to change synthesis option so as to pack more or fewer logic within the dedicated memory or DSP blocks.

• In addition you can run timing analysis on this database and identify design areas for which timing is harder to achieve. Coupling this analysis with the placement analysis you may decide to constrain the synthesis tool to reduce even more area on non critical design parts. You may also check if the synthesis tool timing estimates for these failing paths match. If they do not match maybe the synthesis tool did not perform all the optimizations it could and adjusting its constraints would create an easier netlist to place and route.

• Debug portions of a design in the lab.

• Once the place and route log file indicates all nets are routed you can use SmartPreview to first create a snapshot of the design database, and then generate a programming file which you can load on a device in the lab. With the timing report you will also understand which part of the design does not meet timing. Armed with this data within the lab you can already perform verifications on the parts of the design that meet timing. You can do the same on the parts that do not meet timing if your setup allows you to lower the system operating frequency.

SmartPreview saves runtime and improves user’s productivity by enabling analyses that were typically done only after the place and route tool completed.

3. Additional Strategies to Improve Runtimes

Throughout the design cycle you have different ways to minimize runtime while ensuring timing, area and power constraints are met. You have control over the input netlist, constraint file, tool settings and design methodology. For each of these Table 1 lists different techniques and recommendations to avoid superfluous software runtime.

Table 1:

Category
Reduce runtime by
Overall Design Flow Design Specifications

Fast runtime with few and approximate constraints

  • Ensure netlist was generated for the targeted device.
  • Avoid using IP core unless they are properly constrained
  • Avoid excessively/arbitrarily tight constraints
  • Avoid large number of timing constraints. Try to specify them in a concise manner using wild-cards etc.
Design Development

Fast runtime with incomplete constraints and large design chunks are added or modified

  • Try the SmartCompile – Partition flow to preserve previous run results in the midst of large design changes
Design Closure

Fast runtime with detailed constraints and small logic or constraint changes

  • Try the SmartCompile – SmartGuide to preserve previous run timing results in the midst of small design changes
Quality of Synthesis (Input Netlist) Constraints

Synthesis tools too are constraint driven. A suboptimal input netlist will yield suboptimal implementation results and may force the implementation tool to work hard on paths which could have been better optimized during synthesis.

  • Use timing constraints in your to specify the expected performance for internal clock domains and path to and from I/Os.
  • When the synthesis tool uses almost all of one device resource (such as memory, DSP, LUT,…) consider setting constraints to save area on non timing critical part of the design or force replications on tough parts of the design. These changes the logic or connectivity of the design and will affect the place and route tool runtime
Align Synthesis and Implementation Device Target
  • Synthesis tools perform resource management and will map logic onto other resources when running out of one resource in the selected device. This results in suboptimal netlists when the device chosen during implementation actually has different amount of block memory or block DSP for instance.
  • Synthesize a design for the intended architecture when evaluating an FPGA family. Using a netlist which was generated for another (older) FPGA architecture then the one set in the implementation tool will yield suboptimal results. Because the netlist will not use the all the available device library components, it could make it harder for the P&R tool to meet the performance objective and thus increase runtime.
  • Consider using options in synthesis tools to read in the IP core netlist (-read_core in XST). This will allow better optimization of the paths to and from IP cores.
Timing Constraints Constraining without over-constraining Design over-constraining makes it more difficult for the placer and router to achieve timing closure. In some cases, this produces worse results than using realistic timing objectives.
Number
of Constraints

Recommendation is to minimize the number of constraints. Consider the following where applicable:

  • Use global timing constraints instead of individual timing constraints
  • Use TIMEGRPs to group signals with the same timing requirements
  • Use FROM-TO to define a multi-cycle path
  • Use OFFSET constraints with individual timing groups only for exceptions
  • Use TIG constraints to reduce the difficulty of meeting all timing constraints during Place and Route
Placement constraints I/O Placement

Whenever possible, use the loc constraint for I/Os.

  • It reduces the design complexity from the tool perspective
  • Place your I/Os so the data path flows from one side of the device to another
Critical
Logic Floorplanning

When floorplanning, consider the following:

  • Add few AREA_GROUP for critical paths or acquired IP cores will save runtime
  • Too many AREA_GROUP constraints, especially overlapping ones, could cause long runtimes. To be efficient, AREA_GROUP constraints require a good understanding of the device logic, block and routing resources.
  • Few LOC constraints on DSP or Memory blocs. To guide placement of neighboring logic
Clock
Domain Floorplanning

This strategy saves runtime by confining synchronous elements driven from the same clock buffer to specific clock regions. This prevents clock resource contention throughout the device and ensures better consistency of implementation results as the design evolves or the tools are rerun

Global clocks

  • Highly recommended for designs with more global clock signals then the device architecture.
  • Assign design logic clock domains to separate device clock regions. Tip: use the clock region report from a previous implementation as a starting point and syntax example.

Regional clocks

  • For designs with more than 2 regional clocks consider locking down the clock drivers (BUFRs)
ISE settings Place
and Route Effort Levels
  • Use the lowest effort level which satisfy your constraints
  • Use matching map and par effort levels.
  • Enable the timing driven packer and placer (–timing) option for highly utilized designs.
  • Use the “Extra Effort Level” only when your design is within a few percent of meeting all timing objectives
Advanced Place and
Route options
Settings to further improve area, power dissipation and performance most often also lengthen compilation times
  • Let Xplorer find you the best options for your design when it is stable and close to meeting timing objectives.
  • Disable power optimization unless your power budget is exceeded.
  • Avoid constraints which pack the logic too tight as this may create artificial routing conflict situations that are runtime intensive to solve.

4. Conclusion

ISE 9.1i innovative flows such as SmartCompile and SmartGuide combined with the SmartPreview feature and new algorithmic enhancements especially for tough designs with difficult to meet timing constraints allow for noticeable runtime improvements. Thus, whichever phase in the FPGA development cycle your design is and however defined your logic or constraints are, these algorithmic, flow and features enhancements combined with existing strategies described at the end of this document, will help you achieve your objectives more efficiently and faster.

5. Additional resources:

Xilinx Design Tools: http://www.xilinx.com/ise
ISE 9.1i – what’s new: http://toolbox.xilinx.com/docsan/xilinx9/swcol/whatsnew.htm
ISE 9.1i – Software manuals: http://www.xilinx.com/support/software_manuals.htm

By Philippe Garrault – Technical Marketing Engineer, Xilinx, Inc.

May 31, 2007

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement