| |
In recent years, FPGAs have tremendously evolved in terms of capacity and performance. They’ve taken on more core functionality of a system. Therefore logic and constraint content of the FPGA place and route (P&R) tool inputs have increased. From an algorithmic perspective, extra features and density often represent an exponential complexity. If left unchecked, implications of this complexity on place and route runtime could become quite an impediment to designer’s productivity. At the same time, from a user perspective software runtime must be kept reasonably “fast”. Today, whether it is during logic creation, logic verification, design constraints closure or in-system debugging, designers need the ability to perform multiple design iterations through the place and route tool per day in order to move the project toward completion at an acceptable pace to management. The first section in this document places the FPGA design cycle in the broader context of the entire system development. It highlights which steps typically require fast implementation runtime and for each step what the design properties are. The second section covers ISE 9.1i algorithm improvements and a description of new flows and options available to control software runtime. In the last section we list a set of common project and design strategies which affect runtime. 1. The need for faster implementation toolsFigure 1 illustrates the system development process, detailing the different phases in the FPGA implementation. The right-hand side notes highlight demands on the FPGA software for faster runtime. These are:
We can easily see that the design complexity in terms of logic and constraint is totally different between the different stages in the FPGA development phases. Therefore the FPGA software algorithms and options need to be tailored to meet the user’s varied expectations.
In addition to the design being vastly different depending on the project completion level, trying to achieve one goal, runtime reduction here, often comes at the detriment of something else; as with a lot of things in life. In the FPGA implementation software world, optimizing algorithms for runtime can negatively impact maximum achievable performance, power consumption or logic area usage. For instance, there are many algorithms which the FPGA place and route tool can attempt on a design to squeeze out the maximum performance for the selected device. Try too many algorithms and runtime grows to unacceptable lengths. Try too few algorithms and performance becomes sub-optimal compared to the hardware capabilities. With each software release, Xilinx runs many customer designs through the tool, tuning algorithms and options to statistically exceed previous releases in these conflicting goals. With 9.1i release, Xilinx also introduces new algorithms, flows and user controlled options which we discuss in the following paragraphs. 2. ISE 9.1i runtime improvements With every release, P&R algorithms are enhanced to add support for new architectures or tuned for existing parts to achieve better QoR, runtime, etc… The same review process is done for project options and flows. ISE 9.1i is no exception and the next few sections cover the changes in this release which influence the overall application runtime.
P&R algorithms are completely constraint driven. This means algorithms applied to implement a particular design are dependent on the estimated timing, area, location and power performance compared to user defined constraints. Therefore, the next implementation algorithm step will depend on the margin between the required performance and the current implementation status. It then becomes obvious that user applied constraints AND software performance estimation accuracy have a tremendous impact on runtime. Inaccurate software estimation can lead to wasted runtime in more optimization than required by the constraints and overzealous user constraints may force the software in attempting to meet constraints that are at or beyond the device capabilities. To this effect, ISE 9.1i packer, placer and especially router timing estimation engine has been improved and is now more capable of early detection of placement situations that are not routable in the given timing constraint environment. This increase in timing estimation accuracy saves runtime by not spending CPU cycles exploring impossible situations. ISE 9.1i is therefore able to spend additional cycles on realistically achievable timing constraints while skipping impractical configurations. These enhancements apply to all FPGA architectures and provide QoR AND runtime benefits as illustrated in Figure 2. Gains are most visible for tough timing constraints that is designs for which constraint are at or slightly above what the place and route tool can deliver.
2.2. New SmartCompile Flows
The purpose of this new flow in ISE 9.1i is to save runtime by reusing placement and routing information from a previous implementation. Yet Smartguide gives priority to meeting all design constraints. SmartGuide is very simple to use this feature since it does not require any methodology or constraint change; simply turn on SmartGuide as illustrated in Figure 3.
During the re-implementation phase SmartGuide will compare the new netlist with the reference one and match a maximum number of logic elements and routes. It will then place and route matched elements exactly the same way and run regular pace and route algorithms on changed design elements. Since meeting user timing constraints on the entire design is the primary objective, the algorithm has the possibility of modifying placement or routing of guided elements if necessary to achieve performance requirements on the unguided design portion. Typically, bigger runtime savings are achieved when netlist logic and routing percentage changes are smallest. Location of timing critical paths also has an influence on runtime. For instance, if the new or modified logic contains paths for which meeting timing is difficult, place and route algorithms will potentially need to move around large parts of the unchanged design portion to achieve timing closure on these paths resulting in lower runtime savings. Figure 4a illustrates the guiding process and presents runtime savings which can be expected depending on the type of design change.
SmartCompile – Partitions is a flow whose primary goal is to only re-implement changed design partitions thus exactly preserving previous implementation results for unchanged partitions. At the same time8, this direct copy-and-paste of the unchanged partitions provides appreciable runtime savings. Throughout the development cycle, you will find recurring situations where after design changes you need to redo the implementation and quickly perform a post implementation analysis. For example:
The recurrent theme here is that you need to re-implement only a portion of the design and meeting timing constraints on the modified design portion is not the primary objective at that point. Instead, quick implementation runtime is required to enable post implementation analyses and conclusions shared back to the development teams quickly. The partition flow is illustrated in Figure 5 and the main steps are as follows:
Fastest runtime are achieved when few partitions have changed between two implementations. Typically, partitions follow the natural design hierarchy. Other good partition candidates are design portions with high logic utilization or tight timing. Also a general recommendation is to maintain critical paths within a single partition. For each partition select the appropriate preservation level. Typically, the more information is preserved the more runtime will be saved
This option lets you interrupt the router process using CTRL+C keyboard combination. SmartPreview generates a menu with options to save the design database in its current state and then decide whether or not to let the tool continue. This database snapshot can be very useful in saving runtime at different stages in the design cycle. For instance:
SmartPreview saves runtime and improves user’s productivity by enabling analyses that were typically done only after the place and route tool completed. 3. Additional Strategies to Improve Runtimes Throughout the design cycle you have different ways to minimize runtime while ensuring timing, area and power constraints are met. You have control over the input netlist, constraint file, tool settings and design methodology. For each of these Table 1 lists different techniques and recommendations to avoid superfluous software runtime. Table 1:
4. Conclusion ISE 9.1i innovative flows such as SmartCompile and SmartGuide combined with the SmartPreview feature and new algorithmic enhancements especially for tough designs with difficult to meet timing constraints allow for noticeable runtime improvements. Thus, whichever phase in the FPGA development cycle your design is and however defined your logic or constraints are, these algorithmic, flow and features enhancements combined with existing strategies described at the end of this document, will help you achieve your objectives more efficiently and faster. 5. Additional resources: Xilinx Design Tools: http://www.xilinx.com/ise By Philippe Garrault – Technical Marketing Engineer, Xilinx, Inc. May 31, 2007 Comments on this article? Send them to comments@fpgajournal.com |
||||||||||||||||||||||||||||||||||||||||||||||
All
material on this site copyright © 2006 techfocus media, inc.
All rights reserved.
FPGA and Structured ASIC Journal Privacy Statement |