| |
Abstract This paper uses well-known routing estimation
techniques to analyze the trends of routing area
requirements for Coarse-grained Standard Metal
Structured ASICs versus Field Programmable Gate
Arrays (FPGAs) and Standard Cell. Standard Cell is
typically a fine-grained cell architecture, with
functions created out of standard single gate
primitives, combined with custom metal interconnect,
metal segments and vias fabricated with custom
masks. FPGAs are standard products with
programmable devices that are used to connect
predefined metal segments together with coarsegrained
programmable cells. By contrast Structured
ASICs have regular arrays of coarse-grained Routing Model While there have been recent refinements to Rent’s rule based routing estimation [1][2], many routing area estimations are still based on Rent’s rule [3][4], a function, which estimates the number of pins on a block based on the number of cells in a block. Over 50 years ago, Rent pointed out that the number of pins P required for a block of logic that contained C cells could be estimated by P = K*CR where R is now called the Rent’s coefficient. This number is typically between 0.6 and 0.7 for random logic. For well-defined blocks, designed to minimize the number of pins on the boundary, this coefficient can be as small as 0.5. Using Rent’s rule as the basis, most routing models estimate the wire lengths by integrating Rent’s rule from a single cell up to the number of cells in the design. On such derivation, defines the total wire length in cell pitches as: Wlen = ½K*C(R+ ½ )/(2R-1)+ K*CR/2 Where K is Np/C, and Np is the total number of used cell pins in the overall design, and C is the number of cells. The average wire length per cell would then be: Wavg = K*[ C(R+ ½ )/(2R-1)+ C (R-1)]/2 Area Estimation At some point the area required for the routing exceeds the underlying area required for the cell itself. For a specific design the required area is defined in the derivation below as: Block Area = Max (Gate Area, Wire Area) The Routing area is determined by: Wire Area = Kr * Wlen*Cp / Layers Where Kr is the routing utilization factor (typically 2), and Cp is the average area of a wire traversing a single cell pitch. This can be determined by: Cp = cell area1.5 / What this suggests is that for any specific number of routing layers there exists a size of design that will exceed the routing space provided. Any design above that size is routing area dominated, and any design below that size is cell area dominated. Coarse Grained vs. Fine grained Cell architectures The above models provide a mechanism to predict the effects of coarse-grained versus fine-grained architectures, from the point of view of area. Given coarse and fine grained cells with the following characteristics:
Table 1 The fine grain option is approximated by a 3 input nand gate with 2.5 inputs and 1 output pin per cell. The coarse grained is a look-up table based cell that averages 15 gates with 7.5 inputs and 1.5 output pins per cell. Two extra routing layers are defined for the fine-grained option, to account for the local interconnect of the coarse grain option. That is to say, it will take two layers of additional interconnect to create the coarse-grained fabric, over what it would take to create the fine-grained fabric. This seems reasonable since a routing area estimation of 15 gates of fine grained logic should require 2.33 layers, plus one layer for the fine grained transistor interconnect, or 3.33 layers of random interconnect, which can be reduced to an even 3 layers with hand tuned routing. After both fine-grained and coarse-grained fabrics become wire dominated, the coarse-grained fabric is denser than the fine-grained. The coarse-grained advantage is primarily due to the pin out advantage of the coarse grained architecture. Rent’s rule, using the coefficient in table 1 yields 21.5 pins, which is much larger than the average (9) pins for the coarse grained cell in the table. Conversely, the Rent’s rule coefficient for 9 external pins out of a 15 gate cluster comprised of fine-grained cells is 0.35, much lower than the normal 0.67 for random logic. Interconnect architectures Standard Cell design requires custom masks for all layers of interconnect. Conversely, Standard Metal requires only a single custom via mask, which connects the fixed wire segments and coarse-grained cell configuration options, but this requires the inherent overhead of arranging the segments to be able to interconnect them all at the programmable via layer. As figure 1, below shows, some of the wiring space is taken up with segments used to bring the signals and segment ends up to the custom via layer. This via connection space is the overhead for standard metal.
Figure 1 Figure 1 shows two layers of interconnect. With 4 interconnect layers, there are 114 horizontal and vertical tracks out of a possible 140 tracks used for routing in standard metal. This is an 18+% overhead, compared to Standard cell routing. The resulting area penalty is applicable to any routing dominated designs. In addition when comparing both the standard metal
and coarse-grain cells, what we call Structured ASIC
to custom metal and fine grain cells, the traditional
structure of Standard Cell, Structured ASIC could
still be denser, given the right characteristics of
Structured ASIC’s architecture. Using the
mathematical model above, Figure 2 below is a set of
Figure 2 Clearly, once the size of the design is large enough (in this case between 100,000 and 1,000,000) gates, such that both Standard Cell and Structured ASIC densities are determined by their routing, the coarsegrained Structured ASIC fabric can be denser than the fine-grained Standard Cell fabric, if the efficiency of clustering can sufficiently over come the initial cell density differential and the Standard metal penalty. On the other hand, FPGA architectures consist of coarse-grained cells and programmable interconnect. As a result FPGA architectures not only have a transportation problem to the substrate but also the overhead of devices for programmable interconnect. This can be analyzed by making the generous assumptions that the most common programmable interconnect, which consists of a simple latch driving a pass gate, is at least the area of the NAND gate in table 1, and a coarse-grained cell area equivalent to the cell in table 1. Now further assuming that all the possible interconnects available to a Structured Array per gate, are also available to an FPGA yields the ratio of gate areas as a function of the number of gates as shown in table 2 below. Clearly the analysis suggests FPGAs with equivalent routing interconnect could be from 50 times, to as much as 136 times larger than Structured arrays, and this ratio grows until the Structured array becomes routing limited. Actual numbers appear to be on the order of ½ to ¼ of these ratios, which suggests FPGAs have much interconnect capability than Standard Metal or Standard Cell routing, which usually results in less efficient utilization of cells. Experimental Results We compared the area of a synthesizable processor design called Marvin, a 150Kgate + 30KB SRAM SoC based on the OpenRISC OR1200 processor. It was synthesized and placed and routed in a .13 micron standard cell technology in a little over 1.5 square millimeter of logic gates, which is comparable to a somewhat routing limited design since if not, at our theoretical 7.5 microns per gate, the design should be about 1.125 square millimeters. It was also implemented in an eASIC array, using
Synopsys’ Design Compiler for synthesis, and
eASIC’s layout tools for packing, placement and
routing. It used 20,128 eCells, which, using the area
estimate in table 1, is about 2 times the area of the
Standard Cell implementation. This result is in
keeping with the model, if the effective gate count Marvin was also implemented in a Xilinx Vertex II FPGA, which took 10,342 slices, where a slice consists of two 4 input LUTs and two flip-flops, approximately twice the size of the coarse-grained eCell described in table 1. This result has somewhat poorer packing than the Structured ASIC result, and given the routing overhead, much worse area utilization. While the effective 7.5 gates per cell utilization of the Structured ASIC is well below the theoretical numbers in the model, much of this can be attributed the effectiveness of synthesis and packing. The table 3 below shows some sample designs synthesized by both Synopsys’ Design Compiler, and Magma’s Blast Fusion and then implemented using eASIC’s eTools.
Table 3 Using smaller, logic only structures, a density of ~15 gates per eCell can be achieved, which suggests that with the proper tuning of existing tools this result can be achieved on designs as large or larger than Marvin. Conclusions The current experimental results fall below estimates derived from the model, but improvements in synthesis and coarse cell packing of smaller designs suggests the theoretical model can be achieved. The model suggests that, given a sufficiently efficient coarse-grained cell design, and a standard metal interconnect scheme that minimizes the overhead of single via interconnect, it is possible to construct a Structured ASIC that has at least comparable, if not somewhat higher gate density than Standard Cell, and more than 50 times denser than FPGAs. In fact this advantage is accentuated by the observation [3] that the Rent’s rule coefficient for global wiring between reasonably large blocks is much lower than that of random logic. References [1] B.S. Landman and R.L. Russo, IEEE Trans. Computers, 20, pp.1469 (1971). [2] P. Christie. IEEE Trans. on VLSI Systems, Special Issue on System-Level Interconnect Prediction, 9(6):913-921, December 2001. [3] J.A. Davis, V.K. De, and J.D. Meindl, IEEE Trans. Electron Devices, 45, pp. 580 (1998) [4]J. Dambre, D. Stroobandt and J. Van Campenhout,” Fast Estimation of the Partitioning Rent Characteristic Using a Recursive Partitioning Model”, pp 45, Proc. of Intl. Workshop on System- Level Interconnect Prediction, 2003 by Zvi Or-Bach Founder & CEO, eASIC Corp.
November 17, 2005 Comments on this article? Send them to comments@fpgajournal.com |
All
material on this site copyright © 2006 techfocus media, inc.
All rights reserved.
FPGA and Structured ASIC Journal Privacy Statement |