HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS



Algorithmic C Synthesis Fuels Functional Reuse

by Shawn McCloud, High-level Synthesis Product Line Director, Design Creation and Synthesis Division, Mentor Graphics Corp.

Reusable intellectual property (IP) has been touted for years as the best strategy for efficiently creating ever more complicated system on chip (SoC) designs. IP is certainly gaining traction in today’s advanced ASIC and FPGA designs. However, adoption rates fall far short of what industry pundits predicted just a few years ago. Remember those breathless scenarios of small design teams stitching together hundreds of IP blocks to create incredibly complex ICs in just a matter of weeks? The reality today is much more prosaic.

According to the July 2004 report from Semico Research, Semiconductor Intellectual Property: An Idea Whose Time Has Come, the overall IP market in 2004 was a little less than $1.5 billion. True, the market is growing, but $900 million of the market is dominated by generic CPU or DSP cores. While the CPU market has experienced compound annual growth rates of 27%, the remaining non-CPU IP market, called commodity IP, has experienced growth rates closer to 17%. The primary reason for the widening growth gap is the inability of commodity IP to stay differentiated given ever changing system requirements and silicon processes. The result is loss of value at a much faster pace when compared to CPU cores.

Limitations of Today’s IP

 The biggest inhibitor to the widespread adoption of commodity IP is the “flexibility vs. time” tradeoff designers are forced to make, which stems from the fact that almost all third-party IP is delivered in three forms: soft, parameterizable, or hard IP.

Soft IP blocks have the design architecture and algorithm specified in register-transfer level (RTL) code descriptions that can be read by RTL synthesis tools. Unfortunately, soft IP has virtually no physical information and as such has not been tuned to the physical characteristics of a given semiconductor process. There is little flexibility in terms of area, performance, and power because the design architecture is fixed. Soft IP is written in RTL, so it can be changed to help alleviate the physical and architectural limitations. Unfortunately, substantial modification quickly diminishes the time to market benefits of IP due to added risk, verification overhead, and time to modify.

Parameterizable IP , on the other hand, implies blocks that have the basic design topology established, and have been synthesized into one or more technologies to achieve specific performance, area and power estimates. The designer is allowed to specify certain parameters such as bit widths, number of Taps, sample rate, filter type, interpolation, and decimator factor, but the design architecture is still fixed for a specific performance. For example, a Turbo Decoder architected for 1 cycle throughput will requiring significant silicon real estate and power expenditure. That can be a problem when dealing with lower performance systems, where the same function could be achieved with a smaller design architecture leading to a much more efficient use of silicon.

Hard IP blocks offer design teams the greatest time-to-market advantage, since they are delivered with the complete mask-level data required to implement the block in silicon. The level of refining to a specific process may be fairly low, as in the case of generic IP blocks, or exceptionally high, as in the case of process-specific blocks. Of course, this comes at the cost of design flexibility. Because designers can perform minimal customizations to achieve application-specific design goals, they are forced to live with what the IP provider has instantiated in the hard IP block.

Algorithmic C Synthesis Breaks the Impasse

Methodologies based on pure ANSI C++ can offer a more efficient, attractive path to implementation for third-party IP, freeing the designer from the “either/or” decision of time vs. performance. By nature, the synthesized C/C++ source code is a purely functional description of the algorithm and therefore independent of the target architecture and technology. The original C specification, written in the most natural form, can be synthesized to any architecture and any technology.

C-based design tools allow engineers to work at a higher level of abstraction, enabling them to easily explore alternative device architectures, tune designs for optimal performance, size or power consumption. The tools automatically generate error-free RTL descriptions, saving weeks of development time over typical RTL flows (figure 1) while offering maximum design flexibility and supporting true “Functional Reuse” (figure 2).

Figure 1: A typical RTL flow can be much prolonged. A C-based design flow is significantly more efficient and enables optimal functional reuse.

 

Figure 2: The advantage of synthesizing from pure ANSI C/C++ is in the abstraction.

The most productive C-based tools fall into the category of algorithmic synthesis. Algorithmic synthesis enables designers to take pure ANSI C++ descriptions and automatically synthesize optimized RTL implementations. C/C++ is the language of choice for most algorithm designers because it is not over constrained by the timing and structural constraints inherent in other high-level description languages, allowing for faster simulation and verification.

With algorithmic synthesis based on pure ANSI C/C++, it is possible to develop functional IP in the form of C algorithms and synthesize process-specific and optimized RTL code. The source code doesn’t embed constraints such as clock cycles, concurrency, modules and ports, which would result in a rigid description – verbose and bound to a specific technology. Instead, the user can apply synthesis directives to specify the target technology (ASIC or FPGA), to describe the interface properties, to control the amount of parallelism in the design, to trade-off area for speed, and more.

Working with IP at this level, the design team can efficiently evaluate alternative implementations, modifying and re-verifying C to effectively perform a series of "what-if" evaluations of alternative algorithms. Since the designers can explore many possible scenarios in a relatively short period of time, they can quickly determine an optimal implementation within a reasonable schedule (figure 3).

Figure 3: Pure ANSI C/C++ enables broader micro-architecture “what if” analysis.

Moreover, a design flow based on algorithmic synthesis enables designers to tune performance to exactly match the performance required for the specific technology, including latency, throughput and frequency. And since the C representation is completely abstracted from the final implementation, designers can instead use such intelligence to drive the C to the RTL implementation through a series of "soft" constraints. This means that they can easily re-target the same C representation for different micro-architectures and ASIC/FPGA implementations.

Once the IP is synthesized to RTL, this is a useful point to "stitch" the various functional blocks together and verify the entire hardware system. Since, RTL uses technology-dependent coding styles and hard-codes the micro-architecture, it makes sense at this point for design teams to take full advantage of mature and robust RTL design tools such as test insertion or power analysis.

Creating Optimal “Tuned” IP-based Designs

Design teams optimize or “tune” their designs when using IP at the algorithmic C level based on three parameters: system performance, interface structures, and target semiconductor technology.

System Performance : An IP block’s implementation tightly depends on the type of system performance required by the end application. For instance, the type of datapath used to support video functionality in a consumer application can differ widely depending on system performance goals. In a set top box ( STB) — where performance is paramount and power concerns are not as pressing—a designer would most likely use a highly parallel, high-performance datapath. But in a PDA that has a small, low-resolution screen, power management and size takes precedence over high-quality video graphics, leading to a lower clock speed and lower performance hardware. In contrast, the mostly likely choice for a digital still camera (DSC) would be a datapath with a midrange width to balance low-power requirements with crisp images.

This is where the true benefit kicks in. The same C-based IP for a video algorithm could be used with equal effectiveness by all the design teams mentioned above, even though the three have significantly different requirements. Relying on algorithmic synthesis, each design team could rapidly examine and find an optimal structure to meet their specific performance goals.

Interface structures: Like system performance, there are different types of interfaces that are more desirable depending on the application. Going back to the applications mentioned above—the PDA, STB and DSC—all three require a different interface to meet their specific specifications. A PDA typically requires a shared-memory interface to conserve power. In contrast, an STB relies on a streaming style of interface for maximum throughput. The design team for the DSC needs to explore various options to find the right approach, carefully examining different alternatives such as a pixel pipe, shared memory or streaming interface to find the right balance of performance and power conservation.

Fortunately, some algorithmic synthesis tools available today automatically synthesize interface structures. Working at the algorithmic C level with C-based IP and automatically synthesizing to RTL gives design teams the flexibility to fully examine various interface alternatives without significantly impacting design schedules.

Semiconductor Technology : As mainstream design moves into the nanometer range, it is important to optimize the design to fully realize the capability of the targeted semiconductor technology. ASIC technologies may have dozens of architectures for each operation, providing a wide range of area, performance, and power. For example, it’s not uncommon to see 20 or more architectures for a multiplication ranging from a high-speed Booth encoded parallel multiplier architecture to a far slower and yet compact accumulate logic without any lookahead. Unfortunately, choosing the best combination of operator architecture when combined with hundreds of operations to compose an algorithm can be extremely difficult. For FPGAs, design teams might want to take advantage of special features offered by the manufacturer, such as block multipliers, block memory, distributed memory, special DSP macros, or pipeline multipliers.

Matching these specific capabilities with third-party IP is typically a cumbersome, time-consuming process (for hard IP, it might not be possible at all). Here again, starting at the algorithmic C level enables designers to accommodate the specific IC technology without appreciably adding to the design time.

Fulfilling the Promise of IP

Hampered by the inflexibility of hard IP on the one hand, the restrictions of parameterizable IP on the other, and the time demands of defining soft IP, it is no wonder that IP adoption has been slower than expected. With the advent of C-based algorithmic synthesis tools, however, design teams can finally move beyond the painful tradeoffs between flexibility and time to market. This is a crucial advantage, since algorithmic synthesis provides the ability to separate design architecture from functionality. Now, design teams can fully examine the entire gamut of options and then quickly create an optimal IP-based design without impacting tight design schedules.

Click here for printable PDF
(By clicking on this link you agree to FPGA Journal's Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

by Shawn McCloud, High-level Synthesis Product Line Director, Design Creation and Synthesis Division, Mentor Graphics Corp.

August 9, 2005

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement