HOME :: JOB LISTINGS :: DEMOS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE


SPONSORED WHITE PAPER

Poseidon Tools White Paper

Summary

Processor-based systems are common in modern SoCs and Platform FPGAs: the flexibility and ability of the processor to handle complex algorithms complements the power dedicated hardware. However, the complexity of processor-based systems hides system bottlenecks and can make leveraging hardware acceleration difficult. Poseidon Triton Tools augments existing design tools to solve these problems and powerfully reduce design time and increase the efficiency of these designs.

Triton Tuner gives designers the ability to rapidly evaluate architectures much earlier in your design. This can eliminate costly architectural rework in later stages of the project. Tuner also provides an environment to see inside your system architecture, determine where the cycles are being utilized, and free those extra MIPS from your existing system . A video IP provider is able to reduce their design time by 40% using Tuner.

Triton Builder changes the rules and enables designers to rapidly and predictability repartition the hardware and software adding sophisticated plug-and-play hardware acceleration to your processor-based system. The accelerators convert the C into high-performance dedicated hardware, and the generated micro architecture also includes the necessary communication interfaces to form a complete and balanced system design. A wireless system provider is able to exploit the power of their extensive C code base by moving it to hardware, providing more MIPS/MHz and MIPS/microwatt.

Problems to be Solved

Current processor tools have not kept up with the challenges of the new high performance systems. Designers need tools to explore and exploit performance, power, and cost opportunities in their processor-based designs. Designers need tools to help them solve:

  • Managing the current high level of design complexity
  • Inadequate processor performance to support an ever increasing functional demand.
  • Increasing power consumption
  • Memory hierarchy and system architecture bottlenecks
  • Increasing design scalability, i.e., enhancing the capability of the design without radical redesign work.

Managing High Design Complexity

Designs are increasing in complexity in response to the explosion of useable gates. The designers are struggling with tools which perform at the RTL level and are focused on the back end process of getting all of the signals wired up correctly. The design teams need a tool which raises the level of abstraction and allow them to properly design and optimize their architecture. They also need to be able to model the interactions of the blocks and communication paths so the architecture can be evaluated on real world data.

Inadequate Processor Performance

Gaining more headroom for an application, or trying to add extra features to an existing design require that the code runs faster on the system or key portions are moved to hardware. To accomplish this it is critical that the designer has visibility into the system to determine where the cycles are going and where inefficiencies exist. Existing tools are not sufficient to generate optimal designs. Compilers and simple profilers cannot comprehend the more complex components and system architectures found in modern designs.

To provide higher-performance, the design of components are utilizing special modes and operating conditions that can optimally provide data and computation. These work well as long as the proper conditions are met; however, large penalties are paid when operations deviate from these conditions. An example of this is DDR memories which can provide very high data bandwidth, but the designer must properly account for the conditions and operating modes that DDR memories need to have met to provide this bandwidth. Tailoring the code so it uses these resources best can often double the speed of code over the best optimizing compiler

Increasing Power Consumption

Power consumption is becoming a large problem in the design of silicon devices. It used to be that power was only a concern of the battery-operated market. This trend has changed. The new technologies are pushing the clock rates higher to achieve the rising demand for more processing speed, with a detrimental effect across the board. Power is now a concern for the feasibility of a design. Designers are being forced to look to more expensive solutions to solve both the on chip and off chip problems created by the increase in the power.

Evaluating System and Memory Architecture

With the rising complexity of SOC’s, system design is becoming a critical function encompassing a large portion of the design team. With the flexibility of the new processor architectures and the use of legacy and 3 rd party IP, it is important that a sound system architecture is developed which interfaces efficiently to all of the externally and internally generated IPs. The architecture must also satisfy all of the system and software requirements. At the same time it is becoming more common to have programmable platforms, e.g., Structured ASICs, FPGAs, or board designs which allow the system designer to modify the architecture to best implement the code.

In the modern processor based design there are many options of caches, memory sizes, bus topologies and so forth that you can choose. Working your way through these options and understanding the tradeoffs in the context of your application can be a challenging task without a tool to help you objectively measure and compare your options. Various IP now come ready to connect to popular busses to simplify assembling them. However, this means the designer must properly set up their bus capacity and topology to bring these IPs together into a efficiently functioning system. How efficiently a design team handles these challenges is a critical measurement in a successful design program.

Design Scalability

As a product evolves, the performance, power, or cost pressures require that the original design must scale in one of these dimensions. Frequently, the solution is to move software functions into hardware, the reasons are generally:

  • Increase Performance . A dedicated hardware block will greatly accelerate the software function.
  • Lower power consumption. A hardware block can use a much lower clock rate than an equivalent software-implemented function to get the same level of performance, thereby reducing power consumption.
  • Lower system cost. A hardware block can often eliminate the need for one or more processor while still meeting product performance requirements.
  • Protect IP . A hardware block is much more difficult to reverse engineer than software which is subject to straightforward code disassembly and reverse engineering.

Traditional methods for moving functions to hardware required major effort on the part of the system designer. These require architecting new data flows through the system and developing a new software flow with an interface to the hardware. These programs require an expertise in processor design as well as the time to develop and test the additional hardware. Designers need a tool to provide the scalability in their designs.

Poseidon Tools Overview

Poseidon Triton tools environment is a powerful system design and acceleration tool which enables the designer to quickly develop an optimal architecture to meet demanding system requirements. This tool is designed for processor based systems which require efficient robust architectures with the need to optimize performance, power and cost. Triton consists of two main tools:

  • Triton Tuner: System hardware and software analysis tools
  • Triton Builder: Hardware accelerator generation tool

Triton Tuner is a simulation and analysis environment based on SystemC. With Tuner the system designer co-simulates the hardware and software system identifying inefficiencies in the design. The designer can then make the proper changes to enhance the system operation. The Builder tool is a hardware accelerator generation tool. With Builder algorithms can be moved from software to hardware there by enhancing the processing power of the entire system. Builder provides a high level of automation making the process of generating new hardware fast and easy. These tools work together to provide a system design and optimization environment which enables designers to quickly make dramatic improvements in the performance and power consumption of your application. The Triton suite offers an integrated tool set that greatly increases the capabilities and efficiency of your design team and plugs into your existing design flow.

Design Flow

The Poseidon Triton tools have been designed to be extremely flexible, they can be utilized independently or together as an integrated suite. The system design flow is usually an iterative process where the user analyzes the system performance, determines inefficiencies, modifies the system then checks the resultant performance. The Triton tools aids on this process by providing easy to use tools that accelerate the process of identifying the problem areas and an integrated flow which allows the user to move between the tools to develop the optimal architecture. The typical flow (Figure 1) consists of:

  • Poseidon Triton Tuner to profile the ANSI C code, discover bottle necks in the code or architecture and eliminate the inefficiencies.
  • Poseidon Triton Builder to partition processor intensive algorithms in ANSI C into hardware and generate a hardware accelerator.
  • Poseidon Triton Tuner to verify that the new system performs to the desired level.

Together, Tuner and Builder comprise a tight design iteration loop so the user can quickly make system-level decisions and evaluate the impact of them.

Details of the Design Flow

Tuner reads the application code and an architectural description. Tuner then creates a transaction level model of the system on which the application code is executed. The user then performs system analysis and optimization on the simulated system. If additional performance gains are required, then the Builder tool is used to partition the performance bottleneck into hardware. The architecture and source code are transferred into the Builder tool which generates a hardware accelerator by moving selected loops and functions into hardware. Builder generates a complete system ready to execute, this includes the modified source, drivers, test vectors and RTL for the accelerator. Builder also creates a transaction level model of new hardware which designer then uses Tuner to verify the new design meets the system requirements.

Interfaces to Existing Tools

The Triton Tools augments your existing design tools by providing standard interfaces. These interfaces are

  • ANSI C . Tuner and Builder accept and generate ANSI C for software descriptions.
  • SPIRIT Architecture . Tuner and Builder accept architecture descriptions from tools such as Xilinx XPS and SPIRIT-compliant tools such as Mentor Platform Express.
  • Compilers . Tuner and Builder use your existing compilers and build environment, Triton does not force you to adopt a new compiler to use it. All Builder-created drivers follow standard ANSI C and pass through your existing flow.
  • Synthesizable VHDL and Verilog RTL . All Builder-generated RTL is synthesizable VHDL or Verilog RTL that common synthesis tools such as Synopsys and Synplicity can accept.
  • VHDL and Verilog Testbench . The Builder-generated testbench is compatible with standard VHDL and Verilog simulators.

Triton Tools integrate into your existing design flow, they do not require that existing components in the flow be replaced.

Triton Tuner

Triton Tuner is a SystemC-based simulation tool targeted at processor-based systems. Tuner enables the user to quickly model and co-simulate both the application software and the hardware in the same platform. During the simulation Tuner collects critical performance data. The tool then provides the user with statistical, graphical and analysis tools which are critical to the proper analysis and design of processor based architectures. Triton Tuner is composed of three basic technologies that are integrated into a system design and optimization tool targeted for both the system engineer and the software designer. These technologies are:

  • Transaction-level hardware and software co-simulation
  • System-level transaction modeling
  • Graphical performance analysis tools

These technologies work together to provide a system analysis tool which enables the designer to quickly analyze software and hardware architectures in the co-simulated environment. With these tools the designers can develop optimum system level architectures, as well as software and hardware algorithms.

Transaction-Level Co-simulation

Triton Tuner is based on SystemC transaction-level simulation technology. Processor and IP vendors have adopted SystemC as the common simulation environment for which to provide hardware models. Tuner supports several different levels of abstraction, including Programmer’s View, Programmer’s View with Timing, and Cycle Callable. For software and system-architecture analysis and optimization, the Programmer’s View with Timing provides the best tradeoff among simulation accuracy, the depth of performance data that can be captured, and simulation speed. Most processor and IP vendors generally support SystemC models of their IP, so there is wide availability of models to support your design in the Triton tools suite.

System Modeling and Models

Timed transaction-level models use the transaction as the base unit of communications. The models executes a transaction and knowing how many cycles it takes to complete, the model can simulate the performance of the hardware without carrying forward all of the cycle by cycle interactions of the slower cycle accurate models. This also provides a great advantage for legacy IP. It is not necessary to model all of the intricacies which occur within the IP. A transaction model can be generated from the functional descriptions and the external timing.

Processor transaction models are usually based on an instruction-set simulator (“ISS”) for a particular processor. The ISS interfaces to the rest of the simulation through a transaction-level bus interface. The interface is able to provide cycle-by-cycle timing accuracy for transactions on the system bus to the memory and peripherals. As a result, the simulator has high performance, approaching that of the ISS, along with the accuracy to observe complex interactions among the processor, memory, and peripherals on the bus. Simulation speeds are around 300KHz on a 1.2GHz computer while capturing full performance data.

Graphical Analysis Tools

During the co-simulation, Tuner captures the essential performance information from the simulated system. Time correlated performance data is captured from both the hardware and software maintaining the link between events. System statistics are presented by Tuner to give the designer critical insight into the performance of the current system. A graphical interface displays selected data making it easy to identify bottlenecks in the system. Critical software events can then be linked to the operation of the hardware providing a simple methodology to eliminate inefficiencies between the hardware and software.

Performance indices (“PIs”) are displayed in both tabular and graphical forms. Tuner show overall execution times and times broken down to the level of the functions in the code as shown in Figure 2.

Tuner also gives the ability to analyze system performance over time with a graphical display of PIs. By selecting a specific PI, a graphical window displays the data as shown in Figure 3 . Events of interest, such as the spike in the cache misses, can be easily identified. Tuner can then be used to identify the source code which was executing at the time of this event so the designer can gain insight into how the application is using the available resources. This link between the performance-limiting events and the causing software provides a user with a effective tool for pinpointing performance bottlenecks.

 

Triton Tuner provides the system designer and the algorithm developer with a powerful tool to develop robust and efficient and cost effective architecture. With Tuner the designer can:

  • See inside your system and find out where the cycles are going, and squeeze those extra MIPS from your existing system
  • Properly design your memory hierarchy, selecting the correct size, type and speed to meet your system requirements.
  • Quickly evaluate competing processor architectures selecting the best solution for application
  • Rapidly evaluate and develop robust system architectures much earlier in your design. Eliminating costly problems later in the design flow.

Triton Builder

Triton Builder is a hardware accelerator synthesis tool. Builder enables the designer to quickly and easily move application algorithms running on a processor to a hardware accelerator connected to the system bus. The application specific accelerator is generated from selected loops and functions giving the designer control to move the critical functions thereby maximizing performance while minimizing cost and power.

Poseidon Tools White Paper

Builder creates an optimized processing engine to accelerate the selected software. An example system with accelerators is shown in Figure 4 .

This system contains the processor with its cache, memories and peripherals, and also three hardware accelerators Builder has generated from the original ANSI C application. Two accelerators are connected to the shared system bus and the third directly interfaces to the auxiliary port of the processor. A system can contains any number of accelerators, and they can execute concurrently.

The Builder tool divides the task of creating the application specific accelerator into two parts: the communication and compute. To address the communication portion the Builder tool utilizes a communication template library which contains various architectural solutions for the DMA, Control and local Memory. The compute core is generated manually by the designer or by the Poseidon C synthesis tool.

The integration of these two hardware functions into a complete-plug-and play accelerator greatly simplifies the design process as well as generates an efficient, high performance solution. This is one of the compelling reasons to use Builder over generic C synthesis tools.

Driver Generation and Integration

Another compelling reason to use Builder is its ability to generate the necessary software to integrate the accelerator into the original code. Builder generates all of the drivers necessary to utilize the new hardware. Builder takes the original source code and creates a new copy of the application code with the algorithms moved to hardware extracted and the drivers inserted to invoke and control the accelerator. The tool provides the designer with executable code which runs on the new hardware and matches the functionality of the old architecture. The driver makes the use of the accelerator transparent to the rest of the software.

The designer can use either polling or interrupts as the method to notify the processor of accelerator completion.

Testbench and Verification

Builder creates the testbench to verify the functionality of the accelerator. The design can then be evaluated and verified in Tuner. Builder creates the necessary files to simulate the accelerated system. Builder also generates a transaction level model of the accelerator for system simulation in Tuner. This enables the designer to verify the functionality and performance of the new architecture. The Triton tools high level of automation enables the designer to quickly evaluate many scenarios until the desired results are achieved.

Summary

The Triton tool suite provides designers with the tools to design, optimize and accelerate the architectures of Processor-based systems. Designers can make tradeoffs between performance, power and cost. With the high level of automation the Triton tools also reduce the time and effort required to develop processor-based architectures. The key benefits of the Triton Tools are:

  • Complete design flow for software optimization & system acceleration
  • Reduces design cycle through highly automated flow
  • Generates highest MIPS/MHz systems
  • Trade off for Performance/Power/Cost

For more information visit our website at www.poseidon-systems.com and request a free 30day evaluation.

June 22, 2006

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement