HOME :: JOB LISTINGS :: DEMOS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE


SPONSORED WHITE PAPER

FPGAs for High-Performance DSP Applications

This white paper compares the performance of DSP applications in Altera FPGAs with popular DSP processors as well as competitive FPGA offerings. With higher performance, you can easily time-division-multiplex your DSP design to increase the number of processing channels, reducing the overall cost of your system. Table 1 shows the performance advantages Altera offers over other silicon solutions for DSP systems.

Table 1. Altera DSP Performance Advantage

Comparison Category
Altera Performance Advantage
Altera FPGAs vs. DSP processors
10x DSP processing power per dollar
High-performance FPGAs comparison:
Altera’s Stratix II FPGAs vs. Xilinx’s Virtex-4 FPGAs
Up to 1.8x and on-average 1.2x higher performance
Low-cost FPGAs:
Altera’s Cyclone II FPGAs vs. Xilinx’s Spartan-3 FPGAs
Up to 2x and on-average 1.5x higher performance

Figure 1 compares design performance in Altera Stratix II and Cyclone II devices to Xilinx Virtex-4 and Spartan-3 devices, respectively.

Figure 1. DSP Proprietary IP & Open Core Results Comparison

The Stratix II devices achieved an f MAX of over 350 MHz in 9 of the 17 designs, and two FIR designs exceeded 400 MHz. In comparison, only 2 of the 17 designs in Virtex-4 devices operated above 350 MHz.

The Cyclone II devices achieved an f MAX of over 200 MHz in 9 of the 17 designs, and one FIR design exceeded 300 MHz. None of the 17 designs in Spartan-3 devices operated above 200 MHz.

Performance Comparison Metrics

There are many ways to compare the performance of different DSP solutions, and each provides a different level of accuracy. The following are three ways to compare DSP performance.

• Embedded Multipliers Performance: This is a simplistic method for comparing relative DSP performance that does not take into account the supporting architecture surrounding the embedded multipliers and the complexity and performance of the overall DSP design. This method is the least accurate of the three.

• DSP IP Benchmarks: This method is a more accurate performance comparison between different silicon solutions because it measures the performance of popular functional operations that are integral to many DSP designs. Finite Impulse Response (FIR) filtering and Fast Fourier Transforms (FFT) are two of the most common DSP IP benchmarks.

• Application Level Benchmarks: This method precisely measures the performance of a particular silicon solution when implementing a specific application. An example is the benchmarking results from Berkeley Design Technology Inc. (BDTI).

The performance comparisons in this white paper use DSP IP benchmarks and application level benchmarks. The DSP IP performance data is based on both open and proprietary IP cores comparing Altera’s Stratix II and Cyclone II FPGAs with Xilinx’s Virtex-4 and Spartan-3 devices, respectively. The application level benchmark data is based on real DSP systems for comparison of Altera’s first generation Stratix FPGAs against popular DSP processors.

BDTI Benchmarks - FPGA vs. DSP Processor

Berkeley Design Technology Inc (BDTI) is the leading provider of independent DSP benchmarks and publishes periodic analysis, FPGAs for DSP, comparing the FPGA performance vs. common DSP processors. The latest benchmark based on an orthogonal frequency division multiplexing (OFDM) system shows that Altera’s first generation Stratix FPGAs provide over 95% cost reduction per channel compared to other DSP processor. (See Table 2).

Table 2. BDTI Benchmark Results on OFDM System Comparing Stratix FPGAs & Other DSP Processors.

 

DSP A

DSP B

Altera Stratix EP1S20-6

Altera Stratix EP1S80-6

Channels

<0.2

~0.7

~20

~60

Cost (1 ku) (1)

~$15

~$210

$120

$600

Cost/channel

~$100

~$300

~$6

~$10

Note to Table 2:
(1) As of the second quarter of 2005. Results from FPGAs for DSP and unpublished benchmarks. Results © 2005 BDTI

OFDM Receiver System Information

The benchmarked OFDM receiver system uses algorithms ranging from table look-ups to MAC-intensive transforms. The data sizes ranges from 4 to 16 bits while the data rate ranges from 40 to 320 Mbps. Data includes real and complex values. See Figure 2.

Figure 2. OFDM System Block Diagram

Input and output precision is 8-bit. This FIR filter in this design is a 127-tap complex FIR with real coefficients and the FFT is a 256-point complex FFT with input and output in natural order. The Slicer is a QAM-256 demapper. Soft decision Viterbi Decoder is used in this design.

For even higher performance, based on the benchmark results using real customer designs, Altera’s Stratix II FPGAs offer an average of 50% higher performance than Stratix FPGAs. See the Stratix II Performance & Logic Efficiency Analysis White Paper for more details.

FPGA vs. FPGA

DSP IP performance benchmarks compare both high-performance, high-density FPGAs and low-cost FPGAs.

• The high-performance, high-density FPGA analysis compares Altera Stratix II FPGAs and Xilinx Virtex-4 FPGAs.

• The low-cost FPGA analysis compares Altera Cyclone II FPGAs and Xilinx Spartan-3 FPGAs.

The DSP IP performance benchmark uses Altera and Xilinx proprietary IP cores and open cores from www.opencores.org.

Benchmarking Methodology & Setup

Benchmarking an FPGA performance is a very complex task. A poor benchmarking process can provide inconclusive and incorrect results. Altera has invested significantly to develop a rigorous and scientific benchmarking methodology that is endorsed by industry experts as a meaningful and accurate way to measure FPGA performance. For detailed benchmarking methodology, refer to the FPGA Performance Benchmarking Methodology White Paper. Table 3 shows the benchmark setup.

Table 3. Benchmark Setup

FPGA Category

FPGA Family

Speed Grade

Synthesis Tool

Place-&-Route Tool

Proprietary IP Cores

Open Cores

High-Performance FPGAs

Altera Stratix II

Fastest(-3)

QIS (1), (2)

Synplify Pro 8.0

Quartus II version 5.0

Xilinx Virtex-4

Fastest(-12)

XST (1), (2)

Synplify Pro 8.0

ISE 7.1i Service Pack 1

Low-Cost FPGAs

Altera Cyclone II

Fastest(-6)

QIS (1), (2)

Synplify Pro 8.0

Quartus II version 5.0

Xilinx Spartan-3

Fastest(-5)

XST (1), (2)

Synplify Pro 8.0

ISE 7.1i Service Pack 1

Notes to Table 3:
(1) QIS – Quartus Integrated Synthesis; XST-Xilinx Synthesis Technology
(2) FPGA vendor’s synthesis tools are used to compile proprietary cores because these cores are generated net lists and the tool is only responsible for sythesizing the core wrapper

Proprietary IP & Open Core Designs

Proprietary IP cores are cores generated from Altera’s MegaWizard and Xilinx’s CORE Generator tools. For proprietary IP core comparison, Altera used three types of common DSP IP cores with a total of nine designs:

• FIR filters

• FFT

• Forward Error Correction (FEC)

These IP cores are generated from each FPGA vendor’s tool and benchmarked without further manual optimization.

For open core comparison, Altera selected and benchmarked six different DSP-related open IP cores from www.opencores.org. Cores are chosen if its popularity statistics on this web site is greater than 10%. In addition, the complex FFT core is chosen because it is commonly found in DSP designs.

The selected open cores are written in generic HDL code except for the use of FPGA-specific primitives in original designs, such as instantiations of embedded memory blocks and multipliers. To allow the compilation of such designs for different FPGAs and to provide a fair comparison, FPGA-specific primitives in each design are converted to use the embedded features of a specific FPGA to achieve the best performance. After FPGA-specific primitives are converted, the open cores are benchmarked without futher manual optimization to keep them as close as possible to their original state.

More information for both the proprietary IP and open cores is available in the appendix.

High-Performance FPGA Proprietary IP & Open Core Comparison

For high-performance and high-density FPGAs, Altera’s Stratix II family offers up-to 1.8x higher performance, and an average of 1.2x higher performance, than Xilinx Virtex-4 FPGAs. See Figure 3 for relative performance comparison and Table 4 for detailed performance data for Stratix II and Virtex-4 families.

Modern FPGAs embed dedicated multipliers to increase the speed of multiply-accumulate operations that are essential for many DSP designs. However, the best system performance relies on more than raw multiplier speed. It is critical to couple these multipliers with a complementary logic structure and routing fabrics of the same performance.

The Stratix II family seamlessly integrates DSP blocks that operate at up to 450 MHz with high-performance adaptive logic modules (ALMs) and routing fabric to offer the highest system performance for your DSP designs. As shown in Figure 1, The Stratix II device family operated at over 350 MHz in 9 of the 17 designs, and two FIR designs exceeded 400 MHz. In comparison, only 2 of the 17 designs in Virtex-4 devices exceeded 350 MHz, well under the performance claimed in the Virtex-4 data sheet. This shows that high system performance can only be achieved by having an intelligent combination of embedded features and fabrics.

Figure 3. Stratix II vs. Virtex-4 Proprietary IP & Open Core Relative Performance Comparison

Table 4. Detailed Stratix II vs. Virtex-4 DSP Proprietary IP & Open Core Benchmark Data

DSP IP Cateogry

Design Name

Performance Comparison

Stratix II (MHz)

Virtex-4
(MHz)

Stratix II/
Virtex-4

Category
Average

FPGA Embedded DSP Block Based
FIR Filter

FIR1

368

306

1.20

1.20

FIR2

376

333

1.13

FIR3

450

341

1.32

FIR4

406

322

1.26

FIR5

368

334

1.10

FFT

FFT1

389

293

1.33

1.19

FFT2

393

370

1.06

Forward Error Correction (FEC)

Reed Solomon

284

196

1.45

1.20

Viterbi

229

231

0.99

Open Cores

AES (Rijndael)

231

222

1.04

1.04

CORDIC

374

366

1.02

1.02

Radix 4 Complex FFT (CFFT)

340

270

1.26

1.26

Simple FM Receiver (FM)

177

99

1.78

1.78

VCS-DCT

231

237

0.97

1.10

VCS – Huffman Decoder

276

232

1.19

VCS – Huffman Encoder

392

344

1.14

VGA/LCD Controller

269

246

1.09

1.09

Average

1.19

Low-Cost FPGA Proprietary IP & Open Core Comparison

Altera’s low-cost Cyclone II FPGAs offer up to 2x higher performance, and an average of 1.5x higher performance, than the Xilinx Spartan-3 family. Based on the benchmarked data, the Cyclone II device family operated at over 200 MHz in 9 of the 17 designs, and one FIR design exceeded 300 MHz. None of the 17 designs in Spartan-3 devices operated above 200 MHz. In addition, Cyclone II FPGAs outperform Spartan-3 devices in all designs benchmarked. This performance advantage can directly translate to higher channel count or lower cost for typical designs.

Figure 4 shows the relative performance comparison between Cyclone II and Spartan-3 FPGAs. Table 5 shows detailed performance data for Cyclone II and Spartan-3 FPGAs.

Figure 4. Cyclone II vs. Spartan-3 Proprietary DSP IP Core Relative Performance Comparison

Table 5. Detailed Cyclone II vs. Spartan-3 DSP Proprietary IP & Open Core Benchmark Data

DSP IP Cateogry

Design
Name

Performance Comparison

Cyclone II
(MHz)

Spartan-3
(MHz)

Cyclone II /
Spartan-3

Category
Average

FPGA Embedded
DSP Block Based
FIR Filter

FIR1

258

172

1.50

1.40

FIR2

314

186

1.68

FIR3

208

186

1.12

FIR4

209

154

1.36

FIR5

136

(1)

(1)

FFT

FFT1

211

144

1.46

1.32

FFT2

206

174

1.19

Forward Error Correction (FEC)

Reed Solomon

197

100

1.97

1.76

Viterbi

172

109

1.57

Open Cores

AES (Rijndael)

147

125

1.18

1.18

CORDIC

246

175

1.40

1.40

Radix 4 Complex FFT

206

155

1.33

1.33

Simple FM Reciever (FM)

108

50

2.15

2.15

VCS-DCT

1.66

96

1.72

1.55

VCS-Huffman Decoder

183

128

1.43

VCS-Huffman Encoder

266

178

1.50

VGA/LCD Controller

173

118

1.16

1.46

Average

1.48

Note to Table 5:
(1) The Spartan-3 family cannot support the required number of dedicated multipliers for this design.

Conclusion

Based on the benchmarking results from BDTI as well as Altera’s rigorous benchmarking methodology, Stratix II and Cyclone II FPGAs provide a performance advantage over both popular DSP processors and the competing FPGAs. High system performance for DSP applications cannot be achieved by simply embedding dedicated multipliers – it is an aggregate result of high-performance multipliers and performance-matching logic structure and routing architecture as implemented in Stratix II FPGAs. In addition, Altera’s Quartus II development software and DSP Builder provide a simple way to access the DSP performance in Stratix II and Cyclone II FPGAs without time-consuming manual optimization.

• Altera devices provide, on average, 10x DSP processing power per dollar than the industry’s most widely used DSP processor solutions.

• Altera’s high-density Stratix II FPGAs offer up to 1.8x and an average of 1.2x higher performance than Xilinx’s Virtex-4 family

• Altera’s low-cost Cyclone II FPGAs offer up to 2x and an average of 1.5x higher performance than Xilinx’s Spartan-3 family

Higher DSP performance directly translates to cost savings in typical designs by increasing time-division-multiplexing and, therefore, increasing the total number of processing channels available in your system. Altera offers a comprehensive DSP solution consisting of a complete integrated software environment, performance-optimized devices, DSP intellectual property (IP) cores, development kits, reference designs, and customer training. For more information, visit www.altera.com/dsp.

 

Appendix

Proprietary DSP IP Core Information

DSP IP Cateogry

Design Description & Altera MegaCore IP Parameters

FPGA Embedded
DSP Block Based
FIR Filter
Altera v.3.2.1
Xilinx v.5.1

Design
Name

Taps

Clock/
Output

Coefficient Width

Data Width

Channel

Coefficient Symmetry

FIR1

128

64

16

16

1

Yes

FIR2

128

64

8

8

1

Yes

FIR3

128

16

8

8

1

Yes

FIR4

128

4

8

8

1

Yes

FIR5

128

1

8

8

1

Yes

FFT
Altera v.2.1.2
Xilinx v.3.1

Design Name

Arch.

Points

Data Precision

Twiddle

Engine
Throughput

Engine #

Complex
Multiplier

FFT1

Burst

1024

16-bit

16-bit

Quad

1

Standard

FFT2

Streaming

1024

16-bit

16-bit

Quad

1

Standard

Reed Solomon
Decoder
Altera v.3.6.0
Xilinx v.5.1

Design Name

Pre Setting

Decoding

Key Size

Bit/
Symbol

Symbol/
Codeword

Check Symbol/
Codeword

Reed

Solomon

DVB Standard

Continuous

Half

8

204

16

Viterbi Decoder
Altera v.4.2.0
Xilinx v.5.0

Design Name

Architecture

Soft Width

Constraint Length

Trace Back

Viterbi

Parallel

3

7

66

 

DSP Open Core Information

Core ID

Core Name

Original URL

AES

AES (Rijndael)

www.opencores.org/projects.cgi/web/aes_core

CORDIC

CORDIC

www.opencores.org/projects.cgi/web/cordic/overview

FM

Simple FM Receiver

www.opencores.org/projects.cgi/web/simple_fm_receiver

VGA

VGA/LCD Controller

www.opencores.org/projects.cgi/web/vga_lcd

VCS

Video Compression System

www.opencores.org/projects.cgi/web/video_systems

CFFT

Radix 4 Complex FFT

www.opencores.org/projects.cgi/web/cfft


References

• Stratix II Performance & Logic Efficiency White Paper

• FPGA Performance Benchmarking Methodology White Paper

• For more information on Stratix II FPGA performance, see the Altera web site (www.altera.com/alterazone)

 


101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
www.altera.com

Copyright © 2005 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries.* All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.


Click here for printable PDF
(By clicking on this link you agree to FPGA Journal's Terms of Use for PDF files. PDF files are supplied for the private use of our readers. Republication, linking, and any other distribution of this PDF file without written permission from Techfocus Media, Inc. is strictly prohibited.)

September 15, 2005

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

 

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement