HOME :: JOB LISTINGS :: WEBCASTS :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE :: FORUMS




Legacy Parallel to Serial Backplane Design and Considerations

Parallel digital interconnect and backplanes have existed since the advent of modern electronic systems.

PCI has emerged as the most pervasive interconnect and backplane drive technology, which was first introduced in the early 1990s as a chip-to-chip interconnect standard based on 32 bits of data that operated at 33 MHz on these modern systems. PCI has evolved over the years from 32 bit/33Mhz to 64 bit/66MHz, to most recently, 64bit/133MHz, with plans to migrate up to 266 MHz and beyond in the future.

Many system design engineers viewed PCI, as a vehicle to address not only their chip-to-chip interconnect design requirements, but also to migrate PCI into the backplane for board-to-board interconnect as well. PCI was never designed nor intended to be used in backplane applications or even in mid-plane interconnect applications. Nevertheless, many design engineers successfully deployed systems that utilized PCI as not only the chip-to-chip interconnect, but also as the board-to-board (backplane) interconnect.

Parallel Backplane design served the industry well for many years, independent of, if the system utilized PCI or a proprietary parallel backplane arrangement. The challenge with parallel backplane interconnect arose as a result of the increased system bandwidth requirements. The increased bandwidth requirements forced IC manufacturers and system design engineers a like to use wider (16 ? 32 ? 64 ? 128) data buses and increased operating frequencies (33 MHz ? 66 MHz ? 133 MHz). Given the larger data buses (> 64 bit) and higher bandwidths (>120 MHz), it has all but relegated parallel buses to chip-to-chip interconnect due, primarily to the length of interconnect material being driven. Large buses operating at relatively high operating frequencies over long interconnect PCB traces results in many debilitating effects. The signal noise becomes intolerable due to transmission line effects in the way of crosstalk and reflections also ground bounce and skew limit, the usefulness of large, high-speed backplanes.


Diagram 1

Faced with the increased performance requirements of such new technologies, such as 3G wireless, 10Gbe, OC192 transport, and multiple protocol networking equipment to name a few, design engineers had to find a solution that support the higher data rates, while increasing reliability and reducing cost. Faced with these challenges, the industry turned toward the storage market for a viable solution. That solution came in the way of high speed serial interconnect. Serial interconnect for use in serial backplanes have many significant benefits over legacy parallel interconnected backplanes. The first and most important is the performance and reliable/robust operation of the serial connection.

•Parallel data transfer
–Multiple lines consume board space
–Lines interfere with each other
–Each line needs its own termination circuitry



•Serial Data Transfer
–Fewer lines yields reduced board space
–Line interference can be minimized
–Uses a fraction of the termination circuitry vs. parallel
–No Skew Issues



Diagram 2

 

The serial connection operates by taking parallel signals in on the “local” side of PCB, which then gets converted to a serial bit stream (via parallel to serial converter) on the backplane side. (See Diagram 3) A Clock Data Recovery (CDR) is used to modulate a clock signal on the transmit side of the serial connection and extracts or de-modulates the clock from the data on the receive side of the link (see Diagram 4). For example, 8, 10 or 12 bits of parallel data will enter into a SerDes (serializer/deserializer) device, which generates a serial data stream with the clock modulated onto the stream.


Diagram 3

 

 


Diagram 4

 

Benefits of Serial Backplanes

Area Reduction:
By converting the “local” parallel data to serial, it greatly reduces the number of traces, thus allowing the reduction of the backplane size. The backplane PCB is the most expensive board in many systems and the largest. In fact, the actual size of the system backplane, in many cases is the limiting factor in allowing the system rack size to be reduced.

Additionally, serial backplanes also allow smaller connections used to physically connect from the “local” PCB to the backplane, further reducing size and complexity of the system design, basically an 11 to 1 reduction. The two main reasons for implementing a serial backplane are (1) the high data throughput with reliable performance and (2) backplane PCB reduction. The latter is realized through smaller form factor of the system rack, fewer layers of PCB material, resulting in lower cost.

Noise Reduction:
Current serial signaling technologies utilize a differential Input/Output (I/O) buffer. The differential buffers provide much smaller signal swings compared to historical single ended buffers. This reduced signal swing, results in a lower power I/O buffer, but more importantly, it significantly lowers noise. The noise reduction benefit is seen in much lower RFI/EMI, ground bounce and transmission line effects including crosstalk and reflections.

Bandwidth Increase:
Design engineers, moving from parallel to serial backplanes have a multitude of options regarding their implementation choices. For example, an engineer wanting to convert from a legacy design that used PCI 32b/33MHz, for both the “local” side of the PCB and the backplane, which has a total aggregated bandwidth of 1.056Gbs (32b x 33MHz), could select a SerDes device that would take in the PCI local data, serialize it and transmit the data out at 1GPS - or the designer could elect to provide 8 bits of data to 4 channels of SerDes operating at 265Mbs. Another option would be further increasing the data rate of the serial link. With today’s SerDes technology, the engineer can select from slower speed SerDes devices with more channels, or higher speed with fewer or even a single channel, SerDes devices. SerDes devices operate from 155Mbs, on the low end, up to 10Gbs on the high end, and incorporate two main signaling technologies, Low Voltage Differential Swing (LVDS) and Current Mode Logic (CML).


Diagram 5

As a general rule of thumb, LVDS operates from 155Mbs to 1.25Gbs. CML, on the other hand, operates from 600Mbs to 10Gbs. LVDS and CML can inter-operate, but require external resistors for level shifting. Therefore, it’s important that the design engineer consider their existing serial backplane requirements and future needs before embarking into a SerDes backplane design.

Migration Path:
One of the many benefits of serial backplanes is the ability to migrate to higher speed serial interconnect as system bandwidth requirements increase. By incorporating a sound, high-speed backplane design methodology this migration capability can be supported.

In the case of Lattice Semiconductor’s SerDes offering, the user is able to increase the serial backplane performance without changing the SerDes devices. For example, a user can go from 155Mbs to 850Mbs per channel by simply increasing the SerDes devices reference clock, or depending on the product family, 600Mbs to 3.70Gbs.

Programmability – SerDes vs. ASSP:
The other design aspect that must be considered is what type or class of SerDes device to use. There are basically two types of SerDes to use – ASSP (non-programmable) and programmable. The advantage of a programmable SerDes device is its inherent flexibility as a programmable device. The programmable fabric allows the user to customize the “local” side of the PCB (see Diagram 6). Therefore, the user can build in any local bus required either PCI or proprietary. The flexibility of programmable logic, combined with SerDes, results in a reduction in component counts (PLD or FPGA + ASSP SerDes) and a shortened time to market. The programmable SerDes also allows for flexible I/O assignments, meaning that the user can select the optimal pin assignment that ease board layout and potentially eliminating PCB layers on the local board. Another advantage is in the area of I/O voltage levels and I/O type, both of which are a programmable selection with the Lattice Semiconductor programmable SerDes devices.


Lattice Programmable SerDes
Table 1

 
Programmable Features
High-Speed Interface Features
ORT4/82G5
8-Channel x 1.25 / 2.5 / 3.125 Gbit/s
Transceiver
•ORCA Series 4 FPGA technology
•Up to 400K programmable gates
•432 Programmable user I/O
•111Kb Embedded RAM
•I/O support includes HSTL, LVDS, SSTL, LVPECL, PECL, TTL, CMOS
•Interface to three extra 2Kx32 dual-port RAMs (in the embedded section)
•4/8 channels at 1.25/2.5/3.125/3.7 Gbit/s
•4 channels demonstrated at 4.25 Gbit/s
•High-speed CML I/Os with internal termination
•Total aggregate bandwidth – 10/20 Gbit/s
•Power-down option on each HSI receiver
•Embedded 8B/10B coder/decoder
•Cell Processing
•High-speed 680 PBGAM Packaging
•Multi-channel alignment FIFOs
•Two 4Kx36 Memory blocks in embedded macro
ORSO4/82G5
8-Channel x 1.35 / 2.7 Gbit/s
Transceiver
•ORCA Series 4 FPGA technology
•Up to 400K programmable gates
•432 Programmable user I/O
•111Kb Embedded RAM
•I/O support includes HSTL, LVDS, SSTL, LVPECL, PECL, TTL, CMOS
•Interface to three extra 2Kx32 dual-port RAMs (in the embedded section)


•4/8 channels at 1.35/2.7 Gbit/s
•High-speed CML I/Os with internal termination
•Total aggregate bandwidth – 10/20 Gbit/s
•Power-down option on each HSI receiver
•SONET scrambler
•Cell Processing
interface to devices
•High-speed 680 PBGAM Packaging
•Multi-channel alignment FIFOs
•Two 4Kx36 Memory blocks in embedded macro
ORT8850H
8-Channel
x 850 Mbit/s Transceiver
•ORCA Series 4 FPGA technology
•Up to 600K programmable gates
•536 Programmable user I/O
•147Kb Embedded RAM


•8 channels of 850 Mbit/s
•Total aggregate bandwidth - 6.8 Gbit/s
•LVDS I/Os compliant with EIA-644
•3 full-duplex DDR I/O groups: RapidIO-like I/F
•Powerdown option on each HSI receiver
•Pseudo-SONET framer including A1/A2
•SONET scrambler
•In-band management & configuration
•Multi-channel alignment FIFOs
ISPGDX2 (64,128 & 256)
4 To 16 Channels
400Mbs to 850Mbs
Low Cost (>$2.00 / Ch)
•High speed high channel count
•Up to 256 User Programmable I/O’s
•I/O support includes HSTL, LVDS, SSTL, LVPECL, GTL+, TTL, CMOS
•4/16 channels at 400Mbs/850Mbs
•High-speed, Low Cost LVDS I/Os
•Total aggregate bandwidth – 3.4/13.6 Gbit/s
•Embedded 10B/12B coder/decoder
•Low Latency
ISPXPGA
4 To 20 Channels
400Mbs to 850Mbs
Non-Volatile
•ISPXP FPGA technology
•Up to 30,000 Registers
•496 Programmable user I/O
•415Kb Embedded RAM
•I/O support includes HSTL, LVDS, SSTL, LVPECL, TTL, CMOS
•4/20 channels at 400Mbs / 850Mbs
•High-speed LVDS I/O
•Total aggregate bandwidth – 3.4/17 Gbit/s
•Embedded 10B/12B coder/decoder
XPIO
1 Channel
9.95Gbs to 10.7Gbs
SFI -4
•Parallel LVDS data range from 622 to 670 Mbps
•Single-chip solution
•0.13u CMOS technology
•Low jitter clock multiplier
•On-chip clock data recovery
•16:1 serialization and 1:16 deserialization
•Embedded limiting amplifier
•Built-In-Self-Test (BIST)
•Lowest power consumption at 0.8W
•1 channel at 9.95 Gbit/s to 10.7 Gbit/s


Latency:
The next consideration of the designer is the latency requirements, or lack of, within the system designs. Many systems today are using a large, shared memory that gets accessed by multiple cards within the rack. This is commonly done to reduce the cost and board area of the line or processing (in multi-processing system) card.

As we look at developing a multi-processor shared memory system, we find that if we attempt to implement the backplane connection in a parallel configuration, our concern is the number of traces routed across the backplane.

If we consider a system configuration where there are 15 processor cards and 1 shared memory card, we end up with a total of 480 traces (not including the additional control signals) on the backplane. Also, the performance of the parallel interconnect will be limited to <120MHz for the reasons previously stated. Therefore, a parallel backplane is not feasible for the shared memory design.

If we then look at implementing a high speed SerDes design, we may be tempted to use the highest speed SerDes - for example 3.12Gbs. Keeping in mind, some level of programmability is needed to facilitate the connection to the processor card local bus. The local bus interface may entail developing a PCI master/target (PCI as the local bus) or just programmable I/O for a simple proprietary local bus.

In review of a high-speed programmable device (>2.5Gbs), we find the latency of the device is on the order of 130ns to 150ns on the receive side, and approximately 70ns to 90ns on the transmit side. (The receive side will always have greater latency due to the decoder and buffering requirements.)

Because the larger FPGA fabric of the high speed (>2.5Gbs) SerDes device makes the latency unacceptable, we need to find a device that allows a limited amount of programmability (required only I/O for flexibility in assignments and voltage levels). Thus, we arrive in the selection process with the Lattice ISPGDX2, which provides the programmable I/Os and very low latency. The low latency is the result of the high speed, lightly loaded, routing array. The latency for the GDX2 is 35ns for the receive side and 17ns for the transmit side. (See Diagram 6)

Worst Case clock to data out delay (pin to pin) is 2Bt + 8 »= 2*(1/800)+ 8 = 10.5ns

Worst Case clock to data out delay (pin to pin) is 2Bt + 10 + 5ns (mux delay) = 17.5ns

Worst Case delay =1.5Trcp +4.5Bt + 10 = 34.375ns

Diagram 6


The GDX2 contains three family members that support from 4, 8, and 16 SerDes channels that operate up to 850Mbs. This allows us to use the smallest 4-channel device on the processor card and the 16-channel device on the memory card, making the GDX2 ideal for shared memory, low latency designs using proprietary local bus designs.

In the event that the processor card and memory cards, and local bus are running PCI, then we would need to implement a PCI (master/target) bus interface. Because the GDX2 has limited programmable gates, we would need to trade off some of the latency for the density to implement the PCI master/target. Therefore, we would use the Lattice XPGA200, which is an FPGA that combines world-class speed, non-volatile with the same SerDes technology as used on the Lattice GDX2 devices. Due to the larger FPGA fabric, the latency goes up by approximately 30% to 40% compared to the GDX2, which will be offset by combining the PCI controller and SerDes into a single device.


Serdes Quality
When considering the use of SerDes devices, independent of programmable or non-programmable options, there are several hundred parameters to consider. However, there are four critical parameters that should be evaluated in a lab environment closely approximating the environment in which the SerDes device is expected to operate.

These four parameters, in order of importance/criticality are:

1. Receive Jitter Tolerance
2. Transmit Jitter
3. Eye Diagram
4. Power Consumption/Dissipation

1. Receive (RX) Jitter Tolerance: is a measure of the SerDes device’s ability to reject or tolerate jitter on the receive signal. Jitter is noise that get coupled onto a signal, which results in nebulous signal transitions (see Diagram 7). All signals have some amount of jitter, and there are many different types of jitter. For example, there is Device Dependent (DD) jitter, inter-symbol jitter, and so on. What’s important for the design engineer is to understand is how tolerant of incoming jitter a SerDes device is before using it in a design. In SerDes devices, RX jitter tolerances vary widely from manufacturer to manufacturer. Board layout can either make or break a SerDes system design. (Please see Lattice PCB Layout Guidelines.) It should be noted, however, that no amount of PCB design techniques would guarantee an error free design. In other words, a SerDes device’s Rx jitter tolerance cannot be totally compensated for through board design. Therefore, the user should select a device with the highest jitter tolerance. As state previously, RX jitter tolerance varies among manufacturers. For example, the Lattice ORT-x2G5 device has an RX jitter tolerance of .73UI (UI = Unit Interval, see Diagram 8).


Diagram 7

The Unit of Measurement for Jitter is Time or Fractional Unit Intervals
T Jitter (UI) = T Jitter / T period

Diagram 8

An artifact of receive jitter tolerance is power supply noise rejection, because any noise that appears on the SerDes devices power supplies and grounds gets coupled directly to the SerDes and CDR PLLs. Therefore, it is critical to properly decouple VCC and ground using high “Q” decoupling capacitors placed as close as possible between VCC and ground.

2. Transmit (TX) Jitter: is the amount of jitter that the SerDes device transmits to the receiving SerDes side of the serial link. Ideally, you want a SerDes device that transmits a minimal amount of jitter at the transmit side and a SerDes device at the receive side of the link, that is able to receive maximal jitter. TX jitter is also measured as a UI where the lower (smaller) the UI the better.

3. Eye Diagrams: are a simple way of viewing the integrity of a serial link. The “data eye” is typically measured at the receiving end of the serial link. The eye diagram shows two key elements of the link. The amplitude and the time period, or the amount of time the eye is “open” expressed as a Unit Interval (UI). The eye diagram provides insight into the integrity of the SerDes device transmitting on the transmission media and the characteristics of that media (see Diagrams 9 and 10).

Pre-emphasis is a common method employed by SerDes suppliers to overcome the effects of loading (resistance, capacitance and inductance) on interconnect material, commonly FR-4. Pre-emphasis is a user selectable setting, that invokes a high pass filter to compensate for the interconnect media’s loading effects, which degrades the signal integrity. This degradation results in increased bit errors through closure of the data eye.


EYE Diagram
Diagram 9

Unit Interval (UI) is the Normalized Bit Period, I.e., inverse of Clock Frequency:
– 1 UI for a 1 Gbps Eye Diagram is 1 nS
– 0.4 UI for a 1 Gbps signal is 400 pS (0.4nS)
– 0.4 UI for a 2 Gbps signals is 200 pS

Data Falling Outside the Eye Will be received without Error (BER<1E-12)

Diagram 10


In other words, depending on the characteristic of the link, the designer has the option to apply pre-emphasis to “customize” the SerDes devices differential buffer to that of the interconnect material. A common misunderstanding about SerDes is that pre-emphasis consumes additional power, as if added power is used to “open” the eye. This is not the case. Again, pre-emphasis is a high pass filter that is modified based on the environment it is driving into and even at the maximum pre-emphasis setting, it will consume only slightly more power. Additionally, the pre-emphasis needs to be tuned for the type of interconnect material being used, more is not always better with pre-emphasis (see Diagram 11, 12, 13, and 14).

Diagram 11

No PreEmphasis
Eye Width: 201ps
Eye Height: 152.2mV

Diagram 12

12.5% PreEmphasis
Eye Width: 232ps
Eye Height: 204.2mV

Diagram 13

25% PreEmphasis
Eye Width: 234ps
Eye Height: 268.2mV

Pre-Emphasis Effects
Compensate for Long Trace Lengths

Diagram 14


It should be noted that an eye diagram is best measured using a high bandwidth oscilloscope, and transmit and receive jitter is best measured using a high sample rate TIA (wave crest).

4. Power Consumption:
Power consumption has become an increasingly important parameter over the last several years, in particular with programmable SerDes devices. This is due primarily to the combination of programmable fabric with SerDes buffers. It’s common to find SerDes devices that consume upwards of 400 to 500 Mwatts per channel and the FPGA fabric drawing upwards of 5 to 10 watts.

Therefore, if we consider a design that requires 16 channels of SerDes and 3 million gates of FPGA, the total power consumption would be on the order of 16 watts. Similar to the latency example, the designer must make trade offs, related to the devices employed for a given design. In the case of Lattice Semiconductor’s SerDes devices, they range from .8 watts for 10Gbs SerDes to .210 watts for 3.7Gbs SerDes, down to .065 watts for the Lattice GDX2.


Another concern, particularly if a programmable Serdes device is used, is how the user accesses the Serdes portion of the device. Most programmable Serdes suppliers required the user to configure the Serdes portion using HDL. The same tools used to design the programmable portion of the design. This can be very time consuming and tedious work.

In the case of Lattice Semiconductor SerDes products, due to the large number of control and status registers required to configure the SerDes block of the device, a Graphical User Interface (GUI) is provided as part of the standard Serdes software development kit. This GUI not only speeds design but can save months in terms of debug (see Diagram 15)


Diagram 15

Conclusion:
In summary, many elements must be considered before embarking into a high speed SerDes design. SerDes can provide significant cost savings through lower PCB costs, smaller form factors, reduced power, lower EMI/RFI and a straightforward migration path to high data throughput. However, it’s important to remember that due diligence is required by the design engineer to insure the proper device is selected for the job at hand.

Lattice Semiconductor supplies a wide variety of SerDes devices ranging from 155Mbs to 10Gbs, depending on the design requirements.

Lattice GDX2 supplies very low latency (<35ns worst case) and up to 850MPS, to the highly flexible ORT82G5 FPSC (Field Programmable System on a Chip) that operates at over 3.7Gbs with up to 16, 000 lookup table. Finally, the Lattice XPIO-110, which is also a low latency SerDes, operates up to 10Gbs.

Jock Tomlinson, Vice President
Field Application Engineering & Major Accounts
Lattice Semiconductor Corporation

December 9, 2003

[back to top]

Comments on this article? Send them to comments@fpgajournal.com

All material on this site copyright © 2006 techfocus media, inc. All rights reserved.
FPGA and Structured ASIC Journal
Privacy Statement