|
Legacy Parallel to Serial Backplane Design and
Considerations
Parallel digital interconnect and backplanes have existed since the advent
of modern electronic systems.
PCI has emerged as the most pervasive interconnect and backplane drive
technology, which was first introduced in the early 1990s as a chip-to-chip
interconnect standard based on 32 bits of data that operated at 33 MHz
on these modern systems. PCI has evolved over the years from 32 bit/33Mhz
to 64 bit/66MHz, to most recently, 64bit/133MHz, with plans to migrate
up to 266 MHz and beyond in the future.
Many system design engineers viewed PCI, as a vehicle to address not
only their chip-to-chip interconnect design requirements, but also to
migrate PCI into the backplane for board-to-board interconnect as well.
PCI was never designed nor intended to be used in backplane applications
or even in mid-plane interconnect applications. Nevertheless, many design
engineers successfully deployed systems that utilized PCI as not only
the chip-to-chip interconnect, but also as the board-to-board (backplane)
interconnect.
Parallel Backplane design served the industry well for many
years, independent of, if the system utilized PCI or a proprietary parallel
backplane arrangement. The challenge with parallel backplane interconnect
arose as a result of the increased system bandwidth requirements. The
increased bandwidth requirements forced IC manufacturers and system design
engineers a like to use wider (16 ? 32 ? 64 ? 128) data buses and increased
operating frequencies (33 MHz ? 66 MHz ? 133 MHz). Given the larger data
buses (> 64 bit) and higher bandwidths (>120 MHz), it has all but
relegated parallel buses to chip-to-chip interconnect due, primarily to
the length of interconnect material being driven. Large buses operating
at relatively high operating frequencies over long interconnect PCB traces
results in many debilitating effects. The signal noise becomes intolerable
due to transmission line effects in the way of crosstalk and reflections
also ground bounce and skew limit, the usefulness of large, high-speed
backplanes.

Diagram 1
Faced with the increased performance requirements of such
new technologies, such as 3G wireless, 10Gbe, OC192 transport, and multiple
protocol networking equipment to name a few, design engineers had to find
a solution that support the higher data rates, while increasing reliability
and reducing cost. Faced with these challenges, the industry turned toward
the storage market for a viable solution. That solution came in the way
of high speed serial interconnect. Serial interconnect for use in serial
backplanes have many significant benefits over legacy parallel interconnected
backplanes. The first and most important is the performance and reliable/robust
operation of the serial connection.
| •Parallel
data transfer
–Multiple lines consume board space
–Lines interfere with each other
–Each line needs its own termination circuitry
•Serial
Data Transfer
–Fewer lines yields reduced board space
–Line interference can be minimized
–Uses a fraction of the termination circuitry vs. parallel
–No Skew Issues
|
Diagram 2 |
The serial connection operates by taking parallel signals in on the “local”
side of PCB, which then gets converted to a serial bit stream (via parallel
to serial converter) on the backplane side. (See Diagram 3) A Clock Data
Recovery (CDR) is used to modulate a clock signal on the transmit side
of the serial connection and extracts or de-modulates the clock from the
data on the receive side of the link (see Diagram 4). For example, 8,
10 or 12 bits of parallel data will enter into a SerDes (serializer/deserializer)
device, which generates a serial data stream with the clock modulated
onto the stream.
Diagram 3
|
Diagram 4 |
Benefits of Serial Backplanes
Area Reduction:
By converting the “local” parallel data to serial, it greatly
reduces the number of traces, thus allowing the reduction of the backplane
size. The backplane PCB is the most expensive board in many systems and
the largest. In fact, the actual size of the system backplane, in many
cases is the limiting factor in allowing the system rack size to be reduced.
Additionally, serial backplanes also allow smaller connections used to
physically connect from the “local” PCB to the backplane,
further reducing size and complexity of the system design, basically an
11 to 1 reduction. The two main reasons for implementing a serial backplane
are (1) the high data throughput with reliable performance and (2) backplane
PCB reduction. The latter is realized through smaller form factor of the
system rack, fewer layers of PCB material, resulting in lower cost.
Noise Reduction:
Current serial signaling technologies utilize a differential Input/Output
(I/O) buffer. The differential buffers provide much smaller signal swings
compared to historical single ended buffers. This reduced signal swing,
results in a lower power I/O buffer, but more importantly, it significantly
lowers noise. The noise reduction benefit is seen in much lower RFI/EMI,
ground bounce and transmission line effects including crosstalk and reflections.
Bandwidth Increase:
Design engineers, moving from parallel to serial backplanes have a multitude
of options regarding their implementation choices. For example, an engineer
wanting to convert from a legacy design that used PCI 32b/33MHz, for both
the “local” side of the PCB and the backplane, which has a
total aggregated bandwidth of 1.056Gbs (32b x 33MHz), could select a SerDes
device that would take in the PCI local data, serialize it and transmit
the data out at 1GPS - or the designer could elect to provide 8 bits of
data to 4 channels of SerDes operating at 265Mbs. Another option would
be further increasing the data rate of the serial link. With today’s
SerDes technology, the engineer can select from slower speed SerDes devices
with more channels, or higher speed with fewer or even a single channel,
SerDes devices. SerDes devices operate from 155Mbs, on the low end, up
to 10Gbs on the high end, and incorporate two main signaling technologies,
Low Voltage Differential Swing (LVDS) and Current Mode Logic (CML).

Diagram 5
As a general rule of thumb, LVDS operates from 155Mbs to 1.25Gbs. CML,
on the other hand, operates from 600Mbs to 10Gbs. LVDS and CML can inter-operate,
but require external resistors for level shifting. Therefore, it’s
important that the design engineer consider their existing serial backplane
requirements and future needs before embarking into a SerDes backplane
design.
Migration Path:
One of the many benefits of serial backplanes is the ability to migrate
to higher speed serial interconnect as system bandwidth requirements increase.
By incorporating a sound, high-speed backplane design methodology this
migration capability can be supported.
In the case of Lattice Semiconductor’s SerDes offering, the user
is able to increase the serial backplane performance without changing
the SerDes devices. For example, a user can go from 155Mbs to 850Mbs per
channel by simply increasing the SerDes devices reference clock, or depending
on the product family, 600Mbs to 3.70Gbs.
Programmability – SerDes vs. ASSP:
The other design aspect that must be considered is what type or class
of SerDes device to use. There are basically two types of SerDes to use
– ASSP (non-programmable) and programmable. The advantage of a programmable
SerDes device is its inherent flexibility as a programmable device. The
programmable fabric allows the user to customize the “local”
side of the PCB (see Diagram 6). Therefore, the user can build in any
local bus required either PCI or proprietary. The flexibility of programmable
logic, combined with SerDes, results in a reduction in component counts
(PLD or FPGA + ASSP SerDes) and a shortened time to market. The programmable
SerDes also allows for flexible I/O assignments, meaning that the user
can select the optimal pin assignment that ease board layout and potentially
eliminating PCB layers on the local board. Another advantage is in the
area of I/O voltage levels and I/O type, both of which are a programmable
selection with the Lattice Semiconductor programmable SerDes devices.
Lattice Programmable SerDes
Table 1
| |
Programmable Features |
High-Speed Interface Features |
ORT4/82G5
8-Channel x 1.25 / 2.5 / 3.125 Gbit/s
Transceiver
|
•ORCA Series 4 FPGA technology •Up to
400K programmable gates •432
Programmable user I/O •111Kb Embedded RAM •I/O
support includes HSTL, LVDS, SSTL, LVPECL, PECL, TTL, CMOS •Interface
to three extra 2Kx32 dual-port RAMs (in the embedded section)
|
•4/8 channels at 1.25/2.5/3.125/3.7
Gbit/s
•4 channels demonstrated at 4.25 Gbit/s
•High-speed CML I/Os with internal termination
•Total aggregate bandwidth – 10/20 Gbit/s •Power-down
option on each HSI receiver •Embedded
8B/10B coder/decoder •Cell Processing •High-speed
680 PBGAM Packaging •Multi-channel
alignment FIFOs •Two 4Kx36 Memory blocks in embedded
macro
|
ORSO4/82G5
8-Channel x 1.35 / 2.7 Gbit/s
Transceiver
|
•ORCA Series 4 FPGA technology •Up to 400K
programmable gates •432 Programmable user I/O
•111Kb Embedded RAM •I/O support includes HSTL, LVDS,
SSTL, LVPECL, PECL, TTL, CMOS •Interface to three extra
2Kx32 dual-port RAMs (in the embedded section)
|
•4/8 channels at 1.35/2.7 Gbit/s
•High-speed CML I/Os with internal termination •Total
aggregate bandwidth – 10/20 Gbit/s •Power-down option
on each HSI receiver •SONET scrambler
•Cell Processing interface to devices •High-speed
680 PBGAM Packaging •Multi-channel
alignment FIFOs •Two 4Kx36 Memory blocks in embedded
macro
|
ORT8850H
8-Channel
x 850 Mbit/s Transceiver
|
•ORCA Series 4 FPGA technology •Up to 600K
programmable gates
•536 Programmable user I/O •147Kb Embedded
RAM
|
•8 channels of 850 Mbit/s
•Total aggregate bandwidth - 6.8 Gbit/s •LVDS I/Os
compliant with EIA-644 •3 full-duplex DDR I/O groups: RapidIO-like
I/F •Powerdown option on each HSI receiver •Pseudo-SONET
framer including A1/A2 •SONET scrambler •In-band
management & configuration •Multi-channel
alignment FIFOs
|
ISPGDX2 (64,128 & 256)
4 To 16 Channels
400Mbs to 850Mbs
Low Cost (>$2.00 / Ch)
|
•High speed high channel count •Up to 256
User Programmable I/O’s •I/O support includes
HSTL, LVDS, SSTL, LVPECL, GTL+, TTL, CMOS
|
•4/16 channels at 400Mbs/850Mbs
•High-speed, Low Cost LVDS I/Os •Total aggregate
bandwidth – 3.4/13.6 Gbit/s •Embedded
10B/12B coder/decoder •Low Latency
|
ISPXPGA
4 To 20 Channels
400Mbs to 850Mbs
Non-Volatile
|
•ISPXP FPGA technology •Up to 30,000
Registers •496 Programmable user I/O •415Kb
Embedded RAM •I/O support includes HSTL, LVDS, SSTL, LVPECL,
TTL, CMOS
|
•4/20 channels at 400Mbs / 850Mbs
•High-speed LVDS I/O •Total aggregate bandwidth –
3.4/17 Gbit/s •Embedded 10B/12B coder/decoder
|
XPIO
1 Channel
9.95Gbs to 10.7Gbs
SFI -4
|
•Parallel LVDS data range from 622 to 670 Mbps •Single-chip
solution •0.13u CMOS technology
|
•Low jitter clock multiplier •On-chip clock data
recovery •16:1 serialization and 1:16 deserialization
•Embedded limiting amplifier
•Built-In-Self-Test (BIST)
•Lowest power consumption at 0.8W
•1 channel at 9.95 Gbit/s to 10.7 Gbit/s |
Latency:
The next consideration of the designer is the latency requirements, or
lack of, within the system designs. Many systems today are using a large,
shared memory that gets accessed by multiple cards within the rack. This
is commonly done to reduce the cost and board area of the line or processing
(in multi-processing system) card.
As we look at developing a multi-processor shared memory system, we find
that if we attempt to implement the backplane connection in a parallel
configuration, our concern is the number of traces routed across the backplane.
If we consider a system configuration where there are 15 processor cards
and 1 shared memory card, we end up with a total of 480 traces (not including
the additional control signals) on the backplane. Also, the performance
of the parallel interconnect will be limited to <120MHz for the reasons
previously stated. Therefore, a parallel backplane is not feasible for
the shared memory design.
If we then look at implementing a high speed SerDes design, we may be
tempted to use the highest speed SerDes - for example 3.12Gbs. Keeping
in mind, some level of programmability is needed to facilitate the connection
to the processor card local bus. The local bus interface may entail developing
a PCI master/target (PCI as the local bus) or just programmable I/O for
a simple proprietary local bus.
In review of a high-speed programmable device (>2.5Gbs), we find the
latency of the device is on the order of 130ns to 150ns on the receive
side, and approximately 70ns to 90ns on the transmit side. (The receive
side will always have greater latency due to the decoder and buffering
requirements.)
Because the larger FPGA fabric of the high speed (>2.5Gbs) SerDes
device makes the latency unacceptable, we need to find a device that allows
a limited amount of programmability (required only I/O for flexibility
in assignments and voltage levels). Thus, we arrive in the selection process
with the Lattice ISPGDX2, which provides the programmable I/Os and very
low latency. The low latency is the result of the high speed, lightly
loaded, routing array. The latency for the GDX2 is 35ns for the receive
side and 17ns for the transmit side. (See Diagram 6)
Worst Case clock to data out delay (pin
to pin) is 2Bt + 8 »= 2*(1/800)+ 8 = 10.5ns
Worst Case clock to data out delay (pin to pin)
is 2Bt + 10 + 5ns (mux delay) = 17.5ns

Worst Case delay =1.5Trcp +4.5Bt + 10 =
34.375ns
Diagram 6
The GDX2 contains three family members that support from 4, 8, and 16
SerDes channels that operate up to 850Mbs. This allows us to use the smallest
4-channel device on the processor card and the 16-channel device on the
memory card, making the GDX2 ideal for shared memory, low latency designs
using proprietary local bus designs.
In the event that the processor card and memory cards, and local bus
are running PCI, then we would need to implement a PCI (master/target)
bus interface. Because the GDX2 has limited programmable gates, we would
need to trade off some of the latency for the density to implement the
PCI master/target. Therefore, we would use the Lattice XPGA200, which
is an FPGA that combines world-class speed, non-volatile with the same
SerDes technology as used on the Lattice GDX2 devices. Due to the larger
FPGA fabric, the latency goes up by approximately 30% to 40% compared
to the GDX2, which will be offset by combining the PCI controller and
SerDes into a single device.
Serdes Quality
When considering the use of SerDes devices, independent of programmable
or non-programmable options, there are several hundred parameters to consider.
However, there are four critical parameters that should be evaluated in
a lab environment closely approximating the environment in which the SerDes
device is expected to operate.
These four parameters, in order of importance/criticality are:
1. Receive Jitter Tolerance
2. Transmit Jitter
3. Eye Diagram
4. Power Consumption/Dissipation
1. Receive (RX) Jitter Tolerance: is a measure of
the SerDes device’s ability to reject or tolerate jitter on the
receive signal. Jitter is noise that get coupled onto a signal, which
results in nebulous signal transitions (see Diagram 7). All signals
have some amount of jitter, and there are many different types of jitter.
For example, there is Device Dependent (DD) jitter, inter-symbol jitter,
and so on. What’s important for the design engineer is to understand
is how tolerant of incoming jitter a SerDes device is before using it
in a design. In SerDes devices, RX jitter tolerances vary widely from
manufacturer to manufacturer. Board layout can either make or break
a SerDes system design. (Please see Lattice PCB Layout Guidelines.)
It should be noted, however, that no amount of PCB design techniques
would guarantee an error free design. In other words, a SerDes device’s
Rx jitter tolerance cannot be totally compensated for through board
design. Therefore, the user should select a device with the highest
jitter tolerance. As state previously, RX jitter tolerance varies among
manufacturers. For example, the Lattice ORT-x2G5 device has an RX jitter
tolerance of .73UI (UI = Unit Interval, see Diagram 8).

Diagram 7
The Unit of Measurement for Jitter
is Time or Fractional Unit Intervals
T Jitter (UI) = T Jitter / T period
Diagram 8
An artifact of receive jitter tolerance is power supply noise rejection,
because any noise that appears on the SerDes devices power supplies
and grounds gets coupled directly to the SerDes and CDR PLLs. Therefore,
it is critical to properly decouple VCC and ground using high “Q”
decoupling capacitors placed as close as possible between VCC and ground.
2. Transmit (TX) Jitter: is the amount of jitter that
the SerDes device transmits to the receiving SerDes side of the serial
link. Ideally, you want a SerDes device that transmits a minimal amount
of jitter at the transmit side and a SerDes device at the receive side
of the link, that is able to receive maximal jitter. TX jitter is also
measured as a UI where the lower (smaller) the UI the better.
3. Eye Diagrams: are a simple way of viewing the integrity
of a serial link. The “data eye” is typically measured at
the receiving end of the serial link. The eye diagram shows two key
elements of the link. The amplitude and the time period, or the amount
of time the eye is “open” expressed as a Unit Interval (UI).
The eye diagram provides insight into the integrity of the SerDes device
transmitting on the transmission media and the characteristics of that
media (see Diagrams 9 and 10).
Pre-emphasis is a common method employed by SerDes suppliers to overcome
the effects of loading (resistance, capacitance and inductance) on interconnect
material, commonly FR-4. Pre-emphasis is a user selectable setting,
that invokes a high pass filter to compensate for the interconnect media’s
loading effects, which degrades the signal integrity. This degradation
results in increased bit errors through closure of the data eye.

EYE Diagram
Diagram 9

Unit Interval (UI) is the Normalized Bit
Period, I.e., inverse of Clock Frequency:
– 1 UI for a 1 Gbps Eye Diagram is 1 nS
– 0.4 UI for a 1 Gbps signal is 400 pS (0.4nS)
– 0.4 UI for a 2 Gbps signals is 200 pS
Data Falling Outside the Eye Will be received
without Error (BER<1E-12)
Diagram 10
In other words, depending on the characteristic of the link, the designer
has the option to apply pre-emphasis to “customize” the
SerDes devices differential buffer to that of the interconnect material.
A common misunderstanding about SerDes is that pre-emphasis consumes
additional power, as if added power is used to “open” the
eye. This is not the case. Again, pre-emphasis is a high pass filter
that is modified based on the environment it is driving into and even
at the maximum pre-emphasis setting, it will consume only slightly more
power. Additionally, the pre-emphasis needs to be tuned for the type
of interconnect material being used, more is not always better with
pre-emphasis (see Diagram 11, 12, 13, and 14).
Diagram 11

No PreEmphasis
Eye Width: 201ps
Eye Height: 152.2mV |
Diagram 12

12.5% PreEmphasis
Eye Width: 232ps
Eye Height: 204.2mV |
Diagram 13

25% PreEmphasis
Eye Width: 234ps
Eye Height: 268.2mV
|

Pre-Emphasis Effects
Compensate for Long Trace Lengths
Diagram 14
It should be noted that an eye diagram is best measured using a high
bandwidth oscilloscope, and transmit and receive jitter is best measured
using a high sample rate TIA (wave crest).
4. Power Consumption:
Power consumption has become an increasingly important parameter over
the last several years, in particular with programmable SerDes devices.
This is due primarily to the combination of programmable fabric with
SerDes buffers. It’s common to find SerDes devices that consume
upwards of 400 to 500 Mwatts per channel and the FPGA fabric drawing
upwards of 5 to 10 watts.
Therefore, if we consider a design that requires 16 channels of SerDes
and 3 million gates of FPGA, the total power consumption would be on
the order of 16 watts. Similar to the latency example, the designer
must make trade offs, related to the devices employed for a given design.
In the case of Lattice Semiconductor’s SerDes devices, they range
from .8 watts for 10Gbs SerDes to .210 watts for 3.7Gbs SerDes, down
to .065 watts for the Lattice GDX2.
Another concern, particularly if a programmable Serdes device is used,
is how the user accesses the Serdes portion of the device. Most programmable
Serdes suppliers required the user to configure the Serdes portion using
HDL. The same tools used to design the programmable portion of the design.
This can be very time consuming and tedious work.
In the case of Lattice Semiconductor SerDes products, due to the large
number of control and status registers required to configure the SerDes
block of the device, a Graphical User Interface (GUI) is provided as part
of the standard Serdes software development kit. This GUI not only speeds
design but can save months in terms of debug (see Diagram 15)

Diagram 15
Conclusion:
In summary, many elements must be considered before embarking into a high
speed SerDes design. SerDes can provide significant cost savings through
lower PCB costs, smaller form factors, reduced power, lower EMI/RFI and
a straightforward migration path to high data throughput. However, it’s
important to remember that due diligence is required by the design engineer
to insure the proper device is selected for the job at hand.
Lattice Semiconductor supplies a wide variety of SerDes
devices ranging from 155Mbs to 10Gbs, depending on the design requirements.
Lattice GDX2 supplies very low latency (<35ns worst case) and up to
850MPS, to the highly flexible ORT82G5 FPSC (Field Programmable System
on a Chip) that operates at over 3.7Gbs with up to 16, 000 lookup table.
Finally, the Lattice XPIO-110, which is also a low latency SerDes, operates
up to 10Gbs.
Jock Tomlinson, Vice President
Field Application Engineering & Major Accounts
Lattice Semiconductor Corporation
December 9, 2003
[back to top]
Comments on this article? Send them to comments@fpgajournal.com |