• Technical Conference:  15 – 19 March 2026
  • Exhibition: 17 – 19 March 2026
  • Los Angeles Convention Center, Los Angeles, California, USA

Programmable Hardware Delivers 10,000X Improvement in Verification Speed over Software

By Tony Chan-Carusone, Alphawave Semi https://awavesemi.com/


At OFC 2025 next week, the industry will examine how to remove bottlenecks from networking with advanced modulation for optical transmission playing a key in enabling this. In the following article, Tony Chan-Carusone explores the critical role of Forward Error Correction (FEC) in high-speed wireline networking, particularly with the adoption of PAM4 modulation for mid-distance transmission across data centers and how advances in programmable hardware for verification can achieve speeds 10,000 times faster than traditional software-based simulation.

As data transmission rates increase, FEC is essential for maintaining low error rates and obviating the need for retransmission. The performance of modern FEC depends critically on the details of the receiver DSP, particularly with respect to the potential for bursts of errors to corrupt entire frames of data, making them uncorrectable. Software-based time-domain simulation is traditionally used to verify the performance of FEC. However, software simulation is too slow to confirm the probability of these extremely rare error events.

Fortunately, using FPGAs complete links can be modelled and simulated with enough speed and accuracy to validate FEC performance in a wide variety of real application scenarios prior to widespread deployment. Moreover, such a model can be used to evaluate alternative DSP and FEC for new emerging applications. With high-speed networking evolving rapidly, these innovations will be a key focus at OFC, where experts will explore the latest advancements shaping the future of optical and data center connectivity.

In the race to increase the speeds of wireline networking and communications, forward error correction (FEC) has become a vital part of the toolkit. To function effectively, especially with the increasing use of four-level pulse amplitude modulation (PAM4), high-speed protocols need FEC to avoid a rise in the number of reception errors. Each incremental increase in the transmitted symbol rate requires higher signal bandwidths, with a commensurate increase in the amount of noise in receivers. Thus, more powerful and complex FEC may be expected to counter the increased noise levels.

Next generation PAM4 wireline links for data-centre interconnection will support transmission rates of 200 Gbps per serial lane. The IEEE 802.3dj task force is responsible for writing the standard that implementors will use to develop their 200 Gbps Ethernet interfaces. To prevent a rise in bit error rate (BER), the task force has adopted a two-layer FEC scheme with inner and outer codes to provide two layers of error correction. However, many details of the system-level architecture and how they affect FEC performance need to be analysed.

Figure 1. Concatenated FEC

Required conversion steps

The requirement for two concatenated FEC codes lies in the end-to-end composition of 200 Gbps links. Several transmission hops must be considered; first, a short electrical link transmits data from a host chip in a server, or switch to an optical module. The optical module receives this electrical signal and retransmits an optical signal that is then communicated over a much longer distance to the receiving optical module. This module then receives the incoming optical signal and translates it into an electrical equivalent, relaying the data to the destination chip in another server or switch. In the IEEE 802.3dj architecture, all three of these links employ PAM4 signalling and each of them can introduce errors in the transmitted symbols.

Figure 2. 200 Gb/s multi-park link with concatenated FEC

The electrical links can suffer from large amounts of inter-symbol interference (ISI). Correcting for this ISI often requires equalization techniques, such as decision feedback equalization (DFE), or maximum-likelihood sequence detection (MLSD). These equalization techniques are necessary to establish a link, but are subject to error propagation whereby a single error in the received bit stream due to an extreme noise event may significantly increase the probability of additional errors in neighbouring bits. Thus, errors are probabilistically correlated and it is relatively common to see errors arise in bursts errors.

On the other hand, optical connections are generally subject to less ISI, but a lower signal-to-noise ratio. In practice, PAM4 optical transceivers contain bandwidth-limited amplifiers, introducing some ISI that also demands some equalization. However, the errors will generally be less strongly correlated, compared to purely electrical links. Thus, simulations can model the predominant optical link impairment using additive Gaussian noise resulting in random and highly uncorrelated errors.

This difference in behaviour between the electrical and optical links supports the use of two levels of FEC in a concatenated arrangement. The outer code corrects errors in all three links. The inner code protects only the optical part of the connection and can therefore use a simpler error-correction method. In the case of the upcoming 200 Gbps Ethernet standards, the proposed code is a binary extended Hamming code that can correct a 1-bit error in each 128-bit codeword. This is effective enough for the uncorrelated errors that are likely to be encountered in the optical domain.

In order to correct any correlated errors resulting in a burst from all three links, the outer code proposed for the standard uses a Reed-Solomon code, commonly known as the KP4 FEC, with reference to a previous standard where the same code was used. This can correct up to 15 FEC-symbols per 544-symbol codeword in the FEC-encoded stream, where each FEC symbol in turn comprises multiple PAM symbols. Note that when a decoding operation fails, it can mistakenly introduce yet more errors in its effort to make the corrections.  Further complicating matters, a codeword interleaver is introduced between the inner and outer codes, which spreads out any error bursts introduced by the inner code, improving overall performance of the link. 

The complexity of these new codes leads to potential hazards in their implementation. Traditionally, it has been assumed that for a given FEC code, a required post-FEC BER can be translated into an equivalent required pre-FEC BER. Called the FEC limit paradigm, this approach allowed system analysis to focus on evaluating only the relatively high pre-FEC BER of worst-case links, which can be simulated in software as described below.  However, when widely deployed, 200 Gbps links will face a diversity of noise and ISI in each of the three constituent hops, which will translate into different error correlations and, hence, different post-FEC BER even at the same pre-FEC BER.  Moreover, the choice of equalization in each receiver may also impact the end-to-end post-FEC BER.  Thus, it has become necessary to accurately evaluate the very low probability of FEC decoder failure in the presence of all these variations.  It is also desirable to consider the impact of different DSP approaches (e.g. DFE vs. MLSD).

Software approaches to FEC analysis

An approach often used in the analysis of such protocols is the use of software-based time-domain simulation. A program models the transmission of test data through a  wireline channel that captures different signal-integrity impairments and counts the resulting bit errors. But software simulation is slow compared to the speed of the physical system, a problem that is compounded both by the need to target extremely low post-FEC BERs and the use of DFE and MLSD techniques that are more computationally complex.

A typical analysis holds the electrical link’s BER at a constant level, while the optical link’s BER is swept to find the level at which the overall system’s codeword error ratio (CER) is sufficiently low. To meet the Ethernet standard’s specification, this target CER level should be 1.45×10−11. Obtaining enough data for analysis using only software simulation can take days, or even weeks. This delay is prohibitive in a situation where a development team needs to try out different protocol and hardware-design strategies.

Another possibility is to use a statistical analysis tool to predict a system’s post-FEC BER without having to run a long time-domain simulation. However, there are currently no statistical methods available that can accurately model the architectures considered for the 200 Gbps Ethernet standard. The equalization methods and symbol interleaving techniques proposed for the standard introduce too much complexity to statistical modelling.

Evaluation of hardware for FEC analysis

As the overhead of running software on even high-performance processors represents the biggest bottleneck, a viable solution is to use programmable hardware as the simulation engine. This provides the ability to evaluate algorithms and changing channel conditions in far less time. It also lets developers try out ideas and implement them quickly by reconfiguring the hardware platform.

The capacity of today’s field-programmable gate arrays (FPGAs) allows for many parallel instances of a complete simulation to on a single device, including a built-in processor to manage the dataflows. A platform for doing so is described in ‘An FPGA-Accelerated Platform for Post-FEC BER Analysis of 200 Gb/s Wireline Systems.

To avoid needing to allocate space to resource-intensive Reed-Solomon encoders and decoders directly, the FPGA model need not include them. Instead, the platform can use a checker to detect the number of FEC-symbol errors in each Reed-Solomon codeword. If the total number of FEC-symbol errors in a codeword exceeds 15, then a codeword error can be registered without actually performing the decoding. To allow high flexibility in modelling different setups, the hardware platform can be parameterized. Moreover, for any new emerging applications, different constituent linear block codes can be substituted.

A key challenge in performing accurate time-domain simulations is generation of random noise. For example, a white-noise spectrum is commonly used to evaluate FEC performance, which suits statistical techniques. But amplifiers tend to colour the noise spectrum. By passing the AWGN through a finite impulse response filter, the model can better represent this coloured noise. Noise generation is one of the most computationally-intensive operation in time-domain simulations, and can be significantly accelerated on an FPGA platform.

As in software, hardware channel emulation can trade off simulation accuracy and speed.  Accurately capturing noise statistics and the ISI of channels with a long response requires more logic.  Thus, simpler modelling allows for more parallel instances of a channel simulation to run on any given FPGA platform.  For example, using a simple AWGN channel model, an FPGA might host 200 parallel emulations, whereas a more complex channel with a soft-decision inner FEC decoding scheme might reduce the number of cores to eight.  Nevertheless, throughputs of at least four orders of magnitude faster than software-only simulation are readily achievable.

Hardware simulation platforms can be scaled using multiple FPGAs in parallel. For example, using dozens of FPGAs, CER levels can be validated to the Ethernet standard in less than a day. This significant speed improvement over existing simulation methods make it possible to explore new and complex FEC architectures with high accuracy and support the creation of reliable optical connectivity at 200 Gbps and beyond.

In addition to being able to discuss for the emerging demands of FEC, the company will demonstrate products and IP for mid-distance transmission over PAM4 optical connections at OFC in the Exhibition Hall, booth #5645.

[End]

Bio:

Tony Chan Carusone is CTO at leading interconnect specialist Alphawave Semi, which develops IP, chiplets, custom silicon and standard connectivity products to advance data center interconnect and compute.

Posted: 28 March 2025 by Tony Chan-Carusone, Alphawave Semi https://awavesemi.com/ | with 0 comments

Comments
Blog post currently doesn't have any comments.
 Security code


The views expressed in this blog are those of the authors and do not necessarily reflect the views or policies of The Optical Fiber Communication Conference and Exposition (OFC)  or its sponsors.