Channel Coding for Tb/s Communications
Norbert Wehn; MPSoC 2019
Norbert Wehn gave a Keynote at the 19th International Forum on MPSoC , with title: “Channel Coding for Tb/s Communications”


Turbo codes for Tb/s communications: code design and hardware architecture
Ronald Garzon-Bohorquez, Charbel Abdel Nour, Stefan Weithoffer, Catherine Douillard, Norbert Wehn; OWHTC


expand[ More ]


Abstract: A presentation on "Channel coding for Tb/s wireless communications: insights into code design and implementation", which was given at the International Symposium on Ubiquitous Networking - UNet'19 in Limoges, France



Non-binary turbo codes: design, simplified decoding and comparison with SoA codes
Klaimi, Rami; Abdel Nour, Charbel; Douillard, Catherine
expand[ More ]


Abstract: A presentation on "Non-binary turbo codes: design, simplified decoding and comparison with SoA codes", which was given at the GdR ISIS workshop in Paris, France





expand[ More ]


Abstract: The continuous demands to higher throughput in communications systems, in addition to higher spectral efficiency and lowest signal processing latencies, leads to throughput requirements for the digital baseband signal processing far beyond 100Gbit/s. This is one to two orders of magnitude higher than the tens of Gbit/s targeted in the 5G standardization. At the same time, advances in silicon technology due to shrinking feature sizes and increased performance parameters alone won’t provide the necessary gain, especially in energy efficiency for wireless transceivers, which have tightly constrained power and energy budgets. The focus of this talk lies on channel coding that is a major source of complexity in digital baseband processing. We will highlight implementation challenges for the most advanced channel coding techniques, i.e. Turbo-Codes, LDPC codes and Polar codes, and discuss approaches to tackle these challenges.



expand[ More ]


Abstract: Presentation on Polar Codes for Terabit/s Data Rates

D1.2 "B5G Wireless Tb/s FEC KPI Requirements and Technology Gap Analysis" [March 2018]
This report determines the FEC performance requirement set for the EPIC project and wireless Tb/s use-cases in general. This report sets the performance targets for the FEC development work in the rest of the project.

D3.3 Report on link-level simulation performance final results [September 2020]
Report on link-level simulation performance final results and on communication performance comparison between the different families of FEC codes under study.

D4.4 FPGA Verification Report [September 2020]
This report will describe FPGA verification of selected encoders/decoders which are best performing and prone to be commercial FEC IPs. It will report the description of imulation chain, implementation results of HW complexity and BER performance.

D5.3 Beyond 5G Forward Error Correction (FEC) Workshop [August 2019]
This deliverable includes the organization and execution of scientific workshops and special sessions on next generation FEC research within leading conferences.

The EPIC project uses Zenodo as its open research data repository, in order to grant Open Access to scientific publications. Check out our Zenodo community EPIC!

2020

A 506 Gbit/s Polar Successive Cancellation List Decoder with CRC PIMRC 2020
Kestel, C.; Johannsen, L.; Griebel, O.; Jimenez, J.; Vogt, T.; Lehnigk-Emden, T.; Wehn, N.
expand[ More ]


Abstract: Polar codes have recently attracted significant attention due to their excellent error-correction capabilities. However, efficient decoding of Polar codes for high throughput is very challenging. Beyond 5G, data rates towards 1 Tbit/s are expected. Low complexity decoding algorithms like Successive Cancellation (SC) decoding enable such high throughput but suffer on errorcorrection performance. Polar Successive Cancellation List (SCL) decoders, with and without Cyclic Redundancy Check (CRC), exhibit a much better error-correction but imply higher implementation cost. In this paper we in-depth investigate and quantify various trade-offs of these decoding algorithms with respect to error-correction capability and implementation costs in terms of area, throughput and energy efficiency in a 28nm CMOS FD-SOI technology. We present a framework that automatically generates decoder architectures for throughputs beyond 100 Gbit/s. This framework includes various architectural optimizations for SCL decoders that go beyond State-of-the-Art. We demonstrate a 506 Gbit/s SCL decoder with CRC that was generated by this framework.

Low-complexity Computational Units for the Local-SOVA Decoding Algorithm PIMRC 2020
Weithoffer, S.; Klaimi, R.; Nour, C.; Wehn, N.; Douillard, C.
expand[ More ]


Abstract: Recently the Local-SOVA algorithm was suggested as an alternative to the max-Log MAP algorithm commonly used for decoding Turbo codes. In this work, we introduce new complexity reductions to the Local-SOVA algorithm, which allow an efficient implementation at a marginal BER penalty of 0.05 dB. Furthermore, we present the first hardware architectures for the computational units of the Local-SOVA algorithm, namely for the add-compare select unit and the soft output unit, targeting radix orders 2, 4 and 8.We provide place & route implementation results for 28nm technology and demonstrate an area reduction of 46 75% for the soft output unit for radix orders 4 in comparison with the respective max-Log MAP soft output unit. These area reductions compensate for the overhead in the add compare select unit, resulting in overall area saving of around 27 46% compared to the max-Log-MAP. These savings simplify the design and implementation of high throughput Turbo decoders.

Advanced Hardware Architectures for Turbo Code Decoding Beyond 100 Gb/s WCNC 2020
Weithoffer, S.; Griebel, O.; Klaimi, R.; Nour, C.; Wehn, N.
expand[ More ]


Abstract: In this paper, we present two new hardware architectures for Turbo Code decoding that combine functional, spatial and iteration parallelism. Our first architecture is the first fully pipelined iteration unrolled architecture that supports multiple frame sizes. This frame flexibility is achieved by providing a set of interleavers designed to achieve a hardware implementation with a reduced routing overhead. The second architecture efficiently utilizes the dynamics of the error rate distribution for different decoding iterations and is comprised of two stages. First, a fully pipelined iteration unrolled decoder stage applied for a pre-determined number of iterations and a second stage with an iterative afterburner-decoder activated only for frames not successfully decoded by the first stage. We give post place & route results for implementations of both architectures for a maximum frame size of K = 128 and demonstrate a throughput of 102:4 Gb/s in 28 nm FDSOI technology.With an area efficiency of 6:19 and 7:15 Gb/s/mm2 our implementations clearly outperform state of the art.

A Low-Complexity Dual-Trellis Decoding Algorithm for High-Rate Convolutional Codes WCNC 2020
Vinh Hoang Son Le, Charbel Abdel Nour, Catherine Douillard and Emmanuel Boutillon
expand[ More ]


Abstract: Decoding using the dual trellis is considered as a potential technique to increase the throughput of soft-input soft-output decoders for high coding rate convolutional codes. However, the dual Log-MAP algorithm suffers from a high decoding complexity. More specifically, the source of complexity comes from the soft-output unit, which has to handle a high number of extrinsic values in parallel. In this paper, we present a new low-complexity sub-optimal decoding algorithm using the dual trellis, namely the dual Max-Log-MAP algorithm, suited for high coding rate convolutional codes. A complexity analysis and simulation results are provided to compare the dual Max- Log-MAP and the dual Log-MAP algorithms. Despite a minor loss of about 0.2 dB in performance, the dual Max-Log-MAP algorithm significantly reduces the decoder complexity and makes it a first-choice algorithm for high-throughput high-rate decoding of convolutional and turbo codes.

Fully Pipelined Iteration Unrolled Decoders - TheRoad to Tb/s Turbo Decoding ICASSP 2020
Weithoffer, S.; Klaimi, R.; Nour, C.; Wehn, N.; Douillard, C.
expand[ More ]


Abstract: Turbo codes are a well-known code class used for example in the LTE mobile communications standard. They provide built-in rate flexibility and a low-complexity and fast encoding. However, the serial nature of their decoding algorithm makes high-throughput hardware implementations difficult. In this paper, we present recent findings on the implementation of ultra-high throughput Turbo decoders. We illustrate how functional parallelization at the iteration level can achieve a throughput of several hundred Gb/s in 28 nm technology. Our results show that, by spatially parallelizing the half-iteration stages of fully pipelined iteration unrolled decoders into X-windows of size 32, an area reduction of 40% can be achieved. We further evaluate the area savings through further reduction of the X-window size. Lastly, we show how the area complexity and the throughput of the fully pipelined iteration unrolled architecture scale to larger frame sizes. We consider the same target bit error rate performance for all frame sizes and highlight the direct correlation to area consumption.

Union Bound Evaluation for Non-Binary Turbo Coded Modulations IEEE Communications Letters, June 2020
Rami Klaimi, Charbel Abdel Nour, Catherine Douillard, and Joumana Farah
expand[ More ]


Abstract: A method to compute the truncated union bound of non-binary turbo codes mapped to high order modulations is proposed in this letter. It calls for the estimate of the truncated Euclidean distance spectrum. To this end, we identify the error events that limit the Euclidean distance of non-binary turbo codes based on memory-1 convolutional codes defined over GF(q), q > 2. Application examples are elaborated for codes over GF(64). The resulting bounds are found to accurately predict the asymptotic performance of the coded modulation scheme.

2019

expand[ More ]


Abstract: By using Majority Logic (MJL) aided Successive Cancellation (SC) decoding algorithm, an architecture and a specific implementation for high throughput polar coding are proposed. SC-MJL algorithm exploits the low complexity nature of SC decoding and the low latency property of MJL. In order to reduce the complexity of SC-MJL decoding, an adaptive quantization scheme is developed within 1-5 bits range of internal log-likelihood ratios (LLRs). The bit allocation is based on maximizing the mutual information between the input and output LLRs of the quantizer. This scheme causes a negligible (0.1 < dB) performance loss when the code block length is N = 1024 and the number of information bits is K = 854. The decoder is implemented on 45nm ASIC technology using deeply-pipelined, unrolled hardware architecture with register balancing. The pipeline depth is kept at 40 clock cycles in ASIC by merging consecutive decoding stages implemented as combinational logic. The ASIC synthesis results show that SC-MJL decoder has 427 Gb/s throughput at 45nm technology. When we scale the implementation results to 7nm technology node, the throughput reaches 1 Tb/s with under 10 mm2 chip area and 0.37 W power dissipation.

Revisiting the Max-Log-Map algorithm with SOVA updates rules: new simplifications for high-radix SISO decoders
Vinh Hoang Son Le, Charbel Abdel Nour, Emmanuel Boutillon and Catherine Douillard IEEE TRansactions on Communications, April 2020
expand[ More ]


Abstract: This paper proposes a new soft-input soft-output decoding algorithm particularly suited for low-complexity high-radix turbo decoding, called local soft-output Viterbi algorithm (local SOVA). The local SOVA uses the forward and back-ward state metric recursions just as the conventional Max-Log MAP (MLM) algorithm does, and produces soft outputs using the SOVA update rules. The proposed local SOVA exhibits a lower computational complexity than the MLM algorithm when employed for high-radix decoding in order to increase throughput, while having the same error correction performance even when used in a turbo decoding process. Furthermore, with some simplifications, it offers various trade-offs between error correction performance and computational complexity. Actually, employing the local SOVA algorithm for radix-8 decoding of the LTE turbo code reduces the complexity by 33% without any performance degradation and by 36% with a slight penalty of only 0.05 dB. Moreover, the local SOVA algorithm opens the door for the practical implementation of turbo decoders for radix-16 and higher.

expand[ More ]


Abstract: Puncturing a low-rate convolutional code to generate a high-rate code has some drawback in terms of hardware implementation. In fact, a Maximum A-Posteriori (MAP) decoder based on the original trellis will then have a decoding throughput close to the decoding throughput of the mother non-punctured code. A solution to overcome this limitation is to perform MAP decoding on the dual trellis of a high-rate equivalent convolutional code. In the literature, dual trellis construction is only reported for specific punctured codes with rate k=(k + 1). In this paper, we propose a multi-step method to construct the equivalent dual code defined by the corresponding dual trellis for any punctured code. First, the equivalent non-systematic generator matrix of the high-rate punctured code is derived. Then, the reciprocal parity-check matrix for the construction of the dual trellis is deduced. As a result, we show that the dual-MAP algorithm applied on the newly constructed dual trellis yields the same performance as the original MAP algorithm while allowing the decoder to achieve a higher throughput. When applied to turbo codes, this method enables highly efficient implementations of high-throughput high-rate turbo decoders.

expand[ More ]


Abstract: By using Majority Logic (MJL) aided Successive Cancellation (SC) decoding algorithm, an architecture and a specific implementation for high throughput polar coding are proposed. SC-MJL algorithm exploits the low complexity nature of SC decoding and the low latency property of MJL. In order to reduce the complexity of SC-MJL decoding, an adaptive quantization scheme is developed within 1-5 bits range of internal log-likelihood ratios (LLRs). The bit allocation is based on maximizing the mutual information between the input and output LLRs of the quantizer. This scheme causes a negligible (0.1 < dB) performance loss when the code block length is N = 1024 and the number of information bits is K = 854. The decoder is implemented on 45nm ASIC technology using deeply-pipelined, unrolled hardware architecture with register balancing. The pipeline depth is kept at 40 clock cycles in ASIC by merging consecutive decoding stages implemented as combinational logic. The ASIC synthesis results show that SC-MJL decoder has 427 Gb/s throughput at 45nm technology. When we scale the implementation results to 7nm technology node, the throughput reaches 1 Tb/s with under 10 mm2 chip area and 0.37 W power dissipation.

expand[ More ]


Abstract: The continuous demands for higher throughput, higher spectral efficiency, lower latencies, lower power and large scalability in communication systems impose large challenges on the baseband signal processing. In the future, throughput requirements far beyond 100 Gbit/s are expected, which is much higher than the tens of Gbit/s targeted in the 5G standardization. At the same time, advances in silicon technology due to shrinking feature sizes and increased performance parameters alone will not provide the necessary gain, especially in energy efficiency for wireless transceivers, which have tightly constrained power and energy budgets. In this talk we will focus on channel coding, which is a major source of complexity in digital baseband processing. We will give an overview and first results of the EPIC project, funded by European Union's Horizon 2020 research and innovation program, that aims to develop a new generation of Forward-Error-Correction codes in a manner that will serve as a fundamental enabler of practicable beyond 5G wireless Tb/s solutions. We will highlight implementation challenges for the most advanced channel coding techniques, i.e. Turbo codes, Low Density Parity Check (LDPC) codes and Polar codes and present decoder architectures for all three code classes that are designed for highest throughput.

2018

A Simple Relaxation Scheme for Polar Codes
Sungkwon Hong, Onur Sahin, Chunxuan Ye and Fengjun Xi; ICTC 2018
expand[ More ]


Abstract: In this paper, a simple relaxation scheme to reduce the encoding and decoding complexity of polar codes is introduced. Unlike the conventional relaxation schemes, the proposed technique relies on selecting relevant encoding/decoding nodes based on initialized relaxation attribute values and their extension to the remainder of the encoder and decoder stages. We show that the proposed relaxation scheme provides comparable BLER performance to the conventional polar codes by numerical simulations, while having significant complexity reduction.

Design of Low-Complexity Convolutional Codes over GF(q )
Rami Klaimi, Charbel Abdel Nour, Catherine Douillard and Joumana Farah; Globecom 2018
expand[ More ]


Abstract: This paper proposes a new family of recursive systematic convolutional codes, defined in the non-binary domain over different Galois fields GF(q) and intended to be used as component codes for the design of non-binary turbo codes. A general framework for the design of the best codes over different GF(q) is described. The designed codes offer better performance than the non-binary convolutional codes found in the literature. They also outperform their binary counterparts when combined with their corresponding QAM modulation or with lower order modulations.

Low-complexity decoders for non-binary turbo codes
Rami Klaimi, Charbel Abdel Nour, Catherine Douillard and Joumana Farah, ISTC 2018
expand[ More ]


Abstract: Following the increasing interest in non-binary coding schemes, turbo codes over different Galois fields have started to be considered recently. While showing improved performance when compared to their binary counterparts, the decoding complexity of this family of codes remains a main obstacle to their adoption in practical applications. In this work, a new low-complexity variant of the Min-Log-MAP algorithm is proposed. Thanks to the introduction of a bubble sorter for the different metrics used in the Min-Log-MAP decoder, the number of required computations is significantly reduced. A reduction by a factor of 6 in the number of additions and compare-select operations can be achieved with only a minor impact on error rate performance. With the use of an appropriate quantization, the resulting decoder paves the way for a future hardware implementation.

Mitigating Correlation Problems in Turbo Decoders
Ronald Garzon-Bohorquez, Rami Klaimi, Charbel Abdel Nour and Catherine Douillard; ISTC 2018
expand[ More ]


Abstract: In this paper, new interleaver design criteria for turbo codes are proposed, targeting the reduction of the corre-lation between component decoders. To go beyond the already known correlation girth maximization, we propose several addi-tional criteria that limit the impact of short correlation cycles and increase code diversity. Two application examples are elaborated, targeting an 8-state binary turbo code and a non-binary turbo code defined over GF(64). The proposed design criteria are shown to improve the error correcting performance of the code, especially in the error floor region.

25 Years of Turbo Codes: From Mb/s to beyond 100 Gb/s
Stefan Weithoffer, Charbel Abdel Nour, Norbert Wehn, Catherine Douillard, Claude Berrou; ISTC 2018
expand[ More ]


Abstract: In this paper, we demonstrate how the development of parallel hardware architectures for turbo decoding can be continued to achieve a throughput of more than 100 Gb/s. A new, fully pipelined architecture shows better error correcting performance for high code rates than the fully parallel ap-proaches known from the literature. This is demonstrated by comparing both architectures for a frame size K = 128 LTE turbo code and a frame size K = 128 turbo code with parity puncture constrained interleaving. To the best of our knowledge, an investigation of the error correcting performance at high code rates of fully parallel decoders is missing from the literature. Moreover, place & route results for a case study implementation of the new architecture on 28 nm technology show a throughput of 102:4 Gb/s and an area efficiency of 4:34 Gb/s making it superior to reported implementations of other parallel decoder hardware architectures.

expand[ More ]


Abstract: IEEE 802.11ay is the amendment to the 802.11 standard that enables Wi-Fi devices to achieve 100~Gbps using the unlicensed mm-Wave (60~GHz) band at comparable ranges to today's commercial 60~GHz devices based on the 802.11ad standard. In this paper, we propose a full row-based layer LDPC decoder supporting all the coding rates for 802.11ay. Taking the property of the parity check matrix of 802.11ay, combining multiple layers into single layer improves the hardware utilization hence increases the throughput. Frame interleaved scheduling increases the throughput significantly by making best use of each pipeline stage. The decoder is synthesized at both 28~nm and 16~nm CMOS technology. For the 28~nm implementation running at 600~MHz, it achieves a throughput of 67~Gbps for coding rate 13/16 with an average power cinsumption of 323 mW at 4 iterations, yielding energy efficiency of 4.8~pJ/bit and area efficiency of 160~Gbps/sqmm. For 16~nm running at 1~GHz, the decoder achieves a throughput of 114~Gbps with an average power of 319 mW at 4 iterations. This gives an improved energy efficiency of 2.8~pJ/bit and area efficiency of 589~Gbps/sqmm.

expand[ More ]


Abstract: The continuous demands on increased spectral efficiency, higher throughput, lower latency and lower energy in communication systems impose large challenges on the baseband processing in wireless communication. This applies in particular to channel coding (Forward Error Correction) that is a core technology component in any digital baseband. Future Beyond- 5G use cases are expected to require wireless data rates in the Terabit/s range in a power envelope in the order of 1-10 Watts. In the past, progress in microelectronic silicon technology driven by Moore’s law was an enabler of large leaps in throughput, lower latency, lower power etc. However, we have reached a point where microelectronics can no more keep pace with the increased requirements from communication systems. In addition, advanced technology nodes imply new challenges such as reliability, power density, cost etc. Thus, channel coding for Beyond-5G systems requires a real cross layer approach, covering information theory, algorithm development, parallel hardware architectures and semiconductor technology. The EPIC project addresses these challenges and aims to develop new Forward Error Correction (FEC) schemes for future Beyond-5G use cases targeting a throughput in the Tb/s range. Focus will be on the most advanced FEC schemes, i.e. Turbo codes, Low Density Parity Check (LDPC) codes and Polar codes

Polar Code decoder exploration framework
Claus Kestel, Stefan Weithoffer, Norbert Wehn; advances in radio science
expand[ More ]


Abstract: The increasing demand for fast wireless communications requires sophisticated baseband signal processing. One of the computational intense tasks here is advanced Forward Error Correction (FEC), especially the decoding. Finding efficient hardware implementations for sophisticated FEC decoding algorithms that fulfill throughput demands under strict implementation constraints is an active research topic due to increasing throughput, low latency, and high energy efficiency requirements. This paper focuses on the interesting class of Polar Codes that are currently a hot topic. We present a modular framework to automatically generate and evaluate a wide range of Polar Code decoders, with emphasis on design space exploration for efficient hardware architectures. To demonstrate the efficiency of our framework a very high throughput Soft Cancellation (SCAN) Polar Code decoder is shown that was automatically generated. This decoder is, to the best of our knowledge, the fastest SCAN Polar Code decoder published so far.

2017

expand[ More ]


Abstract: A method to design efficient puncture-constrained interleavers for turbo codes (TCs) is introduced. Resulting TCs profit from a joint optimization of puncturing pattern and interleaver to achieve an improved error rate performance. First, the puncturing pattern is selected based on the constituent code Hamming distance spectrum and on the TC extrinsic information exchange under uniform interleaving. Then, the interleaver function is defined via a layered design process taking account of several design criteria such as minimum span, correlation girth, and puncturing constraints. We show that applying interleaving with a periodic cross connection pattern that can be assimilated to a protograph improves error-correction performance when compared to the state-of-the-art TCs. An application example is elaborated and compared with the long term evolution (LTE) standard: a significant gain in performance can be observed. An additional benefit of the proposed technique resides in the important reduction of the search space for the different interleaver parameters.

expand[ More ]


Abstract: The continuing trend towards higher data rates in wireless communication systems will, in addition to a higher spectral efficiency and lowest signal processing latencies, lead to throughput requirements for the digital baseband signal processing beyond 100 Gbit/s, which is at least one order of magnitude higher than the tens of Gbit/s targeted in the 5G standardization. At the same time, advances in silicon technology due to shrinking feature sizes and increased performance parameters alone won’t provide the necessary gain, especially in energy efficiency for wireless transceivers, which have tightly constrained power and energy budgets. In this paper, we highlight the challenges for wireless digital baseband signal processing beyond 100 Gbit/s and the limitations of today’s architectures. Our focus lies on the channel decoding and MIMO detection, which are major sources of complexity in digital baseband signal processing. We discuss techniques on algorithmic and architectural level, which aim to close this gap. For the first time we show Turbo-Code decoding techniques towards 100 Gbit/s and a complete MIMO receiver beyond 100 Gbit/s in 28 nm technology.

Showcases

Background Image


We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.