Low-Complexity Non-Iterative Soft-Decision BCH Decoder Architecture for WBAN Applications

Boseok Jung, Taesung Kim, and Hanho Lee

Abstract—This paper presents a low-complexity non-iterative soft-decision Bose-Chaudhuri-Hocquenghem (SD-BCH) decoder architecture and design technique for wireless body area networks (WBANs). A SD-BCH decoder with test syndrome computation, a syndrome calculator, Chien search and metric check, and error location decision is proposed. The proposed SD-BCH decoder not only uses test syndromes, but also does not have an iteration process. The proposed SD-BCH decoder provides a 0.75~1 dB coding gain compared to a hard-decision BCH (HD-BCH) decoder, and almost similar coding gain compared to a conventional SD-BCH decoder. The proposed SD-BCH (63, 51) decoder was designed and implemented using 90-nm CMOS standard cell technology. Synthesis results show that the proposed non-iterative SD-BCH decoder using a serial structure can lead to a 75% reduction in hardware complexity and a clock speed 3.8 times faster than a conventional SD-BCH decoder.

Index Terms—BCH codes, soft-decision decoding, decoder, low-complexity, WBAN

I. INTRODUCTION

Recently, wireless body area network (WBAN) technology has drawn wide attention owing to the demand for short-range wireless communication and the advent of energy-efficient VLSI technology. The WBAN [1] is a special purpose sensor network designed to operate autonomously to connect various medical sensors and appliances, located inside and outside of a human body. It consists of small, intelligent devices attached on or implanted in the body, which are capable of establishing a wireless communication link.

Bose-Chaudhuri-Hocquenghem (BCH) codes are important multiple-error-correcting cyclic codes that are widely used in communications and storage systems [2]-[4]. In the WBAN [1], the BCH (63, 51) code and its shortened (31, 19) code are adopted to enhance transmission reliability under different channel conditions. Either hard-decision BCH (HD-BCH) decoders or soft-decision BCH (SD-BCH) decoders can be adopted at the receiver. In general, a SD-BCH decoder can achieve better error performance compared to a HD-BCH decoder, and the improvement in bit error rate (BER) performance of the SD-BCH decoder can translate into power savings at the transmitter, given the same data link requirements.

In general, there are several methods for SD-BCH codes. Maximum likelihood decoding (MLD) [5], generalized minimum distance (GMD) [6], sliding encoding-window (SEW) [7], and Chase algorithms [8] were developed to produce a list of candidate codeword. In several soft-decoding methods, a Chase algorithm can be used to correct errors with a hard-decision kernel. Generally, a SD-BCH decoder with a Chase algorithm has an iteration process and also requires a test pattern generator.

Yang et al. [9] found that a SD-BCH decoder for energy-constrained WBANs provided a 1dB coding gain, compared to a HD-BCH decoder. In order to reduce the energy dissipation and area, Yang et al. [9] presented...
early termination (ET), probabilistic sorting and pass-
transistor logic–based Chien search. An early termination
scheme was used to reduce the number of unnecessary
test patterns. A probabilistic sorting scheme is utilized to
reduce the architectural complexity of the sorting circuit.
For the hard-decision kernel, Yang et al. used a Peterson
algorithm, which uses the determinant value.

In this paper, we propose a non-iterative SD-BCH
decoding algorithm and efficient decoder architecture
without iteration. Specifically, the SD-BCH decoder has
test syndrome computation (TSC), which eliminates the
test pattern generator, a hard-decision kernel based on a
modified step-by-step (m-SBS) algorithm [4], and error
location decision (ELD). Although the SD-BCH decoder
generally uses an iteration method, the proposed SD-
BCH decoder does not require iteration. Moreover, in
order to reduce hardware complexity, optimized test
syndrome computation, a syndrome factor calculator, and
error location decision are also presented.

The rest of this paper is organized as follows. In
Section II, SD-BCH decoding algorithm is briefly
described. Section III describes the proposed non-
iterative SD-BCH decoding algorithm and provides
analysis of a BER simulation. Section IV presents the
proposed SD-BCH decoder architecture and design
techniques. In Section V, the results and a performance
comparison are presented. Finally, a conclusion is
provided in Section VI.

II. OVERVIEW OF SOFT-DECISION BCH
DECODING ALGORITHM

In this section, we introduce the basics of SD-BCH (n,
k, t) decoding algorithm, n and k denote the codeword
length and the information length, respectively, and t
denotes the error correcting capability. Type II Chase-
based SD-BCH decoding algorithm has been discussed in
[8, 9]. The Chase-II algorithm is a sub-optimum soft-
decision algorithm that uses an error correction only
hard-decision decoding (HDD) as the kernel.

The procedure for the Chase-II algorithm is described as
follows:
1) Find the locations of the least reliable bits (LRBs),
where \( p = \lceil d_{\text{min}}/2 \rceil \), and \( d_{\text{min}} \) is the minimum Hamming
distance of the codeword.
2) Generate test patterns by considering all
combinations of the LRBs.
3) Decode each test pattern by using the HDD kernel.
If this test pattern is decoded successfully using the HDD
kernel, the decoded word is regarded as a candidate
codeword.
4) Evaluate the Euclidean distance for each candidate
 codeword in the list and select the one with the smallest
Euclidean distance as the best decision codeword.

In fact, it is unnecessary for the HDD kernel in the
Chase algorithm to process all test patterns since
redundant test patterns can be eliminated to save energy.

In [9, 10], the authors proposed a test pattern generator
to correct more errors than HD-BCH decoder. This idea
comes from adding LRBs in a codeword, which is passed
a hard-decision unit. The procedure for the conventional
SD-BCH decoding using test-pattern generator is
described as follows:

1) Generate a candidate codeword from a received
codeword \( r(x) = r_0 x^0 + r_1 x^1 + r_2 x^2 + \ldots + r_{n-1} x^{n-1} \),
where codeword length is \( n \).

2) Assume that LRBs locations are \( x^0 \) and \( x^t \), test
pattern polynomials, which generated in test pattern
generator, are:

\[
lp_1(x) = r_0 x^0 + r_1 x^1 + r_2 x^2 + \ldots + r_{n-1} x^{n-1} \quad \text{for test pattern 1},
\]

\[
lp_2(x) = r_0 x^0 + r_1 x^1 + r_2 x^2 + \ldots + r_{n-1} x^{n-1} + r_{e_0} x^0 \quad \text{for test pattern 2},
\]

\[
lp_3(x) = r_0 x^0 + r_1 x^1 + r_2 x^2 + \ldots + r_{n-1} x^{n-1} + r_{e_0} x^0 + r_{e_1} x^t \quad \text{for test pattern 3},
\]

\[
lp_4(x) = r_0 x^0 + r_1 x^1 + r_2 x^2 + \ldots + r_{n-1} x^{n-1} + r_{e_1} x^t \quad \text{for test pattern 4},
\]

where \( p = 2, r_i(0 \leq i \leq n-1), r_{e_0}(0 \leq i \leq n-1), r_{e_0} \) and \( r_{e_1} \) is in
\( GF(2) \). By definition, the error pattern is \( e(x) = r(x) - v(x) = e_{j_1} x^{j_1} + e_{j_2} x^{j_2} + e_{j_3} x^{j_3} \ldots + e_{j_v} x^{j_v} \),
where \( 0 \leq j_1 < j_2 < \ldots < j_v \leq n-1 \) and contained errors is \( v \). So, from
equation \( e(x) \), error patterns are described as follows:

\[
e_{p_1} (x) = x^{j_1} + x^{j_2} + x^{j_3} + \ldots + x^{j_v} \quad \text{for error pattern 1},
\]

\[
e_{p_2} (x) = x^{j_1} + x^{j_2} + x^{j_3} + \ldots + x^{j_v} + x^0 \quad \text{for error pattern 2},
\]

\[
e_{p_3} (x) = x^{j_1} + x^{j_2} + x^{j_3} + \ldots + x^{j_v} + x^0 + x^t \quad \text{for error pattern 3},
\]

\[
e_{p_4} (x) = x^{j_1} + x^{j_2} + x^{j_3} + \ldots + x^{j_v} + x^t \quad \text{for error pattern 4},
\]
where \( 0 \leq t0 \leq n-1 \) and \( 0 \leq tl \leq n-1 \). Syndromes can be calculated using the error patterns. To calculate syndrome \( S_x \) substitute for \( x \) in \( \epsilon(x) \). Thus syndrome is 
\[
S_i = (a^{i1})^j + (a^{i2})^j + ... + (a^{in})^j
\]
Syndromes of test pattern can be calculated as

\[
\begin{align*}
S_{tp1} &= (a^{i1})^j + (a^{i2})^j + ... + (a^{in})^j, \\
S_{tp2} &= (a^{i1})^j + (a^{i2})^j + ... + (a^{in})^j, \\
S_{tp3} &= (a^{i1})^j + (a^{i2})^j + ... + (a^{in})^j + (a^{i1})^j, \\
S_{tp4} &= (a^{i1})^j + (a^{i2})^j + ... + (a^{in})^j + (a^{i1})^j,
\end{align*}
\]

where \( 1 \leq i \leq 2t; a^{i1}, a^{i2}, a^{i3}, ..., a^{in} \) are elements of \( GF(2^n) \) and unknown value, but \( a^{i0} \) and \( a^{1h} \) are already known value. Syndromes can be calculated whenever test pattern is generated.

### III. PROPOSED NON-ITERATIVE SOFT-DECISION BCH DECODING ALGORITHM

Suppose an HD-BCH \((n, k, t)\) code has an \( n \)-bit codeword, a \( k \)-bit message and \( t \) correctable bits using algebraic decoders, operation under a Galois-field \( GF(2^n) \), and that \( a \) is the primitive root over the primitive polynomial \( f(x) \). Otherwise, with a SD-BCH code based on a Chase algorithm [8], it is possible that the number of corrected errors is \( t + p \), which determines the position of the \( p \) least reliable bits (LRBs), where \( p \leq d_{min}/2 \). This requires a test pattern generator and an error-correction-only hard-decision decoder (HDD) as the kernel. Chase algorithm–based soft-decision decoders (SDDs) evaluate the soft-decision metric using the HDD kernel. However, in case of BCH (63, 51, 2) code and \( p = 2 \), we only need a test syndrome \( a^{p0}, a^{p1} \) without a test pattern generator. The major steps for the proposed non-iterative SD-BCH (63, 51, 2) decoding are described as follows:

1. Separate the hard-decision value and magnitude in the log likelihood ratio (LLR) input bits.
2. Compute syndromes \( S_i, S_j \) from the syndrome calculator (SC) and test syndromes \( a^{i1}, a^{i2} \) from TSC.
3. Calculate syndrome factors \( R_{hd}, A_{hd}, B_{hd}, R_{tp2}, A_{tp2}, B_{tp2}, R_{tp3}, A_{tp3}, B_{tp3}, R_{tp4}, A_{tp4}, B_{tp4} \) with syndromes and test syndromes.
4. Find the \( H \) values from a Chien search (CS), and calculate the metric value \( M \) from the \( H \) values.
5. Decide the decision codeword \( d \) from the controller and successfully correct the codeword.

In addition, the proposed non-iterative SD-BCH decoding algorithm is described in detail as follows:

#### Proposed Non-Iterative Soft-Decision BCH (63, 51, 2) Decoding

1. **Separate hard-decision and magnitude**
   - **Input:** \( r_{tp1}, r_{tp2}, r_{tp3}, r_{tp4} \) (i = n-1, ..., 0)
   - \( r_{min} = r_{tp0} \), \( m = \{1\} \)
2. ** Syndrome Calculator and Test Syndrome Computation**
   - **Syndrome Calculator:**
     - \( S_i = \sum r_{tp1} \cdot a^i, S_j = \sum r_{tp3} \cdot a^{i1} \) (i = n-1, ..., 0)
   - **Test Syndrome Computation:**
     - **Initial:** \( m_{min} = \{1,1,1\}, m_0 = \{1,1,1\} \)
     - for \( i = n-1 \) until \( i = (n-1)/2 + 1 \) begin
       - if \( m_i < m_{min} \) then \( m_i = m_i \), \( a^i = a^i \)
     - end
     - for \( i = (n-1)/2 \) until \( i = 0 \) begin
       - if \( m_i < m_{min} \) then \( m_i = m_i, a^i = a^i \)
     - end
3. ** Syndrome Factor Calculator**
   - **Input:** \( S_i, S_j, a^{i0}, a^{i1} \)
   - **Hard-decision:**
     - \( R_{hd} = S_i^1 + S_j, A_{hd} = S_i^1, B_{hd} = S_i \)
   - **Test Patterns:**
     - \( A_{tp1} = R_{hd} + A_{hd} (a^{i0})^j + B_{hd} a^{0} \)
     - \( A_{tp2} = R_{hd} + A_{hd} (a^{i0})^j + B_{hd} a^{0} \)
     - \( A_{tp3} = R_{hd} + A_{hd} (a^{i0})^j + B_{hd} a^{0} \)
     - \( A_{tp4} = R_{hd} + A_{hd} (a^{i0})^j + B_{hd} a^{0} \)
4. **Chien Search and Metric Check**
   - **Input:** \( R_i, A_j, B_j \) (j = HD, tp2, tp3, tp4)
   - **Chien Search:**
     - for \( j = HD \) until \( tp4 \) begin
       - if \( R_i = A_j, a^i + B_j, a^{i1} \) then \( H_{ij} = 1 \)
       - else \( H_{ij} = 0 \) (i = n-1, ..., 0)
     - end
   - **Metric Check:**
     - for \( j = HD \) until \( tp4 \) begin
       - \( M_i = \sum H_{ij} \) (i = n-1, ..., 0)
     - end
5. **Error Location Decision**
   - **Input:** \( H_{hd}, H_{tp2}, H_{tp3}, H_{tp4}, a^{i0}, a^{i1} \)
   - for \( i = n-1 \) until \( i = 0 \) begin
     - if \( (a^{i0} + a^{i1}) = 0 \) \( H_{li} = 1 \) else \( H_{li} = 0 \)
   - if \( (ctr = 0) \) \( d_i = H_{hd} \)
   - else if \( (ctr = 1) \) \( d_i = H_{hd} + H_{tp2} \)
   - else if \( (ctr = 2) \) \( d_i = H_{hd} + H_{tp3} + H_{tp4} \)
   - else \( d_i = H_{tp2} + H_{tp3} \)
   - end

Initially, we can separate hard-decision \( r_{hd,i} \) with magnitude \( |r_i| \) (i = n-1, ..., 0) from the received LLR. The hard-decision \( r_{hd,i} \) which is one bit, is a most significant bit of the LLR and is fed into the hard-decision decoder.
decision kernel to calculate the syndromes. On the other hand, magnitude $|r_j|$ is used to find a test syndrome value in test syndrome computation.

Yang et al. [9] demonstrated a probabilistic sorting scheme to have a correct second minimum value with a high probability. The second minimum value is generated from the last two stages of the tree. On the other hand, in the proposed method, two minimum values in serial processing can be searched for the test syndrome value to divide the total clock cycle into halves. Therefore, the syndrome factors are calculated to be

\[ \min_0 \text{ and } \min_1 \text{ store the minimum value} \]

magnitude $|r_j|$ to compare with the next magnitude value. Initially, $\min_0$ and $\min_1$ are both $\{1, 1, 1\}$ to be compared with $|r_j|$, where $i = n - 1, \ldots, 0$.

After computing syndromes and test syndromes, the values of $A$, $B$, and $R$ as syndrome factors are calculated by the syndrome factor calculator. In the m-SBS algorithm [4], the syndrome factors are determined from syndrome values $S_i$ and $S_j$ as follows:

\[
\begin{align*}
\left(S_i + S_j\right) &= S_i \alpha^j + S_j \alpha^{2j} \quad (0 \leq j \leq n-1) \\
R &= A \alpha^j + B \alpha^{2j}
\end{align*}
\]

In Eqs. (1, 2), the values of $A$, $B$, and $R$ are determined to be $A = S_i^2$, $B = S_j$, $R = S_i^3 + S_j$ and correct errors from the Chien search. Therefore, syndrome factors $A_{HD}$, $B_{HD}$, and $R_{HD}$ of the hard decision in the proposed SD-BCH decoder are the same values as $A$, $B$, and $R$. However, the syndrome factors of test patterns are different from each other, because test syndromes should be added to the syndromes ($S_i$, $S_j$). The syndromes of test patterns can be reformulated as follows:

Syndrome of test pattern 1:

\[ S_{tp1} = (\alpha^j)^j + (\alpha^2)^j + \ldots + (\alpha^n)^j, \]

Syndrome of test pattern 2:

\[ S_{tp2} = (\alpha^j)^j + (\alpha^2)^j + \ldots + (\alpha^n)^j + (\alpha^0)^j = S_{tp1} + (\alpha^0)^j, \]

Syndrome of test pattern 3:

\[ S_{tp3} = (\alpha^j)^j + (\alpha^2)^j + \ldots + (\alpha^n)^j + (\alpha^0)^j + (\alpha^1)^j = S_{tp2} + (\alpha^1)^j, \]

Syndrome of test pattern 4:

\[ S_{tp4} = (\alpha^j)^j + (\alpha^2)^j + \ldots + (\alpha^n)^j + (\alpha^4)^j = S_{tp1} + (\alpha^4)^j, \]

That is, in $S_j$ and $S_p$, $tp$ syndrome factors include the value of $\alpha^i$, and $tp$ syndrome factors include the values of $\alpha^0$ and $\alpha^4$. Finally, $tp$ syndrome factors include the value of $\alpha^i$. In the case of $tp_2$, by adding $\alpha^0$, the values of $A_{tp_2}$, $B_{tp_2}$, and $R_{tp_2}$ are calculated as follows:

\[
R_{tp_2} = (S_i + \alpha^0)^3 + (S_j + (\alpha^0)^3) = S_i^3 + S_j + S_i \cdot \alpha^0 + S_j \cdot \alpha^2 + \alpha^0 \cdot \alpha \cdot 2 + \alpha^0
\]

\[
A_{tp_2} = (S_i + \alpha^0)^2 = S_i^2 + \alpha^2 + \alpha \cdot 2 + \alpha
\]

\[
B_{tp_2} = S_i + \alpha^0 = B_{HD} + \alpha^0
\]

where the $GF(2^n)$ adder element is implemented using an exclusive or (XOR) operation. Since the test syndrome values indicate the locations of LRBs, the syndrome factors of test patterns can be determined by adding test syndromes in Eqs. (3-5). For example, syndrome $S_i$ is changed to the value of $S_j$ by adding test syndrome $\alpha^i$, and syndrome $S_j$ is changed to the value of $S_j$ by adding test syndrome $\alpha^j$. Thus, the syndrome factors of the second test pattern $tp_2$ include syndrome factors of the hard decision and test syndrome $\alpha^i$. The syndrome factors of the fourth test pattern $tp_4$ include $A_{tp_2}$, $B_{tp_2}$, and test syndrome $\alpha^4$. On the other hand, the syndrome factors of third test pattern $tp_3$ are different. For $tp_3$, the values of $A_{tp_3}$, $B_{tp_3}$, and $R_{tp_3}$ can be calculated as

\[
R_{tp_3} = (S_i + \alpha^0 + \alpha^4)^3 + (S_j + \alpha^3 + \alpha^2 + \alpha^4) = (S_i^3 + S_j + S_i^2 + \alpha^0 + S_j + \alpha^2 + \alpha^4)
\]

\[
A_{tp_3} = (S_j + \alpha^0)^2 = S_j^2 + \alpha^2 + \alpha \cdot 2 + \alpha
\]

\[
B_{tp_3} = S_i + \alpha^0 + B_{tp_2} + \alpha^4
\]

In Eqs. (6-8), the syndrome factors of $tp_3$ are arranged by the syndrome factor of $tp_2$ and $\alpha^i$. Consequently,
the equations of syndrome factors are simple and can make an efficient area of the syndrome factor calculator (SFC) block. The Chien search calculates the $H$ value that has the information about error location in each codeword of the hard decision and test pattern.

In Eq. (2), if $R + Aa^j + Bc^j = 0$, where $j = n - 1, \ldots, 0$, the $H$ value is 1. If $R + Aa^j + Bc^j \neq 0$, the $H$ value is zero. While the $H$ value is calculated, the metric check computes the $M$ value, which is the number of errors, by adding the $H$ value. Finally, in the error location decision, the decision codeword $d$ is set to 1 between the hard decision and test patterns by the controller.

Fig. 1 shows the BER performance comparison for the proposed (63, 51) SD-BCH decoder, a conventional SD-BCH (63, 51) decoder, and a HD-BCH (63, 51) decoder.

IV. PROPOSED NON-ITERATIVE SOFT-DECISION BCH DECODER ARCHITECTURE

The proposed SD-BCH decoder architecture consists of four major units: TSC, hard-decision kernel, ELD, and controller, as shown in Fig. 2. The hard-decision kernel consists of a syndrome calculator (SC), a syndrome factor calculator (SFC), and Chien search (CS) and metric check (MC).

In this subsection, we discuss the detailed hardware architecture of the proposed TSC architecture using a loop circuit. Fig. 3 shows the proposed TSC block. Since the test pattern generator requires a lot of resistors, the proposed TSC architecture only uses the test syndrome values by the loop circuit generating a syndrome value without the location of min$_1$, min$_2$. The proposed TSC block generates test syndromes $\alpha^0$, $\alpha^1$ by inserting the magnitude of the received LLR. The TSC block needs control signal $\text{ctr}_{TS}$ and comparison signal $\text{cp}$ to get test syndrome $\alpha^0$ and $\alpha^1$; $\text{ctr}_{TS}$ is 1, while magnitude is inserted from $|f_{n,1}|$ to $|f_{n+1,j/2}|$. If $\text{cp}$ is 1, the inserted magnitude value is smaller than the previous magnitude value. Then, test syndrome $\alpha^0$ is updated to a syndrome value that indicates the location of the smallest magnitude. If $\text{cp}=0$, the stored magnitude value is smaller than the inserted magnitude value, and $\alpha^0$ holds the previous value.

2. Hard-Decision Kernel

The hard-decision kernel uses a HD-BCH decoding structure based on an m-SBS algorithm [4]. It consists of the SC block, the SFC block, and the CS and MC block, as shown in Fig. 2.
A. Syndrome calculator

The SC block calculates all the syndromes $S_i$ ($1 \leq i \leq 2^t-1$) by inserting a hard-decision bit.

As mentioned, $S_i$ ($i=1, 3$) is required in m-SBS algorithm-based HD-BCH decoding [4] to calculate the syndrome. The SC consists of a parallel SC cell to receive the parallelized codeword, as shown in Fig. 4(a).

B. Syndrome Factor Calculator

Fig. 4(b) shows the detailed hardware architecture of the proposed SFC architecture. The SFC block calculates the syndrome factors, which are $R$, $A$, and $B$ using the syndrome values $S_i$, $S_j$ from the SC block, and the test syndrome values $\alpha^i$, $\alpha^j$ from the TSC block. The SFC block consists of $GF$ multipliers and $GF$ adders. Therefore, the architecture of the SFC block is comparatively simple and has an efficient area, because it remains unaffected by a serial or parallel structure.

C. Chien Search and Metric Check

Fig. 4(c) shows the CS and MC block, which consists of four parallel blocks because the proposed SD-BCH decoder operates without iteration. The values of the syndrome factor, which are outputs of the SFC block, are fed to the CS block. The function of a NOR gate replaces the syndrome value with the $H$ value using Eq. (2). The output $H$ values can be calculated with the CS block to search the error location information. The CS block checks whether an error has occurred. If $R + A\alpha^i + B\alpha^j = 0$, then the $H$ value is 1; that is, an error has occurred in the $j^{th}$ location. Otherwise, the $H$ value is zero; that is, no error has occurred. The CS block consists of constant $GF$ multipliers, $GF$ adders, multiplexers, NOR gates, and D flip-flops. In the MC block, metric value $M$, which is the number of errors, is computed by adding the $H$ values during 63 clock cycles. For example, if the number of errors is two, the $M$ value becomes 2.

3. Error Location Decision

The ELD block consists of a constant $GF$ multiplier, $GF$ adders, multiplexers, NOR gates and D-FFs, as shown in Fig. 5. The ELD block checks the error location and corrects errors according to $H$ values. The m-bit $H$ value can be transformed to a one-bit value by bitwise NOR. If the $H$ value is non-zero, otherwise, it is zero. In the proposed method, two types of $H$ value are required to check the error location and correct errors. The first type of $H$ value is the reference value that has information about the number of errors in the codeword.
The second type of H value is compared with the reference H value to check whether the bit location is erroneous. The decision codeword d is decided by control signal ctr, which is selected by the controller. The main loop circuit generates \( \alpha^j \), where \( j = n-1, \ldots, 0 \).

### 4. Controller

In the controller, \( ctr \) is selected by \( R_{HD}, R_{q2}, R_{q3}, \) and \( R_{q4} \), which are the syndrome factors from the SFC block, and metric value \( M \). First, control signal \( ctr \) is decided by \( R \) values. For example, if \( R_{HD} \) is zero, \( ctr \) is 1 and selects the hard-decision candidate codeword. In addition, if \( R_{HD} \) is not zero and \( R_{q2} \) is zero, \( ctr \) is 1 and decides the second test pattern. If \( R_{HD} \) and \( R_{q2} \) are not zero and \( R_{q3} \) is zero, \( ctr \) is 2. Finally, if \( R_{q4} \) is zero and \( R_{HD}, R_{q2}, \) and \( R_{q3} \) are not zero, \( ctr \) is 3 and selects the third test pattern. It means that control signal \( ctr \) decides one of the test patterns, because \( R \) value = zero indicates that the number of errors is none or 1.

### V. RESULTS AND COMPARISON

The proposed SD-BCH decoder architecture was modeled in Verilog HDL and then simulated to verify its functionality using a test pattern generated from a C simulator. After complete verification of the design functionality, it was then synthesized using appropriate time and area constraints. Both simulation and synthesis steps were carried out using the SYNOPSYS design tool and 90-nm CMOS technology.

Table 1 shows the hardware complexity comparison of the proposed SD-BCH decoder for a P-parallel factor. The basic building blocks of the SD-BCH decoder are the GF multiplier, GF adder, multiplexer, D flip-flop, and the comparator. The number of complex operation units (e.g., GF multiplier, GF adder) increases depending on the parallel factor. However, the number of GF multipliers and GF adders in the SFC block is the same, regardless of the parallel factor.

Table 2 shows the performance comparisons between the proposed SD-BCH decoders using levels of parallelism \( P = 1, 3, 7 \) and the conventional SD-BCH decoder using a fully-parallel hard-decision kernel [9]. The proposed SD-BCH decoder using a serial structure has 4,121 NAND gate counts from the synthesized results, which shows a 75% gate count saving, compared with the conventional SD-BCH decoder. The proposed BCH decoder operates at a clock rate of 250 MHz, has a latency of 126 clocks, and throughput of 202 Mbps. In addition, the proposed SD-BCH decoder using a 7-parallel structure performs 1.68 times throughput with gate-count savings of 50%, compared with a
conventional SD-BCH decoder. If there are single error in the input, the conventional SD-BCH decoder [9] has 1 processing cycle (= 15 ns) due to a fully-parallel (P = 63) hard-decision kernel and early termination scheme. However, if there are more than two errors in the input, the processing cycles of a conventional SD-BCH decoder will be increased with iteration. The proposed SD-BCH decoder has the same processing cycles regardless of the number of errors. That is, the proposed SD-BCH (63, 51) decoders with 3-parallel (P = 3) and 7-parallel (P=7) have 42 and 18 processing cycles, respectively. In addition, the efficiency of the SD-BCH decoder is calculated by the throughput-to-gate-count ratio (Mbps/M gates). The proposed SD-BCH decoders with P=3 and P=7 have much better efficiency compared to the conventional SD-BCH decoder using a fully-parallel (P = 63) hard-decision kernel. For the same BER performance, the proposed SD-BCH decoder has an efficient area and continuously operates without iteration.

VI. CONCLUSIONS

This paper presents a low-complexity non-iterative SD-BCH decoder architecture and its efficient design techniques. A hardware-friendly, non-iterative SD-BCH decoding algorithm is proposed and adopted for the SD-BCH decoder. In addition, a novel test syndrome computation, a hard-decision kernel, and an error location decision block are proposed. The proposed SD-BCH decoder has better BER performance than a HD-BCH decoder and significantly less hardware complexity than a conventional SD-BCH decoder.

ACKNOWLEDGMENTS

This research was supported by MSIP, Korea, under the ITRC support program (IITP-2016-H8501-16-1019) supervised by the IITP, and by the Basic Science Research Program through the NRF funded by the Ministry of Science, ICT and Future Planning (2013R1A2A2A01068628).

REFERENCES


BoSeok Jung received the B.S and M.S degrees, both in Information & Communication Engineering, from Inha University, Incheon, Korea, in 2013 and 2015, respectively. His research interests VLSI and SOC architecture design for digital signal processing and communication systems.
Taesung Kim received the B.S degree in Information & Communication Engineering in 2015, from Anyang University, Anyang, Korea. Now, he is currently working toward the M.S degree in Inha University, Incheon, Korea. His research interests VLSI and SOC architecture design for digital signal processing and communication systems.

Hanho Lee received Ph.D. and M.S. degrees, both in Electrical & Computer Engineering, from the University of Minnesota, Minneapolis, in 2000 and 1996, respectively. In 1999, he was a Member of Technical Staff-1 at Lucent Technologies, Bell Labs, Holmdel, New Jersey. From April 2000 to August 2002, he was a Member of Technical Staff at the Lucent Technologies (Bell Labs Innovations), Allentown. From August 2002 to August 2004, he was an Assistant Professor at the Department of Electrical and Computer Engineering, University of Connecticut, USA. Since August 2004, he has been with the Department of Information and Communication Engineering, Inha University, where he is currently Professor. He was a visiting researcher at Electronics and Telecommunications Research Institute (ETRI), Korea, in 2005. From August 2010 to August 2011, he was a visiting scholar at Bell Labs, Alcatel-Lucent, Murray Hill, New Jersey, USA. His research interest includes VLSI architecture design for digital signal processing, forward error correction architectures, cryptographic systems, and communications.