Two MDS Array Codes for Disk Erasures: the Blaum-Bruck-Vardy Code and the BASIC Code

In this post, I am going to review two erasure codes: the Blaum-Bruck-Vardy code and the BASIC code (also here). These are erasure codes, which means, their purpose is to encode a number of data disks into a number of coding disks so that when one or more data/coding disks fail, the failed disk can be reconstructed using the existing data and coding disks.

A strength of these codes is that although the algebra is described on extension fields/rings over GF(2), the encoding/decoding process uses only Boolean addition/rotation operation and no finite field operation. These codes are also MDS (Maximum Distance Separable), which means they have the largest possible (minimum) distance for a fixed message-length and codeword-length.

(Recall that if a code has d data components and c parity components in its generator matrix in standard form, its distance is at most c + 1 by the Singleton bound. Hence the code is MDS if and only if it can tolerate c arbitrary disk failures.)

The BASIC code does the following things in relations to the BBV code:

  1. Adds a virtual parity bit after each disk, giving each disk an even parity
  2. Does polynomial arithmetic modulo 1+x^p instead of h(x) = 1+x+\cdots + x^{p-1} as in the case of BBV code
  3. Shows equivalence to the BBV code by making a nice observation via Chinese Remainder Theorem
  4. Proves MDS property for any number of coding disks when p is “large enough” and has a certain structure

Open Question: What is the least disk size for which these codes are MDS with arbitrary distance?

 


The Blaum-Bruck-Vardy code

Let p be an odd prime. (Bounds on p in connection to the distance of the code will come later.) Although there can be at most p data disks, we will assume there are exactly p of them. There are r coding disks. Each data disk is represented as a Boolean polynomial of degree p-2, that is, a polynomial in the ring

R_p := GF(2)[x]/h(x)

where

h(x) = 1+x+x^2+\cdots + x^{p-1}.

Let a_0, \cdots, a_{p-1} \in R_p be the data disks. Let C : R_n \rightarrow R_n : a \mapsto Ca denote a cyclic shift of the polynomial a \in R_n modulo h(x). If deg(a) = p-2, the shifted polynomial is the usual shift (modulo x^p - 1) followed by a flip in all (Boolean) coefficients, since modulo h(x), x^{p-1} = 1 + x + \cdots + x^{p-2} and an addition modulo 2 is a flip for Boolean coefficients.

With this definition, the ith coding disk a_{p+i}, for 0 \leq i \leq r-1, is  is the (Boolean) sum

\displaystyle a_{p+i} := a_0 + C^ia_1 + C^{2i}a_2 +\cdots + C^{(p-1)i}a_{p-1}.

Given the data disks a_0, \cdots, a_{p-1}, the collection

\left( a_0, \cdots, a_{p-1}, a_{p}, a_{p+1}, \cdots, a_{p+r-1}\right)

is treated as a systematic codeword of length p+r with p information symbols and r coding symbols. The rate of this code is p/(p+r) = 1/(1+r/p). Since we want high-rate codes, we are interested in small values of r.

Suppose d is the minimum distance of this code. The singleton bound says d is at most (p+r) - p + 1 = r+1. When code achieves this distance it is said to be an MDS code. For this code, it happens when

  • r \leq 3 for all p
  •  4 \leq r \leq 8 when ord(2) = p-1 modulo p and p excludes some small subsets (depending on r)

A lower-bound on p. Suppose the BBV code has k data disks and r coding disks. According to this analysis, when 2 is a primitive root modulo p, the BBV code is MDS for all r \geq 9 if p-1 is strictly greater than \Omega\left( kr \min\{k,r\} \right). (See the above link for an exact bound.) This is a sufficient condition. A lower-bound on p for this code to be MDS is still an open problem (as of October 2017).

Summary of the BBV code

Parameters/Conditions:  For a prime p,  the disk size is p-1,. There are k data disks, k\leq p, and r\leq 8 coding disks. Fault tolerance: The code is MDS (that is, a [p + r, p, r+1]-linear code) for up to 3 disk failures for all p; up to 8 failures when p \geq 31 is a prime for which 2 is a primitive element; up to any number of failures if p = O(kr \min{k,r})Alphabet: GF(2)[x]/(1+x+\cdots+x^{p-1})Encoding process: See above. Decoding process: Skipped.

Strengths:

  1. Coding disks are independent of each other
  2. The shifts are modulo sum{x^i} which is an irreducible polynomial
  3. The polynomials have Boolean coefficients
  4. Only Boolean additions and rotations are performed. No finite field operations needed for encoding/decoding

Limitations: Not sure.


The BASIC code

The Boolean Addition and Shift Implementable Cyclic-convolutional code, or the BASIC code in short, is like the BBV code. The difference is that the polynomial arithmetic is performed modulo 1+x^m instead of h(x) := 1+x+\cdots+x^{m-1}.

Connections with the BBV code. When m is an odd prime number such that 2 is a primitive root modulo m, the polynomial 1+x^m factors into two irreducible factors over GF(2) as (1+x) h(x). This post talks about the factoring, and Artin’s conjecture says that there are infinitely many such primes.

The data disks are given an even weight by adding a (virtual) parity bit, and hence both the data and coding disks are Boolean polynomials of an even weight in the ring

R_m := GF(2)/(1+x^m).

Let C_m be the subring of R_m containing only even-weight polynomials. By the Chinese Remainder Theorem, there is a bijection taking every f(x) in C_m to the tuple ( f(x) \bmod (1+x),   f(x) \bmod h(x)  ). However, since f(x) has an even weight, f(x) \bmod (1+x) is an even number which is zero (modulo 2) for all f(x)\in C_m. Thus the ring C_m is isomorphic to the field GF(2)/( h(x) ) used in the BBV code.

Data Disks. Let m be an odd number. The disks contain m-1 bits each. By (virtually) appending a parity bit, the jth data disk is treated as an element s_j(x) of the ring

C_m := \{ f(x) : f (x) \in R_m, f(x) \text{ has even weight} \}

That is, the polynomial s_j(x) for the jth disk has degree m-1 and an even number of nonzero coefficients.

The ring C_m can be described in two ways. First, it is the set of polynomials of the form a(x)(1+x) \bmod (x^m+1) for every a(x) \in R_m. Notice that the dimension of C_m over GF(2) is m-1 because, well, it contains all (and only) the even-weight words. Second, since the mth coefficient is the parity bit, it is one (resp. zero) when the Boolean sum of all other coefficients is one (resp. zero). Hence the total weight is always zero.

Why does the polynomial a(x)(1+x) \bmod x^m+1 have an even weight? Let [.]_i denote the ith coefficient of any polynomial, and let a^\prime(x) := a(x)(1+x) \bmod x^m + 1. Notice that [a^\prime(x)]_i is 1 whenever [a(x)]_i differs from [a(x)]_{i-1 \bmod m}. The number of such 0-1/1-0 transitions is always even because modulo m,  the first and last coefficients are adjacent. Hence a^\prime(x) will have even weight.

The check polynomial for any element s(x) \in C_m is the all-ones polynomial

h(x) := 1 + x + x^2 + \cdots + x^{m-1}

because s(x)h(x) = 0.Why? Because if we imagine a matrix whose first column is the coefficient-vector s(x), second column is xs(x), third column is x^2s(x) and so on, and recalling that xs(x) \bmod (x^m+1) is a rotation, it becomes clear that the sum along each row will be zero since the first column contains an even number of ones to begin with.

Coding disks. Suppose we have k data disks

w:=\left(s_0(x), \cdots, s_{k-1}(x)\right).

We treat the k-tuple w as the message. Recall that for each of these polynomials, the mth coefficient, which is not part of the original data, is the sum of all other coefficients. The tth coding disk c_t(x), then, is defined as the Boolean sum of the columns s_j(x), each rotated-down by tj times. That is,

\displaystyle c_t(x) :=  \bigoplus_{0 \leq j \leq k-1}{x^{jt}s_j(x)} \bmod (1+x^p)

for 0 \leq t \leq r-1. In other words, c_t(x) is the vector inner product of the vectors w and (  1, x^j,  x^{2j}, x^{3j}, ..., x^{(k-1)j} ).

Generator matrix. The generator matrix for this code, in standard form, looks like G = [I \mid V], and a systematic codeword is [w \vert wV] where

V:=\left( x^{jt} \right)_{j \in \{0, \cdots, k\}, t \in \{0, \cdots, r-1\} }

is a Vandermonde matrix with entries in R_m. Hence a codeword is a C_m-linear combination of  k row-vectors over R_m.

Invertibility in the ring C_m. Since m is odd, the polynomial x^m + 1 is separable over GF(2). (Why?) Hence the check polynomial h(x) splits into linear factors in its splitting field. It is not hard to see that a polynomial a(x) \in R_m has gcd\left( a(x), h(x) \right)=1 if and only if a(x) is a unit modulo h(x). By the Chinese Remainder Theorem, this happens if and only if a(x) is a unit modulo each of the linear factors of h(x). Such a polynomial a(x) is called C_m-invertible.

MDS-ness and Vandermonde minors. For this code to be MDS, it has to be able to decode with up to r disk failures. The structure of the generator matrix tells us that if any t data disks fail, any k \times k submatrix of G involving any t coding columns and the remaining k-t data columns must be invertible. Since the columns corresponding to the data disks are standard basis vectors, this means any order-t submatrix of the Vandermonde matrix V must be invertible in the ring R_m. Put differently, every t-minor of V must be C_m-invertible. This invertibility condition is, in fact, equivalent to the MDS condition.

MDS-ness and a lower-bound on m. The analysis in the BASIC code paper mentioned above and this paper shows that for this code to be MDS, it is sufficient that m = \Omega(kr \min\{k,r\}) is an odd prime number such that 2 is a primitive element modulo m.

The intuition behind the above bound is as follows. To prove that all minors of a generalized Vandermonde matrix are invertible modulo h(x), they computed the maximum degree  D of any minor modulo h(x) using this result. They observed that h(x) is irreducible over GF(2) when 2 is a primitive root modulo m. Therefore, any polynomial f(x) over GF(2) with degree at most m-2 must be coprime to h(x), and hence invertible modulo h(x). This gives the sufficient bound D \leq m - 2. This bound on D can be turned on its head and used as a bound on m \geq D + 2 if we know an upper bound on D.

What could be the largest degree D of a minor of order \ell? Consider an order-\ell submatrix V_{I,J} given by the rows I and columns J of the Vandermonde matrix V. Assume that the index sets I and J are in increasing order. If we look into the Leibniz expansion of the determinant of V_{I,J}, the term having the largest degree must be the product of the diagonal entries. (Why? We can prove it by induction. Use the 2 \times 2 case as the base case and use the fact that \alpha^2 + \beta^2 \geq 1 \alpha \beta for all real \alpha, \beta.) The degree of this term will be \sum_{t=1}^{\ell}{i_t j_t}, which is no more than \ell \cdot i_{\ell} \cdot j_{\ell}. Since \ell \leq \min\{k,r\}, i_\ell \leq k, j_\ell \leq r, the bound D = \Omega(kr\min\{k,r\} ) follows.

Summary of the BASIC code

Parameters/Conditions:  For a prime p,  the disk size is p-1,. There are k data disks and r\leq 8 coding disks. Fault tolerance: The code is MDS (that is, a [k+r, k, r+1]-linear code) for up to r=3 disk failures for all p; up to r=8 failures when p \geq 31 is a prime for which 2 is a primitive element; up to any number of failures if p = O(kr \min{k,r})Alphabet: GF(2)[x]/(1+x^p)Encoding process: See above. Decoding process: Skipped.

Strengths:

  1. All of the BBV code, but taking remainders modulo (1+x^p) is computationally and conceptually simpler than doing the same modulo 1+x+x^2+\cdots + x^{p-1}

Limitations: Not sure

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s