# Two MDS Array Codes for Disk Erasures: the Blaum-Bruck-Vardy Code and the BASIC Code

In this post, I am going to review two erasure codes: the Blaum-Bruck-Vardy code and the BASIC code (also here). These are erasure codes, which means, their purpose is to encode a number of data disks into a number of coding disks so that when one or more data/coding disks fail, the failed disk can be reconstructed using the existing data and coding disks.

A strength of these codes is that although the algebra is described on extension fields/rings over $GF(2)$, the encoding/decoding process uses only Boolean addition/rotation operation and no finite field operation. These codes are also MDS (Maximum Distance Separable), which means they have the largest possible (minimum) distance for a fixed message-length and codeword-length.

(Recall that if a code has $d$ data components and $c$ parity components in its generator matrix in standard form, its distance is at most $c + 1$ by the Singleton bound. Hence the code is MDS if and only if it can tolerate $c$ arbitrary disk failures.)

The BASIC code does the following things in relations to the BBV code:

1. Adds a virtual parity bit after each disk, giving each disk an even parity
2. Does polynomial arithmetic modulo $1+x^p$ instead of $h(x) = 1+x+\cdots + x^{p-1}$ as in the case of BBV code
3. Shows equivalence to the BBV code by making a nice observation via Chinese Remainder Theorem
4. Proves MDS property for any number of coding disks when $p$ is “large enough” and has a certain structure

Open Question: What is the least disk size for which these codes are MDS with arbitrary distance?

## The Blaum-Bruck-Vardy code

Let $p$ be an odd prime. (Bounds on $p$ in connection to the distance of the code will come later.) Although there can be at most $p$ data disks, we will assume there are exactly $p$ of them. There are $r$ coding disks. Each data disk is represented as a Boolean polynomial of degree $p-2$, that is, a polynomial in the ring

$R_p := GF(2)[x]/h(x)$

where

$h(x) = 1+x+x^2+\cdots + x^{p-1}$.

Let $a_0, \cdots, a_{p-1} \in R_p$ be the data disks. Let $C : R_n \rightarrow R_n : a \mapsto Ca$ denote a cyclic shift of the polynomial $a \in R_n$ modulo $h(x)$. If $deg(a) = p-2$, the shifted polynomial is the usual shift (modulo $x^p - 1$) followed by a flip in all (Boolean) coefficients, since modulo $h(x), x^{p-1} = 1 + x + \cdots + x^{p-2}$ and an addition modulo $2$ is a flip for Boolean coefficients.

With this definition, the $i$th coding disk $a_{p+i}$, for $0 \leq i \leq r-1$, is  is the (Boolean) sum

$\displaystyle a_{p+i} := a_0 + C^ia_1 + C^{2i}a_2 +\cdots + C^{(p-1)i}a_{p-1}.$

Given the data disks $a_0, \cdots, a_{p-1}$, the collection

$\left( a_0, \cdots, a_{p-1}, a_{p}, a_{p+1}, \cdots, a_{p+r-1}\right)$

is treated as a systematic codeword of length $p+r$ with $p$ information symbols and $r$ coding symbols. The rate of this code is $p/(p+r) = 1/(1+r/p)$. Since we want high-rate codes, we are interested in small values of $r$.

Suppose $d$ is the minimum distance of this code. The singleton bound says $d$ is at most $(p+r) - p + 1 = r+1$. When code achieves this distance it is said to be an MDS code. For this code, it happens when

• $r \leq 3$ for all $p$
•  $4 \leq r \leq 8$ when $ord(2) = p-1$ modulo $p$ and $p$ excludes some small subsets (depending on $r)$

A lower-bound on $p$. Suppose the BBV code has $k$ data disks and $r$ coding disks. According to this analysis, when $2$ is a primitive root modulo $p$, the BBV code is MDS for all $r \geq 9$ if $p-1$ is strictly greater than $\Omega\left( kr \min\{k,r\} \right)$. (See the above link for an exact bound.) This is a sufficient condition. A lower-bound on $p$ for this code to be MDS is still an open problem (as of October 2017).

### Summary of the BBV code

Parameters/Conditions:  For a prime $p$,  the disk size is $p-1$,. There are $k$ data disks, $k\leq p$, and $r\leq 8$ coding disks. Fault tolerance: The code is MDS (that is, a $[p + r, p, r+1]$-linear code) for up to $3$ disk failures for all $p$; up to $8$ failures when $p \geq 31$ is a prime for which $2$ is a primitive element; up to any number of failures if $p = O(kr \min{k,r})$Alphabet: $GF(2)[x]/(1+x+\cdots+x^{p-1})$Encoding process: See above. Decoding process: Skipped.

Strengths:

1. Coding disks are independent of each other
2. The shifts are modulo $sum{x^i}$ which is an irreducible polynomial
3. The polynomials have Boolean coefficients
4. Only Boolean additions and rotations are performed. No finite field operations needed for encoding/decoding

Limitations: Not sure.

## The BASIC code

The Boolean Addition and Shift Implementable Cyclic-convolutional code, or the BASIC code in short, is like the BBV code. The difference is that the polynomial arithmetic is performed modulo $1+x^m$ instead of $h(x) := 1+x+\cdots+x^{m-1}$.

Connections with the BBV code. When $m$ is an odd prime number such that $2$ is a primitive root modulo $m$, the polynomial $1+x^m$ factors into two irreducible factors over $GF(2)$ as $(1+x) h(x)$. This post talks about the factoring, and Artin’s conjecture says that there are infinitely many such primes.

The data disks are given an even weight by adding a (virtual) parity bit, and hence both the data and coding disks are Boolean polynomials of an even weight in the ring

$R_m := GF(2)/(1+x^m)$.

Let $C_m$ be the subring of $R_m$ containing only even-weight polynomials. By the Chinese Remainder Theorem, there is a bijection taking every $f(x)$ in $C_m$ to the tuple $( f(x) \bmod (1+x), f(x) \bmod h(x) )$. However, since $f(x)$ has an even weight, $f(x) \bmod (1+x)$ is an even number which is zero (modulo 2) for all $f(x)\in C_m$. Thus the ring $C_m$ is isomorphic to the field $GF(2)/( h(x) )$ used in the BBV code.

Data Disks. Let $m$ be an odd number. The disks contain $m-1$ bits each. By (virtually) appending a parity bit, the $j$th data disk is treated as an element $s_j(x)$ of the ring

$C_m := \{ f(x) : f (x) \in R_m, f(x) \text{ has even weight} \}$

That is, the polynomial $s_j(x)$ for the $j$th disk has degree $m-1$ and an even number of nonzero coefficients.

The ring $C_m$ can be described in two ways. First, it is the set of polynomials of the form $a(x)(1+x) \bmod (x^m+1)$ for every $a(x) \in R_m$. Notice that the dimension of $C_m$ over $GF(2)$ is $m-1$ because, well, it contains all (and only) the even-weight words. Second, since the $m$th coefficient is the parity bit, it is one (resp. zero) when the Boolean sum of all other coefficients is one (resp. zero). Hence the total weight is always zero.

Why does the polynomial $a(x)(1+x) \bmod x^m+1$ have an even weight? Let $[.]_i$ denote the $i$th coefficient of any polynomial, and let $a^\prime(x) := a(x)(1+x) \bmod x^m + 1$. Notice that $[a^\prime(x)]_i$ is $1$ whenever $[a(x)]_i$ differs from $[a(x)]_{i-1 \bmod m}$. The number of such 0-1/1-0 transitions is always even because modulo $m$,  the first and last coefficients are adjacent. Hence $a^\prime(x)$ will have even weight.

The check polynomial for any element $s(x) \in C_m$ is the all-ones polynomial

$h(x) := 1 + x + x^2 + \cdots + x^{m-1}$

because $s(x)h(x) = 0$.Why? Because if we imagine a matrix whose first column is the coefficient-vector $s(x)$, second column is $xs(x)$, third column is $x^2s(x)$ and so on, and recalling that $xs(x) \bmod (x^m+1)$ is a rotation, it becomes clear that the sum along each row will be zero since the first column contains an even number of ones to begin with.

Coding disks. Suppose we have $k$ data disks

$w:=\left(s_0(x), \cdots, s_{k-1}(x)\right).$

We treat the $k$-tuple $w$ as the message. Recall that for each of these polynomials, the $m$th coefficient, which is not part of the original data, is the sum of all other coefficients. The $t$th coding disk $c_t(x)$, then, is defined as the Boolean sum of the columns $s_j(x)$, each rotated-down by $tj$ times. That is,

$\displaystyle c_t(x) := \bigoplus_{0 \leq j \leq k-1}{x^{jt}s_j(x)} \bmod (1+x^p)$

for $0 \leq t \leq r-1$. In other words, $c_t(x)$ is the vector inner product of the vectors $w$ and $( 1, x^j, x^{2j}, x^{3j}, ..., x^{(k-1)j} )$.

Generator matrix. The generator matrix for this code, in standard form, looks like $G = [I \mid V]$, and a systematic codeword is $[w \vert wV]$ where

$V:=\left( x^{jt} \right)_{j \in \{0, \cdots, k\}, t \in \{0, \cdots, r-1\} }$

is a Vandermonde matrix with entries in $R_m$. Hence a codeword is a $C_m$-linear combination of  $k$ row-vectors over $R_m$.

Invertibility in the ring $C_m$. Since $m$ is odd, the polynomial $x^m + 1$ is separable over $GF(2)$. (Why?) Hence the check polynomial $h(x)$ splits into linear factors in its splitting field. It is not hard to see that a polynomial $a(x) \in R_m$ has $gcd\left( a(x), h(x) \right)=1$ if and only if $a(x)$ is a unit modulo $h(x)$. By the Chinese Remainder Theorem, this happens if and only if $a(x)$ is a unit modulo each of the linear factors of $h(x)$. Such a polynomial $a(x)$ is called $C_m$-invertible.

MDS-ness and Vandermonde minors. For this code to be MDS, it has to be able to decode with up to $r$ disk failures. The structure of the generator matrix tells us that if any $t$ data disks fail, any $k \times k$ submatrix of $G$ involving any $t$ coding columns and the remaining $k-t$ data columns must be invertible. Since the columns corresponding to the data disks are standard basis vectors, this means any order-$t$ submatrix of the Vandermonde matrix $V$ must be invertible in the ring $R_m$. Put differently, every $t$-minor of $V$ must be $C_m$-invertible. This invertibility condition is, in fact, equivalent to the MDS condition.

MDS-ness and a lower-bound on $m$. The analysis in the BASIC code paper mentioned above and this paper shows that for this code to be MDS, it is sufficient that $m = \Omega(kr \min\{k,r\})$ is an odd prime number such that $2$ is a primitive element modulo $m$.

The intuition behind the above bound is as follows. To prove that all minors of a generalized Vandermonde matrix are invertible modulo $h(x)$, they computed the maximum degree  $D$ of any minor modulo $h(x)$ using this result. They observed that $h(x)$ is irreducible over $GF(2)$ when $2$ is a primitive root modulo $m$. Therefore, any polynomial $f(x)$ over $GF(2)$ with degree at most $m-2$ must be coprime to $h(x)$, and hence invertible modulo $h(x)$. This gives the sufficient bound $D \leq m - 2$. This bound on $D$ can be turned on its head and used as a bound on $m \geq D + 2$ if we know an upper bound on $D$.

What could be the largest degree $D$ of a minor of order $\ell$? Consider an order-$\ell$ submatrix $V_{I,J}$ given by the rows $I$ and columns $J$ of the Vandermonde matrix $V$. Assume that the index sets $I$ and $J$ are in increasing order. If we look into the Leibniz expansion of the determinant of $V_{I,J}$, the term having the largest degree must be the product of the diagonal entries. (Why? We can prove it by induction. Use the $2 \times 2$ case as the base case and use the fact that $\alpha^2 + \beta^2 \geq 1 \alpha \beta$ for all real $\alpha, \beta$.) The degree of this term will be $\sum_{t=1}^{\ell}{i_t j_t}$, which is no more than $\ell \cdot i_{\ell} \cdot j_{\ell}$. Since $\ell \leq \min\{k,r\}, i_\ell \leq k, j_\ell \leq r$, the bound $D = \Omega(kr\min\{k,r\} )$ follows.

### Summary of the BASIC code

Parameters/Conditions:  For a prime $p$,  the disk size is $p-1$,. There are $k$ data disks and $r\leq 8$ coding disks. Fault tolerance: The code is MDS (that is, a $[k+r, k, r+1]$-linear code) for up to $r=3$ disk failures for all $p$; up to $r=8$ failures when $p \geq 31$ is a prime for which $2$ is a primitive element; up to any number of failures if $p = O(kr \min{k,r})$Alphabet: $GF(2)[x]/(1+x^p)$Encoding process: See above. Decoding process: Skipped.

Strengths:

1. All of the BBV code, but taking remainders modulo $(1+x^p)$ is computationally and conceptually simpler than doing the same modulo $1+x+x^2+\cdots + x^{p-1}$

Limitations: Not sure