Two MDS Array Codes for Disk Erasures: the Blaum-Bruck-Vardy Code and the BASIC Code

In this post, I am going to review two erasure codes: the Blaum-Bruck-Vardy code and the BASIC code (also here). These are erasure codes, which means, their purpose is to encode a number of data disks into a number of coding disks so that when one or more data/coding disks fail, the failed disk can be reconstructed using the existing data and coding disks.

A strength of these codes is that although the algebra is described on extension fields/rings over $GF(2)$, the encoding/decoding process uses only Boolean addition/rotation operation and no finite field operation. These codes are also MDS (Maximum Distance Separable), which means they have the largest possible (minimum) distance for a fixed message-length and codeword-length.

(Recall that if a code has $d$ data components and $c$ parity components in its generator matrix in standard form, its distance is at most $c + 1$ by the Singleton bound. Hence the code is MDS if and only if it can tolerate $c$ arbitrary disk failures.)

The BASIC code does the following things in relations to the BBV code:

1. Adds a virtual parity bit after each disk, giving each disk an even parity
2. Does polynomial arithmetic modulo $1+x^p$ instead of $h(x) = 1+x+\cdots + x^{p-1}$ as in the case of BBV code
3. Shows equivalence to the BBV code by making a nice observation via Chinese Remainder Theorem
4. Proves MDS property for any number of coding disks when $p$ is “large enough” and has a certain structure

Open Question: What is the least disk size for which these codes are MDS with arbitrary distance?

Generalized Vandermonde Matrices with Missing Row-Powers

Let $[n] := \{0, 1, 2, \cdots, n -1 \}$. Let $x = (x_0, \cdots, x_{n-1})$ with distinc $x_j$s. Let $R = \{0 = r_1, r_2, \cdots, r_{n-1} = r$ be a set of distinct nonnegative row-powers. Consider the $n \times n$ Generalized Vandermonde matrix $V(x ; R) := \left( x_j^{r_i} \right)_{i,j \in [n]}$. When $R = [n]$, this matrix becomes the ordinary Vandermonde matrix, $V(x)$.

An equivalent description of $R$ is the largest row-power $r \in R$ and the set of missing rows from $[r]$: that is, the items that are in $[r]$ but not in $R$. Let $L_r = \{\ell_0, \ell_1, \cdots \}$ be this set. Define the punctured Generalized Vandermonde matrix $V_{\perp}(x; L_r) := V(x; R)$. Let $s_k(x)$ be the $k$th elementary symmetric polynomial.

Now we are ready to present some interesting results without proof, from the paper “Lower bound on Sequence Complexity” by Kolokotronis, Limniotis, and Kalouptsidis.

$det V_{\perp}(x ; \{\ell \}) = det V(x) \ s_{n-\ell}(x).$

$det V_{\perp}(x ; \{\ell_0, \ell_1\}) = det V(x) \ det \left( s_{n-\ell_i+j}(x) \right)_{i,j \in [2]}.$

$det V_{\perp}(x ; L) = det V(x) \ det \left( s_{n-\ell_i+j}(x) \right)_{i,j \in [s]} \text{ where } L = \{ \ell_0, \ell_1, \cdots, \ell_{s-1}\}.$

I will add some applications later on.

Vandermonde Submatrices and Arithmetic Progressions

[This post, which is based on an ongoing discussion with Alex Russell and Ravi Sundaram, contains some unpublished results.]

Currently, we are asking whether all submatrices of the order-$p$ Vandermonde matrix over a finite extension of $GF(2)$ are invertible where $p$ is prime. The answer is “no” in general: there are examples of fields where the Vandermonde matrix has a singular submatrix.

We can ask an easier(?) question, though. What happens if we randomly sample a set of columns and look into submatrices formed by a subset of the sampled columns. With a touch of beautiful insight, Professor Russell has connected Szemeredi’s theorem on arithmetic progressions with this question.

Let $AP_k$ denote an arithmetic progression of length $latek k$. Let $[N] := \{1, 2, \cdots, N\}$ for $N \in \mathbb{N}$.

The Szemerédi theorem says, any “sufficiently dense” subset $S \subset [N]$ contains infinitely many $AP_k$ for all $k \in \mathbb{N}$. A finitary version says: Fix your favourite $k \in \mathbb{N}, \delta \in [0, 1]$. Then,  there exists a natural $N := N_{k, \delta}$ such that if you look any subset $S \subset [N]$ of size at least $\delta N$, you will find an $AP_k$. Yet another version says:

Szemerédi’s Theorem. The size of the largest subset $S \subset [N]$ without an $AP_k$ cannot be too large; in particular, it is $o(N)$.

Recall that a function $f(x)$ is $o(g)$ if it grows too slow compared to $g(x)$, so that $\lim_{N\rightarrow \infty}{f(x)/g(x) = 0}$.

When does the Discrete Fourier Transform Matrix have Nonsingular Submatrices?

I am studying a coding theory problem. The question is this:

Open Question: Is there a prime $p$ and a positive integer $d$ such that all submatrices of the $p\times p$ Discrete Fourier Transform matrix over the field $GF(2^d)$ are nonsingular?

Currently, I have only counterexamples: Let $d$ be the degree of the smallest extension over $GF(2)$ which contains a nontrivial $p$th root of unity. Then, I know a lot of primes $p$ for which the matrix $V$ has a singular submatrix.

In this post, I am going to show a failed attempt to answer this question using the results in this paper by Evra, Kowalski, and Lubotzky.

An Intuitive Explanation of the Discrete Fourier Transform

Every time I thought “Now I understand Fourier Transform,” I was wrong.

Doing the Fourier Transform of a function is just seeing it from “another” point of view. The “usual” view of a function is in the standard basis $\{e_1, \cdots, e_n\}$. For example, $f$ can be seen as a vector (in the basis given by the elements in the domain) whose coordinates are the evaluations of $f$ on the elements in the domain. It can also be seen as a polynomial (in the monomial basis) whose coefficients are these evaluations. Let us call this vector $u$.

The same function can also be seen from the “Fourier basis”, which is just another orthogonal basis, formed by the basis vectors $\{v_t\}, t$. The $t$th coordinate in the new basis will be given by inner product between $u$ and the $t$th basis vector $v_t$. We call these inner products the Fourier coefficients. The Discrete Fourier Transform matrix (the DFT matrix) “projects” a function from the standard basis to the Fourier basis in the usual sense of projection: taking the inner product along a given direction.

In this post, I am going to use elementary group theoretic notions, polynomials, matrices, and vectors. The ideas in this post will be similar to this Wikipedia article on Discrete Fourier Transform.

Why x^p-1 Factors over GF(2) the Way It Does

Let $p$ be a prime number. We know that $F_2=GF(2)=\{0,1\}$. Let $F_2[x]$ be the ring  the polynomials with indeterminate $x$ and coefficients in $F_2$. The polynomial $(x^p-1) \in F_2[x]$ factors over $F_2$ as  follows for various $p$:

$x^3-1=(1+x) (1+x+x^2).$
$x^5-1=(1+x) (1+x+x^2+x^3+x^4).$
$x^7-1=(1+x) (1+x+x^3) (1+x^2+x^3).$
$x^{11}-1=(1+x) (1+x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^{10}).$
$x^{13}-1=(1+x) (1+x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^{10}+x^{11}+x^{12}).$
$x^{17}-1=(1+x) (1+x^3+x^4+x^5+x^8) (1+x+x^2+x^4+x^6+x^7+x^8).$

None of the factors above can be factored anymore, hence they are irreducible over $F_2$. Let us call $x+1$ the trivial factor since the root $x= 1$ already belongs to $F_2$. But why do we have two nontrivial irreducible factors of $x^7-1$, each of degree 3, whereas $x^{13}-1$ has only one non-trivial irreducible factor of degree 12? It appears that either there is only one nontrivial factor, or the degree of all nontrivial factors is the same. Why?

The Structure of a Quotient Ring modulo x^p-1

Let $f(X)=X^p-1 \in F_l[X]$ for a prime $p$. We are interested in the structure of the ring $R=F_l[X]/(f)$. Via the Chinese Remainder Theorem, this question is equivalent to finding a factorization of $f$ over $F_l$. Suppose $f$ factors in $F_l$ as

$\displaystyle f(X) = (X-1) \prod_i^s{q_i(X)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*)$

where each $q_i$ is an irreducible polynomial. What are the degrees of $q_1, \cdots, q_s$? What is $s$?

We claim that $R=F_l \oplus (F_{l^r})^s,$ where $r$ is the multiplicative order of $l$ mod $p$.

This structure has been used in a beautifully instructive paper by Evra, Kowalski, and Lubotzky. (Yes, the one in Lubotzky-Philip-Sarnak.) In this paper, they have established connections among the Fourier transform, uncertainty principle, and dimension of ideals generated by polynomials in the ring $R$ above. We’ll talk about these connections in another post. The content of this post is Proposition 2.6(3) from their paper.