Linear solvers

In the implicit scheme, one ends up with a large sparse block matrix \mathbf{A} \vec{x} = \vec{b}. For a typical mesh size of ~2 million cells, it adds up to a 10 million \times 10 million system. One can choose to either solve the system directly (through Gaussian elimination) or iteratively. Direct solvers solve the system in a fixed number of steps and are very robust. However, the number of operations scale with \mathcal{O} (n^3) operations, which can be extremely costly. Typically one wants to keep n<10^{4} for direct solvers. Hence, iterative methods are the preferred choice in typical CFD applications. In addition, preconditioners are commonly used to accelerate the solution. We will first go through one of the more modern iterative method – Krylov methods, before talking about Algebraic multigrid (AMG) preconditioners.

Krylov methods

It took me awhile to wrap my head around the Krylov type linear solvers. This is because most sources start by saying the the method searches for a solution in the Krylov subspace:

    \begin{align*} \mathcal{K}_m(\mathbf{A}, \vec{r}_o) = span \{ \vec{r}_0, \mathbf{A}\vec{r}_0 , \mathbf{A}^2\vec{r}_0 , \dots , \mathbf{A}^{m-1}\vec{r}_0 \} \end{align*}

Where r_0 = \mathbf{A} \vec{x}_0 - \vec{b} with \vec{x}_0 being the first guess for \vec{x}. Now I do not have a mathematics degree so looking at the notation above gives me a headache. Only after further reading, I learnt that any vector which can be expressed as a linear combination of a set of vectors can be said to be in the span of said set. Bascially, if you can express your vector to be a linear combination of \{ \vec{r}_o, \mathbf{A}\vec{r}_o, \dots, \mathbf{A}^{m-1}\vec{r}_o \}, your vector is said to belong in the order m Krylov subspace of \mathbf{A} and \vec{r}_0. As a further illustration:

    \begin{align*} &\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 0 \end{bmatrix} \quad , \quad \vec{r} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \quad , \quad \Rightarrow \mathcal{K}_3(\mathbf{A}, \vec{r}) = span \left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 4 \\ 1 \end{bmatrix}, \begin{bmatrix} 9 \\ 4 \end{bmatrix} \right\},\\ &Let\ \vec{q} = 2\begin{bmatrix} 1 \\ 2 \end{bmatrix} + \begin{bmatrix} 4 \\ 1 \end{bmatrix} - \begin{bmatrix} 9 \\ 4 \end{bmatrix} = \begin{bmatrix} -3 \\ 1 \end{bmatrix} \in \mathcal{K}_3(\mathbf{A}, \vec{r}) . \end{align*}

So we can say that \vec{q} belongs to the order 3 Krylov subspace.

Okay, so now the next question is why of all possible places, do we want to search for \vec{x} in the Krylov subspace? One has to look at the minimal polynomial of a n\times n matrix \mathbf{A} to understand this. Suppose we were to keep multiplying \mathbf{A} and form a sequence:

    \begin{align*} p = \alpha_0 \mathbf{I} + \alpha_1 \mathbf{A} + \alpha_2 \mathbf{A}^2 + \alpha_3 \mathbf{A}^3 + \dots + \alpha_k \mathbf{A}^k. \end{align*}

When we reach a minimum k for a certain combination of \alpha such that the sequence goes to zero, k is said to be the degree of the minimal polynomial of \mathbf{A}. As an example, for a minimal polynomial of degree 3:

    \begin{align*} 0 = \alpha_0 \mathbf{I} + \alpha_1 \mathbf{A} + \alpha_2 \mathbf{A}^2 + \alpha_3 \mathbf{A}^3. \end{align*}

Multiplying throughout by \mathbf{A}^{-1}, then by \vec{b},

    \begin{align*} 0 = \alpha_0 \mathbf{A}^{-1} + \alpha_1 \mathbf{I} + \alpha_2 \mathbf{A} + \alpha_3 \mathbf{A}^2 \\ \Rightarrow \mathbf{A}^{-1} = -\frac{1}{\alpha_0} \left( \alpha_1 \mathbf{I} + \alpha_2 \mathbf{A} + \alpha_3 \mathbf{A}^2 \right) \\ \mathbf{A}^{-1}\vec{b} = -\frac{1}{\alpha_0} \left( \alpha_1 \vec{b} + \alpha_2 \mathbf{A}\vec{b} + \alpha_3 \mathbf{A}^2\vec{b} \right) \in \mathcal{K}_3(\mathbf{A}, \vec{b}). \end{align*}

Aha! So \vec{x} can be expressed in terms of \mathcal{K}_k(\mathbd{A}, \vec{b}) where k is the degree of minimal polynomial of \mathbf{A}. Now there is a theory that for a n \times n matrix, k \leq n. Hence the Krylov subspace is not a bad place to start looking for a solution. What it also means is that if we expand the space large enough until \mathcal{K}_n(\mathbf{A}, \vec{b}), we will arrive at an exact solution of \vec{x}. Of course, the point of Krylov methods is to avoid that scenario, and hopefully get a satisfactory answer for as small a subspace as possible.

As the subspace \mathcal{K}_m grows, we run into another issue where the new vectors become increasingly close to each other. If one were familiar the the power iteration method to find eigenvectors, one would recognise that \mathbf{A}^m \vec{b} approaches the dominant eigenvector of \mathbf{A} as m \rightarrow \infty. To circumvent this problem, we have to orthogonalize the vectors in \mathcal{K}_m.

To get a feel of what that means, imagine we have 2 vectors at a 0.1^{\circ} angle to each other (i.e. \vec{e}_1 = [1, 0]^T and \vec{e}_2 = [cos(0.1^{\circ}), sin(0.1^{\circ})]^T). We can describe any point in (x, y) as a linear combination of \vec{e}_1 and \vec{e}_2. For example, the vector \vec{q} = [0,1]^T can be expressed as -572.9572\vec{e}_1 + 572.9581\vec{e}_2 with \alpha_1 = -572.9572 and \alpha_2 = 572.9581. The coefficients were found by solving a system of equation:

    \begin{align*} \begin{bmatrix} \alpha_1 \\ \alpha_2 \end{bmatrix} = \begin{bmatrix} \vec{e}_1,\vec{e}_2 \end{bmatrix}^{-1} \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{align*}

So we see that the coefficients are sensitive to the concatenated \vec{e} matrix. As you can imagine, the closer \vec{e}_1 and \vec{e}_2 are to each other, the worse the condition number of the matrix. If we live in a world with unlimited floating points, this would not be a problem at all (and many other problems which keep mathematicians up at night would disappear too). However, we still live in a world where unicorns are nonexistent, so we have to think of ways to circumvent this. One way would be to orthogonalize the vectors (i.e. let \vec{e}_1 = [1, 0]^T and \vec{e}_2 = [0, 1]^T) since the condition number of an orthonormal matrix is 1. The new vectors are still able to describe any point in the (x,y) plane, we do not lose any territory with the change in basis.

Alright! Now that we have covered the gist of Krylov methods, we can take a closer look at one of the algorithms: Generalized Minimum Residual (GMRES) method. In this method, the basis vectors for the Krylov subspace are orthogonalized by the Arnoldi iteration. Stacking these orthogonalized basis vectors side by side results in the \mathbf{Q} matrix which is n \times m in size for the order m Krylov space. The new basis vector for the m+1 space is calculated from:

    \begin{align*} \mathbf{A} \mathbf{Q}_{m} = \mathbf{Q}_{m+1} \mathbf{\tilde{H}}_{m}. \end{align*}

\mathbf{\tilde{H}}_m is known as the Hessenberg matrix which is (m+1) \times m in size. Remember for an orthonormal matrix, \mathbf{Q}\mathbf{Q}^T = \mathbf{I} so our \mathbf{A}\vec{x} = \vec{b} system can be rewritten as:

    \begin{align*} \mathbf{A} &= \mathbf{Q}_{m+1} \mathbf{\tilde{H}}_{m} \mathbf{Q}_{m}^T \\ \Rightarrow \mathbf{\tilde{H}}_{m} \mathbf{Q}_{m}^T \vec{x} &= \mathbf{Q}_{m+1}^T \vec{b} \\ \mathbf{\tilde{H}}_{m} \vec{y} &= \mathbf{Q}_{m+1}^T \vec{b}. \end{align*}

With \vec{y} = \mathbf{Q}_{m}^T \vec{x}. Note that since \mathbf{\tilde{H}}_m is a (m+1) \times m matrix, we have an over defined system. We now solve a least squares problem for \vec{y}:

    \begin{align*} min \left\Vert \mathbf{\tilde{H}}_{m} \vec{y} - \mathbf{Q}_{m+1}^T \vec{b}. \right\Vert = min \left\Vert \vec{r}_m \right\Vert \end{align*}

Hence the name Minimum Residual. The estimated \vec{x}_m is recovered from \vec{y} by \vec{x}_m = \mathbf{Q}_{m} \vec{y}, so we see that \vec{y} holds the coefficients to the vectors in the orthogonalized \{ \vec{b}, \mathbf{A}\vec{b}, \mathbf{A}^2\vec{b}, \dots , \mathbf{A}^{m-1}\vec{b} \}. In the algorithm, one starts from m=1, then expand the space with every iteration. As m increases, the size of \mathbf{Q} and \mathbf{\tilde{H}} increases, potentially overtaking the memory requirements of the original sparse \mathbf{A} matrix. Hence, most algorithms restart the process for a large enough m with the latest \vec{x} estimate (usually around m = 30).

Algebraic multigrid preconditioners

As the name suggests, there are 2 main concepts to this: algebraic multigrid methods (AMG) and preconditioners. We will go through each concept individually before piecing them together.

We know that iterative linear solvers take an initial guessed solution \vec{x}_0 and improve on it to get the final answer. As a convenient guess, we can set \vec{x}_0 = \vec{0} or \vec{x}_0 = \vec{b}. This would result in a noisy residual vector \vec{r}_0 = \mathbf{A}\vec{x}_0 - \vec{b}. With the fixed iteration methods like Jacobi or Gauss Seidel, the noise in the residual gets damped down really quickly (usually within 2 to 3 iterations). After the first few iterations, the error becomes smooth and subsequent iterations drive it extremely slowly down to zero. Such methods are really effective for smoothing out spikes in the errors, but perform terribly thereafter.

Hence, the idea of multigrid methods is to transfer the smoothed error from one grid to another such that it becomes spiky again (albeit with reduced amplitude) in the other grid. The same smoothing operation is carried out in the new grid and the process continues recursively. After a few repetitions, the process reverses and we get a final small error vector in relatively little iterations. So how do we make the smooth error become spiky again?

There! As we can see from the diagram I took from Saad’s book, by removing every alternate point in the original grid, we get an oscillatory error in the new grid. The error is then smoothed (usually with only 2 to 3 Jacobi or Gauss Seidel iterations) in this new grid and transferred to another coarser grid and so on. The diagram shows the original geometric multigrid method, where the grids were defined physically. For an unstructured mesh used commonly, it becomes challenging to create physically coarser and coarser mesh. Hence methods to do this algebraically (without considering the physical manifestation of the mesh) were heavily explored, resulting in algebraic multigrid methods.

To understand this a little more, we define the error term \vec{e} in terms of the actual solution \vec{x} and estimated solution \hat{\vec{x}}:

    \begin{gather*} \vec{e} = \vec{x} - \hat{\vec{x}} , \\ \mathbf{A}\left(\vec{x}-\hat{\vec{x}}\right) + \mathbf{A}\hat{\vec{x}} = \vec{b} \\ \Rightarrow \mathbf{A}\vec{e} = \vec{b} - \mathbf{A}\hat{\vec{x}} = \vec{r}. \end{gather*}

We see that by solving the linear system with the residual vector \vec{r} on the RHS, we get the error term which leads us to the actual solution from \hat{\vec{x}}. As we cycle through the grids, we get a more refined estimate of \vec{e}. The trick in algebraic multigrid methods is in manipulating the system without the need of any geometric information, allowing us to treat the process as a “blackbox”, which is an engineer’s favorite approach. In the algebraic method, we transfer results from the fine mesh to the coarse one (restriction operation) by using a short and fat interpolation matrix known as the restriction matrix (\mathbf{I}_f^c):

    \begin{align*} \vec{r}^{c} = \mathbf{I}_f^c  \vec{r}^{f} . \end{align*}

In the reverse direction (prolongation operation), we use a tall and skinny interpolation matrix \mathbf{I}_c^f = \left(\mathbf{I}_f^c\right)^T termed the prolongation matrix. The restriction and prolongation matrices are just transposes of each other. Hence, when we want to cycle to the next coarse mesh, after dropping the fine-to-coarse notation for the short and fat matrix \mathbf{I}_f^c, we perform:

    \begin{align*} \left( \mathbf{I} \mathbf{A} \mathbf{I}^T \right) \vec{e}^{c} = \mathbf{I}  \vec{r}^{f} \\ \vec{e}^{f} = \mathbf{I}^T \vec{e}^{c} . \end{align*}

This is known as a Galerkin projection. We see that after solving the coarse system of equation, we get the error in the fine mesh by using the interpolation matrix again. The interpolation is thus a key ingredient in AMG methods. Here is a pseudo code that shows how AMG works between 2 grids:

x_f = smooth(A, b, x_0);              // Initial guess, 2-3 iterations on x_0
r_f = b - A*x_f;                      // Find residual
r_c = I*r_f;                          // Interpolate residual to coarse mesh
e_c = solve(I*A*I_transpose, r_c);    // Solve for error (or recursion)
e_f = I_transpose*e_c;                // Prolongate error back unto fine mesh
x_f += e_f;                           // Update guess
x_f = smooth(A, b, x_f);              // Additional smoothing, 2-3 iterations on x_f

At the finest grid, we perform a smoothing on the initial guess x_0. This means we “solve” A*x_f = b using fixed point iteration like damped Jacobi or Gauss-Seidel with x_0 as the initial guess. By “solve”, I mean we stop after only 2 to 3 iterations. Then, we interpolate the residual to the next mesh, before solving the projected system (I*A*I_transpose)*e_c = r_c. In a 2 grid set up, we solve the system for real. Once that is done, we go in the reverse direction: transfer the error term back to the fine mesh, update the solution, then perform another smoothing process with the updated solution as initial guess.

With more grids, we simply do a recursion at the solve part until we reach the coarsest grid. A direct solver is then used. At the end of the process, if the residuals are not satisfactory, the obtained x_f is used as the new x_0, and the process loops until convergence. This process is known as a V-cycle.

The algebraic part of the process comes in the form of constructing \mathbf{I}. When solving a PDE with finite difference/volume/element, a node’s neighbor will cause a nonzero entry in its own row of the coefficient matrix. This can be used heuristically to group nodes together for the coarse mesh. A classical method is the Ruge-Stuben method, where it uses the strength of connection between each node to the off-diagonal nodes in its own row as a grouping criteria. A very detailed set of notes by Falgout from the Lawrence Livermore National Laboratories explaining the method and AMG in general can be found here.

Leave a reply

Your email address will not be published.