Category Archives

11 Articles

Inverting PDEs with adjoints

Most books and lectures spend a great deal of time on teaching methods to solving PDEs – finding a solution given the equation and coefficients. However in the real world we normally don’t have clear knowledge on the parameters to the model. In fact, some times we might perform experiments to measure the solution of the PDE and we are supposed to get the coefficients from the results – an inverse problem.

One such example is the calculation of the earth’s electrical resistance in the subsurface. Below is a figure taken from a paper here.

Where the electrical potential of the earth (\phi) is governed by the Poisson equation:

    \begin{align*} \nabla \cdot \left(\sigma \nabla \phi\right) = s \end{align*}

In the example above, the resistance electrical potential is measured at probes inserted into the ground (\phi_{obs}) and a mapping of the conductivity \sigma is to be found. So we have the solution to the PDE at discrete probe locations and the challenge is to find the coefficient \sigma. One might immediately think of a least squares problem of the sorts:

    \begin{align*} \min_{\sigma} &\sum_i \left( \phi_i - \phi_{obs,i} \right)^2 \\ \text{subject to } &\nabla \cdot \left(\sigma \nabla \phi\right) = s \end{align*}

In order to solve such a minimization, one could use gradient descent and this would require use to calculate the gradient of cost function with the conductivity. This is where adjoint methods come in.

Adjoint methods
In a more general case, the minimization problem is:

    \begin{align*} \min_{\beta}\ &J\left(\vec{u},\vec{\beta}\right) \\ \text{subject to}\ &N\left(\vec{u},\vec{\beta}\right) = 0 \text{ in}\ \Omega \\ \text{with}\ &a \frac{\partial \vec{u}}{\partial n} + b \vec{u} = g \text{ in}\ \partial \Omega \end{align*}

Where we are minimizing a cost function J with the design variables \vec{\beta}, and \vec{u} is the primal variable governed by the PDE operator N with a general robin condition imposed on the boundaries. The derivative (or sensitivity) \frac{d J}{d \vec{\beta}} would have to be calculated.

If we only have one design variable, we can simply perturb it and solve the PDE twice to get the sensitivity using finite difference as an example. However, \vec{\beta} usually contains a large number of elements and simply using finite difference becomes intractable. For example, with the earth conductivity problem, one may want to minimize the problem with a unique \sigma at every discrete mesh node and subsequently the length of \vec{\beta} becomes as large as the total number of nodes.

Using a mathematical trick, we can solve a (different) PDE once, then use the result to obtain as many sensitivities we like with minimal additional cost. This mathematical trick can be done to either the continuous equation or discretised equations and they are known as continuous adjoint formulation and discrete adjoint formulation respectively. The discrete adjoint formulation is easier to grasp and we dive into it with an example. We will also look at continuous adjoint formulation later and see where it gets complicated.

Discrete adjoint formulation
Through the discretisation process, the original PDE is reduced to a set of residuals which we would like to drive to zero:

    \begin{align*} \vec{R}\left(\vec{u},\vec{\beta}\right) = 0 \end{align*}

This means that the residual should be a constant of zero, and any derivative of a constant is zero. Linearising this residual with respect to the i^{th} \beta:

    \begin{align*} \frac{d \vec{R}}{d \beta_i} = 0 &= \frac{\partial \vec{R}}{\partial \beta_i} + \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right) \frac{\partial \vec{u}}{\partial \beta_i} \\ \Rightarrow \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right) \frac{\partial \vec{u}}{\partial \beta_i} &= -\frac{\partial \vec{R}}{\partial \beta_i} \end{align*}

From the objective function, the sensitivity is:

    \begin{align*} \frac{d J}{d \beta_i} = \frac{\partial J}{\partial \beta_i} + \left(\frac{\partial J}{\partial \vec{u}}\right)^\intercal \frac{\partial \vec{u}}{\partial \beta_i} \end{align*}

This is what we are really after. It tells us how the objective function changes with the design variable and is used in the minimization. Note that the difficulty in this whole thing is calculating the second term \partial \vec{u}/ \partial \beta_i because as we have seen from the first equation, this requires us to solve the linear system for n_\beta different RHS.

Now we introduce a variable \vec{\phi} which is defined as:

    \begin{align*} \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)^\intercal \vec{\phi} = \frac{\partial J}{\partial \vec{u}} \end{align*}

\vec{\phi} is going to be extremely helpful to us because the problematic second term is now:

    \begin{align*} \left(\frac{\partial J}{\partial \vec{u}}\right)^\intercal \frac{\partial \vec{u}}{\partial \beta_i} = \left[\left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)^\intercal \vec{\phi} \right]^{\intercal} \frac{\partial \vec{u}}{\partial \beta_i} = \vec{\phi}^{\intercal} \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)\frac{\partial \vec{u}}{\partial \beta_i} = \vec{\phi}^{\intercal} \left(-\frac{\partial \vec{R}}{\partial \beta_i}\right) \end{align*}

Therefore the sensitivity now becomes:

    \begin{align*} \frac{d J}{d \beta_i} = \frac{\partial J}{\partial \beta_i} + \vec{\phi}^{\intercal} \left(-\frac{\partial \vec{R}}{\partial \beta_i}\right) \end{align*}

Where \partial J / \partial \beta_i and \partial \vec{R} / \partial \beta_i are easy and cheap to calculate, and we only need to solve the linear system once for \vec{\phi} which is also known as the adjoint solution.

Discrete adjoint example
Now we move on to the more interesting stuff with an example. Suppose we have a 2D metal plate and we want to find it’s conductivity which varies with temperature. We conduct an experiment where it is subjected to a known heat flux (i.e. we can heat up the middle of the plate) and we measure the temperature on the entire plate. The problem is thus:

    \begin{align*} \min_{k} J &= \min_{k} \sum_i \left( T_i - T_{obs,i} \right)^2 \\ \text{subject to } &\nabla \cdot \left(k \nabla T \right) = q \text{ in}\ \Omega\\ \text{with}\ T &= T_{bound} \text{ on}\ \partial \Omega \end{align*}

Discrete adjoint example truth values
We create some truth values with a known conductivity distribution.

Where the thermal conductivity follows

    \begin{align*} k = 13221 \times T^{-1} + 27 \end{align*}

Solving the heat equation using finite differences on Python gives us the temperature field which we will use as our “experiment” result:

And the conductivity field we can use to assess our inverted results:

Discrete adjoint example solution method
Now on to solving the minimization problem. In this example, we allow k to vary at every grid point. The solution process is made up of 2 loops: an inner loop to solve the nonlinear system of equation for the primal solution using Newton’s method, and an outer loop where the adjoint solution is found and used to update the conductivity at every grid point.

An overview of the solution process is:

Initialize T and k 
Start adjoint loop
    Start Newton's iteration loop
        Assemble dR/dT matrix and R vector
        Solve for delta T and update T
    Compute dJ/dT
    Solve for adjoint solution using transpose of dR/dT matrix calculated before
    Compute dR/dk matrix
    Loop through nodes
        Compute dJ/dk for each node
        Update k with -dJ/dk

Newton’s method would require calculation of the Jacobian matrix \partial \vec{R}/\partial \vec{T}, while to compute the sensitivity we require the calculation of the term \partial \vec{R}/\partial \vec{k}. Luckily we can compute these terms using automatic differentiation on a single function. We define the following function to compute the residual on an internal grid point using central differences:

def compute_R_int_discrete_k(input_vec): # input_vec = [T_i,j    T_i-1,j    T_i+1,j    T_i,j-1    T_i,j+1    q    k_i,j    k_i-1,j    k_i+1,j    k_i,j-1    k_i,j+1]
  
  R_int_term_1 = input_vec[6]*\
    (
      (input_vec[2] - 2*input_vec[0] + input_vec[1])/(DELTA_H**2) + 
      (input_vec[4] - 2*input_vec[0] + input_vec[3])/(DELTA_H**2)
    )
  
  R_int_term_2 = \
    (
      (input_vec[8] - input_vec[7])/(2*DELTA_H)*\
      (input_vec[2] - input_vec[1])/(2*DELTA_H)
      
    )\
    +\
    (
      (input_vec[10] - input_vec[9])/(2*DELTA_H)*\
      (input_vec[4] - input_vec[3])/(2*DELTA_H)
    )
  
  R_int_term_3 = input_vec[5]
  
  return R_int_term_1 + R_int_term_2 + R_int_term_3

The required input values are passed in as a single array to facilitate the use of grad function from the autograd package. Details of the package can be found here. The function to calculate the derivative is then created:

compute_R_int_discrete_k_grad = grad(compute_R_int_discrete_k)

The function compute_R_int_discrete_k_grad returns a numpy array of the derivative of R with respect to the various inputs in input_vec. The derivative is calculated with one line of code. Perfect.

We now have one more quantity to calculate to solve for \vec{\phi}, and that is \partial J/\partial \vec{T}. This is simply:

    \begin{align*} \frac{\partial J}{\partial \vec{T}} = 2\left(\vec{T}-\vec{T}_{obs}\right) \end{align*}

Discrete adjoint example results
After trying a few “learning rates”, we are able to get the objective function to reduce nicely and achieve a RMS error of less than 4 degrees.

Below is a plot of the temperature and conductivity field.

It is difficult to get the conductivity to be as smooth as the truth due to k being varied individually hence introducing spikes in the first derivative.

However in the plot above, we see that the optimised conductivity were very close to the truth.

One way we can smoothen the inverted field is to use basis functions to describe k. This function can either be functions of temperature or space (i.e. let k=\sum \beta_i \psi_i\left(T\right) or k=\sum \beta_i \psi_i\left(x, y\right)). The design variable is now \beta and minor changes would have to be made to the calculation of the R and J. With the use of automatic differentiation, the calculation of \partial R/\partial \beta and \partial J/\partial \beta is again trivial.

The complete set of Python script can be found here.

Lagrange multipliers
In the derivation of the discrete adjoint equation earlier, I brought up the equation \left(\partial \vec{R}/\partial \vec{u}\right)^\intercal \vec{\phi}=-\partial J/\partial \vec{u} out of the blue, and it magically solved all our problems. Now let’s dig a little deeper and see where this term comes from.

In a constrained optimisation problem, we can form an augmented cost function known as the Lagrangian:

    \begin{align*} J_{a} = J + \vec{\phi}^\intercal \vec{R} \end{align*}

The additional term does not do anything to the cost function since \vec{R}=0 everywhere and we can choose any vector for \vec{\phi} which is the Lagrange multiplier. Taking the derivative as before,

    \begin{align*} \frac{d J_a}{d \beta} &= \frac{\partial J}{\partial \vec{u}}\frac{d \vec{u}}{d \beta} + \frac{\partial J}{\partial \beta} + \vec{\phi}^\intercal \left(\frac{\partial \vec{R}}{\partial \vec{u}}\frac{d \vec{u}}{d \beta} + \frac{\partial \vec{R}}{\partial \beta}\right) \\ &= \left(\frac{\partial J}{\partial \vec{u}} + \vec{\phi}^\intercal \frac{\partial \vec{R}}{\partial \vec{u}}\right)\frac{d \vec{u}}{d \beta} + \frac{\partial J}{\partial \beta} + \vec{\phi}^\intercal \frac{\partial \vec{R}}{\partial \beta} \end{align*}

Ah! We see that if we make the terms in the brackets on the last line go to zero, we arrive at the same alternate calculation for the sensitivity as we had derived previously. Hence, by solving for a particular vector \vec{\phi}, we totally change the way d J/d \beta is calculated, saving a whole lot of computational effort. This is where the elegance of adjoint methods shine through for me.

Continuous adjoint formulation
The same concept is used in the derivation of the continuous adjoint formulation, however, there are a few additional steps. We will look at the plate conductivity example problem in 1D to derive the continuous adjoint formulation. First, we start with the cost function in the continuous formulation:

    \begin{align*} J &= \int_{\Omega} cT dx \\ J_a &= \int_{\Omega} cT dx + \int_{\Omega} \phi \frac{\partial}{\partial x} \left( k \frac{\partial T}{\partial x} \right) - f dx \end{align*}

Where c can be a penalty function for example. We have diverged slightly from the example problem to make the derivation simpler. Taking the variation of the augmented cost function and assuming that the partial derivative commutes to inside the integral,

    \begin{align*} \delta J_a &= \frac{\partial J_a}{\partial k} \delta k + \frac{\partial J_a}{\partial T} \delta T \\ \delta J_a &= \int_{\Omega} c \delta T dx + \int_{\Omega} \phi \left[\frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) + \frac{\partial}{\partial x} \left( k \frac{\partial \delta T}{\partial x} \right)\right] dx \end{align*}

Now we want to move \delta T outside the derivative, so we apply integration by parts twice to the last term:

    \begin{align*} \int \phi \frac{\partial}{\partial x} \left( k \frac{\partial \delta T}{\partial x} \right) dx &= \left[\phi k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \int k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial \delta T}{\partial x}\right) dx \\ &= \left[\phi k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \left[ \delta T k \left(\frac{\partial \delta \phi}{\partial x}\right)\right]_{x_0}^{x_1} + \int \frac{\partial }{\partial x}\left( k \frac{\partial \phi}{\partial x}\right) \delta T dx \end{align*}

Substituting it back into the augmented cost function,

    \begin{align*} \delta J_a &= \int_{\Omega} \left[ c + \frac{\partial }{\partial x}\left(k \frac{\partial \phi}{\partial x}\right) \right] \delta T dx + \int_{\Omega} \phi \frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) dx + boundary\ terms \end{align*}

Now, this step must be familiar because we are going to eliminate the term in the square brackets, giving us the adjoint equation:

    \begin{align*} \frac{\partial }{\partial x}\left( k \frac{\partial \phi}{\partial x}\right) = - c \end{align*}

This makes \partial J_a / \partial T = 0 since we are making the terms paired with \delta T zero. Now we want to get \delta k out of the derivative so you guessed it, we will have to integrate by parts the last term once.

    \begin{align*} \int \phi \frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) dx &= \left[\phi \delta k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \int \delta k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right) dx \end{align*}

Finally,

    \begin{align*} \delta J = \int_{\Omega} \delta k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right)dx + boundary\ terms \end{align*}

Ignoring the boundary terms for now, this means that the scalar objective function varies with k integrated over \left(\partial \phi / \partial x\right) \left(\partial T / \partial x\right) throughout the domain. Hence the local variation of J with k is:

    \begin{align*} \frac{d J}{d k} = \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right) \end{align*}

Back to the boundary terms, we see that 3 terms popped out through the integration by parts process. In order for us to use the above relation to obtain the sensitivity, we have to pick appropriate boundary conditions for \phi such that the terms indeed vanish. This is similar to the natural boundary conditions that appear in FEM formulations, except that this time we have to handle them explicitly.

Whew, that was a lengthy process, but we finally derived the continuous adjoint equations. Note that this is only for a single variable Poisson equation in 1D. One can imagine how much more complicated the derivation can get with more complicated PDEs like the coupled RANS equation with turbulence models for example. For every new equation, it’s corresponding adjoint would have to be derived and implemented separately. The treatment of the boundary conditions is also another matter that has to be handled delicately.

Note that in the discrete formulation, there was no need to derive specific boundary conditions for the adjoint solution. With the naive implementation through automatic differentiation, we were able to code up an example relatively quickly. However this approach would not scale up as the automatic differentiation uses up much more memory and computational resource than the corresponding continuous implementation. Hence, the choice between discrete and continuous adjoint formulation is a balance between programming effort and computational effort.

More adjoint examples
Adjoints are also used in CFD to do some really interesting things. Traditionally, they have been used to drive shape optimization and this functionality is available in most commercial and open source codes. For example, the objective function can be set to the aerodynamic efficiency of the model, and the sensitivity to the surface deformation is calculated. This involves a mesh morpher and additional terms in the sensitivity computation coming from the mesh deformation method. An example taken from here shows the results of running an optimisation cycle using adjoint methods on the back of a car. The baseline shape is on the right half and the optimised shape on the left. We can see that the optimised solution has a narrower boat tail and greater pressure recovery.

 

 

Another more recent example of adjoints in CFD is in the field of turbulence modelling. Machine learning has been applied to RANS models in order to improve their prediction over a larger range of flow regimes. A great paper is found here. The first step to training these models is gathering data and these data are obtained from field inversion – much like what we are doing here.

An example of the results from a tuned turbulence model from the quoted paper is shown above. The model was improved from the green line to the red line in the figure on the left. The middle figure shows the flowfield based on baseline RANS model, while the right most figure shows the flow field based on machine learned turbulence model. The machine learned model was able to predict post stall characteristics of the airfoil accurately which is a major deficiency in most current RANS models.

Another interesting fact: adjoint and automatic differentiation techniques have even found their way into finance, where it is used to speed up computations of the sensitivities of swap options to asset prices and volatility.

Understanding turbulence

by joelcfdwebsite 0 Comments

The subject of turbulence modelling conjures up images of page long equations that can be intimidating to the average reader (at least it is to me). The importance of understanding turbulence modelling was brought up to me in a recent project where the boundary layer takes up a critical part of the flow. I decided to take this opportunity to try and gain an intuitive understanding of turbulence.

Reynolds stresses from a statistical point of view

Most undergraduate courses introduce turbulence from the Reynolds Averaged Navier-Stokes (RANS) equations and highlight the additional \left<u_i'u_j'\right> terms brought about by the Reynolds decomposition process. They then go on to explain the difficulty in closing the equations and introduce the Boussinesq approximation as way of solving the problem in an engineering way. This is a nice way of explaining turbulence in a nutshell, but what is turbulence?

The Reynolds stress tensor can actually be thought of as a covariance matrix between velocities in the various components. Recall that covariance between two random quantities x and y is taken to be:

    \begin{align*} cov\left(x,y\right) = \frac{1}{N} \sum_n^N \left(x_n-\left<x\right>\right)\left(y_n-\left<y\right>\right) = \left<\left(x-\left<x\right>\right)\left(y-\left<y\right>\right)\right> \end{align*}

Also recall the Reynolds decomposition:

    \begin{align*} u_i &= \left<u_i\right> + u_i' \\ \Rightarrow \left<u_i'u_j'\right> &= \left<\left(u_i-\left<u_i\right>\right)\left(u_j-\left<u_j\right>\right)\right> = cov(u_i,u_j) \end{align*}

Hence we see how the Reynolds stress tensor can be thought of as a covariance matrix. So what does this seemingly statistical quantity have to do with stresses being applied on a fluid? First, we have to imagine a boundary on an infinitesimal cube of fluid.

Take the top surface for example. At an instance of time, the instantaneous flux of fluid crossing the boundary into the cube injects momentum in the x-direction (M_1):

    \begin{align*} \Delta M_1 = \rho \left(-\vec{u}\cdot \hat n\right)u_1 \Delta A = -\left(\rho u_2\right)u_1 \Delta A \end{align*}

Where the terms in parenthesis represent the mass flux going into the cube. Likewise, flux leaving the cube can take positive/negative momentum along with it, reducing/increasing the momentum left in the cube. In turbulent flows, these fluxes fluctuate. Averaged over time:

    \begin{align*} \left<\Delta M_1\right> = -\rho \left<\left(\left<u_1\right>+u_1'\right)\left(\left<u_2\right>+u_2'\right)\right> \Delta A = -\rho \left( \left<u_1\right>\left<u_2\right> + \left<u_1'u_2'\right> \right) \Delta A \end{align*}

Where the \left<u_i\right>u_j' terms cancel in the averaging process. We see that the covariance terms add an additional momentum change on average. In more concrete terms, imagine we have a scatter of velocities on the top surface of the cube as shown on the left plot:

The u_1 and u_2 velocities have mean values of 0 and have the following covariance matrix:

    \begin{align*} \begin{bmatrix} 1 & 0.75 \\ 0.75 & 1 \end{bmatrix} \end{align*}

The positive correlation between u_1 and u_2 tells us that there is a tendency for the fluid to have a negative x-velocity when the fluid enters the cube (i.e. u_1<0 when u_2<0), and a positive x-velocity when fluid is leaving the cube (i.e. u_1>0 when u_2>0). On average, fluid crossing the top boundary brings low momentum in and takes high momentum out, reducing the momentum in the cube.

The covariance measures how strong that tendency is and hence how strongly the momentum is affected. This is exemplified in the plot on the right which shows the \Delta M_1 term for each point on the scatter plot. The red line shows that the mean values of -u_1u_2 is -0.74 in this case. Theoretically this value should take on the off diagonal values in the covariance matrix, which is -0.75.

Hence we can think of turbulence as the preference of the fluid to move in one direction in tandem with another on average. Because of the fluid’s preference to fluctuate in a certain way, it causes an average momentum flux which is non zero in general. The covariance of the fluid varies throughout the flow domain and between different flows, and is a difficult quantity to predict (if at all).

Decomposition and rotation of the Reynolds stress tensor

As one rotates the axes (i.e. rotating the cube of fluid), the Reynolds stress tensor will be transformed into the new axes. However, there are certain properties of the tensor which will remain invariant. One such property is the trace of a tensor, which works out to be twice the turbulent kinetic energy. Hence one can subtract \frac{2}{3}k\delta_{ij} from the Reynolds stress tensor to extract the Reynolds stress anisotropy a_{ij}:

    \begin{align*} a_{ij} = \left<u_i'u_j'\right> - \frac{2}{3}k\delta_{ij} \end{align*}

One can think of \frac{2}{3}k\delta_{ij} as an analog to the hydrostatic pressure faced by an infinitesimal cube of fluid. This hydrostatic pressure always acts in the normal direction with the same magnitude regardless of how one rotates the cube.

At certain orientations, one can obtain a Reynolds stress tensor which is diagonal. Linking this back to our interpretation of the tensor as a covariance matrix, this is equivalent to performing a principal component decomposition! The eigenvalues then tell us how the velocities are scattered.

For example, the velocities may vary greatly in one principal component but not much in the other 2, resulting in 1D turbulence. The velocity may also be scattered in an ellipse on a 2D plane with little variation in the third component (2D turbulence). Finally, the velocities may be totally uncorrelated, resulting in a scatter within a sphere (3D isotropic turbulence).

Lumley triangle

A more rigorous method of describing turbulence like we did above is through the turbulence triangle. First, we nondimensionalize the anisotropy tensor:

    \begin{align*} b_{ij} = \frac{a_{ij}}{2k} = \frac{\left<u_i'u_j'\right>}{2k} - \frac{1}{3}\delta_{ij} \end{align*}

For a general tensor, there are 3 quantities which are invariant with rotation. These quantities are simply referred to as the invariants and they are some times labelled as I_1, I_2 and I_3. The trace and determinant makes up 2 of the invariants. All invariants can be calculated from the eigenvalues. More information can be found on Wikipedia.

Because the trace of b_{ij} is zero, we really only have 2 independent invariants. This means that we are able to describe the anisotropy on a 2D plot independent of the tensor coordinate axes. Pulling a plot from a very useful paper where the shape of the velocity scatter is shown as a surface:

This tells us the type of turbulence that is not based on any particular frame of coordinate system. Sometimes instead of I_2 and I_3, we use \xi and \eta, which is calculated slightly differently from the eigenvalues to stretch out the triangle for easier visualisation. As an example, let’s look at the variation in turbulence throughout a turbulent boundary layer. The data was taken here from DNS simulations.

On the left plot above, we see the variation of the various terms of the Reynolds stress tensor in the boundary layer. The \left<u'u'\right> component rises quickly and reaches a peak around y+ \approx 7 above all other components. Far from the wall, all components on the diagonal of the Reynolds stress tensor converge.

The variation in turbulence is plotted on the Lumley triangle on the right in red. Close to the wall, the turbulence is highly 2D (the fluctuations are dominated by \left<u'u'\right> and \left<w'w'\right>), it moves towards the 1D corner as \left<u'u'\right> reaches its peak at y+ \approx 7. As we move further from the wall, the red dots move towards the origin, indicating that turbulence approaches isotropic as we move far away enough from the wall. There are many other ways to visualize the type of turbulence on a flow domain, one of which is quite cool.

Limits of the Boussinesq approximation

According to the Boussinesq hypothesis, the Reynolds stress anisotropy can be closed with:

    \begin{align*} a_{ij} = -\nu_T\left(\frac{\partial \left<u_i\right>}{\partial x_j}+\frac{\partial \left<u_j\right>}{\partial x_i}\right) = -\nu_T s_{ij} \end{align*}

There are 2 important assumptions made: (1) the turbulence is described by the local mean flow gradients and (2) the anisotropy is aligned with the mean rate-of-strain tensor. Both these assumptions can be shown to be plain wrong in certain cases. Take the flow in such a channel for example.

In this channel, the flow gradients are zero in the parallel flow sections. In the varying section:

    \begin{align*} \left<u\right> = U_0 \quad \left<v\right> = ay \quad \left<w\right> = -az \quad \end{align*}

Where U_0 and a are constants. This flow was investigated experimentally in this paper and the results summarized in Wilcox’s book which is a good resource (along with Pope’s book).

The experimental and computational results (k-\omega^2 model) are shown in markers and lines respectively above. With the Boussinesq approximation, the model was able to obtain correct trends in anisotropy in the varying section and immediately returns to isotropy in the parallel section afterwards (x>2.3). This mirrors the mean rate-of-strain tensor. However in experiments, the flow takes time to return to isotropy. Hence the anisotropy of turbulence is affected by flow history, which is not modeled with the Boussinesq approximation. This is an example of when assumption (1) is not correct.

A separate shear flow example taken from Pope’s book shows when assumption (2) is wrong. Consider a flow with the following profile below.

In this example, the only non-zero gradient is \partial \left<u\right>/\partial y so:

    \begin{align*} s_{ij} = \begin{bmatrix} 0 & \frac{\partial\left<u\right>}{\partial y} & 0 \\ \frac{\partial\left<u\right>}{\partial y} & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \end{align*}

Since \left<u_i'u_j'\right> = -\nu_T s_{ij} + \frac{2}{3}k\delta_{ij}, we expect all the diagonal terms of the Reynolds stress tensor to be equal. However, both experimental and DNS results show that the Reynolds stress tensor have non equal diagonal terms. For example, from the DNS results:

    \begin{align*} \frac{\left<u_i'u_j'\right>}{k} = \begin{bmatrix} 1.06 & -0.33 & 0 \\ -0.33 & 0.32 & 0 \\ 0 & 0 & 0.62 \end{bmatrix} \end{align*}

The diagonal terms would have been calculated to be 2/3 using the turbulence viscosity model. Hence we see where turbulence models based on the Boussinesq approximation is not expected to perform well.

CFD for a better tomorrow

Recently, I had the pleasure of meeting the guys behind Yonah, a start up aiming to revolutionize healthcare in rural areas. As electric motor makers pack increasingly more power into smaller packages, a new class of hybrid VTOL/fixed wing UAVs have emerged in recent years. The concept is simple, slap on 4 VTOL propellers to a fixed wing aircraft to remove the dependence on runways, then transit to forward fixed wing flight as normal. Since the motors are small and light, they carry little penalty. The guys aim to use this technology to deliver medical supplies to remote villages where no run ways exist for conventional planes, and the terrain makes driving there long and arduous. You can check out their interview on Channel NewsAsia.

I really like the drive and enthusiasm showed by the team – If you wake up everyday knowing you are working towards improving the lives of others, who wouldn’t have that drive? – and decided to chip in with my limited knowledge.

Since they have access to the National Supercomputing Centre (NSCC),  I decided to characterize their aircraft on OpenFOAM, which was available there, and see what insights we can obtain. I was also pretty excited to be able to use the NSCC, which is such a computing behemoth.

Meshing

The first order of business was to clean up the geometry and generate a mesh. As with all real engineering projects, the given geometry was very dirty with many small unnoticeable gaps. With my experience, I learn that some times it is better to keep only the large features/surfaces and delete everything else, then reconstruct the geometry from there.

Once we have a nice water tight geometry, the mesh was generated on Ansys Mesher. Care was taken to ensure the first layer was of the correct height using a y+ calculator, and we want to make sure we export the mesh in ASCII format.


OpenFOAM set up

Here comes the tricky bit. The learning curve with OpenFOAM is notoriously steep, which is the price you pay for a free software. However, I believe in understanding every single input one puts into a CFD analysis. Sometimes using commercial software without practicing mindfulness causes one to treat it like a black box and we lose touch with the fundamentals.

For me the trickiest part is in setting the correct boundary conditions. Since the domain is an external flow problem, we can use the freestream boundary for velocity and the matching freestreamPressure for pressure. The freestream boundary look at the local flow conditions and decide whether to impose a velocity (if inflow) or zero gradient (if outflow). It is actually inletOutlet under the hood.

As we are using a high y+ mesh to save on cell count, one needs to use the wall functions. We are also using k-omega SST turbulence model. Typically one sets the turbulence intensity (I) and turbulent viscosity (\nu_t) ratio at the boundaries. A turbulence intensity of 1% is a good value to use for external flow and from there,

    \begin{align*} I=\frac{u'}{U}=\frac{\sqrt{\frac{1}{3}\left(u'_x^2+u'_y^2+u'_z^2\right)}}{\sqrt{U_x^2+U_y^2+U_z^2}} \quad \Rightarrow \quad k = \frac{1}{2}\left(u'_x^2+u'_y^2+u'_z^2\right) = \frac{3}{2}\left(IU\right)^2 \end{align*}

Since \nu_t=k/\omega, the specific dissipation rate at the boundary is calculated with:

    \begin{align*} \omega=\frac{k/\nu}{\nu_t/\nu} \end{align*}

With a typical \nu_t/\nu value of 10, one easily obtains k and \omega value at the boundaries. The quoted I=1\% and \nu_t/\nu=10 were taken from Fluent’s gudelines. Similar to the freestream boundary for velocity, we use inletOutlet for both k and \omega.

Now, at the walls, we have to use kqRWallFunction and omegaWallFunction to invoke the wall functions for our high y+ mesh. However, one still needs to set an initial value. For k, it is straight forward. Since it is a no slip wall, k_{wall}=0. As for \omega, the following relation was found for the viscous sublayer:

    \begin{align*} \omega_{visc}=\frac{6\nu}{\beta y^2} \end{align*}

Where y is the centroid distance to the wall and \beta is a constant of 0.075. As we get closer and closer to the wall, \omega_{visc} approaches infinity, and we can just put a very large number for it.

Results

Thanks to the amazing power of the NSCC, I was able to get an entire polar within a couple of hours. We were able to gain some insights into the design as explained in the project page.

Back to basics with PDEPE

Recently I got introduced to a PDE solver in MATLAB and thought it was a wonderful tool. The function in question is pdepe and on MATLAB’s webpage it claims to be a one-dimensional elliptic-parabolic PDE solver. By wrapping our governing equation, initial conditions and boundary conditions into functions with predefined function signatures, we were able to solve nonlinear system of PDEs (!) quickly. The total amount of code we wrote was less than 100 lines.

Using PDEPE
Despite the parabolic-elliptic caveat given on the MATLAB website, one may be tempted to try solving a hyperbolic equation. Trying it out is easy. Let’s solve a linear advection equation for a cosine wave:

    \begin{align*} \frac{\partial u}{\partial t} + \frac{\partial u}{\partial x} &= 0 \quad , \quad 0 \leq x \leq L \\ u(x, 0) &= \begin{cases}  \cos(x) \quad , \quad 0 \leq x \leq 2\pi \\ 1 \quad , \quad otherwise \end{cases} ,\\ u(0, t) &= 1 \quad , \quad u(L, t) = 1 . \end{align*}

One first casts the governing equation into the form given by MATLAB:

    \begin{align*} c \frac{\partial u}{\partial t} = x^{-m}\frac{\partial}{\partial x}\left(x^m f\right) + s . \end{align*}

Where c, f and s are functions of \left(x, t, u, \partial u/ \partial x\right). We then write a function to compute c, f and s:

function [c, f, s] = advectionEquation(x, t, u, dudx)
    c = 1;
    f = -u;
    s = 0;
end

For the boundary conditions, we need to identify p and q such that at the boundaries,

    \begin{align*} p + qf = 0 . \end{align*}

It’s a bit annoying that MATLAB uses the flux f instead of \partial u/\partial x. If we had a Neumann boundary condition (i.e. \partial u /\partial x = 0 at L), we would have a bit of a trouble implementing it for our case. Writing p and q on the boundaries into the required function:

function [pl, ql, pr, qr] = boundaryConditions(xl, ul, xr, ur, t)
    pl = ul - 1;
    ql = 0;
    pr = ur - 1; 
    qr = 0;
end

The initial conditions is straightforward:

function u0 = initialCondition(x)
    if x <= 2*pi
        u0 = cos(x);
    else
        u0 = 1;
    end
end

Putting everything into a script:

% Parameters
m = 0;
x = linspace(0, 7*pi, 500);
t = linspace(0, 12, 5);

% Solve
sol = pdepe(...
    m,...
    @advectionEquation,....
    @initialCondition,...
    @boundaryConditions,...
    x,...
    t);
u = sol(:,:,1);

% Plot results
figure
for iT = 1:length(t)
    plot(x, sol(iT, :, 1), 'linewidth', 2)
    hold on
end
grid on
title('Advection of cosine wave')
xlabel('x')
ylabel('u')

Perfect! At first glance, one sees the wave moving smoothly to the right with no problems. However, look closer at the trough of the wave. I’ll help you with some markers:

We see that the shape is not preserved, the peak is getting smaller and smaller. this is likely due to the numerical viscosity diffusing the solution. However, the results are still encouraging.

Dealing with discontinuities
With the favorable results, let’s move on and try a more challenging case: Advection of a step function.

    \begin{align*} \frac{\partial u}{\partial t} + \frac{\partial u}{\partial x} &= 0 \quad , \quad -L \leq x \leq L \\ u(x, 0) &= \begin{cases}  1 \quad , \ x \leq 0 \\ 0 \quad , \quad x > 0 \end{cases} ,\\ u(-L, t) &= 1 \quad , \quad u(L, t) = 0 . \end{align*}

The only difference here is the initial condition and boundary conditions. The required MATLAB functions are similar to the one given earlier so I will leave them out.

Argh! Our friend spurious oscillation rears its ugly head! The theoretical discontinuity locations are marked with black markers and we can see that the drop occurs at the correct location, but there are some wild oscillation occurring before that. You can also see that the discontinuity gets a little smeared out. The quickest and dirtiest way to remedy such a situation would be to add some damping (artificial viscosity). Our equation now becomes:

    \begin{align*} \frac{\partial u}{\partial t} + a \frac{\partial u}{\partial x} + b \frac{\partial^2 u}{\partial x^2} = 0. \end{align*}

We need to tweak the function in MATLAB:

function [c, f, s] = advectionEquation(x, t, u, dudx)
    a = 1;
    b = -0.003; % Found through trial and error
    c = 1;
    f = -b*dudx;
    s = -a*dudx;
end

Things are starting to get a little fudgy. One has to tweak the coefficient b around a little. We want it to be big enough to remove the oscillations but small enough not to affect the results too great. Also, note that b should be <0 otherwise we will be adding “negative diffusion” (its “negative” because diffusion terms are typically written on the RHS) which destabilizes the solution. I eventually settled on a value of -0.003 such that it is small enough to have just a little overshoot at the oscillation.

Great! Now the solution looks really good. However, we have to remember what we did to arrive at this solution: the addition of a nonphysical diffusion term. This term is highly dependent on the solution and mesh size. How much to add is really a dark art and left to the coder’s discretion. Also, in this case it is relatively easy to determine how much diffusion to add because it is a pure advection equation – there should be as little smearing of the solution as possible and we already know what the solution looks like. If we had an advection-diffusion equation, and especially if the terms are nonlinear, it becomes increasingly difficult to gauge how much artificial diffusion to add.

These oscillations were a big problem in the computational fluid dynamics community in the 70s. The mathematical character of compressible flows are highly hyperbolic, discontinuities can form in the domain even from smooth initial and boundary conditions. Shockwaves are a prime example. Highly sophisticated theories and mathematical treatments gave rise to the total variation diminishing (TVD) schemes which make use of flux limiters. Despite the complicated sounding name, a flux limiter is just a way of automatically switching the discretisation from second order to first order in the solution close to discontinuities. This helps to remove the wiggles close to discontinuities while keeping the solution accurate elsewhere.

Another highly popular CFD scheme is the JST scheme, named after the people who created it: Jameson-Schmidt-Turkel. It’s basically what we did with the b coefficient, except that they devised a set of rules to compute how much diffusion to add. Artificial viscosity is added to the second- and fourth-derivatives. And of course the rigor that went into it is at a whole other level.

One thing I really want to find out is how much smearing our “fudged” solution creates. How would the second order advection equation with artifical viscosity compare to a pure advection equation solved with upwind discretization? To answer this, I wrote a backward Euler solver and will look at the results in the next post.

What is y+ and why is it important?

In setting up cases for Reynolds averaged Navier-Stokes (RANS) analysis for CFD, we usually incorporate a prism layer along wall surfaces to capture boundary later effects in better detail. The technique in such cases would be to size the first layer of mesh for a target y+ value.

The image above I got from here shows an example. When searching for y+ online, one commonly finds the definition:

    \begin{align*} y^+ = \frac{y u_{\tau}}{\nu} \quad , \quad u_{\tau}^2 = \frac{\tau_w}{\rho} = \nu \frac{dU}{dy}\rvert_{y=0} . \end{align*}

\tau_w can be found with empirical relations, for example by using the 1/7 law, C_f = 0.026/\left(Re_x^{1/7}\right). Then, \tau_w = (1/2) \rho U^2 C_f. The Reynolds number Re_x is usually evaluated with the characteristic length like chord for an airfoil. There are some snazzy websites where you can easily compute the cell length for your desired y+ value like here and here. These are all very well and good, but what exactly is this y+ we are calculating and what dictates it’s value?

Before we proceed, it would be timely to visit the post on RANS and the mixing length model as we will be using knowledge on both of them for our discussion later on.

RANS flow in a channel

Suppose we have a two-dimensional pressure driven flow in a channel. The flow is steady and fully developed. This means that the \partial / \partial t and \partial / \partial x terms are all zero except for \partial p / \partial x. The flow is also assumed to be parallel to the channel, so V=0 and U = U(y). From this, we get for the x direction RANS:

    \begin{align*} \underbrace{ \frac{\partial U}{\partial t} }_{\partial / \partial t = 0} + \underbrace{ U \frac{\partial}{\partial x} U }_{\partial U / \partial x = 0}+ \underbrace{ V \frac{\partial}{\partial y} U }_{V = 0} = -\frac{1}{\rho} \frac{\partial}{\partial x} P + \underbrace{ \nu \frac{\partial^2}{\partial x^2} U }_{\partial U / \partial x = 0} + \nu \frac{\partial^2}{\partial y^2} U - \underbrace{ \frac{\partial}{\partial x} \left< u' u'\right> }_{\partial \left<u'u'\right> / \partial x = 0} - \frac{\partial}{\partial y} \left< u' v'\right> \\ \Rightarrow -\frac{1}{\rho} \frac{\partial}{\partial x} P + \nu \frac{\partial^2}{\partial y^2} U - \frac{\partial}{\partial y} \left< u' v'\right> = 0 . \end{align*}

In the y direction,

    \begin{align*} \underbrace{ \frac{\partial V}{\partial t} }_{V=0} + \underbrace{ U \frac{\partial}{\partial x} V }_{V=0}+ \underbrace{ V \frac{\partial}{\partial y} V }_{V=0} = -\frac{1}{\rho} \frac{\partial}{\partial y} P + \underbrace{ \nu \frac{\partial^2}{\partial x^2} V }_{V=0} + \underbrace{ \nu \frac{\partial^2}{\partial y^2} V }_{V=0}- \underbrace{ \frac{\partial}{\partial x} \left< u' v'\right> }_{\partial \left<u'v'\right> / \partial x = 0} - \frac{\partial}{\partial y} \left< v' v'\right>\\ \Rightarrow -\frac{1}{\rho} \frac{\partial}{\partial y} P - \frac{\partial}{\partial y} \left< v' v'\right> = 0 . \end{align*}

Differentiating the y direction equation in x, then integrating in y,

    \begin{align*} -\frac{1}{\rho} \frac{\partial}{\partial x} \frac{\partial}{\partial y} P - \frac{\partial}{\partial x} \frac{\partial}{\partial y} \left< v' v'\right> &= 0 \\ -\frac{1}{\rho} \frac{\partial}{\partial y} \frac{\partial}{\partial x} P - \frac{\partial}{\partial y} \underbrace{\frac{\partial}{\partial x} \left< v' v'\right>}_{\partial \left<v'v'\right> / \partial x = 0} &= 0 \\ \\ \int \frac{\partial}{\partial y} \frac{\partial}{\partial x} P dy &= const \\ \Rightarrow \frac{\partial}{\partial x} P &= const . \end{align*}

This is helpful because we can now integrate the x equation in y:

(1)   \begin{align*} \int _0^y -\frac{1}{\rho} \frac{\partial}{\partial x} P + \nu \frac{\partial^2}{\partial y^2} U - \frac{\partial}{\partial y} \left< u' v'\right> dy &= 0 \\ -\frac{1}{\rho} \frac{\partial P}{\partial x} y + \nu \frac{\partial U}{\partial y} \rvert_y - \nu \frac{\partial U}{\partial y} \rvert_0 - \left< u' v'\right> \rvert_y + \underbrace{\left< u' v'\right> \rvert_0}_{0} &= 0 . \end{align*}

Here we make the argument that the flow is symmetrical at y=H. Imagine we have a mirror at that location, and a fluid particle is moving towards it from the bottom. There will be another particle approaching the mirror from the top, and the y component velocities will cancel. This means V=v'=0 at y=H. Also, another more obvious observation is that \partial U / \partial y=0 at H. Now since we defined our integration limits to an arbitrary y, we can let it be H:

    \begin{align*} -\frac{1}{\rho} \frac{\partial P}{\partial x} H + \nu \underbrace{ \frac{\partial U}{\partial y} \rvert_H}_{0} - \nu \frac{\partial U}{\partial y} \rvert_0 &- \underbrace{ \left< u' v'\right> \rvert_H}_{0} + \underbrace{\left< u' v'\right> \rvert_0}_{0} = 0 \\ \Rightarrow \nu \frac{\partial U}{\partial y} \rvert_0 &= -\frac{1}{\rho} \frac{\partial P}{\partial x} H \\ \frac{u_{\tau}^2}{H} &= -\frac{1}{\rho} \frac{\partial P}{\partial x} . \end{align*}

This is a very useful result since it helps us get rid of the pressure term in equation 1. Substituting the results into equation 1:

(2)   \begin{align*} \frac{u_{\tau}^2}{H} y + \nu \frac{\partial U}{\partial y} - u_{\tau}^2 - \left< u' v'\right>&= 0 \\ \nu \frac{\partial U}{\partial y} - \left< u' v'\right> &= u_{\tau}^2\left(1-\frac{y}{H}\right) \end{align*}

Whew! After all these manipulations (and latex typing), we finally get an equation for our turbulent flow in a channel. Let’s zoom in into the case when y is very close to the wall (i.e. y/H \ll 1).

Laminar sublayer

As you can already guess, the first thing we can do is to neglect the y/H term when we are close to the wall. This still does not give us anything useful. At the wall, our velocities go to zero. So we can also assume \left<u'v'\right> \approx 0. This leaves us with:

    \begin{align*} \nu \frac{\partial U}{\partial y} &= u_{\tau}^2 . \end{align*}

Dividing both sides by u_{\tau}^2 and introducing the nondimensional velocities and wall distances (as defined by the terms in the parenthesis):

    \begin{align*} \frac{\partial \left(U / u_{\tau}\right)}{\partial \left(y u_{\tau}/\nu\right)} &= 1 \\ \frac{\partial u^+}{\partial y^+} &= 1 . \end{align*}

Integrating to an arbitrary y as we did before:

    \begin{align*} \int _0^y \frac{\partial u^+}{\partial y^+} dy^+ &= y^+\rvert_y \\ u^+\rvert_y - \underbrace{u^+\rvert_0}_0 &= y^+\rvert_y \\ \\ \therefore u^+ &= y^+ . \end{align*}

Finally, a useful result we get from all the math we did. What does this simple expression u^+ = y^+ mean? Despite its simplicity, it is a powerful result – it tells us that no matter what flow conditions you are in or how large the channel is (i.e. it can extend to infinity and we essentially get a flat plate), the velocity profile close to the wall will always fall into this distribution. It is a universal result. Note that since we ignored the Reynolds stresses, this velocity profile holds for laminar flows too. So now the question is how far away from the wall can we get before the results breakdown?

On the left, from a plot I got here, we see the dashed lines showing the different derived velocity profiles, while the solid line shows a blended velocity profile across all regions. In the diagram on the right, which I took from very a useful set of notes by Brennen, we see plots of actual results and pretty good match in results. Typically, the u^+ = y^+ profile is valid for y^+<5. Now we start to appreciate why we need to go through the trouble of sizing our prism layers properly.

Generally there are 2 approaches to RANS simulation: a “low y+” simulation or a “high y+” one. This corresponds to the nondimensional height of the first prism layer closest to the wall. For the “low y+” approach, we are calculating the entire laminar sublayer. To ensure that we capture the velocity profile properly, the guide is normally for y^+ \leq 1 for the first layer. However, this can result in a large cell count, which may be too expensive for some scenarios. Hence, turbulence modelers create wall functions such that the entire laminar sublayer up to the log law region is entirely modeled in the first prism layer, reducing the required mesh size. These typically require 30 \leq y^+ \leq 100, which can be a bit tricky to achieve in simulations and might require some trial and error.

Log-law layer

Alright, let’s move further up the boundary layer into the turbulent zone. to see where the restriction of 30 \leq y^+ \leq 100 for the “high y+” simulation come from. In this region above the laminar sublayer, Prandtl assumed that there is constant shear stress across the boundary layer. Recall that total shear stress (laminar plus Reynolds shear stress) is:

    \begin{align*} \tau &= \nu \frac{\partial U}{\partial y} - \left<u'v'\right> \end{align*}

Constant shear stress means that the y derivative of \tau is zero, so,

(3)   \begin{align*} \frac{\partial }{\partial y} \left[ \nu \frac{\partial U}{\partial y} - \left<u'v'\right> \right] = 0. \end{align*}

Now, applying the mixing length model,

    \begin{align*} \left<u'v'\right> &= \left< \ell \frac{\partial U}{\partial y} \times \ell \frac{\partial V}{\partial y} \right> \\ \left<u'v'\right> &= \ell^2 \left( \frac{\partial U}{\partial y} \right)^2 \end{align*}

We have assumed that the variation of v' follows that of u'. Prandtl made one further assumption that \ell \sim y, so we let \ell = \kappa y where \kappa is a constant. Now substituting our results into equation 3 and integrating to an arbitrary y,

    \begin{align*} \underbrace{\nu \frac{\partial U}{\partial y} \rvert_y}_{\ll 1} - \nu \frac{\partial U}{\partial y} \rvert_0 &- \kappa^2 y^2 \left( \frac{\partial U}{\partial y} \right)^2 \rvert_y + \underbrace{\kappa^2 y^2 \left( \frac{\partial U}{\partial y} \right)^2 \rvert_0}_{0} = 0 \\ \Rightarrow \kappa^2 y^2 \left( \frac{\partial U}{\partial y} \right)^2 &= -u_{\tau}^2 \\ \kappa y \frac{\partial U}{\partial y} &= -u_{\tau} \end{align*}

Nondimensionalizing the variables as usual, and absorbing the negative sign into \kappa,

    \begin{align*} \frac{u_{\tau}^2}{\nu} \frac{\partial \left( U / u_{\tau}\right)}{\partial \left( y u_{\tau}/\nu\right)} &= \frac{1}{\kappa y} u_{\tau} \\ \frac{\partial \left( U / u_{\tau}\right)}{\partial \left( y u_{\tau}/\nu\right)} &= \frac{1}{\kappa y u_{\tau} / \nu} \\ \\ \therefore \frac{\partial u^+}{\partial y^+} &= \frac{1}{\kappa y^+} . \end{align*}

After integration, we finally obtain:

    \begin{align*} u^+ = \frac{1}{\kappa}\log y^+ + B . \end{align*}

Investigations with experimental results over the years have shown that \kappa \approx 0.41 and B\approx 5.1. They generally hold for 30 \leq y^+ \leq 100, hence the restriction we have for “high y+” simulations. As a final note, I must mention that advances in CFD software is relaxing the constraint on these requirements, with options like enhanced wall functions in Ansys. However, they are mostly present in propriety software and might not be found in all codes.

Reynolds averaged Navier-Stokes and Prandtl’s mixing length model

As an engineer, almost all of the flows that I have been analyzing have been turbulent in nature. In the real world, we have no choice but to dabble in the dark art of turbulence modelling. Despite the empiricism involved, my experience with Reynolds averaged Navier-Stokes (RANS) is that they do give pretty useful results that we can pass on with certain confidence to the next stage of the design process. Of course, when we run into more unique cases with bluff bodies and shear layers etc., we treat them with more caution. Today we will take a closer look at the basics of RANS.

Derivation of RANS

Let’s start with the incompressible Navier-Stokes equation using the notation for decomposition u = U + u' and p=P+p':

    \begin{align*} \frac{d\vec{u}}{dt} + \left(\vec{u}\cdot \nabla\right) \vec{u} &= -\frac{1}{\rho} \nabla p + \nu \nabla^2 \vec{u} \\ \frac{\partial u_i}{\partial t} + u_j \frac{\partial}{\partial x_j} u_i &= -\frac{1}{\rho} \frac{\partial}{\partial x_i} p + \nu \frac{\partial^2}{\partial x_j^2} u_i \end{align*}

On the second line we are using tensor notation with implicit summation over the index j. Injecting the decomposition, and noting that averaging the fluctuating component brings it to zero (i.e. \left<u'_i\right>=0), but not multiples of it (i.e. \left<u'_iu'_j\right>\neq 0). Also, averages of mean components is just the mean component:

    \begin{align*} \left< \frac{\partial U_i}{\partial t} \right> + \overbrace{ \left< \frac{\partial u'_i}{\partial t} \right> }^0 + \left< U_j \frac{\partial}{\partial x_j} U_i \right> + \overbrace{ \left< U_j \frac{\partial}{\partial x_j} u'_i \right> }^0 &+ \overbrace{ \left< u'_j \frac{\partial}{\partial x_j} U_i \right> }^0 + \left< u'_j \frac{\partial}{\partial x_j} u'_i \right> \\ &= \left< -\frac{1}{\rho} \frac{\partial}{\partial x_i} P \right> + \underbrace{ \left< -\frac{1}{\rho} \frac{\partial}{\partial x_i} p' \right> }_0 + \nu \left< \frac{\partial^2}{\partial x_j^2} U_i \right> + \underbrace{ \nu \left< \frac{\partial^2}{\partial x_j^2} u'_i \right> }_0 ,\\ \Rightarrow \frac{\partial U_i}{\partial t} + U_j \frac{\partial}{\partial x_j} U_i + \left< u'_j \frac{\partial}{\partial x_j} u'_i \right> &= -\frac{1}{\rho} \frac{\partial}{\partial x_i} P + \nu \frac{\partial^2}{\partial x_j^2} U_i . \end{align*}

Applying the chain rule and continuity equation for the term in angular brackets,

    \begin{align*} \left< u'_j \frac{\partial}{\partial x_j} u'_i \right> = \left< \frac{\partial}{\partial x_j} \left(u'_j u'_i\right) \right> - \underbrace{\left< u'_i \frac{\partial}{\partial x_j} u'_j \right>}_0 \\ \Rightarrow \left< u'_j \frac{\partial}{\partial x_j} u'_i \right> = \frac{\partial}{\partial x_j} \left< u'_j u'_i\right> . \end{align*}

We arrive at the familiar form of RANS:

    \begin{align*} \Rightarrow \frac{\partial U_i}{\partial t} + U_j \frac{\partial}{\partial x_j} U_i = -\frac{1}{\rho} \frac{\partial}{\partial x_i} P + \nu \frac{\partial^2}{\partial x_j^2} U_i - \frac{\partial}{\partial x_j} \left< u'_j u'_i\right> . \end{align*}

The last term on the right is the Reynolds stress term. It is a well known problem where there is no known solution for these terms. There are many suggestions to close the equations, however.

Mixing length model

One of the earlier method is the mixing length model introduced by none other than Prandtl. He postulated that there is a certain length where a fluid parcel will keep its characteristics before mixing with the surrounding fluid. Another way to see it is to imagine a particle in the x-y plane moving around randomly, but within a circle of certain radius. The particle keeps it’s averaged characteristics constant at all times. Linearizing the averaged velocity about where the particle is, we see that to keep its averaged velocity constant, the fluctuation in velocity is equal to:

    \begin{align*} u' = \ell \lvert \frac{\partial U}{\partial x} \rvert . \end{align*}

Where \ell is the mixing length. This is more easily understood by visualizing the particle:

Principal component analysis

Principal component analysis (PCA) has many names depending on the field of application – proper orthogonal decomposition, eigenvalue decomposition etc. In fluid dynamics, among other things, it can be used in flow control. Let’s say we want to design an active jet along the top surface of a wing to control the wake behind it. For example, maybe we want to puff air into the flow at the right moment at high angles of attack to reduce the wake size and hence pressure drag.

What I mean is that we want is to reduce the area in blue in the picture above I took from a great paper by Taira et al. As we all know, the flow behind the airfoil at such angles of attack are highly oscillatory and dynamic. Typical CFD analysis like this require millions of cells. How can we describe such a highly complicated flow field then? With the CFD data, we can describe the flowfield as a huge vector of velocities at every cell. Such a vector will be in the order of 10^{6}\times 1. Obviously this would result in an unfeasible state space model for control.

Using our intuition, we know that the flow will be dominated by certain vortices shedding at certain frequencies. We can think of the flowfield as a superposition of a mean flow with such vortices. Thus we start by adding the big vortices to the mean flow field, slowing adding on smaller and smaller vortices which are less important to it.

Now our complicated flowfield can be represented by just a handful of major flow features! This is explained visually with images I pulled from the same paper by Taira et al. Put in another way, we have extracted the principal components of the flowfield based on a set of snapshots we have of the data. To understand how these principal components are extracted, we have to look at a more modest example of a vector sized 2\times 1.

Understanding PCA

In this example, we have a set of data represented by 2 states. For example, it can be a dataset of height and weight of each students in a class. Plotting such a data will look like:

Stacking the data vectors on top of each other, we have our data matrix \mathbf{X} which is a n\times p matrix. In our case, it is 5\times 2 in size. To find the principal components, we simply find the eigenvectors of \mathbf{X}^T\mathbf{X}, resulting in \mathbf{W} which is a matrix with all the eigenvectors stacked side by side (i.e. \mathbf{W}=\left[\vec{W}_1, \vec{W}_2, \cdots\right]). The eigenvector corresponding to the largest eigenvalue would represent the most dominant feature of the data set, the eigenvector corresponding to the second largest eigenvalue represents the second most dominant feature and so on. This where most of the explanations for PCA stops. However, to get a better understanding of what is going on, let’s see what does the first eigenvector represent.

To get the direction which most closely describes the data points, we want to reduce the angle \vec{W}_1 makes with the vector to all our data points \vec{X}_i, which is denoted \theta in the diagram above. This also means that we want the dot product \left(\vec{W}_1 \cdot \vec{X}_i\right) to be maximized for all our data points, resulting in the condition:

    \begin{align*} max \left[\sum_i  \left(\vec{W}_1 \cdot \vec{X}_i\right)^2 \right] . \end{align*}

Note that we square the dot product to get a magnitude. The equivalent matrix form is:

    \begin{align*} max \left[\vec{W}_1^T \left(\mathbf{X}^T\mathbf{X}\right) \vec{W}_1  \right] . \end{align*}

This would give us the direction of \vec{W}_1, but we need to further impose a condition that \|\vec{W}_1\|=1, so we get:

    \begin{align*} max \left[\frac{\vec{W}_1^T \left(\mathbf{X}^T\mathbf{X}\right) \vec{W}_1}{\vec{W}_1^T\vec{W}_1}  \right] . \end{align*}

The problem is now reduced to finding the maximum for a Rayleigh quotient: max\ R\left(\mathbf{A}, \vec{v}\right) = \left(\vec{v}^T\mathbf{A}\vec{v}\right)/\left(\vec{v}^T\vec{v}\right), where \mathbf{A} is symmetric. There is a proof to show that the maximum and minimum of the Rayleigh quotient is the largest and smallest eigenvalue of \mathbf{A}. This occurs when \vec{v} is the eigenvector corresponding to the largest or smallest eigenvalue. As a quick but not at all rigorous check, if we substitute any eigenvector of \mathbf{A}, \vec{\phi} into the Rayleigh quotient:

    \begin{align*} R\left(\mathbf{A}, \vec{\phi}\right) = \frac{\vec{\phi}^T\mathbf{A}\vec{\phi}}{\vec{\phi}^T\vec{\phi}} = \frac{\vec{\phi}^T\lambda\vec{\phi}}{\vec{\phi}^T\vec{\phi}} = \lambda . \end{align*}

So we see how the results for the maximum/minimum of the Rayleigh quotient comes from the eigenvectors. Of course we did not look into what happens when \vec{v} is not an eigenvector of \mathbf{A}, but that is outside of our scope here.

Okay, back to our discussion on PCA, we now know that \vec{W}_1 should be the eigenvector corresponding to the largest eigenvalue of \mathbf{X}^T\mathbf{X}.

So now we have found \vec{W}_1, shown as the blue arrow above. What about the remaining components? To do that we subtract the component \vec{X}_i have in the \vec{W}_1 direction for all our data points and apply the same procedure. The magnitude of \vec{X}_i in \vec{W}_1 is simply the dot product. To find this subtracted \vec{X}_i which we denote \hat{\vec{X}}_i, we get \hat{\vec{X}}_i = \vec{X}_i - \left(\vec{X}_i \cdot \vec{W}_1\right) \vec{W}_1.

So here we see that after subtraction, our black markers follow the dotted line to the red markers. To find the second component, we find the eigenvectors corresponding to the largest eigenvalue for the red markers:

Great! So now we have all our components for the dataset. As you can see with the blue arrow, we can describe our data pretty accurately in just that direction. Also, the subtraction process is akin to deflation in most eigenvector search algorithm. So the principle components which we are looking for is just all eigenvectors of \mathbf{X}^T\mathbf{X}, as frequently explained. In summary:

Example with faces

Anything that can be described as a vector of numbers can be analyzed with PCA. Be it the velocity field behind an airfoil, height and weight of students, or even images (since they are just a matrix of RGB values). A quick search on the internet yielded a database of human faces taken at the AT&T lab in Cambridge University, which I thought would make an interesting test case.

All the face images are shown in the picture above. Each image was saved as a 4096 \times 1 vector of numbers. Each entry in the vector represents a pixel, and the number represents the darkness of the pixel. Finding the first k principal components was as simple as calling [V, D] = eigs(X'*X, k) on Matlab. Plotting the first few dominant modes:

We see that the first dominant mode indeed captures the main features of the human face. It’s literally what a generic face look like. Also, note that the nose and lip areas are pretty blurred. Looking at the later modes, we see that glasses and different lip and nose shapes become more prominent. Hence as we add the modes, we are adding definition to those areas.

What about the results? The original pictures are on the left, while the reconstructed ones are on the right. Not bad! We have reduced our original 4096 \times 1 vector 256 times to a 16 \times 1 vector which describes the faces pretty accurately.

In this post, we have seen how to extract the main features in any dataset, and also more details about PCA beyond the steps and formula in application. We have also seen how it helps in flow control problems by reducing the dimensionality of the problem.

Linear solvers

In the implicit scheme, one ends up with a large sparse block matrix \mathbf{A} \vec{x} = \vec{b}. For a typical mesh size of ~2 million cells, it adds up to a 10 million \times 10 million system. One can choose to either solve the system directly (through Gaussian elimination) or iteratively. Direct solvers solve the system in a fixed number of steps and are very robust. However, the number of operations scale with \mathcal{O} (n^3) operations, which can be extremely costly. Typically one wants to keep n<10^{4} for direct solvers. Hence, iterative methods are the preferred choice in typical CFD applications. In addition, preconditioners are commonly used to accelerate the solution. We will first go through one of the more modern iterative method – Krylov methods, before talking about Algebraic multigrid (AMG) preconditioners.

Krylov methods

It took me awhile to wrap my head around the Krylov type linear solvers. This is because most sources start by saying the the method searches for a solution in the Krylov subspace:

    \begin{align*} \mathcal{K}_m(\mathbf{A}, \vec{r}_o) = span \{ \vec{r}_0, \mathbf{A}\vec{r}_0 , \mathbf{A}^2\vec{r}_0 , \dots , \mathbf{A}^{m-1}\vec{r}_0 \} \end{align*}

Where r_0 = \mathbf{A} \vec{x}_0 - \vec{b} with \vec{x}_0 being the first guess for \vec{x}. Now I do not have a mathematics degree so looking at the notation above gives me a headache. Only after further reading, I learnt that any vector which can be expressed as a linear combination of a set of vectors can be said to be in the span of said set. Bascially, if you can express your vector to be a linear combination of \{ \vec{r}_o, \mathbf{A}\vec{r}_o, \dots, \mathbf{A}^{m-1}\vec{r}_o \}, your vector is said to belong in the order m Krylov subspace of \mathbf{A} and \vec{r}_0. As a further illustration:

    \begin{align*} &\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 0 \end{bmatrix} \quad , \quad \vec{r} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \quad , \quad \Rightarrow \mathcal{K}_3(\mathbf{A}, \vec{r}) = span \left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 4 \\ 1 \end{bmatrix}, \begin{bmatrix} 9 \\ 4 \end{bmatrix} \right\},\\ &Let\ \vec{q} = 2\begin{bmatrix} 1 \\ 2 \end{bmatrix} + \begin{bmatrix} 4 \\ 1 \end{bmatrix} - \begin{bmatrix} 9 \\ 4 \end{bmatrix} = \begin{bmatrix} -3 \\ 1 \end{bmatrix} \in \mathcal{K}_3(\mathbf{A}, \vec{r}) . \end{align*}

So we can say that \vec{q} belongs to the order 3 Krylov subspace.

Okay, so now the next question is why of all possible places, do we want to search for \vec{x} in the Krylov subspace? One has to look at the minimal polynomial of a n\times n matrix \mathbf{A} to understand this. Suppose we were to keep multiplying \mathbf{A} and form a sequence:

    \begin{align*} p = \alpha_0 \mathbf{I} + \alpha_1 \mathbf{A} + \alpha_2 \mathbf{A}^2 + \alpha_3 \mathbf{A}^3 + \dots + \alpha_k \mathbf{A}^k. \end{align*}

When we reach a minimum k for a certain combination of \alpha such that the sequence goes to zero, k is said to be the degree of the minimal polynomial of \mathbf{A}. As an example, for a minimal polynomial of degree 3:

    \begin{align*} 0 = \alpha_0 \mathbf{I} + \alpha_1 \mathbf{A} + \alpha_2 \mathbf{A}^2 + \alpha_3 \mathbf{A}^3. \end{align*}

Multiplying throughout by \mathbf{A}^{-1}, then by \vec{b},

    \begin{align*} 0 = \alpha_0 \mathbf{A}^{-1} + \alpha_1 \mathbf{I} + \alpha_2 \mathbf{A} + \alpha_3 \mathbf{A}^2 \\ \Rightarrow \mathbf{A}^{-1} = -\frac{1}{\alpha_0} \left( \alpha_1 \mathbf{I} + \alpha_2 \mathbf{A} + \alpha_3 \mathbf{A}^2 \right) \\ \mathbf{A}^{-1}\vec{b} = -\frac{1}{\alpha_0} \left( \alpha_1 \vec{b} + \alpha_2 \mathbf{A}\vec{b} + \alpha_3 \mathbf{A}^2\vec{b} \right) \in \mathcal{K}_3(\mathbf{A}, \vec{b}). \end{align*}

Aha! So \vec{x} can be expressed in terms of \mathcal{K}_k(\mathbd{A}, \vec{b}) where k is the degree of minimal polynomial of \mathbf{A}. Now there is a theory that for a n \times n matrix, k \leq n. Hence the Krylov subspace is not a bad place to start looking for a solution. What it also means is that if we expand the space large enough until \mathcal{K}_n(\mathbf{A}, \vec{b}), we will arrive at an exact solution of \vec{x}. Of course, the point of Krylov methods is to avoid that scenario, and hopefully get a satisfactory answer for as small a subspace as possible.

As the subspace \mathcal{K}_m grows, we run into another issue where the new vectors become increasingly close to each other. If one were familiar the the power iteration method to find eigenvectors, one would recognise that \mathbf{A}^m \vec{b} approaches the dominant eigenvector of \mathbf{A} as m \rightarrow \infty. To circumvent this problem, we have to orthogonalize the vectors in \mathcal{K}_m.

To get a feel of what that means, imagine we have 2 vectors at a 0.1^{\circ} angle to each other (i.e. \vec{e}_1 = [1, 0]^T and \vec{e}_2 = [cos(0.1^{\circ}), sin(0.1^{\circ})]^T). We can describe any point in (x, y) as a linear combination of \vec{e}_1 and \vec{e}_2. For example, the vector \vec{q} = [0,1]^T can be expressed as -572.9572\vec{e}_1 + 572.9581\vec{e}_2 with \alpha_1 = -572.9572 and \alpha_2 = 572.9581. The coefficients were found by solving a system of equation:

    \begin{align*} \begin{bmatrix} \alpha_1 \\ \alpha_2 \end{bmatrix} = \begin{bmatrix} \vec{e}_1,\vec{e}_2 \end{bmatrix}^{-1} \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{align*}

So we see that the coefficients are sensitive to the concatenated \vec{e} matrix. As you can imagine, the closer \vec{e}_1 and \vec{e}_2 are to each other, the worse the condition number of the matrix. If we live in a world with unlimited floating points, this would not be a problem at all (and many other problems which keep mathematicians up at night would disappear too). However, we still live in a world where unicorns are nonexistent, so we have to think of ways to circumvent this. One way would be to orthogonalize the vectors (i.e. let \vec{e}_1 = [1, 0]^T and \vec{e}_2 = [0, 1]^T) since the condition number of an orthonormal matrix is 1. The new vectors are still able to describe any point in the (x,y) plane, we do not lose any territory with the change in basis.

Alright! Now that we have covered the gist of Krylov methods, we can take a closer look at one of the algorithms: Generalized Minimum Residual (GMRES) method. In this method, the basis vectors for the Krylov subspace are orthogonalized by the Arnoldi iteration. Stacking these orthogonalized basis vectors side by side results in the \mathbf{Q} matrix which is n \times m in size for the order m Krylov space. The new basis vector for the m+1 space is calculated from:

    \begin{align*} \mathbf{A} \mathbf{Q}_{m} = \mathbf{Q}_{m+1} \mathbf{\tilde{H}}_{m}. \end{align*}

\mathbf{\tilde{H}}_m is known as the Hessenberg matrix which is (m+1) \times m in size. Remember for an orthonormal matrix, \mathbf{Q}\mathbf{Q}^T = \mathbf{I} so our \mathbf{A}\vec{x} = \vec{b} system can be rewritten as:

    \begin{align*} \mathbf{A} &= \mathbf{Q}_{m+1} \mathbf{\tilde{H}}_{m} \mathbf{Q}_{m}^T \\ \Rightarrow \mathbf{\tilde{H}}_{m} \mathbf{Q}_{m}^T \vec{x} &= \mathbf{Q}_{m+1}^T \vec{b} \\ \mathbf{\tilde{H}}_{m} \vec{y} &= \mathbf{Q}_{m+1}^T \vec{b}. \end{align*}

With \vec{y} = \mathbf{Q}_{m}^T \vec{x}. Note that since \mathbf{\tilde{H}}_m is a (m+1) \times m matrix, we have an over defined system. We now solve a least squares problem for \vec{y}:

    \begin{align*} min \left\Vert \mathbf{\tilde{H}}_{m} \vec{y} - \mathbf{Q}_{m+1}^T \vec{b}. \right\Vert = min \left\Vert \vec{r}_m \right\Vert \end{align*}

Hence the name Minimum Residual. The estimated \vec{x}_m is recovered from \vec{y} by \vec{x}_m = \mathbf{Q}_{m} \vec{y}, so we see that \vec{y} holds the coefficients to the vectors in the orthogonalized \{ \vec{b}, \mathbf{A}\vec{b}, \mathbf{A}^2\vec{b}, \dots , \mathbf{A}^{m-1}\vec{b} \}. In the algorithm, one starts from m=1, then expand the space with every iteration. As m increases, the size of \mathbf{Q} and \mathbf{\tilde{H}} increases, potentially overtaking the memory requirements of the original sparse \mathbf{A} matrix. Hence, most algorithms restart the process for a large enough m with the latest \vec{x} estimate (usually around m = 30).

Algebraic multigrid preconditioners

As the name suggests, there are 2 main concepts to this: algebraic multigrid methods (AMG) and preconditioners. We will go through each concept individually before piecing them together.

We know that iterative linear solvers take an initial guessed solution \vec{x}_0 and improve on it to get the final answer. As a convenient guess, we can set \vec{x}_0 = \vec{0} or \vec{x}_0 = \vec{b}. This would result in a noisy residual vector \vec{r}_0 = \mathbf{A}\vec{x}_0 - \vec{b}. With the fixed iteration methods like Jacobi or Gauss Seidel, the noise in the residual gets damped down really quickly (usually within 2 to 3 iterations). After the first few iterations, the error becomes smooth and subsequent iterations drive it extremely slowly down to zero. Such methods are really effective for smoothing out spikes in the errors, but perform terribly thereafter.

Hence, the idea of multigrid methods is to transfer the smoothed error from one grid to another such that it becomes spiky again (albeit with reduced amplitude) in the other grid. The same smoothing operation is carried out in the new grid and the process continues recursively. After a few repetitions, the process reverses and we get a final small error vector in relatively little iterations. So how do we make the smooth error become spiky again?

There! As we can see from the diagram I took from Saad’s book, by removing every alternate point in the original grid, we get an oscillatory error in the new grid. The error is then smoothed (usually with only 2 to 3 Jacobi or Gauss Seidel iterations) in this new grid and transferred to another coarser grid and so on. The diagram shows the original geometric multigrid method, where the grids were defined physically. For an unstructured mesh used commonly, it becomes challenging to create physically coarser and coarser mesh. Hence methods to do this algebraically (without considering the physical manifestation of the mesh) were heavily explored, resulting in algebraic multigrid methods.

To understand this a little more, we define the error term \vec{e} in terms of the actual solution \vec{x} and estimated solution \hat{\vec{x}}:

    \begin{gather*} \vec{e} = \vec{x} - \hat{\vec{x}} , \\ \mathbf{A}\left(\vec{x}-\hat{\vec{x}}\right) + \mathbf{A}\hat{\vec{x}} = \vec{b} \\ \Rightarrow \mathbf{A}\vec{e} = \vec{b} - \mathbf{A}\hat{\vec{x}} = \vec{r}. \end{gather*}

We see that by solving the linear system with the residual vector \vec{r} on the RHS, we get the error term which leads us to the actual solution from \hat{\vec{x}}. As we cycle through the grids, we get a more refined estimate of \vec{e}. The trick in algebraic multigrid methods is in manipulating the system without the need of any geometric information, allowing us to treat the process as a “blackbox”, which is an engineer’s favorite approach. In the algebraic method, we transfer results from the fine mesh to the coarse one (restriction operation) by using a short and fat interpolation matrix known as the restriction matrix (\mathbf{I}_f^c):

    \begin{align*} \vec{r}^{c} = \mathbf{I}_f^c  \vec{r}^{f} . \end{align*}

In the reverse direction (prolongation operation), we use a tall and skinny interpolation matrix \mathbf{I}_c^f = \left(\mathbf{I}_f^c\right)^T termed the prolongation matrix. The restriction and prolongation matrices are just transposes of each other. Hence, when we want to cycle to the next coarse mesh, after dropping the fine-to-coarse notation for the short and fat matrix \mathbf{I}_f^c, we perform:

    \begin{align*} \left( \mathbf{I} \mathbf{A} \mathbf{I}^T \right) \vec{e}^{c} = \mathbf{I}  \vec{r}^{f} \\ \vec{e}^{f} = \mathbf{I}^T \vec{e}^{c} . \end{align*}

This is known as a Galerkin projection. We see that after solving the coarse system of equation, we get the error in the fine mesh by using the interpolation matrix again. The interpolation is thus a key ingredient in AMG methods. Here is a pseudo code that shows how AMG works between 2 grids:

x_f = smooth(A, b, x_0);              // Initial guess, 2-3 iterations on x_0
r_f = b - A*x_f;                      // Find residual
r_c = I*r_f;                          // Interpolate residual to coarse mesh
e_c = solve(I*A*I_transpose, r_c);    // Solve for error (or recursion)
e_f = I_transpose*e_c;                // Prolongate error back unto fine mesh
x_f += e_f;                           // Update guess
x_f = smooth(A, b, x_f);              // Additional smoothing, 2-3 iterations on x_f

At the finest grid, we perform a smoothing on the initial guess x_0. This means we “solve” A*x_f = b using fixed point iteration like damped Jacobi or Gauss-Seidel with x_0 as the initial guess. By “solve”, I mean we stop after only 2 to 3 iterations. Then, we interpolate the residual to the next mesh, before solving the projected system (I*A*I_transpose)*e_c = r_c. In a 2 grid set up, we solve the system for real. Once that is done, we go in the reverse direction: transfer the error term back to the fine mesh, update the solution, then perform another smoothing process with the updated solution as initial guess.

With more grids, we simply do a recursion at the solve part until we reach the coarsest grid. A direct solver is then used. At the end of the process, if the residuals are not satisfactory, the obtained x_f is used as the new x_0, and the process loops until convergence. This process is known as a V-cycle.

The algebraic part of the process comes in the form of constructing \mathbf{I}. When solving a PDE with finite difference/volume/element, a node’s neighbor will cause a nonzero entry in its own row of the coefficient matrix. This can be used heuristically to group nodes together for the coarse mesh. A classical method is the Ruge-Stuben method, where it uses the strength of connection between each node to the off-diagonal nodes in its own row as a grouping criteria. A very detailed set of notes by Falgout from the Lawrence Livermore National Laboratories explaining the method and AMG in general can be found here.

Automatic differentiation

Yay! I finally get to talk about one of my favourite topics today: Automatic Differentiation (AD). I was horrendous at calculus in school, and learning about this was akin to a life changing experience for me – I never have to worry about differentiating long and complicated mathematical expressions again.

Sample code

All you need is some programming knowledge about template classes and poof, you can find even the Jacobian of your vector functions without any laborious manual differentiation and the subsequent programming effort. For a function that maps 3 input vector to 3 output vector, you save yourself from the misery 9 times over!

We’ll jump straight into an example:

#include <iostream>
#include <Eigen/Dense>
#include <unsupported/Eigen/AutoDiff>

using namespace std;

template<typename T>
T scalarFunctionOne(T const & x) {
  return 2*x*x + 3*x + 1;
};

void checkFunctionOne(double & x, double & dfdx) {
  dfdx = 4*x + 3;
}

template<typename T>
T scalarFunctionTwo(T const & x, T const & y) {
  return 2*x*x + 3*x + 3*x*y*y + 2*y + 1;
};

void checkFunctionTwo(double & x, double & y, double & dfdx, double & dfdy ) {
  dfdx = 4*x + 3 + 3*y*y;
  dfdy = 6*x*y + 2;
}

int main () {
  
  double x, y, z, f, g, dfdx, dgdy, dgdz;
  Eigen::AutoDiffScalar<Eigen::VectorXd> xA, yA, zA, fA, gA;
  
  cout << endl << "Testing scalar function with 1 input..." << endl;
  
  xA.value() = 1;
  xA.derivatives() = Eigen::VectorXd::Unit(1, 0);
  
  fA = scalarFunctionOne(xA);
  
  cout << "  AutoDiff:" << endl;
  cout << "    Function output: " << fA.value() << endl;
  cout << "    Derivative: " << fA.derivatives() << endl;
  
  x = 1;
  checkFunctionOne(x, dfdx);
  
  cout << "  Hand differentiation:" << endl;
  cout << "    Derivative: " << dfdx << endl << endl;
  
  cout << "Testing scalar function with 2 inputs..." << endl;
  
  yA.value() = 1;
  zA.value() = 2;
  
  yA.derivatives() = Eigen::VectorXd::Unit(2, 0);
  zA.derivatives() = Eigen::VectorXd::Unit(2, 1);
  
  gA = scalarFunctionTwo(yA, zA);
  
  cout << "  AutoDiff:" << endl;
  cout << "    Function output: " << gA.value() << endl;
  cout << "    Derivative: " << gA.derivatives()[0] << ", " 
    << gA.derivatives()[1] << endl;
  
  y = 1;
  z = 2;
  checkFunctionTwo(y, z, dgdy, dgdz);
  
  cout << "  Hand differentiation:" << endl;
  cout << "    Derivative: " << dgdy << ", " 
    << dgdz << endl;
  
  return EXIT_SUCCESS;
  
}

And the output:

Testing scalar function with 1 input...
  AutoDiff:
    Function output: 6
    Derivative: 7
  Hand differentiation:
    Derivative: 7

Testing scalar function with 2 inputs...
  AutoDiff:
    Function output: 22
    Derivative: 19, 14
  Hand differentiation:
    Derivative: 19, 14

First off, I created 4 functions, 2 to test the automatic differentiation method, and 2 for checking the results. scalarFunctionOne has 1 input while scalarFunctionTwo has 2 inputs. Both have only one output. Template programming is key to AD. We have to make our functions able to accept a generic input type other than the usual double. This is really not an issue at all for most problems. If you had already written the function, all you need to do is to change the type declaration and voila, you are now able to tap unto the automatic differentiation package.

We are using Eigen’s automatic differentiation package here. Eigen is an amazing C++ math library. Being an include only library, you don’t have to worry about portability and distribution issues. The AutoDiffScalar type will handle the dual number component for us. I will explain a little more about dual numbers later.

We see that before passing the AutoDiffScalar types into our function, we need to set its value and derivative output index. For example, yA.derivatives() = Eigen::VectorXd::Unit(2, 0) mean that we want the derivative of the function with respect to y to be written in the first index of the output derivative vector. We can now pass the AutoDiffScalars into our function and let Eigen do the magic.

Dual numbers

As advertised, we see that AutoDiff gives us the expected results without the need of creating new functions. So why does this work? We start with complex numbers. Let’s take the square of (1+2i):

    \begin{align*} (1+2i)^2 = 1 + 4i +4i^2 = -3 + 4i . \end{align*}

Complex number arithmetic works like algebra, except for the additional rule that i^2 = -1. Dual numbers work in the same way, except that \epsilon^2 = 0, where i is replaced by \epsilon. Applying dual numbers to scalarFunctionOne:

    \begin{align*} f\left(x+\epsilon\right) &= 2\left(x+\epsilon\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2\left(x^2 + 2x\epsilon + \epsilon ^2\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2\left(x^2 + 2x\epsilon\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2 x^2 + 3x + 1 + \left(4x+3\right)\epsilon . \end{align*}

We see that the coefficient of epsilon is equal to the derivative of the function! One way dual numbers can be implemented in programming is to define a dual number class with 2 components: real component, and dual component. We then overload operators on dual numbers (i.e. addition, subtraction, multiplication etc.) such that we can enforce \epsilon^2 = 0. Hence, one way to look at AD is to appreciate that no matter how complex a function is, it can be broken down into a collection of simpler operations that we can overload with dual numbers.

Templates supercharged

The toughest challenge I faced with AD was with structuring the code to reduce repeatability. I had created a base class for the various flux calculations of the different boundary conditions (BC), and their input and output array sizes are determined at runtime depending on whether the problem is 2D or 3D. These flux calculators are also dependent on a Gas class which I created to solve the thermodynamic relations (i.e. perfect gas). An example is:

class FluxRoe : public Flux {
  
  public:
  
    void initialize(Config * config, Gas * gas, Mesh * mesh) 
    {
      // Save pointers to required information. Gas class used in
      // vector function, so it also needs to be a template class
      // for AD.
    }
    
    // Vector function
    void computeFlux
    (
      double const * leftVariable, 
      double const * rightVariable, 
      double const * normal,
      double * flux
    ) 
    {
      // Array sizes determined at runtime. Output to flux.
    }
    
  protected:
  
    Gas * gas_;
      
};

Eigen provides a convenient AutoDiffJacobian class, which calculates the jacobian of your vector function without the need for you to manually define the AutoDiffScalar like we did previously. The AutoDiffJacobian class required me to structure my vector function in a certain way. Namely, I needed the vector function to be called via the () operator, and several variables had to be defined. To add on to the woes, I would definitely need a base class for the flux jacobians. An example of a vector function to be used with AutoDiffJacobian is:

template <short N_IN, short N_OUT>
class VectorFunction {
  
  public:
  
    // Eigen needs this to determine no. of variables at compile time
    enum 
    {
      InputsAtCompileTime = N_IN,
      ValuesAtCompileTime = N_OUT
    };
    
    // Also needed by Eigen
    typedef Eigen::Matrix<double, N_IN, 1> InputType;
    typedef Eigen::Matrix<double, N_OUT, 1> ValueType;
  
    // Vector function
    template <typename T>
    void operator()
    (
      const Eigen::Matrix<T, N_IN, 1>& vIn, 
      Eigen::Matrix<T, N_OUT, 1>* vOut
    ) const 
    {
      // Compute function with vIn and output to vOut.
    }
    
};

int main () {

  Eigen::Matrix<double, 3, 1> vIn;
  Eigen::Matrix<double, 3, 1> vOut;
  Eigen::Matrix<double, 3, 3> mJacobian;

  Eigen::AutoDiffJacobian< VectorFunction<3, 3> > vectorFunAD;
  
  // Set values in vIn, vOut and mJacobian...
  
  vectorFunAD(vIn, &vOut, &mJacobian); // Voila! jacobian is in mJacobian.
  
  return EXIT_SUCCESS;
  
}

With the 2 code snippets above, we see the disconnect between what we have and what is required. Just thinking back about these issues make my head about to explode. I spent a good 3 weeks on coming up with a satisfactory solution (at least to me). The initial thought was to copy and paste code into new jacobian classes for individual BC and number of dimensions, but that would be a clumsy solution since many chunks of code would be repeated. For a more acceptable solution, I had to turn to static polymorphism and make use of template template parameters (yes, the word template is used twice).

The first major step was to make the Flux and Gas classes into template classes. i.e.

template<T>
Flux {
  public:
    computeFlux
    (
      T const * leftVariable, 
      T const * rightVariable, 
      double const * normal,
      T * flux
    ) 
    { 
      //... 
    }
  protected:
    Gas<T> * gas_;
};

The next major step was to apply static polymorphism to a wrapper class for Eigen:

template< template<typename> typename F, short N_LEFT, short N_RIGHT, short N_OUT >
class FluxWrapper {
  
  public:
    
    enum 
    {
      InputsAtCompileTime = N_LEFT+N_RIGHT,
      ValuesAtCompileTime = N_OUT
    };
    typedef Eigen::Matrix<double, InputsAtCompileTime, 1> InputType;
    typedef Eigen::Matrix<double, ValuesAtCompileTime, 1> ValueType;
    
    void setNormal (double const * normal) {
      normal_ = normal;
    }
    
    template <typename T>
    void operator()
    (
      const Eigen::Matrix<T, N_IN, 1>& vIn, 
      Eigen::Matrix<T, N_OUT, 1>* vOut
    ) const 
    {
      F<T> f; // Flux calculator instantiated with AutoDiff variable
      
      f.computeFlux(
        &( vIn[0] ),
        &( vIn[N_LEFT] ),
        normal_,
        &( (*vOut)[0]) );
      
    }
    
  protected:
  
    double const * normal_;
    
};

The various wrapped flux calculators are then created with:

typedef FluxWrapper<FluxRoe, 4, 4, 4> FluxRoeWrappedTwoDim;
typedef FluxWrapper<FluxRoe, 5, 5, 5> FluxRoeWrappedThreeDim;

Finally, I use static polymorphism again to create my individual jacobian classes which are derived from a base class:

template <typename F> 
class FluxJacobianWrapper : public FluxJacobian {
  public:
    void computeJacobian
    (
      double const * leftVariable,
      double const * rightVariable,
      double const * normal,
      double * leftJacobian,
      double * rightJacobian
    )
    {
      Eigen::Matrix<double, F::InputsAtCompileTime, 1> vIn;
      Eigen::Matrix<double, F::ValuesAtCompileTime, 1> vOut;
      Eigen::Matrix<double, F::ValuesAtCompileTime, F::InputsAtCompileTime> mJacobian;
    
      Eigen::AutoDiffJacobian<F> fluxAD; // Create flux class for AD
      
      fluxAD.setNormal(normal);
      
      // Map double array to Eigen storage type (MatrixXd etc.)...
      
      fluxAD(vIn, &vOut, &mJacobian);
      
      // Map jacobian values to double array...
      
    }
};

The flux jacobians are used by:

typedef FluxJacobianWrapper<FluxRoeWrappedTwoDim> FluxJacobianRoeTwoDim;
typedef FluxJacobianWrapper<FluxRoeWrappedThreeDim> FluxJacobianRoeThreeDim;

Great! Now I just need to code the various combination that may happen once and the appropriate code will be generated for me.

Characteristic form of the Euler equations

When I first learnt about eigenvalues and eigenvectors, I never thought that it would actually be useful. Nowadays I can’t seem to avoid them even if I wanted to – Linear solvers? Don’t forget about the spectral radius! Control systems? Can’t go without the natural frequencies! Euler equations? Guess where do the wavespeeds come from!

Eigenvalues and eigenvectors intuitively

In order to understand the role of eigenvalues and eigenvectors in fluid dynamics, lets turn to it’s sibling structural dynamics for awhile. The canonical introductory example in structural dynamics is the problem of 2 weights connected by springs as shown in the diagram I pulled from here.

The system can be described with constant mass and stiffness matrices by: \mathbf{M}\ddot{\vec{x}} + \mathbf{K} \vec{x} = 0 when no forces are applied. \mathbf{K} has off diagonal terms so the system is coupled. Rearranging, we get \ddot{\vec{x}} + \mathbf{M}^{-1}\mathbf{K} \vec{x} = 0. By diagonalizing the matrix, one gets:

    \begin{align*} \ddot{\vec{x}} + \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^{-1} \vec{x} &= 0 \\ \mathbf{Q}^{-1} \ddot{\vec{x}} + \mathbf{\Lambda} \mathbf{Q}^{-1} \vec{x} &= 0 \\ \ddot{\vec{y}} + \mathbf{\Lambda} \vec{y} &= 0 . \end{align*}

We have made the transformation \vec{y} = \mathbf{Q}^{-1}\vec{x}. Since \mathbf{\Lambda} is a diagonal matrix, we get an uncoupled system of equations which is now trivial to solve. By projecting the state vector into the eigenvectors, we are working in components that are independent from each other. Collectively, these independent components give the system its characteristics, like mode shapes in beams or principle components of a reduced system. Note that I am using the term characteristic very loosely here since we are just trying to get an intuitive feel.

Fluid dynamics

So now we’ve seen how to uncouple a system of equations, we can simply apply the same method to the fluid equations and we’re done! Simple, no? Okay, let’s see where does that lead us with the Euler equations. Consider the one-dimensional case with conserved variables \vec{W}:

    \begin{gather*} \frac{\partial \vec{W}}{\partial t} + \frac{d \vec{F}}{d \vec{W}} \frac{\partial \vec{W}}{\partial x} = \frac{\partial \vec{W}}{\partial t} + \mathbf{A} \frac{\partial \vec{W}}{\partial x} = 0 \quad , \quad \mathbf{A} = \mathbf{T}\mathbf{\Lambda}\mathbf{T}^{-1} \\ \Rightarrow \frac{ \mathbf{T}^{-1} \partial \vec{W} }{\partial t} + \mathbf{\Lambda} \frac{ \mathbf{T}^{-1} \partial \vec{W} }{\partial x} = 0. \end{gather*}

So this leads us to the characteristic form with d\vec{C} = \mathbf{T}^{-1} d\vec{W}. Note that \mathbf{T} is not a constant matrix so we can’t bring it into the differential operator. Then the characteristic form looks like:

    \begin{align*} \frac{\partial \vec{C} }{\partial t} + \mathbf{\Lambda} \frac{\partial \vec{C} }{\partial x} = 0. \end{align*}

Despite the simple looking form of the equation, the set of equation in full is:

    \begin{align*} \left(\frac{\partial \rho}{\partial t} - \frac{1}{a^2}\frac{\partial P}{\partial t}\right) + u \left( \frac{\partial \rho}{\partial x} - \frac{1}{a^2}\frac{\partial P}{\partial x} \right) &= 0 \\ \left( \frac{\partial u}{\partial t} + \frac{1}{\rho a}\frac{\partial P}{\partial t} \right) + \left(u+a\right)\left(\frac{\partial u}{\partial x} + \frac{1}{\rho a}\frac{\partial P}{\partial x} \right)&= 0 \\ \left( \frac{\partial u}{\partial t} - \frac{1}{\rho a}\frac{\partial P}{\partial t} \right) + \left(u-a\right)\left(\frac{\partial u}{\partial x} - \frac{1}{\rho a}\frac{\partial P}{\partial x} \right)&= 0 . \end{align*}

It’s still not too bad, we now have 3 advection equations which obey:

    \begin{align*} d\rho - \frac{dP}{a^2} = 0 \quad &for \quad \frac{dx}{dt} = u \\ du + \frac{dP}{\rho a} = 0 \quad &for \quad \frac{dx}{dt} = u+a \\ du - \frac{dP}{\rho a} = 0 \quad &for \quad \frac{dx}{dt} = u-a . \end{align*}

Integrating the left hand side of each equation gives us the characteristic variables which remain constant along the curve dx/dt = \lambda where \lambda is either u, (u+a) or (u-a). The curves are called wave fronts since they are the path where the variables propagate. Let’s try to get the characteristic variable for the middle equation for example,

    \begin{align*} \int du + \int \frac{dP}{\rho a} = const \quad &for \quad \frac{dx}{dt} = u+a \\ u + \int \frac{dP}{\rho a} = const \quad &for \quad \frac{dx}{dt} = u+a . \end{align*}

It seems like we’ve run into some trouble with the integration for the second term on the left. What about applying the perfect gas relation?

    \begin{align*} \int \frac{dP}{\rho a} = \sqrt{\frac{R}{\gamma}} \int \frac{dP}{P / \sqrt{T}}. \end{align*}

No dice. In fact, for the characteristic form of the Euler equation, only the first equation can be integrated to entropy. The other characteristic variables do not have a physical manifestation. In addition, the wave speeds u, (u+a) and (u-a) are also dependent on the solution, which does not lend a simple solution like we had previously thought.

So we’ve gone through all these efforts for nothing? Well, not really. The characteristic variables may not be integratable into something physical, but at least the wave fronts tell us something. For this one-dimensional problem, we see that information in the flow moves with u, (u+a) and (u-a) wave speeds.

Now imagine for a moment that you are swimming alone in a large pool. As you pull your strokes, you generate ripples in the water. Looking ahead, you have a wave moving forward away from you and if you were to look back, you would also see a wave moving away behind you. Now outside of the ripple, the water is still, but within the ripple, the water has been influenced by your presence. Suppose a friend were to join you a little later and he’s caught up with your backward propagating ripple. You can imagine that the shape of his ripples will be affected by your presence, it is dependent on the state of the water surrounding him before he pulls his strokes.

This is very roughly what the characteristics tell us about the fluid. The characteristic variables are like the ripples, and they are broadcast at the speed of sound relative to the flow (i.e. (u+a) and (u-a)), as well as the actual flow speed u. For a subsonic forward flow (u < a), we have 2 characteristics moving forward and one moving back. For example when we have change in area for 1D flow, the air particles where the change occurs sends 2 characteristics downwind and one upwind. It is because of the one characteristic being sent at (u-a) that allows the flow upwind of the area change to adjust gradually, accommodating for the change in area.

Another point to note here, as was already pointed out, is that \mathbf{T} is not constant. Hence we run into difficulties integrating the characteristic variables. If we can approximate a constant \mathbf{T}_0, we have for the second term in \vec{C}:

    \begin{gather*} c_2 = \int du + \frac{1}{\rho_0 a_0} \int dP = u + \frac{P}{\rho_0 a_0} = const \\ \Rightarrow \Delta c_2 = \Delta u + \frac{\Delta P}{\rho_0 a_0} . \end{gather*}

This would come in handy when we look at flux calculation later.