Inverting PDEs with adjoints

Most books and lectures spend a great deal of time on teaching methods to solving PDEs – finding a solution given the equation and coefficients. However in the real world we normally don’t have clear knowledge on the parameters to the model. In fact, some times we might perform experiments to measure the solution of the PDE and we are supposed to get the coefficients from the results – an inverse problem.

One such example is the calculation of the earth’s electrical resistance in the subsurface. Below is a figure taken from a paper here.

Where the electrical potential of the earth (\phi) is governed by the Poisson equation:

    \begin{align*} \nabla \cdot \left(\sigma \nabla \phi\right) = s \end{align*}

In the example above, the resistance electrical potential is measured at probes inserted into the ground (\phi_{obs}) and a mapping of the conductivity \sigma is to be found. So we have the solution to the PDE at discrete probe locations and the challenge is to find the coefficient \sigma. One might immediately think of a least squares problem of the sorts:

    \begin{align*} \min_{\sigma} &\sum_i \left( \phi_i - \phi_{obs,i} \right)^2 \\ \text{subject to } &\nabla \cdot \left(\sigma \nabla \phi\right) = s \end{align*}

In order to solve such a minimization, one could use gradient descent and this would require use to calculate the gradient of cost function with the conductivity. This is where adjoint methods come in.

Adjoint methods
In a more general case, the minimization problem is:

    \begin{align*} \min_{\beta}\ &J\left(\vec{u},\vec{\beta}\right) \\ \text{subject to}\ &N\left(\vec{u},\vec{\beta}\right) = 0 \text{ in}\ \Omega \\ \text{with}\ &a \frac{\partial \vec{u}}{\partial n} + b \vec{u} = g \text{ in}\ \partial \Omega \end{align*}

Where we are minimizing a cost function J with the design variables \vec{\beta}, and \vec{u} is the primal variable governed by the PDE operator N with a general robin condition imposed on the boundaries. The derivative (or sensitivity) \frac{d J}{d \vec{\beta}} would have to be calculated.

If we only have one design variable, we can simply perturb it and solve the PDE twice to get the sensitivity using finite difference as an example. However, \vec{\beta} usually contains a large number of elements and simply using finite difference becomes intractable. For example, with the earth conductivity problem, one may want to minimize the problem with a unique \sigma at every discrete mesh node and subsequently the length of \vec{\beta} becomes as large as the total number of nodes.

Using a mathematical trick, we can solve a (different) PDE once, then use the result to obtain as many sensitivities we like with minimal additional cost. This mathematical trick can be done to either the continuous equation or discretised equations and they are known as continuous adjoint formulation and discrete adjoint formulation respectively. The discrete adjoint formulation is easier to grasp and we dive into it with an example. We will also look at continuous adjoint formulation later and see where it gets complicated.

Discrete adjoint formulation
Through the discretisation process, the original PDE is reduced to a set of residuals which we would like to drive to zero:

    \begin{align*} \vec{R}\left(\vec{u},\vec{\beta}\right) = 0 \end{align*}

This means that the residual should be a constant of zero, and any derivative of a constant is zero. Linearising this residual with respect to the i^{th} \beta:

    \begin{align*} \frac{d \vec{R}}{d \beta_i} = 0 &= \frac{\partial \vec{R}}{\partial \beta_i} + \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right) \frac{\partial \vec{u}}{\partial \beta_i} \\ \Rightarrow \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right) \frac{\partial \vec{u}}{\partial \beta_i} &= -\frac{\partial \vec{R}}{\partial \beta_i} \end{align*}

From the objective function, the sensitivity is:

    \begin{align*} \frac{d J}{d \beta_i} = \frac{\partial J}{\partial \beta_i} + \left(\frac{\partial J}{\partial \vec{u}}\right)^\intercal \frac{\partial \vec{u}}{\partial \beta_i} \end{align*}

This is what we are really after. It tells us how the objective function changes with the design variable and is used in the minimization. Note that the difficulty in this whole thing is calculating the second term \partial \vec{u}/ \partial \beta_i because as we have seen from the first equation, this requires us to solve the linear system for n_\beta different RHS.

Now we introduce a variable \vec{\phi} which is defined as:

    \begin{align*} \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)^\intercal \vec{\phi} = \frac{\partial J}{\partial \vec{u}} \end{align*}

\vec{\phi} is going to be extremely helpful to us because the problematic second term is now:

    \begin{align*} \left(\frac{\partial J}{\partial \vec{u}}\right)^\intercal \frac{\partial \vec{u}}{\partial \beta_i} = \left[\left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)^\intercal \vec{\phi} \right]^{\intercal} \frac{\partial \vec{u}}{\partial \beta_i} = \vec{\phi}^{\intercal} \left(\frac{\partial \vec{R}}{\partial \vec{u}}\right)\frac{\partial \vec{u}}{\partial \beta_i} = \vec{\phi}^{\intercal} \left(-\frac{\partial \vec{R}}{\partial \beta_i}\right) \end{align*}

Therefore the sensitivity now becomes:

    \begin{align*} \frac{d J}{d \beta_i} = \frac{\partial J}{\partial \beta_i} + \vec{\phi}^{\intercal} \left(-\frac{\partial \vec{R}}{\partial \beta_i}\right) \end{align*}

Where \partial J / \partial \beta_i and \partial \vec{R} / \partial \beta_i are easy and cheap to calculate, and we only need to solve the linear system once for \vec{\phi} which is also known as the adjoint solution.

Discrete adjoint example
Now we move on to the more interesting stuff with an example. Suppose we have a 2D metal plate and we want to find it’s conductivity which varies with temperature. We conduct an experiment where it is subjected to a known heat flux (i.e. we can heat up the middle of the plate) and we measure the temperature on the entire plate. The problem is thus:

    \begin{align*} \min_{k} J &= \min_{k} \sum_i \left( T_i - T_{obs,i} \right)^2 \\ \text{subject to } &\nabla \cdot \left(k \nabla T \right) = q \text{ in}\ \Omega\\ \text{with}\ T &= T_{bound} \text{ on}\ \partial \Omega \end{align*}

Discrete adjoint example truth values
We create some truth values with a known conductivity distribution.

Where the thermal conductivity follows

    \begin{align*} k = 13221 \times T^{-1} + 27 \end{align*}

Solving the heat equation using finite differences on Python gives us the temperature field which we will use as our “experiment” result:

And the conductivity field we can use to assess our inverted results:

Discrete adjoint example solution method
Now on to solving the minimization problem. In this example, we allow k to vary at every grid point. The solution process is made up of 2 loops: an inner loop to solve the nonlinear system of equation for the primal solution using Newton’s method, and an outer loop where the adjoint solution is found and used to update the conductivity at every grid point.

An overview of the solution process is:

Initialize T and k 
Start adjoint loop
    Start Newton's iteration loop
        Assemble dR/dT matrix and R vector
        Solve for delta T and update T
    Compute dJ/dT
    Solve for adjoint solution using transpose of dR/dT matrix calculated before
    Compute dR/dk matrix
    Loop through nodes
        Compute dJ/dk for each node
        Update k with -dJ/dk

Newton’s method would require calculation of the Jacobian matrix \partial \vec{R}/\partial \vec{T}, while to compute the sensitivity we require the calculation of the term \partial \vec{R}/\partial \vec{k}. Luckily we can compute these terms using automatic differentiation on a single function. We define the following function to compute the residual on an internal grid point using central differences:

def compute_R_int_discrete_k(input_vec): # input_vec = [T_i,j    T_i-1,j    T_i+1,j    T_i,j-1    T_i,j+1    q    k_i,j    k_i-1,j    k_i+1,j    k_i,j-1    k_i,j+1]
  R_int_term_1 = input_vec[6]*\
      (input_vec[2] - 2*input_vec[0] + input_vec[1])/(DELTA_H**2) + 
      (input_vec[4] - 2*input_vec[0] + input_vec[3])/(DELTA_H**2)
  R_int_term_2 = \
      (input_vec[8] - input_vec[7])/(2*DELTA_H)*\
      (input_vec[2] - input_vec[1])/(2*DELTA_H)
      (input_vec[10] - input_vec[9])/(2*DELTA_H)*\
      (input_vec[4] - input_vec[3])/(2*DELTA_H)
  R_int_term_3 = input_vec[5]
  return R_int_term_1 + R_int_term_2 + R_int_term_3

The required input values are passed in as a single array to facilitate the use of grad function from the autograd package. Details of the package can be found here. The function to calculate the derivative is then created:

compute_R_int_discrete_k_grad = grad(compute_R_int_discrete_k)

The function compute_R_int_discrete_k_grad returns a numpy array of the derivative of R with respect to the various inputs in input_vec. The derivative is calculated with one line of code. Perfect.

We now have one more quantity to calculate to solve for \vec{\phi}, and that is \partial J/\partial \vec{T}. This is simply:

    \begin{align*} \frac{\partial J}{\partial \vec{T}} = 2\left(\vec{T}-\vec{T}_{obs}\right) \end{align*}

Discrete adjoint example results
After trying a few “learning rates”, we are able to get the objective function to reduce nicely and achieve a RMS error of less than 4 degrees.

Below is a plot of the temperature and conductivity field.

It is difficult to get the conductivity to be as smooth as the truth due to k being varied individually hence introducing spikes in the first derivative.

However in the plot above, we see that the optimised conductivity were very close to the truth.

One way we can smoothen the inverted field is to use basis functions to describe k. This function can either be functions of temperature or space (i.e. let k=\sum \beta_i \psi_i\left(T\right) or k=\sum \beta_i \psi_i\left(x, y\right)). The design variable is now \beta and minor changes would have to be made to the calculation of the R and J. With the use of automatic differentiation, the calculation of \partial R/\partial \beta and \partial J/\partial \beta is again trivial.

The complete set of Python script can be found here.

Lagrange multipliers
In the derivation of the discrete adjoint equation earlier, I brought up the equation \left(\partial \vec{R}/\partial \vec{u}\right)^\intercal \vec{\phi}=-\partial J/\partial \vec{u} out of the blue, and it magically solved all our problems. Now let’s dig a little deeper and see where this term comes from.

In a constrained optimisation problem, we can form an augmented cost function known as the Lagrangian:

    \begin{align*} J_{a} = J + \vec{\phi}^\intercal \vec{R} \end{align*}

The additional term does not do anything to the cost function since \vec{R}=0 everywhere and we can choose any vector for \vec{\phi} which is the Lagrange multiplier. Taking the derivative as before,

    \begin{align*} \frac{d J_a}{d \beta} &= \frac{\partial J}{\partial \vec{u}}\frac{d \vec{u}}{d \beta} + \frac{\partial J}{\partial \beta} + \vec{\phi}^\intercal \left(\frac{\partial \vec{R}}{\partial \vec{u}}\frac{d \vec{u}}{d \beta} + \frac{\partial \vec{R}}{\partial \beta}\right) \\ &= \left(\frac{\partial J}{\partial \vec{u}} + \vec{\phi}^\intercal \frac{\partial \vec{R}}{\partial \vec{u}}\right)\frac{d \vec{u}}{d \beta} + \frac{\partial J}{\partial \beta} + \vec{\phi}^\intercal \frac{\partial \vec{R}}{\partial \beta} \end{align*}

Ah! We see that if we make the terms in the brackets on the last line go to zero, we arrive at the same alternate calculation for the sensitivity as we had derived previously. Hence, by solving for a particular vector \vec{\phi}, we totally change the way d J/d \beta is calculated, saving a whole lot of computational effort. This is where the elegance of adjoint methods shine through for me.

Continuous adjoint formulation
The same concept is used in the derivation of the continuous adjoint formulation, however, there are a few additional steps. We will look at the plate conductivity example problem in 1D to derive the continuous adjoint formulation. First, we start with the cost function in the continuous formulation:

    \begin{align*} J &= \int_{\Omega} cT dx \\ J_a &= \int_{\Omega} cT dx + \int_{\Omega} \phi \frac{\partial}{\partial x} \left( k \frac{\partial T}{\partial x} \right) - f dx \end{align*}

Where c can be a penalty function for example. We have diverged slightly from the example problem to make the derivation simpler. Taking the variation of the augmented cost function and assuming that the partial derivative commutes to inside the integral,

    \begin{align*} \delta J_a &= \frac{\partial J_a}{\partial k} \delta k + \frac{\partial J_a}{\partial T} \delta T \\ \delta J_a &= \int_{\Omega} c \delta T dx + \int_{\Omega} \phi \left[\frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) + \frac{\partial}{\partial x} \left( k \frac{\partial \delta T}{\partial x} \right)\right] dx \end{align*}

Now we want to move \delta T outside the derivative, so we apply integration by parts twice to the last term:

    \begin{align*} \int \phi \frac{\partial}{\partial x} \left( k \frac{\partial \delta T}{\partial x} \right) dx &= \left[\phi k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \int k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial \delta T}{\partial x}\right) dx \\ &= \left[\phi k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \left[ \delta T k \left(\frac{\partial \delta \phi}{\partial x}\right)\right]_{x_0}^{x_1} + \int \frac{\partial }{\partial x}\left( k \frac{\partial \phi}{\partial x}\right) \delta T dx \end{align*}

Substituting it back into the augmented cost function,

    \begin{align*} \delta J_a &= \int_{\Omega} \left[ c + \frac{\partial }{\partial x}\left(k \frac{\partial \phi}{\partial x}\right) \right] \delta T dx + \int_{\Omega} \phi \frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) dx + boundary\ terms \end{align*}

Now, this step must be familiar because we are going to eliminate the term in the square brackets, giving us the adjoint equation:

    \begin{align*} \frac{\partial }{\partial x}\left( k \frac{\partial \phi}{\partial x}\right) = - c \end{align*}

This makes \partial J_a / \partial T = 0 since we are making the terms paired with \delta T zero. Now we want to get \delta k out of the derivative so you guessed it, we will have to integrate by parts the last term once.

    \begin{align*} \int \phi \frac{\partial}{\partial x} \left( \delta k \frac{\partial T}{\partial x} \right) dx &= \left[\phi \delta k \frac{\partial T}{\partial x}\right]_{x_0}^{x_1} - \int \delta k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right) dx \end{align*}


    \begin{align*} \delta J = \int_{\Omega} \delta k \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right)dx + boundary\ terms \end{align*}

Ignoring the boundary terms for now, this means that the scalar objective function varies with k integrated over \left(\partial \phi / \partial x\right) \left(\partial T / \partial x\right) throughout the domain. Hence the local variation of J with k is:

    \begin{align*} \frac{d J}{d k} = \left(\frac{\partial \phi}{\partial x}\right) \left(\frac{\partial T}{\partial x}\right) \end{align*}

Back to the boundary terms, we see that 3 terms popped out through the integration by parts process. In order for us to use the above relation to obtain the sensitivity, we have to pick appropriate boundary conditions for \phi such that the terms indeed vanish. This is similar to the natural boundary conditions that appear in FEM formulations, except that this time we have to handle them explicitly.

Whew, that was a lengthy process, but we finally derived the continuous adjoint equations. Note that this is only for a single variable Poisson equation in 1D. One can imagine how much more complicated the derivation can get with more complicated PDEs like the coupled RANS equation with turbulence models for example. For every new equation, it’s corresponding adjoint would have to be derived and implemented separately. The treatment of the boundary conditions is also another matter that has to be handled delicately.

Note that in the discrete formulation, there was no need to derive specific boundary conditions for the adjoint solution. With the naive implementation through automatic differentiation, we were able to code up an example relatively quickly. However this approach would not scale up as the automatic differentiation uses up much more memory and computational resource than the corresponding continuous implementation. Hence, the choice between discrete and continuous adjoint formulation is a balance between programming effort and computational effort.

More adjoint examples
Adjoints are also used in CFD to do some really interesting things. Traditionally, they have been used to drive shape optimization and this functionality is available in most commercial and open source codes. For example, the objective function can be set to the aerodynamic efficiency of the model, and the sensitivity to the surface deformation is calculated. This involves a mesh morpher and additional terms in the sensitivity computation coming from the mesh deformation method. An example taken from here shows the results of running an optimisation cycle using adjoint methods on the back of a car. The baseline shape is on the right half and the optimised shape on the left. We can see that the optimised solution has a narrower boat tail and greater pressure recovery.



Another more recent example of adjoints in CFD is in the field of turbulence modelling. Machine learning has been applied to RANS models in order to improve their prediction over a larger range of flow regimes. A great paper is found here. The first step to training these models is gathering data and these data are obtained from field inversion – much like what we are doing here.

An example of the results from a tuned turbulence model from the quoted paper is shown above. The model was improved from the green line to the red line in the figure on the left. The middle figure shows the flowfield based on baseline RANS model, while the right most figure shows the flow field based on machine learned turbulence model. The machine learned model was able to predict post stall characteristics of the airfoil accurately which is a major deficiency in most current RANS models.

Another interesting fact: adjoint and automatic differentiation techniques have even found their way into finance, where it is used to speed up computations of the sensitivities of swap options to asset prices and volatility.

Leave a reply

Your email address will not be published.