Automatic differentiation

Yay! I finally get to talk about one of my favourite topics today: Automatic Differentiation (AD). I was horrendous at calculus in school, and learning about this was akin to a life changing experience for me – I never have to worry about differentiating long and complicated mathematical expressions again.

Sample code

All you need is some programming knowledge about template classes and poof, you can find even the Jacobian of your vector functions without any laborious manual differentiation and the subsequent programming effort. For a function that maps 3 input vector to 3 output vector, you save yourself from the misery 9 times over!

We’ll jump straight into an example:

#include <iostream>
#include <Eigen/Dense>
#include <unsupported/Eigen/AutoDiff>

using namespace std;

template<typename T>
T scalarFunctionOne(T const & x) {
  return 2*x*x + 3*x + 1;
};

void checkFunctionOne(double & x, double & dfdx) {
  dfdx = 4*x + 3;
}

template<typename T>
T scalarFunctionTwo(T const & x, T const & y) {
  return 2*x*x + 3*x + 3*x*y*y + 2*y + 1;
};

void checkFunctionTwo(double & x, double & y, double & dfdx, double & dfdy ) {
  dfdx = 4*x + 3 + 3*y*y;
  dfdy = 6*x*y + 2;
}

int main () {
  
  double x, y, z, f, g, dfdx, dgdy, dgdz;
  Eigen::AutoDiffScalar<Eigen::VectorXd> xA, yA, zA, fA, gA;
  
  cout << endl << "Testing scalar function with 1 input..." << endl;
  
  xA.value() = 1;
  xA.derivatives() = Eigen::VectorXd::Unit(1, 0);
  
  fA = scalarFunctionOne(xA);
  
  cout << "  AutoDiff:" << endl;
  cout << "    Function output: " << fA.value() << endl;
  cout << "    Derivative: " << fA.derivatives() << endl;
  
  x = 1;
  checkFunctionOne(x, dfdx);
  
  cout << "  Hand differentiation:" << endl;
  cout << "    Derivative: " << dfdx << endl << endl;
  
  cout << "Testing scalar function with 2 inputs..." << endl;
  
  yA.value() = 1;
  zA.value() = 2;
  
  yA.derivatives() = Eigen::VectorXd::Unit(2, 0);
  zA.derivatives() = Eigen::VectorXd::Unit(2, 1);
  
  gA = scalarFunctionTwo(yA, zA);
  
  cout << "  AutoDiff:" << endl;
  cout << "    Function output: " << gA.value() << endl;
  cout << "    Derivative: " << gA.derivatives()[0] << ", " 
    << gA.derivatives()[1] << endl;
  
  y = 1;
  z = 2;
  checkFunctionTwo(y, z, dgdy, dgdz);
  
  cout << "  Hand differentiation:" << endl;
  cout << "    Derivative: " << dgdy << ", " 
    << dgdz << endl;
  
  return EXIT_SUCCESS;
  
}

And the output:

Testing scalar function with 1 input...
  AutoDiff:
    Function output: 6
    Derivative: 7
  Hand differentiation:
    Derivative: 7

Testing scalar function with 2 inputs...
  AutoDiff:
    Function output: 22
    Derivative: 19, 14
  Hand differentiation:
    Derivative: 19, 14

First off, I created 4 functions, 2 to test the automatic differentiation method, and 2 for checking the results. scalarFunctionOne has 1 input while scalarFunctionTwo has 2 inputs. Both have only one output. Template programming is key to AD. We have to make our functions able to accept a generic input type other than the usual double. This is really not an issue at all for most problems. If you had already written the function, all you need to do is to change the type declaration and voila, you are now able to tap unto the automatic differentiation package.

We are using Eigen’s automatic differentiation package here. Eigen is an amazing C++ math library. Being an include only library, you don’t have to worry about portability and distribution issues. The AutoDiffScalar type will handle the dual number component for us. I will explain a little more about dual numbers later.

We see that before passing the AutoDiffScalar types into our function, we need to set its value and derivative output index. For example, yA.derivatives() = Eigen::VectorXd::Unit(2, 0) mean that we want the derivative of the function with respect to y to be written in the first index of the output derivative vector. We can now pass the AutoDiffScalars into our function and let Eigen do the magic.

Dual numbers

As advertised, we see that AutoDiff gives us the expected results without the need of creating new functions. So why does this work? We start with complex numbers. Let’s take the square of (1+2i):

    \begin{align*} (1+2i)^2 = 1 + 4i +4i^2 = -3 + 4i . \end{align*}

Complex number arithmetic works like algebra, except for the additional rule that i^2 = -1. Dual numbers work in the same way, except that \epsilon^2 = 0, where i is replaced by \epsilon. Applying dual numbers to scalarFunctionOne:

    \begin{align*} f\left(x+\epsilon\right) &= 2\left(x+\epsilon\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2\left(x^2 + 2x\epsilon + \epsilon ^2\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2\left(x^2 + 2x\epsilon\right)^2 + 3\left(x+\epsilon\right) + 1 \\ &= 2 x^2 + 3x + 1 + \left(4x+3\right)\epsilon . \end{align*}

We see that the coefficient of epsilon is equal to the derivative of the function! One way dual numbers can be implemented in programming is to define a dual number class with 2 components: real component, and dual component. We then overload operators on dual numbers (i.e. addition, subtraction, multiplication etc.) such that we can enforce \epsilon^2 = 0. Hence, one way to look at AD is to appreciate that no matter how complex a function is, it can be broken down into a collection of simpler operations that we can overload with dual numbers.

Templates supercharged

The toughest challenge I faced with AD was with structuring the code to reduce repeatability. I had created a base class for the various flux calculations of the different boundary conditions (BC), and their input and output array sizes are determined at runtime depending on whether the problem is 2D or 3D. These flux calculators are also dependent on a Gas class which I created to solve the thermodynamic relations (i.e. perfect gas). An example is:

class FluxRoe : public Flux {
  
  public:
  
    void initialize(Config * config, Gas * gas, Mesh * mesh) 
    {
      // Save pointers to required information. Gas class used in
      // vector function, so it also needs to be a template class
      // for AD.
    }
    
    // Vector function
    void computeFlux
    (
      double const * leftVariable, 
      double const * rightVariable, 
      double const * normal,
      double * flux
    ) 
    {
      // Array sizes determined at runtime. Output to flux.
    }
    
  protected:
  
    Gas * gas_;
      
};

Eigen provides a convenient AutoDiffJacobian class, which calculates the jacobian of your vector function without the need for you to manually define the AutoDiffScalar like we did previously. The AutoDiffJacobian class required me to structure my vector function in a certain way. Namely, I needed the vector function to be called via the () operator, and several variables had to be defined. To add on to the woes, I would definitely need a base class for the flux jacobians. An example of a vector function to be used with AutoDiffJacobian is:

template <short N_IN, short N_OUT>
class VectorFunction {
  
  public:
  
    // Eigen needs this to determine no. of variables at compile time
    enum 
    {
      InputsAtCompileTime = N_IN,
      ValuesAtCompileTime = N_OUT
    };
    
    // Also needed by Eigen
    typedef Eigen::Matrix<double, N_IN, 1> InputType;
    typedef Eigen::Matrix<double, N_OUT, 1> ValueType;
  
    // Vector function
    template <typename T>
    void operator()
    (
      const Eigen::Matrix<T, N_IN, 1>& vIn, 
      Eigen::Matrix<T, N_OUT, 1>* vOut
    ) const 
    {
      // Compute function with vIn and output to vOut.
    }
    
};

int main () {

  Eigen::Matrix<double, 3, 1> vIn;
  Eigen::Matrix<double, 3, 1> vOut;
  Eigen::Matrix<double, 3, 3> mJacobian;

  Eigen::AutoDiffJacobian< VectorFunction<3, 3> > vectorFunAD;
  
  // Set values in vIn, vOut and mJacobian...
  
  vectorFunAD(vIn, &vOut, &mJacobian); // Voila! jacobian is in mJacobian.
  
  return EXIT_SUCCESS;
  
}

With the 2 code snippets above, we see the disconnect between what we have and what is required. Just thinking back about these issues make my head about to explode. I spent a good 3 weeks on coming up with a satisfactory solution (at least to me). The initial thought was to copy and paste code into new jacobian classes for individual BC and number of dimensions, but that would be a clumsy solution since many chunks of code would be repeated. For a more acceptable solution, I had to turn to static polymorphism and make use of template template parameters (yes, the word template is used twice).

The first major step was to make the Flux and Gas classes into template classes. i.e.

template<T>
Flux {
  public:
    computeFlux
    (
      T const * leftVariable, 
      T const * rightVariable, 
      double const * normal,
      T * flux
    ) 
    { 
      //... 
    }
  protected:
    Gas<T> * gas_;
};

The next major step was to apply static polymorphism to a wrapper class for Eigen:

template< template<typename> typename F, short N_LEFT, short N_RIGHT, short N_OUT >
class FluxWrapper {
  
  public:
    
    enum 
    {
      InputsAtCompileTime = N_LEFT+N_RIGHT,
      ValuesAtCompileTime = N_OUT
    };
    typedef Eigen::Matrix<double, InputsAtCompileTime, 1> InputType;
    typedef Eigen::Matrix<double, ValuesAtCompileTime, 1> ValueType;
    
    void setNormal (double const * normal) {
      normal_ = normal;
    }
    
    template <typename T>
    void operator()
    (
      const Eigen::Matrix<T, N_IN, 1>& vIn, 
      Eigen::Matrix<T, N_OUT, 1>* vOut
    ) const 
    {
      F<T> f; // Flux calculator instantiated with AutoDiff variable
      
      f.computeFlux(
        &( vIn[0] ),
        &( vIn[N_LEFT] ),
        normal_,
        &( (*vOut)[0]) );
      
    }
    
  protected:
  
    double const * normal_;
    
};

The various wrapped flux calculators are then created with:

typedef FluxWrapper<FluxRoe, 4, 4, 4> FluxRoeWrappedTwoDim;
typedef FluxWrapper<FluxRoe, 5, 5, 5> FluxRoeWrappedThreeDim;

Finally, I use static polymorphism again to create my individual jacobian classes which are derived from a base class:

template <typename F> 
class FluxJacobianWrapper : public FluxJacobian {
  public:
    void computeJacobian
    (
      double const * leftVariable,
      double const * rightVariable,
      double const * normal,
      double * leftJacobian,
      double * rightJacobian
    )
    {
      Eigen::Matrix<double, F::InputsAtCompileTime, 1> vIn;
      Eigen::Matrix<double, F::ValuesAtCompileTime, 1> vOut;
      Eigen::Matrix<double, F::ValuesAtCompileTime, F::InputsAtCompileTime> mJacobian;
    
      Eigen::AutoDiffJacobian<F> fluxAD; // Create flux class for AD
      
      fluxAD.setNormal(normal);
      
      // Map double array to Eigen storage type (MatrixXd etc.)...
      
      fluxAD(vIn, &vOut, &mJacobian);
      
      // Map jacobian values to double array...
      
    }
};

The flux jacobians are used by:

typedef FluxJacobianWrapper<FluxRoeWrappedTwoDim> FluxJacobianRoeTwoDim;
typedef FluxJacobianWrapper<FluxRoeWrappedThreeDim> FluxJacobianRoeThreeDim;

Great! Now I just need to code the various combination that may happen once and the appropriate code will be generated for me.

Leave a reply

Your email address will not be published.