Principal component analysis

Principal component analysis (PCA) has many names depending on the field of application – proper orthogonal decomposition, eigenvalue decomposition etc. In fluid dynamics, among other things, it can be used in flow control. Let’s say we want to design an active jet along the top surface of a wing to control the wake behind it. For example, maybe we want to puff air into the flow at the right moment at high angles of attack to reduce the wake size and hence pressure drag.

What I mean is that we want is to reduce the area in blue in the picture above I took from a great paper by Taira et al. As we all know, the flow behind the airfoil at such angles of attack are highly oscillatory and dynamic. Typical CFD analysis like this require millions of cells. How can we describe such a highly complicated flow field then? With the CFD data, we can describe the flowfield as a huge vector of velocities at every cell. Such a vector will be in the order of 10^{6}\times 1. Obviously this would result in an unfeasible state space model for control.

Using our intuition, we know that the flow will be dominated by certain vortices shedding at certain frequencies. We can think of the flowfield as a superposition of a mean flow with such vortices. Thus we start by adding the big vortices to the mean flow field, slowing adding on smaller and smaller vortices which are less important to it.

Now our complicated flowfield can be represented by just a handful of major flow features! This is explained visually with images I pulled from the same paper by Taira et al. Put in another way, we have extracted the principal components of the flowfield based on a set of snapshots we have of the data. To understand how these principal components are extracted, we have to look at a more modest example of a vector sized 2\times 1.

Understanding PCA

In this example, we have a set of data represented by 2 states. For example, it can be a dataset of height and weight of each students in a class. Plotting such a data will look like:

Stacking the data vectors on top of each other, we have our data matrix \mathbf{X} which is a n\times p matrix. In our case, it is 5\times 2 in size. To find the principal components, we simply find the eigenvectors of \mathbf{X}^T\mathbf{X}, resulting in \mathbf{W} which is a matrix with all the eigenvectors stacked side by side (i.e. \mathbf{W}=\left[\vec{W}_1, \vec{W}_2, \cdots\right]). The eigenvector corresponding to the largest eigenvalue would represent the most dominant feature of the data set, the eigenvector corresponding to the second largest eigenvalue represents the second most dominant feature and so on. This where most of the explanations for PCA stops. However, to get a better understanding of what is going on, let’s see what does the first eigenvector represent.

To get the direction which most closely describes the data points, we want to reduce the angle \vec{W}_1 makes with the vector to all our data points \vec{X}_i, which is denoted \theta in the diagram above. This also means that we want the dot product \left(\vec{W}_1 \cdot \vec{X}_i\right) to be maximized for all our data points, resulting in the condition:

    \begin{align*} max \left[\sum_i  \left(\vec{W}_1 \cdot \vec{X}_i\right)^2 \right] . \end{align*}

Note that we square the dot product to get a magnitude. The equivalent matrix form is:

    \begin{align*} max \left[\vec{W}_1^T \left(\mathbf{X}^T\mathbf{X}\right) \vec{W}_1  \right] . \end{align*}

This would give us the direction of \vec{W}_1, but we need to further impose a condition that \|\vec{W}_1\|=1, so we get:

    \begin{align*} max \left[\frac{\vec{W}_1^T \left(\mathbf{X}^T\mathbf{X}\right) \vec{W}_1}{\vec{W}_1^T\vec{W}_1}  \right] . \end{align*}

The problem is now reduced to finding the maximum for a Rayleigh quotient: max\ R\left(\mathbf{A}, \vec{v}\right) = \left(\vec{v}^T\mathbf{A}\vec{v}\right)/\left(\vec{v}^T\vec{v}\right), where \mathbf{A} is symmetric. There is a proof to show that the maximum and minimum of the Rayleigh quotient is the largest and smallest eigenvalue of \mathbf{A}. This occurs when \vec{v} is the eigenvector corresponding to the largest or smallest eigenvalue. As a quick but not at all rigorous check, if we substitute any eigenvector of \mathbf{A}, \vec{\phi} into the Rayleigh quotient:

    \begin{align*} R\left(\mathbf{A}, \vec{\phi}\right) = \frac{\vec{\phi}^T\mathbf{A}\vec{\phi}}{\vec{\phi}^T\vec{\phi}} = \frac{\vec{\phi}^T\lambda\vec{\phi}}{\vec{\phi}^T\vec{\phi}} = \lambda . \end{align*}

So we see how the results for the maximum/minimum of the Rayleigh quotient comes from the eigenvectors. Of course we did not look into what happens when \vec{v} is not an eigenvector of \mathbf{A}, but that is outside of our scope here.

Okay, back to our discussion on PCA, we now know that \vec{W}_1 should be the eigenvector corresponding to the largest eigenvalue of \mathbf{X}^T\mathbf{X}.

So now we have found \vec{W}_1, shown as the blue arrow above. What about the remaining components? To do that we subtract the component \vec{X}_i have in the \vec{W}_1 direction for all our data points and apply the same procedure. The magnitude of \vec{X}_i in \vec{W}_1 is simply the dot product. To find this subtracted \vec{X}_i which we denote \hat{\vec{X}}_i, we get \hat{\vec{X}}_i = \vec{X}_i - \left(\vec{X}_i \cdot \vec{W}_1\right) \vec{W}_1.

So here we see that after subtraction, our black markers follow the dotted line to the red markers. To find the second component, we find the eigenvectors corresponding to the largest eigenvalue for the red markers:

Great! So now we have all our components for the dataset. As you can see with the blue arrow, we can describe our data pretty accurately in just that direction. Also, the subtraction process is akin to deflation in most eigenvector search algorithm. So the principle components which we are looking for is just all eigenvectors of \mathbf{X}^T\mathbf{X}, as frequently explained. In summary:

Example with faces

Anything that can be described as a vector of numbers can be analyzed with PCA. Be it the velocity field behind an airfoil, height and weight of students, or even images (since they are just a matrix of RGB values). A quick search on the internet yielded a database of human faces taken at the AT&T lab in Cambridge University, which I thought would make an interesting test case.

All the face images are shown in the picture above. Each image was saved as a 4096 \times 1 vector of numbers. Each entry in the vector represents a pixel, and the number represents the darkness of the pixel. Finding the first k principal components was as simple as calling [V, D] = eigs(X'*X, k) on Matlab. Plotting the first few dominant modes:

We see that the first dominant mode indeed captures the main features of the human face. It’s literally what a generic face look like. Also, note that the nose and lip areas are pretty blurred. Looking at the later modes, we see that glasses and different lip and nose shapes become more prominent. Hence as we add the modes, we are adding definition to those areas.

What about the results? The original pictures are on the left, while the reconstructed ones are on the right. Not bad! We have reduced our original 4096 \times 1 vector 256 times to a 16 \times 1 vector which describes the faces pretty accurately.

In this post, we have seen how to extract the main features in any dataset, and also more details about PCA beyond the steps and formula in application. We have also seen how it helps in flow control problems by reducing the dimensionality of the problem.

Leave a reply

Your email address will not be published.