SVD > PCA
I wrote a blog about Robust PCA. As a prerequisite for the readers, I will explain what SVD and PCA are. As we shall see, PCA is essentially SVD, and learning these two will be a nice segue way to robust PCA.
SVD
Formula
Any matrix can be written as
where is an orthogonal matrix,
is also an orthogonal matrix,
and is a diagonal matrix.
Diagonal values of are called singular values. SVD shows that we can decompose any rectangular matrices into three matrices with nice properties (i.e. orthogonal and diagonal).
Low Rank SVD
In machine learning, we use low rank (or truncated) SVD a lot, because it can compress the information has to smaller matrices. Formally,
where is an orthogonal matrix,
is also an orthogonal matrix,
and is a diagonal matrix.
contains largest singular values. The rows and columns in can be thought of as “the most information rich” vectors of . Note that the exact equality no longer holds. Here is an image that illustrates the above formula:
Finding and
Recall that the definition of an orthogonal matrix is . Using this fact, you can find by using eigen decomposition twice.
Essentially, applying eigen decomposition to gives us as stacks of eigen vectors. Eigen values in the diagonal entries of correspond to the square of the singular values.
Applications
There are numerous applications of (truncated) SVD:

Collaborative filtering is an algorithm for recommender systems. is a user item matrix, which is decomposed into , a user matrix and an item matrix. SVD finds a dimensional embedding vector for each user and item.

Image Compression: is the original image. can be decomposed into smaller matrix space by choosing a small . Below, you can see the reconstructed images for different values of . The lower the is, the more compressed the image is. Image quality does go down, but it does preserve many important aspects of the original image.
 Semantic Indexing: What is known as LSI (latent semantic indexing) in NLP is essentially SVD. In LSI, is a term frequency matrix of dimension document term. are low dimensional embeddings of document, term, respectively.
PCA
I would say that PCA is one of the applications of SVD.
The objective of PCA is to “compress” a data matrix to a lower rank:
Solving this optimization function let’s us find with rank at most that best approximates . Eckart and Young proved in 1936 ^{1} (!) that the optimal is , with only the top principal components. What are principal components? We already calculated them above! It’s . Corresponding principal “principal vectors” are . PCA is that simple. This is true for all norms that are invariant under unitary transformations. To learn more about matrix norms, see this. We will be using some of them in Robust PCA.
Conclusion
I hope you see how PCA is easily derived from SVD. Now that you know PCA, you’re ready to learn Robust PCA!
Leave a Comment