Hi there, I'll start this game with the PCA (Principal Component Analysis) course on Coursera (Section 3 in Mathematics for Machine Learning).
Why do we as a data scientist need to learn PCA?
As we know, the prosperity of data science is attributed to its contribution to solving the problems that human cannot handle stably or immediately. Take what I've done in my job as an example, I used to build a model to predict travlling time for different routes in one of major cities in Taiwan. To build this model, it is essential for me to realize what kinds of features would influence the travlling time in the future in a city. So then I start collecting a bunch of features based on the literature review, customer survey and availability of data**.
However, when putting all features into a model, it would be difficult to interpret the produced model and may incur** the inteference from the noisy**. And here is what PCA saves us. PCA reduces the dimensionality of feature set and makes the interpretation of model more straghtforward and operation of model more efficient. And this is a critical issue in data science industry, as in addition to the performance of model, decision makers want to get the insight with the result of the model so that they can integrate the model with its service process and long-run plan.
How to start the journey of meeting our hero, PCA?
In the coming days, I'll follow the expertise in coursera to show what I've learned.
Here is the brief structure:
1. Statistical Introduction
2. Transformation of Vectors in Spaces
3. Orthogonal Projection
Let's move forward!