DAY 14
1
AI & Data

## Day14 Feature Engineering, Kurtosis and Skewness 淺談特徵工程、峰度與偏度

Feature engineering is the process of turning raw data into features, it is an essencial step before we fit data to models. There are three types of frequently seen features: numeric features, categorical features, and time series features. We will discuss through them in the following articles. Generally, there will be both numeric features categorical features in our data, so we will have to include at least encoding and feature scaling methods in our feature engineering.

## 編碼方法 Encoding

1. 獨熱編碼
2. 標籤編碼
3. 均值編碼
4. 目標編碼
5. 頻數編碼
6. 二值化編碼

Some other mothods of encoding are listed as below. We decide which encoding method to use depend on the problem type.

1. One-hot encoding
2. Label encoding
3. Mean encoding
4. Target encoding
5. Count encoding
6. Label binarize encoding

## 特徵縮放方法 Feature scaling

### 歸一化 Scaling to unit length

Transit raw data into 0-1 range. The most commonly used method in Sklearn is MinMaxScaler.

### 標準化 Normalization

Normalize data into a specified range. Normalization wouldn't perform so good if the distribution of data is far from normal. Most commonly used methods are z-score normalization, StandardScaler, etc.

## 偏度 Skewness

Skewness is a measure of the asymmetry of the probability distribution. Could be either symmetric distribution (skewness=0), positive skew (skewness>0), or negative skew (skewness<0).

## 峰度 Kurtosis

Kurtosis is a descriptor of the shape of a probability distribution.

Reference 參考資料：

[1] 第二屆機器學習百日馬拉松內容

[3] 特征工程

[4] 特征工程到底是什么？

[5] 一文看懂常用特徵工程方法