DAY 28
2
Data Technology

• 什麼是特徵標準化？
• 為什麼要標準化？
• 特徵標準化怎麼做？

特徵標準化怎麼做？

標準化方式：

• min max normalization：
會將特徵數據按比例縮放到0到1的區間，（或是-1到1）。
• standard deviation normalization：
會將所有特徵數據縮放成平均為0、平方差為1。

程式碼撰寫

``````from sklearn import preprocessing
import numpy as np
``````
• `train_test_split`來把資料做切分（一邊為訓練一邊為測試資料）
• `make_classification`用來產生隨機的訓練資料
• `SVC`這個分類法當作例子
``````from sklearn.cross_validation import train_test_split
from sklearn.datasets.samples_generator import make_classification
from sklearn.svm import SVC
``````

``````import matplotlib.pyplot as plt
``````

``````X,y = make_classification(n_samples=300,n_features=2,n_redundant=0,n_informative=2,
random_state=3,scale=100,n_clusters_per_class=1)
plt.scatter(X[:,0],X[:,1],c=y)
plt.show()
``````

``````X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)
clf = SVC()
clf.fit(X_train,y_train)
``````

``````clf.score(X_test,y_test)
``````

標準化：

``````X = preprocessing.scale(X)
``````

``````X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)
clf = SVC()
clf.fit(X_train,y_train)
``````

``````clf.score(X_test,y_test)
``````