昨天理解且消化完 SVM 到底是什麼之後,我們就來進行簡單的實作吧!首先從 Python 開始(好像開賽到今天是第一次講解 Python !?)
因為是第一次講解 Python 實作,所以會比較詳細介紹每個套件。
import pandas as pd # 製作 table 相關
import numpy as np # 維度陣列與矩陣運算
import matplotlib.pyplot as plt # 繪圖用
import seaborn as sns
# seaborn 套件是以 matplotlib 為基礎建構的高階繪圖套件,讓使用者更加輕鬆地建立圖表
from sklearn.datasets import load_iris # 引入 iris 資料集
iris = load_iris()
df_data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= ['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm','Species'])
df_data # 看一下整理好的資料
執行結果為:
接著將資料分成 train set & test set
from sklearn.model_selection import train_test_split
X = df_data.drop(labels=['Species'],axis=1).values # 移除Species並取得剩下欄位資料
y = df_data['Species'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
print('train shape:', X_train.shape)
print('test shape:', X_test.shape)
執行結果為:
train shape: (120, 4)
test shape: (30, 4)
接下來繪製決策邊界 Function
def make_meshgrid(x, y, h=.02):
"""Create a mesh of points to plot in
Parameters
----------
x: data to base x-axis meshgrid on
y: data to base y-axis meshgrid on
h: stepsize for meshgrid, optional
Returns
-------
xx, yy : ndarray
"""
x_min, x_max = x.min() - 1, x.max() + 1
y_min, y_max = y.min() - 1, y.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
return xx, yy
def plot_contours(ax, clf, xx, yy, **params):
"""Plot the decision boundaries for a classifier.
Parameters
----------
ax: matplotlib axes object
clf: a classifier
xx: meshgrid ndarray
yy: meshgrid ndarray
params: dictionary of params to pass to contourf, optional
"""
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
out = ax.contourf(xx, yy, Z, **params)
return out
# 將原先 iris 4個特徵降成2維,方便做視覺化。
from sklearn.decomposition import PCA
pca = PCA(n_components=2, iterated_power=1)
train_reduced = pca.fit_transform(X_train)
訓練線性 SVM (kernal = linear)模型
from sklearn import svm
# 建立 kernel='linear' 模型
svcModel=svm.SVC(kernel='linear', C=1)
# 使用訓練資料訓練模型
svcModel.fit(train_reduced, y_train)
# 使用訓練資料預測分類
predicted=svcModel.predict(train_reduced)
# 計算準確率
accuracy = svcModel.score(train_reduced, y_train)
X0, X1 = train_reduced[:, 0], train_reduced[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(plt, svcModel, xx, yy,
cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X0, X1, c=y_train, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVC with linear kernel'+ '\n' + 'Accuracy:%.2f'%accuracy)
執行結果為:
r 的 SVM 實作真的非常簡單~~
require("e1071")
library(ggplot2)
data(iris)
# Linear
svm_model_linear <- svm(Species ~ ., data=iris, kernel="linear")
plot(svm_model_linear, iris, Petal.Width ~ Petal.Length,slice = list(Sepal.Width = 3, Sepal.Length = 4))
執行結果為: