Day 14 支援向量機 Support Vector Machine 實作篇

2022 iThome 鐵人賽

DAY 14

AI & Data

語言學與NLP系列第 14 篇

14th鐵人賽 # svm # linear # python # r

cjom06991

團隊KnULPers_from_NCCU

2022-09-29 11:24:21

2208 瀏覽

分享至

昨天理解且消化完 SVM 到底是什麼之後，我們就來進行簡單的實作吧！首先從 Python 開始（好像開賽到今天是第一次講解 Python ！？）

Python SVM 實作

因為是第一次講解 Python 實作，所以會比較詳細介紹每個套件。


import pandas as pd # 製作 table 相關
import numpy as np # 維度陣列與矩陣運算
import matplotlib.pyplot as plt # 繪圖用
import seaborn as sns 
# seaborn 套件是以 matplotlib 為基礎建構的高階繪圖套件，讓使用者更加輕鬆地建立圖表
from sklearn.datasets import load_iris # 引入 iris 資料集


iris = load_iris()
df_data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= ['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm','Species'])
                     
df_data # 看一下整理好的資料

執行結果為：

接著將資料分成 train set & test set


from sklearn.model_selection import train_test_split
X = df_data.drop(labels=['Species'],axis=1).values # 移除Species並取得剩下欄位資料
y = df_data['Species'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print('train shape:', X_train.shape)
print('test shape:', X_test.shape)

執行結果為：

train shape: (120, 4)
test shape: (30, 4)

接下來繪製決策邊界 Function


def make_meshgrid(x, y, h=.02):
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    """Plot the decision boundaries for a classifier.

    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out


# 將原先 iris 4個特徵降成2維，方便做視覺化。

from sklearn.decomposition import PCA
pca = PCA(n_components=2, iterated_power=1)
train_reduced = pca.fit_transform(X_train)

訓練線性 SVM (kernal = linear)模型


from sklearn import svm

# 建立 kernel='linear' 模型
svcModel=svm.SVC(kernel='linear', C=1)
# 使用訓練資料訓練模型
svcModel.fit(train_reduced, y_train)
# 使用訓練資料預測分類
predicted=svcModel.predict(train_reduced)
# 計算準確率
accuracy = svcModel.score(train_reduced, y_train)

X0, X1 = train_reduced[:, 0], train_reduced[:, 1]
xx, yy = make_meshgrid(X0, X1)
plot_contours(plt, svcModel, xx, yy,
                  cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X0, X1, c=y_train, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVC with linear kernel'+ '\n' + 'Accuracy:%.2f'%accuracy)

執行結果為：

svm p

R SVM 實作

r 的 SVM 實作真的非常簡單～～


require("e1071")
library(ggplot2)
data(iris)

# Linear
svm_model_linear <- svm(Species ~ ., data=iris, kernel="linear")


plot(svm_model_linear, iris, Petal.Width ~ Petal.Length,slice = list(Sepal.Width = 3, Sepal.Length = 4))

執行結果為：

r p