DAY 29
2
Data Technology

# 交叉驗證

• k-folder cross-vailation
• kk folder cross-vaildation
• least-one-out cross-validation

## 為什麼需要交叉驗證

which is intended to avoid the possible bias introduced by relying on any one particular division into test and train components, is to partition the original set in several different ways and to compute an average score over the different partitions.

## 交叉驗證怎麼做？

K-Fold Cross Validation is used to validate your model through generating different combinations of the data you already have. For example, if you have 100 samples, you can train your model on the first 90, and test on the last 10. Then you could train on samples 1-80 & 90-100, and test on samples 80-90. Then repeat. This way, you get different combinations of train/test data, essentially giving you ‘more’ data for validation from your original data.

## 程式碼

### 引入

``````from sklearn.cross_validation import cross_val_score
``````

`cross_val_score`這是驗證用來評分資料準確度的。

``````from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
``````

``````iris = datasets.load_iris()
X = iris.data
y = iris.target
``````

### KNeighborsClassifier

``````knn = KNeighborsClassifier(n_neighbors=10)
``````

### cross_val_score

`'accuracy'`則是一種方法，是顯示準確度高不高的方法（越高越好），接下來我們將得分取平均：

``````scores = cross_val_score(knn,X,y,cv=5,scoring='accuracy')
print(scores)
print(scores.mean())
``````

### 改變n_neighbors

``````k_range = range(1,31)
k_scores = []
for k_number in k_range:
knn = KNeighborsClassifier(n_neighbors=k_number)
scores = cross_val_score(knn,X,y,cv=10,scoring='accuracy')
k_scores.append(scores.mean())
``````

### 繪圖

``````plt.plot(k_range,k_scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')
plt.show()
``````

## 總結

### 1 則留言

0
HenryC
iT邦新手 5 級 ‧ 2019-07-18 16:18:09