Day 07：撰寫第一支CNN 程式 -- 比較『阿拉伯數字』辨識力

2018 iT 邦幫忙鐵人賽

DAY 7

AI & Machine Learning

以100張圖理解 Neural Network -- 觀念與實踐系列第 7 篇

2018鐵人賽 neural network machine learning ai

I code so I am

2017-12-17 09:35:39

55351 瀏覽

分享至

範例程式

我們仍然作『阿拉伯數字的辨識』，比較 CNN 的作法與簡單的 Neural Network 有何不同。程式來自https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py ，我在程式中加了註解，請參考這裡，檔案名稱為cnn.py。

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

# 定義梯度下降批量
batch_size = 128
# 定義分類數量
num_classes = 10
# 定義訓練週期
epochs = 12

# 定義圖像寬、高
img_rows, img_cols = 28, 28

# 載入 MNIST 訓練資料
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 保留原始資料，供 cross tab function 使用
y_test_org = y_test

# channels_first: 色彩通道(R/G/B)資料(深度)放在第2維度，第3、4維度放置寬與高
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else: # channels_last: 色彩通道(R/G/B)資料(深度)放在第4維度，第2、3維度放置寬與高
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# 轉換色彩 0~255 資料為 0~1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# y 值轉成 one-hot encoding
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# 建立簡單的線性執行的模型
model = Sequential()
# 建立卷積層，filter=32,即 output space 的深度, Kernal Size: 3x3, activation function 採用 relu
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
# 建立卷積層，filter=64,即 output size, Kernal Size: 3x3, activation function 採用 relu
model.add(Conv2D(64, (3, 3), activation='relu'))
# 建立池化層，池化大小=2x2，取最大值
model.add(MaxPooling2D(pool_size=(2, 2)))
# Dropout層隨機斷開輸入神經元，用於防止過度擬合，斷開比例:0.25
model.add(Dropout(0.25))
# Flatten層把多維的輸入一維化，常用在從卷積層到全連接層的過渡。
model.add(Flatten())
# 全連接層: 128個output
model.add(Dense(128, activation='relu'))
# Dropout層隨機斷開輸入神經元，用於防止過度擬合，斷開比例:0.5
model.add(Dropout(0.5))
# 使用 softmax activation function，將結果分類
model.add(Dense(num_classes, activation='softmax'))

# 編譯: 選擇損失函數、優化方法及成效衡量方式
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

# 進行訓練, 訓練過程會存在 train_history 變數中
train_history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

# 顯示損失函數、訓練成果(分數)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

整個訓練過程執行有點久，真想買一片好一點的GPU顯示卡，裝個GPU版的TensorFlow，以縮短泡茶/喝咖啡的時間，執行完趕快用下列程式碼存檔，避免下次還要重跑。

# 模型結構存檔
from keras.models import model_from_json
json_string = model.to_json()
with open("cnn.config", "w") as text_file:
    text_file.write(json_string)

    
# 模型訓練結果存檔
model.save_weights("cnn.weight")

訓練結果準確率達 99.11%，比單純使用簡單的 Neural Network 高多了，但執行時間也相對較長，但只要將模型結果儲存，我們就只要訓練這次就夠了，之後直接載入模型及參數，就可以直接進行預測了。讀者如果不耐久等，也可以自這裡取得 cnn.config 及 cnn.weight，直接載入模型及參數。

另外，各位可以再執行 cnn_1.py，測試第三篇用 draw.exe 撰寫的數字看看預測結果。

這裡再提供個小技巧，可以計算『混淆矩陣』(Confusion Matrix)，顯示測試集分類的正確及錯認總和數，左上至右下的對角線為正確數，其他格為錯認數，可以看出某一數字被錯認為哪一數字的機率最高，可以再加強訓練資料，以改善錯誤分類。

# 計算『混淆矩陣』(Confusion Matrix)，顯示測試集分類的正確及錯認總和數
import pandas as pd 
predictions = model.predict_classes(x_test) 
pd.crosstab(y_test_org, predictions, rownames=['實際值'], colnames=['預測值'])

圖. 『混淆矩陣』(Confusion Matrix)

程式說明

整個程式結構與第二篇的程式大致相同，主要的差異在層(layer)的設計(第42~59行)，注意，當我們使用ConvxD卷積層時，第一個參數濾波器(filters)數目並不是 output 的大小，它是output 的深度(Depth)，而 output 的寬與高會隨著參數設定有所不同，計算公式為 ((W-F+2P)/S)+1，各變數定義如下：

W: input 的寬度
F：濾波器數量
P：補零的策略，卷積層取週邊NxN的滑動視窗時，若超越邊界時，是否要放棄這個點、還是一律補零，若採後者，P就等於1，反之為0。
S：『滑動步長』(Stride)，指滑動視窗時，要一次滑動幾格。

透過以上公式，我們就會算出output 的寬或高，例如第42行 output 的寬與高 = ((28-3+0)/1)+1 = 26，可以執行指令 model.summary() 驗證，output的維度大小為 (None, 26, 26, 32)。

圖. CNN範例程式的結構

結語

利用CNN來作『阿拉伯數字的辨識』，有點像大材小用，因為，阿拉伯數字的圖形單純，只有線條，而CNN的長處是自動萃取特徵，辨識由線、面、角，構成複雜的形狀，所以，我們會多舉一些應用實例，來彰顯它的威力。但在那之前，我會先在下一篇整理一下這支範例程式的相關函數及參數說明。

Day 06：處理影像的利器 -- 卷積神經網路(Convolutional Neural Network)

Day 08：CNN 模型設計

系列文

以100張圖理解 Neural Network -- 觀念與實踐共 31 篇

RSS系列文訂閱系列文

468 人訂閱

完整目錄

直播研討會

3 則留言

ronaldxsun

iT邦新手 5 級 ‧ 2019-06-06 16:57:27

您好，看了你的文章獲益良多，
但我有個小問題想請教一下
我的score = model.evaluate(x_test, y_test, verbose=0)
score[1] 也就是acc與自行計算混淆矩陣的準確率不相符，請問您知道有什麼原因可能造成這種狀況嗎?
我是做兩類分類的，即非A即B的狀況。
謝謝

回應 1
檢舉

I code so I am iT邦高手 1 級 ‧ 2019-06-06 22:29:34 檢舉

該應是會相符，你可以用model.predict_class or argmax(model.predict) 得到預測值，再跟 y_test 比較即可得到準確率。

登入發表回應

b19990705

iT邦新手 5 級 ‧ 2020-06-10 02:01:19

您好
我想要請問一下
如果我要用小畫家寫數字存檔
讀入模型做測試
要怎麼使用

回應 4
檢舉

看更多先前的回應...收起先前的回應...

I code so I am iT邦高手 1 級 ‧ 2020-06-10 10:02:10 檢舉

from skimage import data, color, io

uploaded_file='<檔名>'

image1 = io.imread(uploaded_file, as_gray=True)
image_resized = resize(image1, (28, 28), anti_aliasing=True)    
if K.image_data_format() == 'channels_first':
    X1 = image_resized.reshape(1,28,28) #/ 255
else:
    X1 = image_resized.reshape(28,28,1) #/ 255

X1 = np.abs(1-X1)
predictions = model.predict_classes(X1)

b19990705 iT邦新手 5 級 ‧ 2020-06-10 17:36:41 檢舉

您好
我剛剛測試的時候
image_resized = resize(image1, (28, 28), anti_aliasing=True)
錯誤訊息跑出
name 'resize' is not defined
是我少了甚麼套件嗎

I code so I am iT邦高手 1 級 ‧ 2020-06-10 20:19:00 檢舉

前面加：

from skimage.transform import rescale, resize

b19990705 iT邦新手 5 級 ‧ 2020-06-10 21:18:05 檢舉

您好
謝謝您的教學
我在測試的時候發現
這個部分
if K.image_data_format() == 'channels_first':
X1 = image_resized.reshape(1,28,28) #/ 255
else:
X1 = image_resized.reshape(28,28,1) #/ 255
需要改成這樣
if K.image_data_format() == 'channels_first':
X1 = image_resized.reshape(1,1,28,28) #/ 255
else:
X1 = image_resized.reshape(1,28,28,1) #/ 255
不然會出現
Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (28, 28, 1)
這行錯誤