今天我們要來實作一個Grad-CAM,來了解到底神經網路看重甚麼地方。
由CAM所改良,原本的CAM最後一層一定要是GAP(Global Average Pooling),否則無法使用,但是Grad-CAM改良了這個缺點,強調卷積層的最後不管是何種架構,都不用修改模型就可以實現。
我們使用Colab來當作我們的實作平台,並使用Keras來完成。
Grad-CAM的Code使用keisen/tf-keras-vis來完成。
fashion_mnist,為Keras內建的資料集
訓練集為60,000 張28x28 像素灰度圖像,測試集為10,000 同規格圖像,總共10 類時尚物品標籤。該數據集可以用作MNIST 的直接替代品。類別標籤是:
類別 描述 中文
0 T-shirt/top T卹/上衣
1 Trouser 褲子
2 Pullover 套頭衫
3 Dress 連衣裙
4 Coat 外套
5 Sandal 涼鞋
6 Shirt 襯衫
7 Sneaker 運動鞋
8 Bag 背包
9 Ankle boot 短靴
讀取資料集後,把數值都scale到0~1之間。
並設定batch_size=150、epoch=25
from keras.layers import Input, Dense, Conv1D, Conv2D, MaxPooling1D,\
MaxPooling2D, UpSampling1D, UpSampling2D, Dropout, Lambda, Convolution2D,\
Reshape, Activation, Flatten, add, concatenate, Subtract, BatchNormalization
from keras.models import Model, Sequential
from keras.datasets import fashion_mnist
import numpy as np
import keras
import tensorflow as tf
nb_classes=10
nb_epoch=25
batch_size=150
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], x_train.shape[2], 1))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], x_test.shape[2], 1))
我們沿用這篇的model。
input_shape=(28,28,1)
input = Input(input_shape, name='input')
layer=Conv2D(32, kernel_size=(2, 2), activation='relu', padding='same')(input)
layer=BatchNormalization()(layer)
layer=Conv2D(32, kernel_size=(2, 2), activation='relu',padding='same')(layer)
layer=BatchNormalization()(layer)
layer=Conv2D(64, kernel_size=(2, 2), activation='relu', padding='same')(layer)
layer=BatchNormalization()(layer)
layer=Conv2D(64, kernel_size=(2, 2), activation='relu', padding='same')(layer)
layer=BatchNormalization()(layer)
layer=MaxPooling2D(pool_size=(2, 2))(layer)
layer=Conv2D(128, kernel_size=(2, 2), activation='relu', padding='same')(layer)
layer=BatchNormalization()(layer)
layer=Conv2D(128, kernel_size=(2, 2), activation='relu', padding='same')(layer)
layer=BatchNormalization()(layer)
layer=MaxPooling2D(pool_size=(2, 2))(layer)
layer=Conv2D(256, kernel_size=(2, 2), activation='relu', padding='same')(layer)
layer=BatchNormalization()(layer)
layer=Conv2D(256, kernel_size=(2, 2), activation='relu', padding='same',name='final')(layer)
layer=BatchNormalization()(layer)
layer=MaxPooling2D(pool_size=(2, 2))(layer)
layer = Dropout(0.5)(layer)
layer = Flatten(name='flatten')(layer)
output = Dense(nb_classes, name="Dense_10nb", activation='softmax')(layer)
model = Model(inputs=[input], outputs=[output])
model.compile(loss='sparse_categorical_crossentropy',optimizer=keras.optimizers.Adam(lr=0.0001,decay=1e-6),metrics = ['accuracy'])
model.summary()
我們會比較兩者之間的差別。
%%time
from matplotlib import cm
import matplotlib.pyplot as plt
from tf_keras_vis.gradcam import Gradcam,GradcamPlusPlus
from tensorflow.keras import backend as K
from tf_keras_vis.saliency import Saliency
from tf_keras_vis.utils import normalize
def Grad_CAM_savepictures(file_index,model,save_name):
def loss(output):
return (output[0][y_test[file_index]])
def model_modifier(m):
m.layers[-1].activation = tf.keras.activations.linear
return m
# Create Gradcam object
gradcam = Gradcam(model,model_modifier=model_modifier,clone=False)
originalimage=x_test[file_index]
originalimage=originalimage.reshape((1,originalimage.shape[0],originalimage.shape[1],1))
# Generate heatmap with GradCAM
cam = gradcam(loss,originalimage,penultimate_layer=-1)
cam = normalize(cam)
#overlap image
ax1=plt.subplot(1, 3, 1)
heatmap = np.uint8(cm.jet(cam)[..., :3] * 255)
ax1.imshow(x_test[file_index].reshape((x_test.shape[1],x_test.shape[2])),cmap="gray")
ax1.imshow(heatmap.reshape((x_test.shape[1],x_test.shape[2],3)), cmap='jet', alpha=0.4) # overlay
ax1.set_title("Grad-CAM")
gradcam = GradcamPlusPlus(model,model_modifier=model_modifier,clone=False)
cam = gradcam(loss,originalimage,penultimate_layer=-1)
cam = normalize(cam)
ax1=plt.subplot(1, 3, 2)
heatmap = np.uint8(cm.jet(cam)[..., :3] * 255)
ax1.imshow(x_test[file_index].reshape((x_test.shape[1],x_test.shape[2])),cmap="gray")
ax1.imshow(heatmap.reshape((x_test.shape[1],x_test.shape[2],3)), cmap='jet', alpha=0.4) # overlay
ax1.set_title("Grad-CAMPlusPlus")
plt.savefig(save_name)
plt.show()
在第14行的地方,我們會把我們訓練好的model傳入GradCAM模組。並通過一連串計算後得到heapmap(第21行)。
接著把heapmap與原圖重疊即可得到可視化的結果囉。
另外model最後一層的softmax可能會干擾可視化結果,因此我們把它轉為線性的(第9行)。
還有我們必須取得loss值當作分數,因此我們要找出這張圖片的分類標籤,如第8行所示。
顏色越紅的代表神經網路越看重。
感覺本體看重的較少(汗),看中的地方都是主體邊緣。
我們今天實作了Grad-CAM和Grad-CAM++,並且Grad-CAM++擁有較好的結果,我們就可以了解神經網路看重哪邊了。
https://github.com/keisen/tf-keras-vis/blob/master/examples/attentions.ipynb