雖然 Tensorflow 提供了幾個預訓練模型讓我們可以很快的完成訓練任務,但是有時候想做一需實驗時(比如說微調 mobilenet 的 CNN 層節點數 ),就沒有簡單易用的 API。因此今天需要把手弄髒,學習如何從0建構一個 mobilenetV2 出來,並且把 Tensorflow 提供的預訓練權重也一併轉移上去!
首先老樣子,我們先產生官方版的 mobilenetV2,並把權重鎖住。
base = tf.keras.applications.MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
base.trainable = False
net = tf.keras.layers.GlobalAveragePooling2D()(base.output)
net = tf.keras.layers.Dense(NUM_OF_CLASS)(net)
model = tf.keras.Model(inputs=[base.input], outputs=[net])
model.summary()
接著會印出很長一串整個模型的 Layer 細節,可以讓我們觀摩怎麼從頭開始建。
_____________________________________
Layer (type) Output Shape Param # Connected to ===================================
input_1 (InputLayer) [(None, 224, 224, 3) 0 _____________________________________
Conv1 (Conv2D) (None, 112, 112, 32) 864 input_1[0][0] ____________________________________
bn_Conv1 (BatchNormalization) (None, 112, 112, 32) 128 Conv1[0][0]
(略)
_____________________________________
global_average_pooling2d (Globa (None, 1280) 0 out_relu[0][0] _______________________________________
dense (Dense) (None, 2) 2562 global_average_pooling2d[0][0]
=====================================
Total params: 2,260,546
Trainable params: 2,562
Non-trainable params: 2,257,984
接著,我們簡單訓練10個 epochs ,因為鎖住權重,所以收斂很快。
model.compile(
optimizer=tf.keras.optimizers.SGD(LR),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
history = model.fit(
ds_train,
epochs=EPOCHS,
validation_data=ds_test,
verbose=True)
產出:
loss: 0.1718 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.9364 - val_sparse_categorical_accuracy: 0.7833
接著到了我們要復刻模型的環節,經過剛剛的 summary ,我們發現 mobilenetV2 中其實有很多 block_{num}_* 的結構,這個區塊就是所謂的 bottleneck 設計(Conv -> DepthwiseConv -> Conv)。
def get_mobilenetV2(shape):
input_node = tf.keras.layers.Input(shape=shape)
net = tf.keras.layers.Conv2D(32, 3, (2, 2), use_bias=False, padding='same')(input_node)
net = tf.keras.layers.BatchNormalization()(net)
net = tf.keras.layers.ReLU(max_value=6)(net)
net = tf.keras.layers.DepthwiseConv2D(3, use_bias=False, padding='same')(net)
net = tf.keras.layers.BatchNormalization()(net)
net = tf.keras.layers.ReLU(max_value=6)(net)
net = tf.keras.layers.Conv2D(16, 1, use_bias=False, padding='same')(net)
net = tf.keras.layers.BatchNormalization()(net)
net = bottleneck(net, 16, 24, (2, 2), shortcut=False, zero_pad=True) # block_1
net = bottleneck(net, 24, 24, (1, 1), shortcut=True) # block_2
net = bottleneck(net, 24, 32, (2, 2), shortcut=False, zero_pad=True) # block_3
net = bottleneck(net, 32, 32, (1, 1), shortcut=True) # block_4
net = bottleneck(net, 32, 32, (1, 1), shortcut=True) # block_5
net = bottleneck(net, 32, 64, (2, 2), shortcut=False, zero_pad=True) # block_6
net = bottleneck(net, 64, 64, (1, 1), shortcut=True) # block_7
net = bottleneck(net, 64, 64, (1, 1), shortcut=True) # block_8
net = bottleneck(net, 64, 64, (1, 1), shortcut=True) # block_9
net = bottleneck(net, 64, 96, (1, 1), shortcut=False) # block_10
net = bottleneck(net, 96, 96, (1, 1), shortcut=True) # block_11
net = bottleneck(net, 96, 96, (1, 1), shortcut=True) # block_12
net = bottleneck(net, 96, 160, (2, 2), shortcut=False, zero_pad=True) # block_13
net = bottleneck(net, 160, 160, (1, 1), shortcut=True) # block_14
net = bottleneck(net, 160, 160, (1, 1), shortcut=True) # block_15
net = bottleneck(net, 160, 320, (1, 1), shortcut=False) # block_16
net = tf.keras.layers.Conv2D(1280, 1, use_bias=False, padding='same')(net)
net = tf.keras.layers.BatchNormalization()(net)
net = tf.keras.layers.ReLU(max_value=6)(net)
return input_node, net
def bottleneck(net, filters, out_ch, strides, shortcut=True, zero_pad=False):
padding = 'valid' if zero_pad else 'same'
shortcut_net = net
net = tf.keras.layers.Conv2D(filters * 6, 1, use_bias=False, padding='same')(net)
net = tf.keras.layers.BatchNormalization()(net)
net = tf.keras.layers.ReLU(max_value=6)(net)
if zero_pad:
net = tf.keras.layers.ZeroPadding2D(padding=((0, 1), (0, 1)))(net)
net = tf.keras.layers.DepthwiseConv2D(3, strides=strides, use_bias=False, padding=padding)(net)
net = tf.keras.layers.BatchNormalization()(net)
net = tf.keras.layers.ReLU(max_value=6)(net)
net = tf.keras.layers.Conv2D(out_ch, 1, use_bias=False, padding='same')(net)
net = tf.keras.layers.BatchNormalization()(net)
if shortcut:
net = tf.keras.layers.Add()([net, shortcut_net])
return net
完成上述的結構後,我們在建立一個 rework_model 並把權重值從原先的 tf 匯到重建的 model 裡。
input_node, net = get_mobilenetV2((224,224,3))
net = tf.keras.layers.GlobalAveragePooling2D()(net)
net = tf.keras.layers.Dense(NUM_OF_CLASS)(net)
rework_model = tf.keras.Model(inputs=[input_node], outputs=[net])
rework_model.compile(
optimizer=tf.keras.optimizers.SGD(LR),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
for origin_layer, rework_layer in zip(model.layers, rework_model.layers):
origin_layer.trainable = True
rework_layer.set_weights(origin_layer.get_weights())
為了確定這個重建的 rework_model 的結構和權重值都和原先的一樣,我們用 evaluate() 來比較兩者的 loss 和 準確度是否都相同。
model.evaluate(ds_test, verbose=True)
rework_model.evaluate(ds_test, verbose=True)
產出:
32/32 [==============================] - 3s 80ms/step - loss: 0.9364 - sparse_categorical_accuracy: 0.7833
32/32 [==============================] - 4s 79ms/step - loss: 0.9364 - sparse_categorical_accuracy: 0.7833
看起來沒問題,兩個模型的 loss 和 sparse_categorical_accuracy都相同,我們真的從頭複製了一個 mobilenetV2 !