昨天介紹了序列式模型建構方式,今天要來介紹函數式 API。函數式 API 使用上較序列式自由度高,適合要建構比較複雜的模型,例如有分支結構,或是非線性拓樸的模型。多輸入和多輸出的模型,適合使用函數式 API 來建構。
函數式 API 的特色,會將層作為函數,資料作為輸入,和序列式模型一層一層堆疊的感覺略有不同:
# 序列式模型
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
# 或是另一種寫法
model = Sequential([
Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
])
# 函數式 API
# 假設 x 為前一層的輸出
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(x)
可以從以上程式碼觀察到,函數式 API 會清楚表示將哪一個變數作為輸入,然後輸出的變數可以做其他層的輸入,所以可以重複使用某一些層,這樣的方法使用起來自由度很高,所以可以用來建構複雜結構的模型。
將昨天的序列式模型改寫成函數式 API,首先匯入所需類別:
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, Flatten, Dense
from tensorflow.keras.models import Model
序列式模型程式碼改寫為函數式 API 程式碼:
input_layer = Input(shape=(256, 256, 3))
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Flatten()(x)
x = Dense(units=4096, activation="relu")(x)
x = Dense(units=4096, activation="relu")(x)
output_layer = Dense(units=5, activation="softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer)
最後一行表示取得模型最終輸出,將一開始的輸入和最後的輸出指定好。
使用 summary()
查看模型結構:
model.summary()
執行結果:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 256, 256, 3)] 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
conv2d_14 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 128, 128, 64) 0
_________________________________________________________________
conv2d_15 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
conv2d_16 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 64, 64, 128) 0
_________________________________________________________________
conv2d_17 (Conv2D) (None, 64, 64, 256) 295168
_________________________________________________________________
conv2d_18 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
conv2d_19 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 32, 32, 256) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 32, 32, 512) 1180160
_________________________________________________________________
conv2d_21 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
conv2d_22 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 16, 16, 512) 0
_________________________________________________________________
conv2d_23 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
conv2d_24 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
conv2d_25 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense_3 (Dense) (None, 4096) 134221824
_________________________________________________________________
dense_4 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_5 (Dense) (None, 5) 20485
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________
這裡的模型內容和昨天是一樣的。
在 Keras 中,有內建許多知名的神經網路模型函式,通常是使用 ImageNet 作為資料集來訓練。ImageNet 為一個大型影像資料庫,含有超過 1400 萬張的影像,包含了 2 萬多個類別,在影像辨識或分類等研究上擁有很大的貢獻,VGGNet、GoogLeNet(Inception)和 ResNet 等都是基於 ImageNet 所訓練。在 Keras 上的模型函式,包含 VGG、ResNet、Inception、Xception、MobileNet、DenseNet、NASNet、EfficientNet 和 ConvNeXt 等系列。
例如本系列要使用的 VGG16,可以直接呼叫內建的 VGG16()
函式來使用,就可以不用自己搭建。不過自行搭建也有好處,可以修改裡面的超參數,或是依照需求去做調整。
使用 VGG16()
函式的方法:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = VGG16(include_top=True, weights='imagenet', input_tensor=inputs)
base_model = base_model.output
model = tf.keras.Model(inputs=inputs, outputs=base_model)
VGG16()
中的 include_top
設置為 True
,表示模型包含全連接層,False
表示模型不包含全連接層,可以自行依照需求去搭建模型。這裡設定需要配合原本訓練 ImageNet 的設置,程式碼才不會執行錯誤,例如影像輸入大小為 (224, 224, 3)
。input_tensor
為輸入層。
使用 summary()
的執行結果:
model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
可以看到最後輸出為 1000,表示有 1000 個類別,這裡為當初 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 所使用的類別數,ILSVRC 為基於 ImageNet 資料集的一項競賽。
如果要依照本系列所找的資料集需求,就要將程式碼做一些修改。首先要先將 include_top
設定為 False
,因為熊熊資料集(前情提要:本系列以 Bear dataset 為例,以後就稱呼它為熊熊資料集)總共有 5 個類別,所以不能用原本的 1000 個類別作為輸出的神經元。影像的大小,以 256×256 為例,所以在輸入層的影像大小也需要修改。程式碼如下:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers
inputs = tf.keras.Input(shape=(256, 256, 3))
base_model = VGG16(include_top=False, weights='imagenet', input_tensor=inputs)
x = base_model.output
x = layers.Flatten()(x)
x = layers.Dense(4096, activation="relu")(x)
x = layers.Dense(4096, activation="relu")(x)
outputs = layers.Dense(5, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
這裡的 weights
設定為 imagenet
,表示使用 ImageNet 訓練的權重,也可以設定為 None
,使用自己的資料集來重新訓練權重。接著將這個 VGG16 模型卷積層和池化層部分,連接到自己搭建的神經層,包含 1 個平坦層 Flatten
和 3 個全連接層 Dense
,最後的全連接層設定成 5 個神經元,表示有 5 個分類,會輸出各個類別的機率值。
執行 model.summary()
:
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 256, 256, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 64, 64, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 64, 64, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 32, 32, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 32, 32, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 16, 16, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense (Dense) (None, 4096) 134221824
_________________________________________________________________
dense_1 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_2 (Dense) (None, 5) 20485
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________
可以看見最後的全連接層被修改成符合我們資料集的輸出了。
最後加上 Day 8 介紹的資料增強層:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers
# use data augmentation layer
data_augmentation = tf.keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.2),
]
)
inputs = tf.keras.Input(shape=(256, 256, 3))
base_model = data_augmentation(inputs)
base_model = VGG16(include_top=False, weights='imagenet', input_tensor=base_model)
x = base_model.output
x = layers.Flatten()(x)
x = layers.Dense(4096, activation="relu")(x)
x = layers.Dense(4096, activation="relu")(x)
outputs = layers.Dense(5, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
執行 model.summary()
:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 256, 256, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 64, 64, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 64, 64, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 32, 32, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 32, 32, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 16, 16, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 8, 8, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 32768) 0
_________________________________________________________________
dense (Dense) (None, 4096) 134221824
_________________________________________________________________
dense_1 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_2 (Dense) (None, 5) 20485
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________
輸入層下一層多出來的 sequential 就是資料增強層(名稱可自行修改),它已經成為模型的一部份了。
今天介紹了函數式 API 的寫法,感覺建構模型變得更有彈性了!明天會開始進入訓練模型的階段,方向盤緊抓不放啦~