iT邦幫忙

第 11 屆 iT 邦幫忙鐵人賽

DAY 19
0
Google Developers Machine Learning

How to Train Your Model 訓模高手:我的 Tensorflow 個人使用經驗系列文系列 第 19

【19】tensorflow 訓練技巧:用 tf.GraphKeys.TRAINABLE_VARIABLES 評估模型效能篇

當你擁有越來越多建構模型的經驗後,下個該注意的重點就是評估這個模型的效能,以我最常處理的影像來說,很常會考量運算是放在雲端還在邊緣裝置,這都會需要測試模型的 benchmark。

那麼,我該如何做才能比較有保握訓練出來的模型是適合該裝置上的呢?模型建完後,做幾次推論,就大概可以略知一二!

舉個例子:假設我現在要在部署人臉辨識模型,希望每秒可以辨識 10 張(即FPS=10),我會希望我的模型每算一張畫面所花費的時間落在 100 ms(1000 / 10),反正一開始只需知道推論速度而不管準確性,所以 session init 初始變數後,就可以丟進網路中計算啦。

以下示範在 tensorflow 建造一個 alexnet

input_node = tf.placeholder(shape=[None, 224, 224, 3], dtype=tf.float32, name='input_node')

net = tf.layers.conv2d(input_node, 96, (11, 11), 
                        strides=(4, 4), 
                        activation=tf.nn.relu, 
                        padding='same', 
                        name='conv_1')
net = tf.nn.lrn(net, depth_radius=5, 
                        bias=1.0, 
                        alpha=0.0001 / 5.0, 
                        beta=0.75, 
                        name='norm_1')
net = tf.layers.max_pooling2d(net, pool_size=(3, 3), 
                        strides=2, 
                        name='max_pool_1')

net = tf.layers.conv2d(net, 256, (5, 5), 
                        strides=(1, 1), 
                        activation=tf.nn.relu, 
                        padding="same", 
                        name='conv_2')
net = tf.nn.lrn(net, depth_radius=5, 
                        bias=1.0, 
                        alpha=0.0001 / 5.0, 
                        beta=0.75, 
                        name='norm_2')
net = tf.layers.max_pooling2d(net, pool_size=(3, 3), 
                        strides=2, 
                        name='max_pool_2')

net = tf.layers.conv2d(net, 384, (3, 3), 
                        strides=(1, 1), 
                        padding="same", 
                        activation=tf.nn.relu, 
                        name='conv_3')

net = tf.layers.conv2d(net, 384, (3, 3), 
                        strides=(1, 1), 
                        padding="same", 
                        activation=tf.nn.relu, 
                        name='conv_4')

net = tf.layers.conv2d(net, 256, (3, 3), 
                        strides=(1, 1), 
                        padding="same", 
                        activation=tf.nn.relu, 
                        name='conv_5')
net = tf.layers.max_pooling2d(net, pool_size=(3, 3), 
                                    strides=2, 
                                    padding="valid", 
                                    name='max_pool_5')

net = tf.reshape(net, [-1, 6 * 6 * 256], name='flat')
net = tf.layers.dense(net, 4096, activation=tf.nn.relu, name='dense_1')

net = tf.layers.dense(net, 4096, activation=tf.nn.relu, name='dense_2')

logits = tf.layers.dense(net, NUM_CLASSES, name="logits_layer")

節點圖:
https://ithelp.ithome.com.tw/upload/images/20190927/20107299m9CeU7phD7.png

Alexnet 是一個很精簡的模型,為了演示,我再建造另一個較大的模型:VGG16

input_node = tf.placeholder(shape=[None, 224, 224, 3], dtype=tf.float32, name='input_node')

net = tf.layers.conv2d(input_node, 64, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_1_1')
net = tf.layers.conv2d(net, 64, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_1_2')
net = tf.layers.max_pooling2d(net, pool_size=(2, 2), 
                        strides=2, 
                        name='max_pool_1')

net = tf.layers.conv2d(net, 128, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_2_1')
net = tf.layers.conv2d(net, 128, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_2_2')
net = tf.layers.max_pooling2d(net, pool_size=(2, 2), 
                        strides=2, 
                        name='max_pool_2')

net = tf.layers.conv2d(net, 256, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_3_1')
net = tf.layers.conv2d(net, 256, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', name='conv_3_2')
net = tf.layers.conv2d(net, 256, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_3_3')
net = tf.layers.max_pooling2d(net, pool_size=(2, 2), 
                        strides=2, 
                        name='max_pool_3')

net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_4_1')
net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_4_2')
net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_4_3')
net = tf.layers.max_pooling2d(net, pool_size=(2, 2), 
                        strides=2, 
                        name='max_pool_4')

net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_5_1')
net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_5_2')
net = tf.layers.conv2d(net, 512, (3, 3), 
                        strides=(1, 1), 
                        activation=tf.nn.relu,
                        padding='same', 
                        name='conv_5_3')
net = tf.layers.max_pooling2d(net, pool_size=(2, 2), strides=2, name='max_pool_5')

net = tf.reshape(net, [-1, 7 * 7 * 512], name='flat')
net = tf.layers.dense(net, 4096, activation=tf.nn.relu, name='dense_1')
net = tf.layers.dense(net, 4096, activation=tf.nn.relu, name='dense_2')
net = tf.layers.dense(net, 4096, activation=tf.nn.relu, name='dense_3')
logits = tf.layers.dense(net, NUM_CLASSES, name='logits_layer')

節點圖:
https://ithelp.ithome.com.tw/upload/images/20190927/20107299GhOhF6bfvJ.png

有了這兩個網路後,我們就可以攥寫一些評測的方法,第一個就是該模型的變數數量啦,透過 tf.get_collection 取得 tf.GraphKeys.TRAINABLE_VARIABLES 底下的權重(被訓練的變數數量),然後因為有時候會有 dense layer ,所以變數數量的計算要從每個 dimension 相乘再加總。

total_parameters = 0
for variable in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
   shape = variable.get_shape()
   variable_parameters = 1
   for dim in shape:
       variable_parameters *= dim.value
   total_parameters += variable_parameters
print(f'trainable parameters count: {total_parameters}')

再來,我們需要一個評測運算速度的方法,有個小地方要注意,tensorflow 在 run 第一次時,他的效率並不好,個人猜測其機制可能是真的需要運算時,物件才會初始化 (lazy loading),因此我們不將第一次的計算列入評測,以下我們讓模型推論10次,然後把總花費的時間印出。

TIMES = 10
image = cv2.imread('../05/ithome.jpg')
image = cv2.resize(image, (input_node.shape[1], input_node.shape[2]))
image = np.expand_dims(image, 0)

with tf.Session() as sess:
   sess.run(tf.global_variables_initializer())
   sess.run(tf.local_variables_initializer())

   sess.run(logits, feed_dict={input_node: image})  # <-- first time is slow

   start = timeit.default_timer()
   for _ in range(0, TIMES):
       sess.run(logits, feed_dict={input_node: image})

   print(f'cost time:{(timeit.default_timer() - start)} sec')

接著,我們就可以開始進行測試。

Alexnet:
https://ithelp.ithome.com.tw/upload/images/20190927/20107299MnVDFcSJvE.png

VGG16:
https://ithelp.ithome.com.tw/upload/images/20190927/20107299tYX7Dfwm2n.png

可以觀察到 VGG16 模型變數比 alexnet 多兩倍多,而速度上 alexnet 比 VGG16 退段速度快了近 8 倍!

這邊還是要補充,模型變數的數量不一定是多者一定比較慢,所以這邊的評測速度先參考參考囉,還有這邊所測量出來的速度也還沒有經過優化,像 tensorflow 有特別針對 ARM 架構的 CPU 做像是 tflite 的優化!

github連結


上一篇
【18】tensorflow 訓練技巧:模型如何輕鬆又方便地做 regularization 篇
下一篇
【20】tensorflow 訓練技巧:觀念一次就搞懂 Gradient Descent, Momentum, Adagrad, RMSProp, Adam 五種 optimizer 差異篇
系列文
How to Train Your Model 訓模高手:我的 Tensorflow 個人使用經驗系列文31

尚未有邦友留言

立即登入留言