DAY 18
1
Big Data

## 今日目標

• 了解 Convolutional Autoencoder
• 實作 Deconvolutional layer
• 實作 Max Unpooling layer
• 觀察 code layer 以及 decoder

Github Ipython Notebook 好讀完整版

## Deconvolution

• x 維度: 28 * 28，channel: 1
• encoder layer1 維度: 14 * 14，channel: 16
• encoder_layer2 維度: 7 * 7，channel: 32
• decoder_layer1 維度: 14 * 14，channel: 16
• decoder_layer2 維度: 28 * 28，channel: 1
• x recontruct = decoder_layer2

(`tf.nn.conv2d_transpose` 的參數跟 tf.nn.conv2d 很像，只是要多一個 `output_shape`)

### Build convolution and deconvolution function

``````def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 2, 2, 1], padding = 'SAME')

def deconv2d(x, W, output_shape):
return tf.nn.conv2d_transpose(x, W, output_shape, strides = [1, 2, 2, 1], padding = 'SAME')
``````

### Build compute graph

``````tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape = [None, 784])
x_origin = tf.reshape(x, [-1, 28, 28, 1])

W_e_conv1 = weight_variable([5, 5, 1, 16], "w_e_conv1")
b_e_conv1 = bias_variable([16], "b_e_conv1")

W_e_conv2 = weight_variable([5, 5, 16, 32], "w_e_conv2")
b_e_conv2 = bias_variable([32], "b_e_conv2")

code_layer = h_e_conv2
print("code layer shape : %s" % h_e_conv2.get_shape())

W_d_conv1 = weight_variable([5, 5, 16, 32], "w_d_conv1")
b_d_conv1 = bias_variable([1], "b_d_conv1")
output_shape_d_conv1 = tf.pack([tf.shape(x)[0], 14, 14, 16])
h_d_conv1 = tf.nn.relu(deconv2d(h_e_conv2, W_d_conv1, output_shape_d_conv1))

W_d_conv2 = weight_variable([5, 5, 1, 16], "w_d_conv2")
b_d_conv2 = bias_variable([16], "b_d_conv2")
output_shape_d_conv2 = tf.pack([tf.shape(x)[0], 28, 28, 1])
h_d_conv2 = tf.nn.relu(deconv2d(h_d_conv1, W_d_conv2, output_shape_d_conv2))

x_reconstruct = h_d_conv2
print("reconstruct layer shape : %s" % x_reconstruct.get_shape())

``````
``````code layer shape : (?, 7, 7, 32)
reconstruct layer shape : (?, ?, ?, ?)
``````

### Build cost function

``````cost = tf.reduce_mean(tf.pow(x_reconstruct - x_origin, 2))
``````

### Training

``````sess = tf.InteractiveSession()
batch_size = 60
init_op = tf.global_variables_initializer()
sess.run(init_op)

for epoch in range(5000):
batch = mnist.train.next_batch(batch_size)
if epoch < 1500:
if epoch%100 == 0:
print("step %d, loss %g"%(epoch, cost.eval(feed_dict={x:batch[0]})))
else:
if epoch%1000 == 0:
print("step %d, loss %g"%(epoch, cost.eval(feed_dict={x:batch[0]})))
optimizer.run(feed_dict={x: batch[0]})

print("final loss %g" % cost.eval(feed_dict={x: mnist.test.images}))
``````
``````step 0, loss 0.103204
step 100, loss 0.0235892
step 200, loss 0.0167378
step 300, loss 0.0203425
step 400, loss 0.0175616
step 500, loss 0.0171841
step 600, loss 0.0155463
step 700, loss 0.0153204
step 800, loss 0.00345139
step 900, loss 0.00248451
step 1000, loss 0.00312759
step 1100, loss 0.00264636
step 1200, loss 0.00225495
step 1300, loss 0.00223552
step 1400, loss 0.00223038
step 2000, loss 0.0017688
step 3000, loss 0.00150994
step 4000, loss 0.00099031
final loss 0.000948159
``````

## Max Unpooling

The indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c.

1. 使用 Github Issue 討論中的方法，也就是放大兩倍後在固定的地方填值 (ex. 左上角)

2. 借用影像的 upsample 函數 `tf.image.resize_nearest_neighbor` 來做等比例放大，也就不會補 0．

• 在 encoder 中都是一個 convolutional layer 接一個 max pooling，因此 convolutional layer 的 strides 就調回為 1，只讓 max pooling 做降維．
• 經過測試以後 encoder 使用 relu；decoder 使用 sigmoid，才有比較好的還原效果，都使用 relu 會訓練失敗．

### Build helper functions

``````def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding = 'SAME')

def deconv2d(x, W, output_shape):
return tf.nn.conv2d_transpose(x, W, output_shape, strides = [1, 1, 1, 1], padding = 'SAME')

def max_unpool_2x2(x, output_shape):
out = tf.concat_v2([x, tf.zeros_like(x)], 3)
out = tf.concat_v2([out, tf.zeros_like(out)], 2)
out_size = output_shape
return tf.reshape(out, out_size)

def max_pool_2x2(x):
_, argmax = tf.nn.max_pool_with_argmax(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding = 'SAME')
pool = tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')
return pool, argmax
``````

### Build compute graph

``````tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape = [None, 784])
x_origin = tf.reshape(x, [-1, 28, 28, 1])

W_e_conv1 = weight_variable([5, 5, 1, 16], "w_e_conv1")
b_e_conv1 = bias_variable([16], "b_e_conv1")
h_e_pool1, argmax_e_pool1 = max_pool_2x2(h_e_conv1)

W_e_conv2 = weight_variable([5, 5, 16, 32], "w_e_conv2")
b_e_conv2 = bias_variable([32], "b_e_conv2")
h_e_pool2, argmax_e_pool2 = max_pool_2x2(h_e_conv2)

code_layer = h_e_pool2
print("code layer shape : %s" % code_layer.get_shape())

W_d_conv1 = weight_variable([5, 5, 16, 32], "w_d_conv1")
b_d_conv1 = bias_variable([1], "b_d_conv1")

# convolutional layer 不改變輸出的 shape
output_shape_d_conv1 = tf.pack([tf.shape(x)[0], 7, 7, 16])
h_d_conv1 = tf.nn.sigmoid(deconv2d(code_layer, W_d_conv1, output_shape_d_conv1))

# max unpool layer 改變輸出的 shape 為兩倍
output_shape_d_pool1 = tf.pack([tf.shape(x)[0], 14, 14, 16])
h_d_pool1 = max_unpool_2x2(h_d_conv1, output_shape_d_pool1)

W_d_conv2 = weight_variable([5, 5, 1, 16], "w_d_conv2")
b_d_conv2 = bias_variable([16], "b_d_conv2")

# convolutional layer 不改變輸出的 shape
output_shape_d_conv2 = tf.pack([tf.shape(x)[0], 14, 14, 1])
h_d_conv2 = tf.nn.sigmoid(deconv2d(h_d_pool1, W_d_conv2, output_shape_d_conv2))

# max unpool layer 改變輸出的 shape 為兩倍
output_shape_d_pool2 = tf.pack([tf.shape(x)[0], 28, 28, 1])
h_d_pool2 = max_unpool_2x2(h_d_conv2, output_shape_d_pool2)

x_reconstruct = h_d_pool2
print("reconstruct layer shape : %s" % x_reconstruct.get_shape())
``````
``````code layer shape : (?, 7, 7, 32)
reconstruct layer shape : (?, 28, 28, 1)
``````

### Training

``````step 0, loss 0.143627
step 100, loss 0.0930532
step 200, loss 0.0884409
step 300, loss 0.0928427
step 400, loss 0.0909152
step 500, loss 0.0826991
step 600, loss 0.0848275
step 700, loss 0.0810827
step 800, loss 0.0864824
step 900, loss 0.086182
step 1000, loss 0.0865941
step 1100, loss 0.0828469
step 1200, loss 0.091879
step 1300, loss 0.0909268
step 1400, loss 0.0917193
step 2000, loss 0.0869293
step 3000, loss 0.078727
step 4000, loss 0.0829563
final loss 0.0858374
``````

### Build helper functions

``````def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding = 'SAME')

def deconv2d(x, W, output_shape):
return tf.nn.conv2d_transpose(x, W, output_shape, strides = [1, 1, 1, 1], padding = 'SAME')

def max_pool_2x2(x):
_, argmax = tf.nn.max_pool_with_argmax(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding = 'SAME')
pool = tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')
return pool, argmax

def max_unpool_2x2(x, shape):
inference = tf.image.resize_nearest_neighbor(x, tf.pack([shape[1]*2, shape[2]*2]))
return inference
``````

### Training

``````step 0, loss 0.117823
step 100, loss 0.022254
step 200, loss 0.0202093
step 300, loss 0.0189887
step 400, loss 0.0180246
step 500, loss 0.0189967
step 600, loss 0.0178063
step 700, loss 0.0176837
step 800, loss 0.0180569
step 900, loss 0.0173936
step 1000, loss 0.0176715
step 1100, loss 0.0180319
step 1200, loss 0.0170217
step 1300, loss 0.0181365
step 1400, loss 0.0164201
step 2000, loss 0.0178452
step 3000, loss 0.0171906
step 4000, loss 0.0164161
final loss 0.0172661
``````

### Plot reconstructed images

Bingo!可以看到效果算不錯的重建影像，而且不會有前一種方法的稀疏情形．

## 今日心得

### 問題

• convolutional autoencoder 和 autoencoder mean square error 似乎低很多，why?

## 學習資源連結

tensorflow 學習筆記30

### 2 則留言

0
mangotree
iT邦新手 5 級 ‧ 2017-08-11 20:28:00

step 0, loss 0.190614
step 100, loss 0.109387
step 200, loss 0.109375
step 300, loss 0.120102
step 400, loss 0.104884
step 500, loss 0.112688
step 600, loss 0.113531
step 700, loss 0.108073
step 800, loss 0.117363
step 900, loss 0.103254
step 1000, loss 0.130862
step 1100, loss 0.105082
step 1200, loss 0.109608
step 1300, loss 0.108594
step 1400, loss 0.116146
step 2000, loss 0.109568
step 3000, loss 0.113789
step 4000, loss 0.118909

allenyl iT邦新手 5 級‧ 2019-01-03 17:03:22 檢舉

0
yichung279
iT邦見習生 0 級 ‧ 2018-05-06 12:15:27

allenyl iT邦新手 5 級‧ 2019-01-08 13:08:05 檢舉