DAY 25
0
Big Data

## 今日目標

• 了解 RNN
• 用 MNIST 訓練 RNN
• 觀察 RNN 訓練的情形以及結果

Github Ipython Notebook 好讀完整版

## Introduction

Reccurent Neural Network 簡稱 RNN．跟之前提到的 CNN (找出特徵)，Autoencoder (降維 重建) 不同．它關注的是 時間序列 有關的問題，舉個例子，一篇文章中的文字會是跟前後文有前因後果的，而如果想要製作一個文章產生器，就會需要用到 RNN．

## MNIST Test with RNN

### Configuraions

``````n_input = 28 # MNIST data input (image shape: 28*28)
n_steps = 28 # steps
n_hidden = 128 # number of neurons in fully connected layer
n_classes = 10 # (0-9 digits)

x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

weights = {
"w_fc" : weight_variable([n_hidden, n_classes], "w_fc")
}
biases = {
"b_fc" : bias_variable([n_classes], "b_fc")
}
``````

### Adjust input x to RNN

• 先把 x 交換維度成 `n_step, None, n_input`
``````x_transpose = tf.transpose(x, [1, 0, 2])
print("x_transpose shape: %s" % x_transpose.get_shape())
``````
``````x_transpose shape: (28, ?, 28)
``````
• 再來變成 `n_step * None, n_input`
``````x_reshape = tf.reshape(x_transpose, [-1, n_input])
print("x_reshape shape: %s" % x_reshape.get_shape())
``````
``````x_reshape shape: (?, 28)
``````
• 最後會把它切成長度為 n_steps 的 list，其中第 i 個元素就是對應第 i 個 step．每個元素會有 None x n_inputs
``````x_split = tf.split(0, n_steps, x_reshape)
print("type of x_split: %s" % type(x_split))
print("length of x_split: %d" % len(x_split))
print("shape of x_split[0]: %s" % x_split[0].get_shape())
``````
``````type of x_split: <type 'list'>
length of x_split: 28
shape of x_split[0]: (?, 28)
``````

``````basic_rnn_cell = rnn_cell.BasicRNNCell(n_hidden)
h, states = rnn.rnn(basic_rnn_cell, x_split, dtype=tf.float32)
print("type of outputs: %s" % type(h))
print("length of outputs: %d" % len(h))
print("shape of h[0]: %s" % h[0].get_shape())
print("type of states: %s" % type(states))
``````
``````type of outputs: <type 'list'>
length of outputs: 28
shape of h[0]: (?, 128)
type of states: <class 'tensorflow.python.framework.ops.Tensor'>
``````

### fully connnected layer

``````h_fc = tf.matmul(h[-1], weights['w_fc']) + biases['b_fc']
y_ = h_fc
``````

### cost function

``````cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(h_fc, y))
``````

### accuracy function

``````correct_prediction = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

``````

### Training

``````batch_size = 100
init_op = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init_op)

variables_names =[v.name for v in tf.trainable_variables()]

for step in range(5000):
batch_x, batch_y = mnist.train.next_batch(batch_size)
batch_x = np.reshape(batch_x, (batch_size, n_steps, n_input))
cost_train, accuracy_train, states_train, rnn_out = sess.run([cost, accuracy, states, h[-1]], feed_dict = {x: batch_x, y: batch_y})
values = sess.run(variables_names)
rnn_out_mean = np.mean(rnn_out)
for k,v in zip(variables_names, values):
if k == 'RNN/BasicRNNCell/Linear/Matrix:0':
w_rnn_mean = np.mean(v)

if step < 1500:
if step % 100 == 0:
print("step %d, loss %.5f, accuracy %.3f, mean of rnn weight %.5f, mean of rnn out %.5f" % (step, cost_train, accuracy_train, w_rnn_mean, rnn_out_mean))
else:
if step%1000 == 0:
print("step %d, loss %.5f, accuracy %.3f, mean of rnn weight %.5f, mean of rnn out %.5f" % (step, cost_train, accuracy_train, w_rnn_mean, rnn_out_mean))
optimizer.run(feed_dict={x: batch_x, y: batch_y})

``````
``````step 0, loss 2.32416, accuracy 0.090, mean of rnn weight -0.00024, mean of rnn out -0.00735
step 100, loss 1.64264, accuracy 0.380, mean of rnn weight -0.00052, mean of rnn out -0.09297
step 200, loss 1.13360, accuracy 0.600, mean of rnn weight 0.00075, mean of rnn out 0.00324
step 300, loss 1.03078, accuracy 0.670, mean of rnn weight 0.00082, mean of rnn out -0.00883
step 400, loss 1.29169, accuracy 0.510, mean of rnn weight 0.00108, mean of rnn out 0.00112
step 500, loss 1.48408, accuracy 0.420, mean of rnn weight 0.00160, mean of rnn out -0.01736
step 600, loss 1.43396, accuracy 0.570, mean of rnn weight 0.00256, mean of rnn out -0.05415
step 700, loss 2.06715, accuracy 0.350, mean of rnn weight 0.00297, mean of rnn out -0.04546
step 800, loss 1.53593, accuracy 0.390, mean of rnn weight 0.00282, mean of rnn out 0.00934
step 900, loss 1.58583, accuracy 0.370, mean of rnn weight 0.00266, mean of rnn out 0.01959
step 1000, loss 1.36978, accuracy 0.470, mean of rnn weight 0.00299, mean of rnn out 0.04775
step 1100, loss 2.12206, accuracy 0.360, mean of rnn weight 0.00161, mean of rnn out -0.00393
step 1200, loss 1.50930, accuracy 0.470, mean of rnn weight 0.00138, mean of rnn out -0.01369
step 1300, loss 1.39899, accuracy 0.520, mean of rnn weight 0.00152, mean of rnn out 0.00569
step 1400, loss 1.44504, accuracy 0.430, mean of rnn weight 0.00158, mean of rnn out -0.00496
step 2000, loss 2.32795, accuracy 0.170, mean of rnn weight 0.00122, mean of rnn out 0.09313
step 3000, loss 2.43317, accuracy 0.100, mean of rnn weight 0.00119, mean of rnn out 0.07819
step 4000, loss 2.42197, accuracy 0.110, mean of rnn weight 0.00111, mean of rnn out 0.07806
``````
``````cost_test, accuracy_test = sess.run([cost, accuracy], feed_dict={x: np.reshape(mnist.test.images, [-1, 28, 28]), y: mnist.test.labels})
print("final loss %.5f, accuracy %.5f" % (cost_test, accuracy_test) )
``````
``````final loss 2.41618, accuracy 0.10320
``````

### Result

``````print h[-1].eval(feed_dict={x: np.reshape(mnist.test.images, [-1, 28, 28]), y: mnist.test.labels})[0,:]
``````
``````[ 0.99999559  1.          1.         -1.         -0.99998862  1.          1.
-1.          1.          1.         -0.99999774  1.         -0.99999982
1.          1.          1.          1.          0.9999997   0.99999994
0.9999994  -0.99999225 -1.         -1.          1.         -1.          1.
-0.99999952  1.          0.99999928 -1.          0.99674076  1.         -1.
-0.99999458 -0.99956894  1.          0.99983639  0.99999982  0.99956954
-0.99999893  1.         -0.99999994  1.         -0.99997771  1.          1.
1.         -1.00000012 -1.          1.         -0.99970055  0.99998623
-0.99999619 -1.         -0.99960238  0.99785262 -1.          0.99962986
-1.          1.         -1.         -1.          1.         -1.          1.
0.99979544  1.          1.          1.          1.          1.          1.
1.          1.          1.          1.          0.99939382 -1.         -1.
-0.99976331 -0.99999881 -1.         -0.99999976 -1.         -0.99999964
1.         -1.         -0.99999934  0.99999392  0.99910891 -0.99995011
-1.         -1.         -1.         -0.99998069  0.99999958 -0.99999964
1.         -1.          0.99999958  1.         -1.          1.
-0.99998337  0.99999732  1.          1.          1.         -0.99997371
-1.         -0.999376    0.99992633  0.9999997   1.         -1.
0.99999499  1.         -1.         -1.          1.         -1.
-0.99995339  0.99957949 -1.         -0.999933   -0.99999905 -0.99999183
1.        ]
``````

### LSTM

``````lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
h, states = rnn.rnn(lstm_cell, x_split, dtype=tf.float32)
``````
``````step 0, loss 2.30415, accuracy 0.110, mean of lstm weight 0.00013, mean of lstm out 0.01331
step 100, loss 0.31279, accuracy 0.880, mean of lstm weight -0.00529, mean of lstm out -0.00088
step 200, loss 0.17318, accuracy 0.940, mean of lstm weight -0.00648, mean of lstm out 0.00784
step 300, loss 0.15617, accuracy 0.950, mean of lstm weight -0.00778, mean of lstm out -0.00153
step 400, loss 0.08717, accuracy 0.980, mean of lstm weight -0.00872, mean of lstm out 0.00838
step 500, loss 0.13275, accuracy 0.960, mean of lstm weight -0.00991, mean of lstm out 0.00275
step 600, loss 0.11011, accuracy 0.970, mean of lstm weight -0.01076, mean of lstm out 0.00076
step 700, loss 0.12507, accuracy 0.960, mean of lstm weight -0.01037, mean of lstm out 0.00274
step 800, loss 0.09086, accuracy 0.970, mean of lstm weight -0.01050, mean of lstm out 0.00409
step 900, loss 0.05551, accuracy 0.990, mean of lstm weight -0.01066, mean of lstm out -0.00078
step 1000, loss 0.03132, accuracy 0.990, mean of lstm weight -0.01064, mean of lstm out -0.00035
step 1100, loss 0.06873, accuracy 0.980, mean of lstm weight -0.01098, mean of lstm out -0.00248
step 1200, loss 0.08930, accuracy 0.980, mean of lstm weight -0.01073, mean of lstm out -0.00918
step 1300, loss 0.10252, accuracy 0.980, mean of lstm weight -0.01027, mean of lstm out -0.00038
step 1400, loss 0.00594, accuracy 1.000, mean of lstm weight -0.01041, mean of lstm out 0.00762
step 2000, loss 0.04595, accuracy 0.990, mean of lstm weight -0.01243, mean of lstm out 0.00263
step 3000, loss 0.12044, accuracy 0.960, mean of lstm weight -0.01405, mean of lstm out -0.00691
step 4000, loss 0.03068, accuracy 0.990, mean of lstm weight -0.01488, mean of lstm out -0.00371

final loss 0.06450, accuracy 0.98390
``````

Bingo!可以看到準確率提升到非常高，看來 LSTMs 真的解決了 RNN 的缺點．

## 小結

### 問題

• 找找看 tensorflow 有沒有顯示各階段 gradient 的函數
• 看看 RNN 的 backpropagation

## 學習資源連結

tensorflow 學習筆記30