【Day14】 Pytorch 轉 Tensorflow

13th鐵人賽

Rorschach

2021-09-14 15:18:17

4523 瀏覽

分享至

Part1 - Function

我們從幾個常用的操作開始吧！

型態操作

# random np array - shape = 1,2,2
test = np.random.rand(1,2,2)
# 轉成 torch 可以操作的 Tensor
nn_v = torch.from_numpy(test)
# 轉成 tf 可以操作的 Tensor
tf_v = tf.convert_to_tensor(test)
#  從 torch tensor 轉回 np
nn_v.numpy()
#  從 tf tensor 轉回 np
tf_v.numpy()

在 pytorch 裡你時常會看到 XXX.to(device) 這樣的操作，其中 device = "cpu" 或者是 "cuda:0"

model.to(device)

這句話的意思是 從一開始讀取數據的地方 "複製一份到指定 device" 上，放在哪個 device 上運算的時候就是在那邊運行，在訓練 model 的時候通常都會指定到 gpu 上，但在 Inference 的時候想轉回使用 numpy() 這類型數據的話，就只能在 cpu 上操作，如果 torch tensor 還在 gpu 上的話，是不能夠像上面一樣直接透過 .numpy() 轉回來的。

這時候只需要把數據從 device 上 detach() 再指定回 cpu 就可以了

nn_v.detach().cpu().numpy()

拓展維度，你可以感受到 tf 的操作跟 np 一模一樣

# shape(1,2,2) -> (1,1,2,2)
# 在 torch 裡頭 axis 都寫作 dim
nn_v.unsqueeze(dim=0)
tf.expand_dims(tf_v,axis=0)
np.expand_dims(test,axis=0)

Transpose 也是

# 意思是 0 跟 1 維互換
nn_v.transpose(0,1)
# 要把交換的維度明確地寫上去
tf.transpose(tf_v,(1,0,2))
np.transpose(test,(1,0,2))

Expand (broadcast_to)

# shape(1,2,2) -> (3,2,2)
# torch 允許用 -1 來表示照原本的 shape
nn_v.expand(3,-1,-1)
# tf 要寫出來，這點 np 也是一樣
tf.broadcast_to(tf_v,[3,2,2])
np.broadcast_to(test,[3,2,2])

Concatenate 的話變成三個的操作方式都一樣

torch.cat([nn_v,nn_target],dim=0)
tf.concat([tf_v,tf_target],axis=0)
np.concatenate([test,target],axis=0)

Part2 - Layers

接著我們要證明一下 AutoVC 裡頭用的 layers 可以在 TF 上得到幾乎一樣的結果
AutoVC 主要用的 layers 只有 4 種：
1. LinearNorm -> 自定義權重初始化的 nn.Linear
2. ConvNorm -> 自定義權重初始化的 nn.Conv1d
3. LSTM -> nn.LSTM
4. BatchNormalize -> nn.BatchNorm1d
驗證方法 -> 讓雙方的 weights init 為 1, bias init 為 0

LinearNorm = Dense

# In Pytorch
class LinearNorm(torch.nn.Module):
    def __init__(self, in_dim, out_dim, bias=True, w_init_gain='linear'):
        super(LinearNorm, self).__init__()
        self.linear_layer = torch.nn.Linear(in_dim, out_dim, bias=bias)
        torch.nn.init.constant_(self.linear_layer.weight.data, 1)
        torch.nn.init.constant_(self.linear_layer.bias.data, 0)
    def forward(self, x):
        return self.linear_layer(x)

結果

ConvNorm = Conv1d 加上 transpose，以一組三維數據(EX：shape(1,2,2)) 而言 pytorch 的操作維度是 axis = 1，TF 是 axis = 2

class ConvNorm(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=5, stride=1,
                 padding=2, dilation=1, bias=True, w_init_gain='linear'):
        super(ConvNorm, self).__init__()
        if padding is None:
            assert(kernel_size % 2 == 1)
            padding = int(dilation * (kernel_size - 1) / 2)

        self.conv = torch.nn.Conv1d(in_channels, out_channels,
                                    kernel_size=kernel_size, stride=stride,
                                    padding=padding, dilation=dilation,
                                    bias=bias)

        torch.nn.init.constant_(self.conv.weight.data, 1)
        torch.nn.init.constant_(self.conv.bias.data, 0)

    def forward(self, signal):
        conv_signal = self.conv(signal)
        return conv_signal

結果在注意 padding=same 的情況下可以一樣，想了解他們 padding 的話可以看這篇，不過我們這次做的 padding 都會等於 same。

LSTM 這邊的問題是它們的 Bidirectional 初始化方式不同，我找了半天也沒找到初始化的方法 (也許是無法更改)，不過它們是做一樣的事，最後為了更方便檢查我把它們的值加起來。

最後的 BatchNormalization 花了我不少時間研究，後來我發現他們在 "訓練" 的時候應是一樣的東西，不過我們先看一下直接呼叫會發生什麼事。

torch.nn.BatchNorm1d 的 init 方式就是 tf 預設的 init 方式

也有試過把 bc 變成 eval() 的模式，得到的結果一樣不同，但是查了很久還是沒有結果 QAQ。

最後乾脆自己寫一個 BatchNormalization 讓它可以在 TF 上得到跟 Pytorch 幾乎一樣的結果

class tfBatchNormalize1d(tf.keras.layers.Layer):
    def __init__(self, num_features,size, epsilon,name):
        super(tfBatchNormalize1d, self).__init__()
        one_init = tf.keras.initializers.Ones()
        zero_init = tf.keras.initializers.Zeros()
        v_o = one_init(shape=(size,))
        v_z = zero_init(shape=(size,))

        self.gamma = tf.Variable(initial_value=v_o, name =f"gamma_{name}")
        self.beta = tf.Variable(initial_value=v_z, name =f"beta_{name}")
        self.num_features = num_features
        self.epsilon = epsilon   

    def call(self, x):
        mean, variance = tf.nn.moments(x, [0, 2])
        n = tf.cast(tf.size(x)/tf.shape(x)[1],tf.float32)  

        ## 這裡我有點不太確定他們怎麼處理超出的情況
        tmp = None
        for i in range(self.num_features):
            if self.num_features+i > self.num_features:
                break
            res = (x - mean[None, i:self.num_features+i, None]) / (tf.sqrt(variance[None, i:self.num_features+i,  None] + self.epsilon))
            if tmp == None:
                tmp = res
            else:
                tf.concat([tmp,res],axis=0)      
        x = tmp  
        return x*self.gamma + self.beta

結果就可以得到跟 Pytorch 差不多的結果，至少小數點後 4 位會一樣

Part3 - Loss

AutoVC 用了以下兩種 Loss Function

import torch.nn.functional as F
F.mse_loss.(y_true, y_predict)
# 就是 MAE
F.l1_loss(y_true, y_predict)

在 TF 裡等價於

def mse_loss(y_true, y_pred):
    err = tf.keras.losses.mean_squared_error(y_true, y_pred)
    return tf.reduce_mean(err)

def l1_loss(y_true, y_pred):
    err = tf.keras.losses.mean_absolute_error(y_true, y_pred)
    return tf.reduce_mean(err)

Part4 - Gradient

在 pytorch 裡頭更新 Gradient 的方法是

g_optimizer = torch.optim.Adam(model.parameters(), 0.0001)
for i in range(num_iters):
        ...
        ...
        ...
   
    g_loss = l1_loss + mse_loss
    g_optimizer.zero_grad()
    g_loss.backward()
    g_optimizer.step()

在 TF 裡 optimizer 是固定 zero_grad 的

g_optimizer = tf.keras.optimizers.Adam(0.0001)
for i in range(num_iters):
         ...
         ...
    with tf.GradientTape() as autovc_tape:
            ...
            ...
        g_loss = l1_loss + mse_loss
    gradients_of_autovc = autovc_tape.gradient(g_loss,model.trainable_variables)
    g_optimizer.apply_gradients(zip(gradients_of_autovc,model.trainable_variables))