第 11 屆 iThome 鐵人賽

DAY 13

AI & Data

深度學習裡的冰與火之歌： Tensorflow vs PyTorch系列第 13 篇

Day 13: 使用 python C extension 來擴充 PyTroch

11th鐵人賽

renewang

2019-09-28 23:45:13

3140 瀏覽

分享至

致讀者，由於筆者的電腦遭遇到不幸的事件襲擊，所以在修復期間，仍會盡量發文，但文中的程式碼目前不會親自驗證，望各位讀者多體諒。各位讀者也可以繼續關注本文章，而獲得目前文章更新的狀態。

擴充 PyTorch

在之前的幾篇文章中，我們介紹了 TorchScript，這個模組可以幫助 Python 編譯成較快的版本，亦可以在 C++ 的執行環境中執行。另外，我們也看到了 PyTorch 提供了一個與 Python API 相似的 C++ frontend API。在今天的文章中，我們則是使用傳統 python 的作法，那即是用 python C api 和 setuptool/disutil 來編譯 C/C++ 原始碼為 dynamical library，供 python 直譯器執行期間 load 進 library，並在 python 的呼叫環境內呼叫這些 C extension，宛如呼叫 pure python function，只是用有較快的執行速度。

今天的文章大部分參考CUSTOM C++ AND CUDA EXTENSIONS，由於沒有 CUDA 的環境，所以不會時做這一部分的原始碼。

LLTM, or Long-Long-Term-Memory unit

假設我們要識做一個新的 LSTM unit，在這個 unit 中沒有 forget gate，以及使用 Exponential Linear Unit (ELU)，來作為內在狀態的啟動函式。在官方的網頁上可以看到 python 的實作。然而，python 的版本看起來直接易懂，但對於執行卻不是那麼有效率。在文中對於純 python 的實作，提到了兩個效能降低的原因。
一是，該 python code 將會逐一的呼叫每一個運算元，若多個運算元須要使用 CUDA 在 GPU 上作運算，則多次呼叫 CUDA Kernel 將會帶來一顯著的延遲。二則是，C++ 可以將相近的運算元給聚合起來，就如同對 CUDA Kernel invocation 的運算原作最佳化，fuse 部分性質相同的運算元，可以加快執行速度。

在撰寫 C++ 函式，我們可以使用 PyTorch backend C++ API，稱為 ATen 或 A TENsor library for C++11。這個
函式庫提供一些有關 Tensor 的運算。關於我們的 LLTM cell，相對應的 C++ 原始碼如下：

#include <vector>

std::vector<at::Tensor> lltm_forward(
    torch::Tensor input,
    torch::Tensor weights,
    torch::Tensor bias,
    torch::Tensor old_h,
    torch::Tensor old_cell) {
  auto X = torch::cat({old_h, input}, /*dim=*/1);

  auto gate_weights = torch::addmm(bias, X, weights.transpose(0, 1));
  auto gates = gate_weights.chunk(3, /*dim=*/1);

  auto input_gate = torch::sigmoid(gates[0]);
  auto output_gate = torch::sigmoid(gates[1]);
  auto candidate_cell = torch::elu(gates[2], /*alpha=*/1.0);

  auto new_cell = old_cell + candidate_cell * input_gate;
  auto new_h = torch::tanh(new_cell) * output_gate;

  return {new_h,
          new_cell,
          input_gate,
          output_gate,
          candidate_cell,
          X,
          gate_weights};
}

很不幸的，不如 python 的 API 可以自動產生一個 backward 函式，使用 backend 謝成的 LLTM 必須要自行撰寫 backward 的部分，而原始碼則如下：

// tanh'(z) = 1 - tanh^2(z)
torch::Tensor d_tanh(torch::Tensor z) {
  return 1 - z.tanh().pow(2);
}

// elu'(z) = relu'(z) + { alpha * exp(z) if (alpha * (exp(z) - 1)) < 0, else 0}
torch::Tensor d_elu(torch::Tensor z, torch::Scalar alpha = 1.0) {
  auto e = z.exp();
  auto mask = (alpha * (e - 1)) < 0;
  return (z > 0).type_as(z) + mask.type_as(z) * (alpha * e);
}

std::vector<torch::Tensor> lltm_backward(
    torch::Tensor grad_h,
    torch::Tensor grad_cell,
    torch::Tensor new_cell,
    torch::Tensor input_gate,
    torch::Tensor output_gate,
    torch::Tensor candidate_cell,
    torch::Tensor X,
    torch::Tensor gate_weights,
    torch::Tensor weights) {
  auto d_output_gate = torch::tanh(new_cell) * grad_h;
  auto d_tanh_new_cell = output_gate * grad_h;
  auto d_new_cell = d_tanh(new_cell) * d_tanh_new_cell + grad_cell;

  auto d_old_cell = d_new_cell;
  auto d_candidate_cell = input_gate * d_new_cell;
  auto d_input_gate = candidate_cell * d_new_cell;

  auto gates = gate_weights.chunk(3, /*dim=*/1);
  d_input_gate *= d_sigmoid(gates[0]);
  d_output_gate *= d_sigmoid(gates[1]);
  d_candidate_cell *= d_elu(gates[2]);

  auto d_gates =
      torch::cat({d_input_gate, d_output_gate, d_candidate_cell}, /*dim=*/1);

  auto d_weights = d_gates.t().mm(X);
  auto d_bias = d_gates.sum(/*dim=*/0, /*keepdim=*/true);

  auto d_X = d_gates.mm(weights);
  const auto state_size = grad_h.size(1);
  auto d_old_h = d_X.slice(/*dim=*/1, 0, state_size);
  auto d_input = d_X.slice(/*dim=*/1, state_size);

  return {d_old_h, d_input, d_weights, d_bias, d_old_cell};
}

有了 backward 和 forward funciton，接著我們將要把這兩個 C++ 函式“包裝”起來成為 torch.autograd.Function 物件，使在 python 使也可以呼叫這兩個函式。

Python binding

這裏要談的就是如何“包裝”我們的純 C++ 函式，使這些函式可以透過 python 來呼叫。有非常多的方式可以用來“包裝” C++ 函式，包括了用手刻 Python C API，圈於這部分大家可以到 python 的官方網站。
但是手刻的方式，需要程式設計者不斷地寫相似的程式碼，所以就有了 boost.python 這個第三方提供的函式庫，專門產生“包裝”原始碼讓 python 直譯器可以閱讀。然而由於 boost.python 需要安裝的 headers 或 library 很多，所以另外一個比較輕巧的版本，稱為 pybind11 就出現了，也成為 PyTorch 官方使用的產生 glue code 的函式庫。
我們可以看到，如何使用 pybind11，extension 模組內的方法，如下

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("forward", &lltm_forward, "LLTM forward");
  m.def("backward", &lltm_backward, "LLTM backward");
}

接著將下面的原始碼寫在 setup.py 中，並使用 setuptool 來編譯這個模組，就大功告成啦！：

Extension(
   name='lltm_cpp',
   sources=['lltm.cpp'],
   include_dirs=cpp_extension.include_paths(),
   language='c++')

Day 12: PyTorch C++ front-end API

Day 14: 使用 TorchScript 來擴充 PyTorch

系列文

深度學習裡的冰與火之歌： Tensorflow vs PyTorch 共 31 篇

RSS系列文訂閱系列文

87 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19864 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙