Day 18: Tensorflow 2.0 再造訪 tf.function

11th鐵人賽

renewang

2019-10-03 20:55:58

8279 瀏覽

分享至

Functions, not Sessions

tf1.0 vs tf2.0

上圖引自 RFC：TF 2.0: Functions, not Sessions（見 Reference 1）。

在 Tensorflow 2.0 的設計原則為：

Python 函式即是一個計算圖（Python functions as Graphs）：基於這個原則 tf.Session 由 tf.function 替換，tf.placeholder 為 tf.function 包覆函式的引數。這樣做的好處是程式邏輯與 Tensorflow runtime 計算結果相符合，在 1.x，程式執行邏輯須由使用者先建立計算圖，並提供全部或部分計算圖給 tf.Session 物件執行。
程式執行語意和預先設定的相依關係（Program-order semantics / Control dependencies）：在過去，tensorflow 的使用者需要維持兩個運算模型，一個是 python 直譯器，另外一個則是 tf.Session 所持有的運算圖。當 tf.Session 需要同時對 stateful 的 tf.Variable 執行讀與寫，兩者的運算模型分歧，後者呈不確定的狀態，因此不一定符合程式寫作的邏輯。過去， tensorflow 1.x 是藉由使用者透過tf.control_dependencies 函式對靜態計算圖做註釋，先跑哪一個運算元，舉例說明，讀還是寫。

用一個例子來解釋原則 2。首先，一個 Tensorflow 1,x 的程式碼：

v = tf.Variable(1.0) # 建立一個 Variable 初始值為 1.9
init_op = tf.global_variables_initializer() # 建立初始化全域變數值的運算元
assign_op = v.assign(2.0) #建立寫入值為 2.0 的運算元
read = v.read_value() #讀出 Variable v 目前的數值

with tf.Session() as sess: # 建立一個 Session 物件
  sess.run(init_op) # Session 物件執行初始化全域變數值的運算元
  val = sess.run(read) # Session 物件執行讀取運算元
  print(val) # 將會輸出 1.0，因為 assign_op 還未被執行
  val = sess.run([read, assign_op])[0] # Session 物件執行讀取和寫入運算元
  print(val)   # 非確定行為可能輸出 1.0 或 2.0,

若在 2.0 則可以先用 tf.function 封裝一到多個運算元，在 tf.function 內則的運算次序不會被保證與程式撰寫一樣，因為在編譯期間最佳化，但 tf.function的輸出則可以保證一致。

v = tf.Variable(1.0)
@tf.function
def f():
  v.assign(2.0)
  return v.read_value()

print(f()) # 永遠印出 2.0.

藉著一個簡單的例子，我們瞭解了 tf.function 的使命，接著...

什麼是 `tf.function`

今天要進到 Tensorflow 2.0 的核心，也就是 tf.function。tf.function 可以看作 PyTorch 的 torch script，主要是處理在 function-level 的計算封裝。任何函式經過 tf.function decorator 包覆後，該函式的語法會被 tf.function：

解析 python 語法後翻譯為和 tf.Tensor 相容的語法。比如說， if ... else 則會被翻譯為 tf.cond。
編譯語法為靜態計算圖，並就該計算圖做優化。

現就官方網站的tf.function介紹，Better performance with tf.function and AutoGraph提供的例子來解釋如何使用 tf.function。

直接使用 `tf.function` decorate target function

# non-eager
@tf.function 
def simple_nn_layer(x, y):
  return tf.nn.relu(tf.matmul(x, y))

x = tf.random.uniform((3, 3)) # eager code
y = tf.random.uniform((3, 3)) # eager code

simple_nn_layer(x, y)
# => <tf.Tensor: id=23, shape=(3, 3), dtype=float32, numpy=
#array([[1.2809252 , 0.44859692, 0.9838194 ],
#       [0.7492161 , 0.22386347, 0.36669958],
#       [1.3041403 , 0.5147673 , 1.0727234 ]], dtype=float32)>

可以發現被 tf.function decorate 的 target function，也就是 simple_nn_layer 的輸出是一個 tf.Tensor，因為預設的 eager mode，所以包含了 numpy buffer，以 numpy 屬性附著在輸出tf.Tensor。

`tf.function` 巢狀 decoration

tf.function 有一個良好特徵，那就是它會主動地將 target function 裡呼叫的函式都進行 decoration，所以使用者不必親自回溯，並一一的加上 tf.function decoration。

def linear_layer(x): 
  return 2 * x + 1

print("linear_layer:", linear_layer)
#=> linear_layer: <function linear_layer at 0x7fd5f2baf1e0>

@tf.function
def deep_net(x):
  return tf.nn.relu(linear_layer(x))

print("deep_net:", deep_net)
#=> deep_net: <tensorflow.python.eager.def_function.Function object at 0x7fd5f2bceeb8>

deep_net(tf.constant((1, 2, 3)))
#=> <tf.Tensor: id=47, shape=(3,), dtype=int32, numpy=array([3, 5, 7], dtype=int32)>

由上面的程式碼我們可以看到，被 tf.function 直接 decorate 的 python 函式會被tf.function 轉型為 eager.def_function.Function，如：deep_net。雖然在 deep_net 程式主體被呼叫的 linear_layer 仍是 python function，但在deep_net 中被呼叫，也會被解析編譯成為計算圖。

大致來說，若 tf.function 的 target function 是一些非常小的運算元們，其速度就會比 eager code 還快速，但若是包含像 convolution 或 lstm 這樣計算較為昂貴的計算元，則兩者差距並不大。大家可以到官方文件中看範例。

但要注意的是所有的 tf.keras.layers 類別物件，並沒有預設被 tf.function decorate。同樣的 tf.nn.*也沒有。這些函式如果要被 tf.function 使用，可以用另外一個 python 函式包覆，最後在定義上加上tf.function decorator，程式碼如下：

lstm_cell = tf.keras.layers.LSTMCell(10)
print(lstm_cell)
#=> <tensorflow.python.keras.layers.recurrent_v2.LSTMCell object at 0x7fd5f23513c8>

@tf.function
def lstm_fn(input, state):
  return lstm_cell(input, state)

print(lstm_fn)
#=> <tensorflow.python.eager.def_function.Function object at 0x7fd5f2321128>

input = tf.zeros([10, 10])
state = [tf.zeros([10, 10])] * 2
# warm up
lstm_cell(input, state); lstm_fn(input, state)
print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))
print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))
# => eager lstm: 0.007967374000145355
#    function lstm: 0.004763247000028059

AutoGraph is Behind the Scenes

而 tf.function 所做的是將一般的 python function 做轉譯成計算圖，而負責這層轉譯工作的則是 tf.autograph 模組。透過這個模組所提供的 to_code 方法，我們可以檢視轉譯後的結果。

@tf.function
def sum_even(items):
  s = 0
  for c in items:
    if c % 2 > 0:
      continue
    s += c
  return s

#sum_even(tf.constant([10, 12, 15, 20]))
print(tf.autograph.to_code(sum_even.python_function))
#=>def tf__sum_even(items):
#  do_return = False
#  retval_ = ag__.UndefinedReturnValue()
#  with ag__.FunctionScope('sum_even', 'sum_even_scope', #ag__.ConversionOptions(recursive=True, user_requested=True, #optional_features=(), internal_convert_user_code=True)) as #sum_even_scope:
#...
# return ag__.retval(retval_)

藉著呼叫被 tf.function decorated 過的 eager.def_function.Function 物件的python_function 屬性，我們仍可以 access 原 python funtion。將這個 python function 傳入 tf.autograph.to_code() 函式後，則可以重建在 low-level 的原始碼轉譯。讀者要注意的是，上方的 tf.autograph.to_code 的 printout 只有秀出一小片段，因為該函式實在複雜，大家可以前往官方網站或使用網站所提供的 colab notebook 來重建全部輸出。
在這裏則秀出另一個 to_code的結果，用的 python function 則是 deep_net，我們將輸出 deep_net 的 to_code 結果，證明 linear_layer也被編譯，而不是文件上如此說而已。

print(tf.autograph.to_code(deep_net.python_function))
#=>def tf__deep_net(x):
#  do_return = False
#  retval_ = ag__.UndefinedReturnValue()
#  with ag__.FunctionScope('deep_net', 'deep_net_scope', #ag__.ConversionOptions(recursive=True, user_requested=True, #optional_features=(), internal_convert_user_code=True)) as #deep_net_scope:
#    do_return = True
#    retval_ = #deep_net_scope.mark_return_value(ag__.converted_call(tf.nn.relu, #deep_net_scope.callopts, (ag__.converted_call(linear_layer, #deep_net_scope.callopts, (x,), None, deep_net_scope),), None, #deep_net_scope))
#  do_return,
#  return ag__.retval(retval_)

可以看到 ag__.converted_call(tf.nn.relu,...) 是動態轉換 tf.nn.relu 而 ag__.converted_call(linear_layer,... 則是動態轉換 linear_layer。

其他

關於 tf.function 的其他細節，包括了：

使用 tf.config.run_functions_eagerly(True) 來除錯：由於 python built-in debugger, aka pdb，無法在 tf.function 所編譯的 **graph mode** 中使用，所以對於要使用 pdb 等 python debugger，需要依賴tf.config.run_functions_eagerly(True)enable **eager mode**。在 enable **eager mode** 後，使用者就可以在需要除錯的地方插入pdb.trace()，並使用 pdb 除錯，最後則依賴 tf.config.experimental_run_functions_eagerly(False)` 關閉 eager mode。使用的 pattern 可以簡化如下：

tf.config.experimental_run_functions_eagerly(True)

# f 是一個 tf.function decorated function
f(tf.constant(1)) 

tf.config.experimental_run_functions_eagerly(False)