這幾天玩下來,大家應該都有發現到一個問題,就是 Transformer 的效能不是太好,尤其你要在大吞吐量下運作,想必是非常的耗費運算資源。更不用說在不久的將來,元宇宙即將發展起來,全世界的人如果都進到元宇宙裡面,那麼自然語言處理的效能應該要更好,否則設備會支撐不住的。
很幸運的是,我們有 Hugging Face ,提供了一個超好用的性能優化 library,可以極高程度地提高在目標硬體上訓練和運行模型的效率。這個 library 就是 Optimum。那麼我們現在就打開你的 Azure Machine Learning ,來玩玩看 Optimum 吧!
pip install optimum
from transformers import AutoTokenizer, pipeline
from transformers import AutoModelForQuestionAnswering
model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)
question = "What's my name?"
context = "My name is Ko Ko and I live in Taiwan."
result = onnx_qa(question, context)
print(result)
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering
model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)
question = "What's my name?"
context = "My name is Ko Ko and I live in Taiwan."
result = onnx_qa(question, context)
print(result)
有沒有明顯感覺變快了呢!這就是得利於 ONNX Runtime 的協助。
No module named 'tensorflow.python.keras.engine.keras_tensor'
,那就先來升級一下你的 Tensorflow 到 2.3:pip install tensorflow-gpu==2.3.0
,然後 restart kernel 再重跑一次吧!如下面兩張圖,我們可以看到一個原始版本花了4秒鐘,而 ONNX Runtime 下只花了1秒鐘,真的是速度變得飛快呢!!
from_transformers=True
這個參數。from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer
from pathlib import Path
model_name="deepset/roberta-base-squad2"
onnx_path = Path("onnx")
model = ORTModelForQuestionAnswering.from_pretrained(model_name, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.save_pretrained(onnx_path)
tokenizer.save_pretrained(onnx_path)
今天就是 Optium 的基本用法了,明天我們再來講如何更進一步的優化自己的 transformer 吧!