隨著大數據,機器學習到深度學習領域的不斷演變,多層人工神經網路在許多人眼中已經逐漸化為了處理各種複雜問題的可選工具之一。而在深度學習中,循環神經網絡(Recurrent Neural Networks, RNN)在序列資料處理、自然語言處理,理解和時間序列分析等諸多方向顯示出了其潛力。也因如此,RNN算法方面不會靜止於一個傳統架構,它在許多學者多年的發展和改進中,衍生出了許多的變體和延伸,像是長短時記憶網絡(LSTM)跟門控循環單元(GRU),以及後天會提到的Transformer其實在某些地方也借鑒了此算法的某些概念。
今天這篇Blog主要會深入講解RNN及其變體的演變歷程,從最早的Vanilla RNN到近幾年較新的算法以及觀念。此篇文章將對每種RNN算法的原理和程式碼實現進行深度的解釋,以讓讀者和我能夠更好地理解此類算法的工作機制和並應用到適合的應用場景。
循環神經網絡(RNN)是一種特殊類型的人工神經網絡。與一般的神經網絡不同,RNN可以讓信息在網絡的各層之間來回流動。這意味著某些節點(或稱為神經元)的輸出可以影響它們自己未來的輸入。
這種「記憶」能力使RNN非常適用於需要處理一系列輸入的任務,比如手寫識別或語音識別。簡單來說,RNN可以「記住」過去的信息,並用它來影響未來的決策。
RNN有兩種主要類型:有限脈衝和無窮脈衝。有限脈衝的RNN實際上可以被展開成一個普通的前向神經網絡,而無窮脈衝的RNN則包含了循環,不能這樣展開。
除了基本型態,RNN也有一些更先進的變體,如長短期記憶網絡(LSTM)和門控循環單元(GRU)。這些變體添加了額外的「門」和「記憶單元」,以更有效地存儲和控制信息。
最後,值得一提的是,RNN理論上是圖靈完全的,這意味著它們有能力執行任意程序來處理任意序列的輸入。這也是為什麼近年來,RNN和許多有借鑑他概念的模型,如Transformer模型,已經在自然語言處理、機器翻譯等多個領域取得了重要突破。
RNN的概念可以追溯到20世紀20年代的Ising模型,這是由物理學家Wilhelm Lenz和Ernst Ising於德國提出的。雖然Ising模型最初是為了解釋固態物理學中的磁性行為,但其數學結構與早期的RNN有相似之處。然而,這個早期版本的RNN是靜態的,也就是說,它不能學習或適應。
到了1972年,數學家Shun'ichi Amari對Ising模型進行了改進,使其具有自適應性,也就是能夠學習。這基本上是第一個能夠學習的RNN模型。
1982年,Hopfield網絡在美國被發表,這是一種能夠存儲和檢索模式的神經網絡,與Ising模型和Amari的工作可能有關。
1986年,David Rumelhart進一步推動了RNN的發展,他是反向傳播算法的共同發明人,該算法被廣泛用於訓練各種神經網絡,包括RNN。
1993年,一個名為“neural history compressor system”的模型解決了一個非常深度的學習任務,該任務需要在時間上展開的RNN中使用超過1000個連續層。
綜合以上各點,RNN從一個簡單的物理模型演變成一個能夠處理非常複雜任務的強大工具。它的發展歷程涵蓋了從基本存儲和模式識別到高度複雜的序列處理和深度學習。
1990年代初,RNN開始受到更多的關注,特別是在語音識別和自然語言處理領域。然而,由於梯度消失和梯度爆炸等問題,基本的RNN模型在訓練過程中遇到了困難。
1997年,Sepp Hochreiter和Jürgen Schmidhuber提出了長短期記憶(Long Short-Term Memory,LSTM)模型,這是一種特殊類型的RNN,設計用於解決梯度消失問題。LSTM的出現為RNN的發展打開了新的大門,並在後來的多個領域,包括機器翻譯、文本生成等,取得了重要的成就。
2000年代後期和2010年代,隨著計算能力的提升和大數據的出現,RNN和其變體(如GRU、LSTM等)在各種應用中表現出色,包括語音識別、圖像生成、自然語言處理等。
1997年,長短時記憶(LSTM)網絡由Sepp Hochreiter和Jürgen Schmidhuber提出,解決了RNN在長序列上的梯度消失問題。2014年,Kyunghyun Cho等人提出了閘控循環單元(GRU),這是一種簡化版的LSTM。
RNN被廣泛應用於語音識別、自然語言處理、時間序列預測等多個領域。除了基本的RNN結構,還有多種變體,如雙向RNN、多層RNN等。
近年來,Transformer模型在自然語言處理領域取得了突破性成果。儘管它不是RNN,但它的自注意機制(Self-Attention Mechanism)和位置編碼(Positional Encoding)使其在處理序列數據方面表現得非常出色,甚至在某些任務上超過了RNN,這在很大程度上取代了RNN的相關的算法。
隨著深度學習的快速發展,RNN也在不斷進化,包括更高效的訓練算法、更多的變體和更廣泛的應用場景。
由於這是大概1982以來就有的模型,並且到2012~2014(也是有點老古董)年左右因為 word of bag 的相關應用才又開始熱門的應用,這邊原理的解說就可以查看下面幾個比較經典的原理講解。推薦直接看英文的拉,那個是最經典的。
2.遞歸神經網路(RNN)和長短期記憶模型(LSTM)的運作原理
3.循环神经网络 – Recurrent Neural Network | RNN
這篇Blog主要是程式碼,算法的實作。
循環神經網絡(RNN)是一類適用於學習像自然語言處理(NLP)中的文本這樣的序列數據的表示的神經網路種類。
RNN背後的思想是運用序列信息進行判斷,RNN之所以被稱為“循環”,是因為它們對序列中的每一個元素執行相同的任務,輸出依賴於之前的計算。
另一種理解RNN的方式是,它們有一個“記憶”,捕捉到目前為止已經計算的信息,理論上,RNN可以利用在任意長的序列中的信息,但在實際的實現上,經典的RNN只能回顧幾步而已。
import numpy as np # 為了使用線性代數相關的函數
import pandas as pd # 為了使用資料處理相關的套件
import requests
url = 'https://github.com/markl-a/ML-demos/raw/main/3.RNNs/wonderland.txt' # 注意這裡是 'raw' 鏈接
response = requests.get(url)
# 確保請求成功
if response.status_code == 200:
with open('wonderland.txt', 'wb') as f:
f.write(response.content)
else:
print('Failed to download the file.')
# 載入Keras相關的套件
from __future__ import print_function
from tensorflow.keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.src.utils import split_dataset
# 資料處理相關的過程
RawData = "wonderland.txt"
# 將輸入檔案轉成字元串流並轉到要處理的檔案中
print("將輸入檔案轉成字元串流並轉到要處理的檔案中...")
with open(RawData, 'rb') as StreamData:
SplitDataset = [
line.strip().lower().decode("ascii", "ignore")
for line in StreamData
if len(line.strip()) > 0
]
text = " ".join(SplitDataset)
# 創建字符到索引和索引到字符的映射
charSet = set(text)
charToIndex = {c: i for i, c in enumerate(charSet)}
indexToChar = {i: c for i, c in enumerate(charSet)}
# 初始化參數和列表
print("建立輸入向量和文字標籤")
seqLen, step = 10, 1
inputChars, labelChars = [], []
# 創建輸入和標籤列表
inputChars = [text[i:i + seqLen] for i in range(0, len(text) - seqLen, step)]
labelChars = [text[i + seqLen] for i in range(0, len(text) - seqLen, step)]
# 初始化和填充 X 和 y
numChars = len(charSet)
X = np.zeros((len(inputChars), seqLen, numChars), dtype=np.bool)
y = np.zeros((len(inputChars), numChars), dtype=np.bool)
for i, inputChar in enumerate(inputChars):
for j, ch in enumerate(inputChar):
X[i, j, charToIndex[ch]] = 1
y[i, charToIndex[labelChars[i]]] = 1
# 初始化參數
hiddenSize, batchSize = 128, 128
numIterations, numEpochsPerIteration, numPredsPerEpoch = 25, 1, 100
# 建立模型
model = Sequential([
SimpleRNN(hiddenSize, return_sequences=False, input_shape=(seqLen, numChars), unroll=True),
Dense(numChars),
Activation("softmax")
])
# 編譯模型
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
# 我們分批訓練模型,並在每個迭代步驟後生成測試輸出
for iteration in range(numIterations):# 遍歷每一個迭代步驟
print("=" * 50)# 輸出分隔線
print("Iteration #: %d" % (iteration))# 輸出當前迭代次數
# 使用 fit 方法訓練模型,批次大小為 batchSize,迭代次數為 numEpochsPerIteration
model.fit(X, y, batch_size=batchSize, epochs=numEpochsPerIteration)
# 測試模型
# 從 inputChars 中隨機選擇一個索引作為種子,然後生成接下來的 100 個字符
testIdx = np.random.randint(len(inputChars))# 隨機選擇一個索引
testChars = inputChars[testIdx]# 使用該索引獲取對應的字符序列作為種子
print("Generating from seed: %s" % (testChars))# 輸出所選的種子
print(testChars, end="")# 輸出種子字符,不換行
# 遍歷每一個預測步驟
for i in range(numPredsPerEpoch):
# 初始化一個形狀為 (1, seqLen, numChars) 的零矩陣,用於存儲單個輸入序列
Xtest = np.zeros((1, seqLen, numChars))
# 填充 Xtest 矩陣
for i, ch in enumerate(testChars):
Xtest[0, i, charToIndex[ch]] = 1# 將對應的字符位置設為 1
# 使用模型進行預測
pred = model.predict(Xtest, verbose=0)[0]
# 從預測結果中選擇最可能的字符
ypred = indexToChar[np.argmax(pred)]
# 輸出預測的字符,不換行
print(ypred, end="")
# 更新 testChars,以便下一次預測
testChars = testChars[1:] + ypred
print()# 換行,開始下一個迭代步驟
==================================================
Iteration #: 0
1241/1241 [==============================] - 15s 6ms/step - loss: 2.3450
Generating from seed: y took the
y took the wast the sher the said the said the said the said the said the said the said the said the said the
==================================================
Iteration #: 1
1241/1241 [==============================] - 6s 4ms/step - loss: 2.0465
Generating from seed: any rate a
any rate and the sald the tore the routhe ther she her the sald the tore the routhe ther she her the sald the
==================================================
Iteration #: 2
1241/1241 [==============================] - 6s 5ms/step - loss: 1.9354
Generating from seed: n, and she
n, and she could to the dore to the dore to the dore to the dore to the dore to the dore to the dore to the do
==================================================
Iteration #: 3
1241/1241 [==============================] - 6s 5ms/step - loss: 1.8498
Generating from seed: e alice wa
e alice was a lang the could the doon the labbet in a moute the forme the forme the forme the forme the forme
==================================================
Iteration #: 4
1241/1241 [==============================] - 5s 4ms/step - loss: 1.7818
Generating from seed: ake us up
ake us up and she had alice and alice and alice and alice and alice and alice and alice and alice and alice an
==================================================
Iteration #: 5
1241/1241 [==============================] - 6s 5ms/step - loss: 1.7279
Generating from seed: you myself
you myself the grown the grown the grown the grown the grown the grown the grown the grown the grown the grown
==================================================
Iteration #: 6
1241/1241 [==============================] - 6s 5ms/step - loss: 1.6828
Generating from seed: n and how
n and how the said to the gryphon and the mock turtle to any all the some of the said to the gryphon and the m
==================================================
Iteration #: 7
1241/1241 [==============================] - 6s 5ms/step - loss: 1.6441
Generating from seed: ng at ever
ng at ever was the she said the mouse said the mouse said the mouse said the mouse said the mouse said the mou
==================================================
Iteration #: 8
1241/1241 [==============================] - 5s 4ms/step - loss: 1.6122
Generating from seed: ought; and
ought; and the mack turnle the mad a little sought and was a little sought and was a little sought and was a l
==================================================
Iteration #: 9
1241/1241 [==============================] - 5s 4ms/step - loss: 1.5845
Generating from seed: o the trad
o the tradece the dight and the cat a little sought and comping the could to the doom as she cauld her head th
==================================================
Iteration #: 10
1241/1241 [==============================] - 5s 4ms/step - loss: 1.5593
Generating from seed: d the quee
d the queens and was to she was to mare alice was so she said alice was so she said alice was so she said alic
==================================================
Iteration #: 11
1241/1241 [==============================] - 5s 4ms/step - loss: 1.5384
Generating from seed: g her paws
g her paws and the was the mock turtle the project gutenberg-tm electronic work on the which the was the mock
==================================================
Iteration #: 12
1241/1241 [==============================] - 6s 5ms/step - loss: 1.5200
Generating from seed: : so she h
: so she heard to have to have to have to have to have to have to have to have to have to have to have to have
==================================================
Iteration #: 13
1241/1241 [==============================] - 6s 5ms/step - loss: 1.5026
Generating from seed: adrille is
adrille is the mock turtle the project gutenberg-tm electronic work or the said the mock turtle the project gu
==================================================
Iteration #: 14
1241/1241 [==============================] - 5s 4ms/step - loss: 1.4875
Generating from seed: and was go
and was going to be not ment the project gutenberg-tm electronic works the hatter was the mory to do stated th
==================================================
Iteration #: 15
1241/1241 [==============================] - 5s 4ms/step - loss: 1.4745
Generating from seed: buting a p
buting a person and the courte in a minute or the things alice as the three so alice as the three so alice as
==================================================
Iteration #: 16
1241/1241 [==============================] - 5s 4ms/step - loss: 1.4611
Generating from seed: g, and lon
g, and long a little gone and the caterpillar the could not as she caterpillar the could not as she caterpilla
==================================================
Iteration #: 17
1241/1241 [==============================] - 6s 5ms/step - loss: 1.4505
Generating from seed: , and said
, and said the mock turtle at the work of the sablee at the work of the sablee at the work of the sablee at th
==================================================
Iteration #: 18
1241/1241 [==============================] - 6s 5ms/step - loss: 1.4387
Generating from seed: days wrong
days wrong as it made of the which was the mock turtle she was go now you know what it made the white rabbit w
==================================================
Iteration #: 19
1241/1241 [==============================] - 6s 5ms/step - loss: 1.4303
Generating from seed: i want a
i want a little streat of a must had a little streat of a must had a little streat of a must had a little str
==================================================
Iteration #: 20
1241/1241 [==============================] - 5s 4ms/step - loss: 1.4212
Generating from seed: arm, with
arm, with the dormouse said the king her it was the king her it was the king her it was the king her it was th
==================================================
Iteration #: 21
1241/1241 [==============================] - 6s 5ms/step - loss: 1.4125
Generating from seed: with diam
with diam, said the mock turtle in a much all down the mouse got may little the mock turtle in a much all dow
==================================================
Iteration #: 22
1241/1241 [==============================] - 8s 7ms/step - loss: 1.4042
Generating from seed: asleep, h
asleep, harr one of the door of the door of the door of the door of the door of the door of the door of the d
==================================================
Iteration #: 23
1241/1241 [==============================] - 6s 5ms/step - loss: 1.3980
Generating from seed: ed you wit
ed you with and all the some of the door all the some of the door all the some of the door all the some of the
==================================================
Iteration #: 24
1241/1241 [==============================] - 6s 5ms/step - loss: 1.3903
Generating from seed: ould chang
ould change the dormouse in a minute or the white rabbit on the the footman in the the footman in the the foot
明天會繼續介紹 LSTM,GRU等的RNN的延伸以及變革。