iT邦幫忙

2

筆記-回聲狀態神經網路(Echo state network)

更新紀錄:

  • 2020/05/07-更新排版、代碼、參考文獻

前言:

ESN是RNN的一個變種,優點在於訓練速度比一般RNN快幾百倍(用回歸求解矩陣)。主要參考下面幾篇文獻,如果沒有接觸過這個網路的朋友請務必先看[1]和[2]。以前碩班有摸過一下,現在因為工作所需又重新複習了下,這邊作個筆記然後有空在畫成圖上傳到這篇。下面會有兩個代碼實現,但請注意這兩個代碼所實現的ESN結構略有不同,所以請務必看一下文獻提到的Paper。

總共看了兩個代碼(ESN鼻祖的學生[6]、Medium網友[7]),後者算是變種版。下面會提供自己修改後的版本,其運行結果與原代碼相同,只是稍微修改一下架構和補上自己的註釋:

簡易ESN[6]

# -*- coding: utf-8 -*-
"""
原代碼, Mantas Lukoševičius(開山祖師Jaeger的學生), http://mantas.info
數據集可以到Lukoševičius那邊下載
"""

import matplotlib.pyplot as plt
import scipy
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCV
from sklearn.metrics import mean_squared_error

np.random.seed(42)

"""
Step 1. 讀取數據(MackeyGlass_t17),尺寸為(10000, 1)
"""
data = pd.read_table('MackeyGlass_t17.txt', header=None)


"""
Step 2. 定義訓練集長度len_train、測試集長度len_test、暫態長度len_transient、誤差計算長度len_error
"""
len_train = 2000
len_test = 2000
len_transient = min(int(len_train / 10), 100)
len_error = 500


"""
Step 3. 畫出數據前1000筆的曲線
"""
plt.figure('A sample of data').clear()
plt.plot(data[0:1000])
plt.title('A sample of data')


"""
Step 4. 定義ESN架構
輸入層神經元數inSize:
    - 等於輸入資料的特徵數
輸出層神經元數outSize:
    - 等於目標資料的特徵數
隱藏層神經元數resSize:
    - 建議最少100顆,且每次增幅單位為50顆
    - 沒看過超過1000的
洩漏率a:
    - 加強短期記憶能力,範圍在[0, 1]
    - 若a=1則ESN;a<1則Leaky ESN(P7)
輸入層的縮放因子input_scaling
    - 範圍在[0, 1]
稀疏程度sparsity:
    - 建議在0.01(1%)~0.05(5%),也有人建議0.05(5%)~0.2(20%)
指定譜半徑rho:
    - 範圍在[0, 1)
"""
inSize = 1
outSize = 1
resSize = 100
a = 0.3
input_scaling = 1.0
sparsity = 1.00
rho = 1.25


"""
Step 5. 初始化輸入層權重Win、隱藏層權重W
輸入層權重Win:
    - 如果有偏權值,則尺寸為(隱藏層神經元數, 1+輸入層神經元數)
    - 否則(隱藏層神經元數, 輸入層神經元數)
    - 常見範圍在[-1, 1]或者[-0.5, 0.5]
    - 建議用常態分布
初始隱藏層權重W_0:
    - 尺寸為(隱藏層神經元數, 隱藏層神經元數)
    - 常見的初始範圍在[-1, 1]或者[-0.5, 0.5]
    - 這邊提供另一種寫法,借鑑於Madalina Ciortan,這個方法才真正達到稀疏效果
    - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5
    - scipy.sparse.rand(列數, 行數, density=稀疏程度).todense()
    - 假設density=0.1(10%)、列數=10、行數=10,則10x10的矩陣中只有總計10個元素不為零,而且是不規則零散分布在矩陣中
初始隱藏層權重的譜半徑rhoW:
    - max( abs( W ))
    - rhoW<1,隱藏層才具有ECHO特性
隱藏層權重W:
    - W = W_0 * rho/rhoW
"""
Win = (np.random.rand(resSize, 1+inSize)-0.5) * input_scaling
W_0 = np.random.rand(resSize,resSize)-0.5
# W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense()
# W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5
rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) 
W = W_0 * rho/rhoW


"""
Step 5. 收集:取得每一時刻的狀態x(t),然後存在所有時刻的狀態X中
    - t=0時,x(0)為一個尺寸(隱藏層神經元數,1)的零矩陣,y(0)為一個尺寸(1,輸出層神經元數)的零矩陣
    - 記憶庫響應狀態x(t=0)=(resSize, 1)
    - 有偏權值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) ) 
    = (隱藏層神經元數,1) = 純量*(隱藏層神經元數,1) + 純量*( (隱藏層神經元數,1+輸入層神經元數)*(1+輸入層神經元數,1) + (隱藏層神經元數, 隱藏層神經元數)*(隱藏層神經元數,1) )
    = (隱藏層神經元數,1) = 純量*(隱藏層神經元數,1) + 純量*( (輸入層神經元數,1) + (輸入層神經元數,1) )
    
    - 無偏權值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) )
    - 記得要刪掉X的前len_transient筆
"""
X = []
x = np.zeros([resSize, 1])
for t in range(len_train):
    u = data.loc[t]
    x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack((1, u))) + np.dot(W, x) )
    X.append(np.vstack([1, u, x]))

X = X[len_transient:]
X = np.array(X).reshape(-1, 1+inSize+resSize)



"""
Step 6. 訓練:利用所有時刻的狀態X、所有時刻的目標Yt,以Ridge回歸求得輸出層權重Wout
    - 輸出層權重的尺寸為(輸出層神經元數, 1+輸入層神經元數+隱藏層神經元數)
    - 借鑑Madalina Ciortan,我直接用Ridge取代了Wout
    - L1正則的回歸器效果都不太好(Lasso、ElasticNet),我猜是因為很多權重被刪掉的關係
    
    - y(t) = Wout*[1;u(t);x(t)]
      (輸出層神經元數,1) = (輸出層神經元數,1+輸入層神經元數+隱藏層神經元數)*(1+輸入層神經元數+隱藏層神經元數,1)
"""
Yt = data.loc[len_transient+1:len_train]
ridge = Ridge(alpha=1e-8)
ridge.fit(X, Yt)

    
"""
Step 7. 測試
    - 第一筆測試輸入u(t-1)=data[2000]
    - 第一筆狀態x(t-1)延續於Step 5
    - Y是用來記錄模型在測試階段各時刻的輸出
    - 作者提供兩種模型輸出:遞迴式多步預測和單步預測,選一個用就可以了
    - 有偏權值:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) ) 
              y(t) = Wout*[1;u(t);x(t)]
    - 無偏權值:x(t) = (1-a)*x(t-1) + a*tanh( Win*u(t) + W*x(t-1) )
              y(t) = Wout*[u(t);x(t)]    
"""
Y_pred = []
u = data.loc[len_train]
for t in range(len_test):
    x = (1-a)*x + a*np.tanh( np.dot( Win, np.vstack([1,u]) ) + np.dot( W, x ) )
    y = ridge.predict(np.vstack((1,u,x)).T)
    Y_pred.append(y)
    # 遞迴式多步預測,也就是當前時刻的模型輸出會作為下一刻模型的輸入
    u = y
    # 單步預測,當前時刻的輸入數據是從實際量測或者離線數據中取得
    # u = data.loc[len_train+t+1] 
Y_pred = np.array(Y_pred)
Y_pred = Y_pred.reshape(-1, outSize)


"""
Step 8. 採MSE計算測試集中前len_error筆誤差
    - 為什麼只計算len_error筆誤差,因為Mantas當初是採遞迴式多步預測作為輸出方式
    - 遞迴式多步預測(Recursive Multi-Step Forecasting)的問題在於預測能力會隨著時間而衰弱,從本代碼的執行結果中也顯而易見
    - 所以我猜這就是他只取500筆作計算的原因
"""
mse = mean_squared_error(data.loc[len_train+1:len_train+len_error], Y_pred[:len_error])
print('MSE = ' + str( mse ))
    

"""
畫出測試集輸入訊號
"""
plt.figure('Target and generated signals').clear()
plt.plot(data.loc[len_train+1:len_train+len_test].to_numpy(), 'g')
plt.plot( Y_pred, 'b' )
plt.title('Target and generated signals $y(n)$ starting at $n=0$')
plt.legend(['Target signal', 'Free-running predicted signal'])


"""
畫出20條權重在t=100~300的狀態響應X
"""
plt.figure('Some reservoir activations').clear()
plt.plot( X[0:200, 0:20] )# 看起來是t=0~200,但加上沖洗時間initLen=100後應為t=100~300
plt.title('Some reservoir activations $\mathbf{x}(n)$')


"""
畫出輸出權重的值
"""
plt.figure('Output weights').clear()
plt.bar( range(1+inSize+resSize), ridge.coef_.ravel() )
plt.title('Output weights $\mathbf{W}^{out}$')

plt.show()

遞迴式多步預測結果,可以注意到從第1000筆開始發散,所以Lukoševičius才會只取500筆計算誤差
https://ithelp.ithome.com.tw/upload/images/20200507/201247666GZswxCOWZ.png
單步預測結果
https://ithelp.ithome.com.tw/upload/images/20200507/20124766slv9Uq7Koe.png

Medium網友[7]:

# -*- coding: utf-8 -*-
"""
重點文獻:
1. Reservoir computing approaches for representation and classification of multivariate time series
2. https://towardsdatascience.com/gentle-introduction-to-echo-state-networks-af99e5373c68
"""


import numpy as np
import scipy.io
from scipy import sparse
from sklearn.preprocessing import OneHotEncoder
from sklearn.decomposition import PCA
from sklearn.linear_model import Ridge
from sklearn.metrics import accuracy_score, f1_score

"""
Reservoir implementation
"""
class Reservoir(object):
    
    def __init__(self, n_internal_units=100, spectral_radius=0.99, leak=None,
                 connectivity=0.3, input_scaling=0.2, noise_level=0.01, circle=False):
        """
        Step 1. 建立參數
        - n_internal_units = 記憶庫神經元,某文獻說約在300~100,但我看過的期刊最少都100起跳
        - input_scaling = Win的縮放因子
        - noise_level = 白噪音的縮放因子
        - leak = 洩漏率,範圍在[0, 1]之間
        """
        self._n_internal_units = n_internal_units
        self._input_scaling = input_scaling
        self._noise_level = noise_level
        self._leak = leak
        
        
        """
        Step 2. 輸入權重Win初始化為None
        - _input_weights = 輸入權重
        """
        self._input_weights = None
        
        
        """
        Step 3. 根據遞迴機制circle啟用與否(Wr),來決定記憶庫權重W的建立方式
        - circle = 遞迴機制
        - _internal_weights = 記憶庫權重,size=[n_internal_units, n_internal_units]
        """       
        if circle:
            # W+Wr
            self._internal_weights = self._initialize_internal_weights_Circ(n_internal_units,
                                                                            spectral_radius)
        else:
            # W
            self._internal_weights = self._initialize_internal_weights(n_internal_units,
                                                                       connectivity,
                                                                       spectral_radius)


    def _initialize_internal_weights_Circ(self, n_internal_units, spectral_radius):
        """
        
        """
        internal_weights = np.zeros((n_internal_units, n_internal_units))
        internal_weights[0,-1] = spectral_radius
        for i in range(n_internal_units-1):
            internal_weights[i+1,i] = spectral_radius
                
        return internal_weights
    
    

    def _initialize_internal_weights(self, n_internal_units,
                                     connectivity, spectral_radius):
        """
        Step 4. 以常態分佈產生稀疏度connectivity的記憶庫權重W
        - internal_weights = 記憶庫權重W,size=[n_internal_units, n_internal_units]
        - connectivity = 稀疏度,部分期刊建議設1%~5%。例如某個10X10的矩陣,稀疏度設10%,則矩陣中只有10個元素不為零。    
        - spectral_radius = 譜半徑,必須要<1
        """
        internal_weights = sparse.rand(n_internal_units,
                                       n_internal_units,
                                       density=connectivity).todense()


        """
        Step 5. 記憶庫權重W所有非零項範圍從0~+1=>-0.5~+0.5        
        """
        internal_weights[np.where(internal_weights > 0)] -= 0.5
        
        
        """
        Step 6. 求得記憶庫權重W的特徵值E,然後以e_max=max( abs( E ) )和譜半徑spectral_radius調整記憶庫權重W
        - W = (spectral_radius*W)/e_max
        - 這邊的spectral_radius為調整譜半徑用的係數
        - 實際上W的譜半徑是e_max沒有錯
        """
        E, _ = np.linalg.eig(internal_weights)
        e_max = np.max(np.abs(E))
        internal_weights /= np.abs(e_max)/spectral_radius       

        return internal_weights


    def _compute_state_matrix(self, X, n_drop=0):
        """
        Step 10. 初始化過往狀態x(t)
        previous_state = 過往狀態,size=[N, _n_internal_units]
        """
        N, T, _ = X.shape
        previous_state = np.zeros((N, self._n_internal_units), dtype=float)

        
        """
        Step 11. 建立狀態矩陣X
        state_matrix = 狀態矩陣,size=[N, T-n_drop, _n_internal_units]
        """        
        state_matrix = np.empty((N, T - n_drop, self._n_internal_units), dtype=float)
        for t in range(T):
            # 取得當前輸入u(t)
            current_input = X[:, t, :]

            # 計算活化函數前的狀態狀態x(t)
            # state_before_tanh = W*x(t-1)+Win*u(t)
            state_before_tanh = self._internal_weights.dot(previous_state.T) + self._input_weights.dot(current_input.T)

            # 加入白噪音
            #state_before_tanh = W*x(t-1)+Win*u(t) + random*_noise_level
            state_before_tanh += np.random.rand(self._n_internal_units, N)*self._noise_level

            # 加入非線性tanh()以及洩漏率_leak
            # 雖然原作者列了條件,但應該可以概括為x(t) = (1-a)*x(t-1) + tanh(W*x(t-1)+Win*u(t) + random*_noise_level)
            # 當a=1即條件1;否則條件2
            if self._leak is None:
                # 條件1:a=1,ESN
                previous_state = np.tanh(state_before_tanh).T
            else:
                # 條件2:a=[0, 1),Leaky ESN
                # 這邊應該要寫x(t)=(1-a)*x(t-1) + a*x(t)
                previous_state = (1.0 - self._leak)*previous_state + np.tanh(state_before_tanh).T

            # 當超過沖洗時間,將過往狀態儲存x(t)在狀態矩陣X
            if (t > n_drop - 1):
                state_matrix[:, t - n_drop, :] = previous_state

        return state_matrix

    
    def get_states(self, X, n_drop=0, bidir=True):
        # 取得數據列數N、時間步數T、特徵數V
        # 若Win未建立,則建立Win
        
        """
        Step 8. 根據數據的三個維度大小(N:列數或資料筆數;T:時間步數;V:行數或特徵數),建立輸入權重Win
        - _input_weights = 輸入權重,size=[_n_internal_units, V]
        - np.random.binomial(1, 0.5 , [self._n_internal_units, V]):RxV個0或1
        - 2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0:RxV個-1或1
        - (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling:RxV個-_input_scaling或_input_scaling
        """
        N, T, V = X.shape
        if self._input_weights is None:
            self._input_weights = (2.0*np.random.binomial(1, 0.5 , [self._n_internal_units, V]) - 1.0)*self._input_scaling
        
            
        """
        Step 9. 計算順向狀態矩陣X
        - states/state_matrix = 狀態矩陣,size=[N, T-n_drop, _n_internal_units]
        """
        states = self._compute_state_matrix(X, n_drop)
        
        
        """
        Step 12. 如果雙向機制bidir啟動,將數據反向後再算一次狀態矩陣X,然後順向狀態矩陣X與反向狀態矩陣Xr合併
        """
        if bidir is True:
            # 數據沿著時間軸反向
            X_r = X[:, ::-1, :]
            # 計算反向狀態矩陣states_r
            states_r = self._compute_state_matrix(X_r, n_drop)
            # 順向狀態矩陣states與反向狀態矩陣states_r合併
            states = np.concatenate((states, states_r), axis=2)

        return states
    
    def getReservoirEmbedding(self, X,pca, ridge_embedding,  n_drop=5, bidir=True, test = False):
        
        """
        Step 7. 取得初始狀態res_states
        - 初始狀態矩陣res_states = 各時刻的狀態(除了前n_drop筆)
        - 拋棄筆數n_drop = 前n筆不記錄在狀態矩陣states
        """
        res_states = self.get_states(X, n_drop=5, bidir=True)

        
        """
        Step 13. 為了降維,將初始狀態矩陣res_states從[N, T-n_drop, _n_internal_units]重塑為res_states[N*(T-n_drop), _n_internal_units]
        資料筆數N_samples      
        """
        N_samples = res_states.shape[0]
        res_states = res_states.reshape(-1, res_states.shape[2])                   


        """
        Step 14. 根據當前模式,決定PCA是對res_states採學習或者轉換方式
        經降維後的res_states    
        """
        # 若目前為測試模式,則pca僅執行轉換
        if test:
            red_states = pca.transform(res_states)
        # 若目前為訓練模式,則pca執行學習
        else:
            red_states = pca.fit_transform(res_states)          
        # 重新將red_states從[N*(T-n_drop), _n_internal_units]=>[N, T-n_drop, _n_internal_units]
        red_states = red_states.reshape(N_samples,-1,red_states.shape[1])  
        
        
        
        """
        Step 15. 透過Ridge回歸,進行編碼
        編碼器input_repr
        """
        coeff_tr = []
        biases_tr = []   

        for i in range(X.shape[0]):
            ridge_embedding.fit(red_states[i, 0:-1, :], red_states[i, 1:, :])
            coeff_tr.append(ridge_embedding.coef_.ravel())
            biases_tr.append(ridge_embedding.intercept_.ravel())
        print(np.array(coeff_tr).shape,np.array(biases_tr).shape)
        input_repr = np.concatenate((np.vstack(coeff_tr), np.vstack(biases_tr)), axis=1)
        return input_repr


"""
讀取資料
"""
data = scipy.io.loadmat('dataset/JpVow.mat')
X = data['X']  
Y = data['Y']
Xte = data['Xte']
Yte = data['Yte']

"""
對輸出作熱編碼
"""
onehot_encoder = OneHotEncoder(sparse=False, categories='auto')
Y = onehot_encoder.fit_transform(Y)
Yte = onehot_encoder.transform(Yte)

"""
建立降維器、編碼器,以及解碼器
"""
pca = PCA(n_components=75)  
ridge_embedding = Ridge(alpha=10, fit_intercept=True)
readout = Ridge(alpha=5)

"""
ESN訓練
"""
res = Reservoir(n_internal_units=450, spectral_radius=0.6, leak=0.6,
                 connectivity=0.25, input_scaling=0.1, noise_level=0.01, circle=False)
"""
降維器訓練、編碼器訓練
"""
input_repr = res.getReservoirEmbedding(X, pca, ridge_embedding,  n_drop=5, bidir=True, test = False)
"""
將測試資料餵入ESN,並且進行編碼
"""
input_repr_te = res.getReservoirEmbedding(Xte,pca, ridge_embedding,  n_drop=5, bidir=True, test = True)


"""
Step 16. 解碼器訓練
"""
readout.fit(input_repr, Y)
"""
將測試資料餵入解碼器
"""
logits = readout.predict(input_repr_te)
"""
分類
"""
pred_class = np.argmax(logits, axis=1)

"""
結果可視化
"""
true_class = np.argmax(Yte, axis=1)
print(accuracy_score(true_class, pred_class))
print(f1_score(true_class, pred_class, average='weighted'))

以[6]維框架,加入了[7]的Wfb、noise、Win的定義方法後,其誤差可以降到9.58e-10

# -*- coding: utf-8 -*-
"""
原代碼, Mantas Lukoševičius(開山祖師Jaeger的學生), http://mantas.info
數據集可以到Lukoševičius那邊下載
"""

import matplotlib.pyplot as plt
import scipy
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge, RidgeCV, ElasticNet, ElasticNetCV, Lasso, LassoCV
from sklearn.metrics import mean_squared_error

np.random.seed(42)

"""
Step 1. 讀取數據(MackeyGlass_t17),尺寸為(10000, 1)
"""
data = pd.read_table('MackeyGlass_t17.txt', header=None).to_numpy()


"""
Step 2. 定義訓練集長度len_train、測試集長度len_test、暫態長度len_transient、誤差計算長度len_error
"""
len_train = 2000
len_test = 2000
len_transient = min(int(len_train / 10), 100)
len_error = 500


"""
Step 3. 畫出數據前1000筆的曲線
"""
plt.figure('A sample of data').clear()
plt.plot(data[0:1000])
plt.title('A sample of data')


"""
Step 4. 定義ESN架構
輸入層神經元數Nu:
    - 等於輸入資料的特徵數
輸出層神經元數Ny:
    - 等於目標資料的特徵數
隱藏層神經元數Nx:
    - 建議最少100顆,且每次增幅單位為50顆
    - 沒看過超過1000的
洩漏率a:
    - 加強短期記憶能力,範圍在[0, 1]
    - 若a=1則ESN;a<1則Leaky ESN(P7)
輸入層的縮放因子input_scaling
    - 範圍在[0, 1]
稀疏程度sparsity:
    - 建議在0.01(1%)~0.05(5%),也有人建議0.05(5%)~0.2(20%)
指定譜半徑rho:
    - 範圍在[0, 1)
使用偏權值use_bias
使用輸出層回饋權重(Wfb)use_fb
"""
Nu = 1
Ny = 1
Nx = 500
a = 0.3
input_scaling = 0.01
sparsity = 0.05
rho = 0.99
use_bias = True
use_fb = True
noise_level = 0.00


"""
Step 5. 初始化輸入層權重Win、隱藏層權重W
輸入層權重Win:
    - 如果有偏權值,則尺寸為(隱藏層神經元數, 1+輸入層神經元數)
    - 否則(隱藏層神經元數, 輸入層神經元數)
    - 常見範圍在[-1, 1]或者[-0.5, 0.5]
    - 建議用常態分布
輸出層回饋權重Wfb:
    - 尺寸為(輸出層神經元數, 1)
    - 常見範圍在[-1, 1]或者[-0.5, 0.5]
    - 建議用常態分布
初始隱藏層權重W_0:
    - 尺寸為(隱藏層神經元數, 隱藏層神經元數)
    - 常見的初始範圍在[-1, 1]或者[-0.5, 0.5]
    - 這邊提供另一種寫法,借鑑於Madalina Ciortan,這個方法才真正達到稀疏效果
    - W_0 = scipy.sparse.rand(resSize, resSize, density=sparsity).todense() - 0.5
    - scipy.sparse.rand(列數, 行數, density=稀疏程度).todense()
    - 假設density=0.1(10%)、列數=10、行數=10,則10x10的矩陣中只有總計10個元素不為零,而且是不規則零散分布在矩陣中
初始隱藏層權重的譜半徑rhoW:
    - max( abs( W_0 ))
    - rhoW<1,隱藏層才具有ECHO特性
隱藏層權重W:
    - W = W_0 * rho/rhoW
"""
if use_bias==True:
    # Win = (np.random.rand(Nx, 1+Nu)-0.5) * input_scaling
    Win = (2.0*np.random.binomial(1, 0.5 , [Nx, 1+Nu]) - 1.0) * input_scaling
else:
    # Win = (np.random.rand(Nx, Nu)-0.5) * input_scaling
    Win = (2.0*np.random.binomial(1, 0.5 , [Nx, Nu]) - 1.0) * input_scaling
if use_fb==True:
    # Wfb = np.random.rand(Nx,Ny)-0.5
    Wfb = (2.0*np.random.binomial(1, 0.5 , [Nx, Ny]) - 1.0)
else:
    Wfb = np.zeros([Nx,Ny])
# W_0 = np.random.rand(Nx,Nx)-0.5
W_0 = scipy.sparse.rand(Nx, Nx, density=sparsity).todense()
W_0[np.where(W_0 > 0)] = W_0[np.where(W_0 > 0)] - 0.5
rhoW = np.max(np.abs(np.linalg.eig(W_0)[0])) 
W = W_0 * rho/rhoW


"""
Step 5. 收集:取得每一時刻的狀態x(t),然後存在所有時刻的狀態X中
    - t=0時,x(0)為一個尺寸(隱藏層神經元數,1)的零矩陣,y(0)為一個尺寸(輸出層神經元數,1)的零矩陣
    - 無偏權值、無回饋:x(t) = (1-a)*x(t-1) + a*tanh( Win*   u(t)  + W*x(t-1) + 0          );X尺寸=(筆數,   輸入層神經元數+隱藏層神經元數            )
    - 無偏權值、有回饋:x(t) = (1-a)*x(t-1) + a*tanh( Win*   u(t)  + W*x(t-1) + Wfb*y(t-1) );X尺寸=(筆數,   輸入層神經元數+隱藏層神經元數+輸出層神經元數)
    - 有偏權值、無回饋:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + 0          );X尺寸=(筆數, 1+輸入層神經元數+隱藏層神經元數            )
    - 有偏權值、有回饋:x(t) = (1-a)*x(t-1) + a*tanh( Win*[1;u(t)] + W*x(t-1) + Wfb*y(t-1) );X尺寸=(筆數, 1+輸入層神經元數+隱藏層神經元數+輸出層神經元數)
    - X應為一個尺寸(隱藏層神經元數,筆數)
    - 在訓練Wout前記得要刪掉X的前len_transient筆
"""
X = []
x = np.zeros([Nx, 1])
y = np.zeros([Ny, 1])
for t in range(len_train):
    u = data[t:t+1]
    if use_bias==True:
        x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)
    else:
        x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand([Nx, 1])*noise_level)
    if use_bias==False and use_fb==False:
        X.append(np.vstack([u, x]))
    elif use_bias==False and use_fb==True:
        X.append(np.vstack([u, x, y]))
    elif use_bias==True and use_fb==False:
        X.append(np.vstack([1, u, x]))
    elif use_bias==True and use_fb==True:
        X.append(np.vstack([1, u, x, y]))
    y = data[t:t+1]

X = X[len_transient:]
X = np.array(X).reshape(len_train-len_transient, -1)



"""
Step 6. 訓練:利用所有時刻的狀態X、所有時刻的目標Yt,以Ridge回歸求得輸出層權重Wout
    - 輸出層權重的尺寸為(輸出層神經元數, 1+輸入層神經元數+隱藏層神經元數)
    - 借鑑Madalina Ciortan,我直接用Ridge取代了Wout
    - L1正則的回歸器效果都不太好(Lasso、ElasticNet),我猜是因為很多權重被刪掉的關係
    - 無偏權值、無回饋:Wout尺寸=(輸出層神經元數,   輸入層神經元數+隱藏層神經元數            )
    - 無偏權值、有回饋:Wout尺寸=(輸出層神經元數,   輸入層神經元數+隱藏層神經元數+輸出層神經元數)
    - 有偏權值、無回饋:Wout尺寸=(輸出層神經元數, 1+輸入層神經元數+隱藏層神經元數            )
    - 有偏權值、有回饋:Wout尺寸=(輸出層神經元數, 1+輸入層神經元數+隱藏層神經元數+輸出層神經元數)
"""
y_train = data[len_transient+1:len_train+1]
ridge = Ridge(alpha=1e-8)
ridge.fit(X, y_train)

    
"""
Step 7. 測試
    - 第一筆測試輸入u(t-1)=data[2000]
    - 第一筆狀態x(t-1)延續於Step 5
    - Y是用來記錄模型在測試階段各時刻的輸出
    - 作者提供兩種模型輸出:遞迴式多步預測和單步預測,選一個用就可以了
    - 無偏權值、無回饋:y(t) = Wout*[  u(t);x(t)]
    - 無偏權值、有回饋:y(t) = Wout*[  u(t);x(t);y(t-1)]
    - 有偏權值、無回饋:y(t) = Wout*[1;u(t);x(t)]
    - 有偏權值、有回饋:y(t) = Wout*[1;u(t);x(t);y(t-1)]  
"""
y_pred = []
u = data[len_train:len_train+1]
for t in range(len_test):
    if use_bias==True:
        x = (1-a)*x + a*np.tanh( np.dot(Win, np.vstack([1, u])) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)
    else:
        x = (1-a)*x + a*np.tanh( np.dot(Win, u) + np.dot(W, x) + np.dot(Wfb, y) + np.random.rand(Nx, 1)*noise_level)
    # print(t)
    # if t==504:
    #     print()
    if use_bias==False and use_fb==False:
        y = ridge.predict(np.vstack([u, x]).T)
    elif use_bias==False and use_fb==True:
        y = ridge.predict(np.vstack([u, x, y]).T)
    elif use_bias==True and use_fb==False:
        y = ridge.predict(np.vstack([1, u, x]).T)
    elif use_bias==True and use_fb==True:
        y = ridge.predict(np.vstack([1, u, x, y]).T)

    y_pred.append(y)
    # 遞迴式多步預測,也就是當前時刻的模型輸出會作為下一刻模型的輸入
    # u = y
    # 單步預測,當前時刻的輸入數據是從實際量測或者離線數據中取得
    u = data[len_train+t+1:len_train+t+2]
    y = data[len_train+t:len_train+t+1]
    
y_pred = np.array(y_pred)
y_pred = y_pred.reshape(-1, Ny)


"""
Step 8. 採MSE計算測試集中前len_error筆誤差
    - 為什麼只計算len_error筆誤差,因為Mantas當初是採遞迴式多步預測作為輸出方式
    - 遞迴式多步預測(Recursive Multi-Step Forecasting)的問題在於預測能力會隨著時間而衰弱,從本代碼的執行結果中也顯而易見
    - 所以我猜這就是他只取500筆作計算的原因
"""
mse = mean_squared_error(data[len_train+1:len_train+1+len_error], y_pred[:len_error])
print('MSE = ' + str( mse ))
    

"""
畫出測試集輸入訊號
"""
plt.figure('Target and generated signals').clear()
plt.plot(data[len_train+1:len_train+1+len_test], 'g')
plt.plot( y_pred, 'b' )
plt.title('Target and generated signals $y(n)$ starting at $n=0$')
plt.legend(['Target signal', 'Free-running predicted signal'])


"""
畫出20條權重在t=100~300的狀態響應X
"""
plt.figure('Some reservoir activations').clear()
plt.plot( X[0:200, 0:20] )# 看起來是t=0~200,但加上沖洗時間initLen=100後應為t=100~300
plt.title('Some reservoir activations $\mathbf{x}(n)$')


"""
畫出輸出權重的值
"""
plt.figure('Output weights').clear()
if use_bias==False and use_fb==False:
    plt.bar( range(Nu+Nx), ridge.coef_.ravel() )
elif use_bias==False and use_fb==True:
    plt.bar( range(Nu+Nx+Ny), ridge.coef_.ravel() )
elif use_bias==True and use_fb==False:
    plt.bar( range(1+Nu+Nx), ridge.coef_.ravel() )
elif use_bias==True and use_fb==True:
    plt.bar( range(1+Nu+Nx+Ny), ridge.coef_.ravel() )
plt.title('Output weights $\mathbf{W}^{out}$')

plt.show()

參考文獻

[1]An Introduction to the Echo State Network and its Applications in Power System:入門級期刊(含應用),若不認識ESN的話這篇一定要先看
[2]Adaptive Nonlinear System Identification with Echo State Networks:開山祖師Jaeger寫的,也是入門級期刊,若不認識ESN的話這篇一定要先看
[3]A Practical Guide to Applying Echo State Networks:Lukoševičius(開山祖師Jaeger的碩+博學生)寫的,重點觀念在P2~P3
[4]Predictive Modeling with Echo State Networks:這篇加了Wfb,以及各參數的實驗結果,2.1節是重點
[5]Echo State Networks for Adaptive Filtering:重點觀念在P18的Algorithm 1,終於明白Lukoševičius原代碼的1.25是什麼了
[6]Simple Echo State Network implementations:Lukoševičius提供簡易代碼,與[2]相呼應
[7]Gentle introduction to Echo State Networks:Ciortan(Medium網友)的代碼,該代碼源頭可追朔到[8]
[8]Reservoir computing approaches for representation and classification of multivariate time series:作時間序列辨識的,加入雙向機制、降維和編解碼器
[9]An Introduction to:Reservoir Computing and Echo State Networks:入門級投影片


尚未有邦友留言

立即登入留言