這是一個具有一層隱藏層的神經網路：

神經網路示意圖

假設 🔗

輸入層有 3 個節點，輸入 X 中有 3 筆數據，其標籤為 Y：

$X = \begin{bmatrix} 1 & 2 & 0 \\ 2 & -3 & 2 \\ -1 & -1 & 3 \\ \end{bmatrix}, Y = \begin{bmatrix} 1 \\ 2 \\ -3 \\ \end{bmatrix}$

$W_1 = \begin{bmatrix} -1 & 0 \\ -1 & 1 \\ 1 & -1 \\ \end{bmatrix}$

隱藏層有 2 個節點，隱藏層權重矩陣為 $W_1$ ，線性組合 $Z=XW_1$ ，
經過激活函數 $\sigma$ 後的值為 $K$ ，即 $K = \sigma(Z)$
另激活函數 $\sigma$ 為 $Relu$ 函數， $\sigma(x)=Relu(x)=max(x,0)$
輸出層有 1 個節點，其權重矩陣為 $W_2$ ，線性輸出 $O=KW_2$

$W_2 = \begin{bmatrix} 1 \\ -2 \\ \end{bmatrix}$

將輸出值與標籤去計算損失，令損失為 $J$ ，假設使用加總型式的最小平方損失

$J = \sum (\frac{1}{2}(O-Y)^2)$

此時，已知輸出層梯度：

$G_{out} = \frac{\partial J}{\partial O}=O-Y$

隱藏層梯度：

$\begin{equation} \begin{aligned} G_2 &= \frac{\partial J}{\partial W_2} = \frac{\partial J}{\partial O} \frac{\partial O}{\partial W_2} \\\\ &= ((G_{out})^T \cdot K)^T = K^TG_{out} \end{aligned} \end{equation}$

$Relu$ 函數的微分式：

$\sigma^{\prime}(x) = \begin{cases} 0 \quad x < 0 \\ 1 \quad x \ge 0 \\ \end{cases}$

先假定一個暫存的 $G_{temp}$ ：

$G_{temp} = (G_{out} \cdot W^T_2) \circ \sigma^{\prime}(Z)$

其中的 $\cdot$ 代表一般的矩陣乘法、 $\circ$ 代表阿達瑪乘積，為對應位置的矩陣元素乘積

輸入層梯度：

$\begin{equation} \begin{aligned} G_1 &= \frac{\partial J}{\partial W_1} = \frac{\partial J}{\partial O} \frac{\partial O}{\partial K} \frac{\partial K}{\partial Z} \frac{\partial Z}{\partial W_1} \\\\ &= (((G_{out} \cdot W^T_2) \circ \sigma^{\prime} (Z))^T \cdot X)^T \\\\ &= ((G_{temp})^T \cdot X)^T \\\\ &= X^TG_{temp} \end{aligned} \end{equation}$

可用 $G_1,G_2$ 梯度更新權重 $W_1,W_2$ 的值，得到新權重 $W_1^{new},W_2^{new}$ ，

假設我們採用隨機梯度下降法來進行更新，且學習率令為 $0.1$ ，則

$\begin{cases} W_1^{new} = W_1 - 0.1 \times G_1 \\ W_2^{new} = W_2 - 0.1 \times G_2 \\ \end{cases}$

問題 🔗

求矩陣 $Z,K,O,G_{out},G_2,\sigma^{\prime}(Z),G_{temp},G_1,W_1^{new},W_2^{new}$

求解過程皆省略公式推導過程，將直接使用最終結果代入計算

$Z$

由線性組合 $Z=XW_1$ ：

$\begin{equation} \begin{aligned} Z &= XW_1 \\\\ &= \begin{bmatrix} 1 & 2 & 0 \\ 2 & -3 & 2 \\ -1 & -1 & 3 \\ \end{bmatrix} \cdot \begin{bmatrix} -1 & 0 \\ -1 & 1 \\ 1 & -1 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -3 & 2 \\ 3 & -5 \\ 5 & -4 \\ \end{bmatrix} \end{aligned} \end{equation}$

$K$

由經過激活函數 $\sigma$ 後的值 $K = \sigma(Z)$ ，且激活函數 $\sigma$ 為 $Relu$ 函數， $\sigma(x)=Relu(x)=max(x,0)$ ：

$\begin{equation} \begin{aligned} K &= \sigma(Z) \\\\ &= \sigma\ ( \begin{bmatrix} -3 & 2 \\ 3 & -5 \\ 5 & -4 \\ \end{bmatrix} ) \\\\ &= \begin{bmatrix} 0 & 2 \\ 3 & 0 \\ 5 & 0 \\ \end{bmatrix} \end{aligned} \end{equation}$

$O$

輸出層有 1 個節點，其權重矩陣為 $W_2$ ，由線性輸出 $O=KW_2$ ：

$\begin{equation} \begin{aligned} O &= KW_2 \\\\ &= \begin{bmatrix} 0 & 2 \\ 3 & 0 \\ 5 & 0 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 \\ -2 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -4 \\ 3 \\ 5 \\ \end{bmatrix} \end{aligned} \end{equation}$

$G_{out}$

由輸出層梯度 $G_{out} = O-Y$ ：

$\begin{equation} \begin{aligned} G_{out} &= O-Y \\\\ &= \begin{bmatrix} -4 \\ 3 \\ 5 \\ \end{bmatrix} - \begin{bmatrix} 1 \\ 2 \\ -3 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -5 \\ 1 \\ 8 \\ \end{bmatrix} \end{aligned} \end{equation}$

$G_2$

由隱藏層梯度 $G_2 = K^TG_{out}$ ，其中 $K^T$ 為矩陣 $K$ 的轉置：

$K = \begin{bmatrix} 0 & 2 \\ 3 & 0 \\ 5 & 0 \\ \end{bmatrix}, K^T = \begin{bmatrix} 0 & 3 & 5 \\ 2 & 0 & 0 \\ \end{bmatrix}$

$\begin{equation} \begin{aligned} G_2 &= K^TG_{out} \\\\ &= \begin{bmatrix} 0 & 3 & 5 \\ 2 & 0 & 0 \\ \end{bmatrix} \cdot \begin{bmatrix} -5 \\ 1 \\ 8 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} 43 \\ -10 \\ \end{bmatrix} \end{aligned} \end{equation}$

$\sigma^{\prime}(Z)$

其中 $\sigma^{\prime}$ 為 $Relu$ 函數的微分式：

$\sigma^{\prime}(x) = \begin{cases} 0 \quad x < 0 \\ 1 \quad x \ge 0 \\ \end{cases}$

$\begin{equation} \begin{aligned} \sigma^{\prime}(Z) &= \sigma^{\prime}( \begin{bmatrix} -3 & 2 \\ 3 & -5 \\ 5 & -4 \\ \end{bmatrix}) \\\\ &= \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ 1 & 0 \\ \end{bmatrix} \end{aligned} \end{equation}$

$G_{temp}$

由假定暫存的 $G_{temp} = (G_{out} \cdot W^T_2) \circ \sigma^{\prime}(Z)$
其中的 $\cdot$ 代表一般的矩陣乘法、 $\circ$ 代表阿達瑪乘積，為對應位置的矩陣元素乘積，且 $W_2^T$ 為矩陣 $W_2$ 的轉置：

$W_2 = \begin{bmatrix} 1 \\ -2 \\ \end{bmatrix}, W_2^T = \begin{bmatrix} 1 & -2 \\ \end{bmatrix}$

$\begin{equation} \begin{aligned} G_{temp} &= (G_{out} \cdot W^T_2) \circ \sigma^{\prime}(Z) \\\\ &= ( \begin{bmatrix} -5 \\ 1 \\ 8 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & -2 \end{bmatrix} ) \circ \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ 1 & 0 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -5 & 10 \\ 1 & -2 \\ 8 & -16 \\ \end{bmatrix} \circ \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ 1 & 0 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} 0 & 10 \\ 1 & 0 \\ 8 & 0 \\ \end{bmatrix} \end{aligned} \end{equation}$

$G_1$

由輸入層梯度 $G_1 = X^TG_{temp}$ ，其中 $X^T$ 為矩陣 $X$ 的轉置：

$X = \begin{bmatrix} 1 & 2 & 0 \\ 2 & -3 & 2 \\ -1 & -1 & 3 \\ \end{bmatrix}, X^T = \begin{bmatrix} 1 & 2 & -1 \\ 2 & -3 & -1 \\ 0 & 2 & 3 \\ \end{bmatrix},$

$\begin{equation} \begin{aligned} G_1 &= X^TG_{temp} \\\\ &= \begin{bmatrix} 1 & 2 & -1 \\ 2 & -3 & -1 \\ 0 & 2 & 3 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 & 10 \\ 1 & 0 \\ 8 & 0 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -6 & 10 \\ -11 & 20 \\ 26 & 0 \\ \end{bmatrix} \end{aligned} \end{equation}$

$W_1^{new},W_2^{new}$

可用 $G_1,G_2$ 梯度更新權重 $W_1,W_2$ 的值，得到新權重 $W_1^{new},W_2^{new}$ ，
已知我們採用隨機梯度下降法來進行更新，且學習率為 $0.1$ ，則

$\begin{cases} W_1^{new} = W_1 - 0.1 \times G_1 \\ W_2^{new} = W_2 - 0.1 \times G_2 \\ \end{cases}$

$\begin{equation} \begin{aligned} W_1^{new} &= W_1 - 0.1 \times G_1 \\\\ &= \begin{bmatrix} -1 & 0 \\ -1 & 1 \\ 1 & -1 \\ \end{bmatrix} - 0.1 \times \begin{bmatrix} -6 & 10 \\ -11 & 20 \\ 26 & 0 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -1 & 0 \\ -1 & 1 \\ 1 & -1 \\ \end{bmatrix} - \begin{bmatrix} -0.6 & 1 \\ -1.1 & 2 \\ 2.6 & 0 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -0.4 & -1 \\ 0.1 & -1 \\ -1.6 & -1 \\ \end{bmatrix} \end{aligned} \end{equation}$

$\begin{equation} \begin{aligned} W_2^{new} &= W_2 - 0.1 \times G_2 \\\\ &= \begin{bmatrix} 1 \\ -2 \\ \end{bmatrix} - 0.1 \times \begin{bmatrix} 43 \\ -10 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -1 & 0 \\ -1 & 1 \\ 1 & -1 \\ \end{bmatrix} - \begin{bmatrix} 4.3 \\ -1 \\ \end{bmatrix} \\\\ &= \begin{bmatrix} -3.3 \\ -1 \\ \end{bmatrix} \end{aligned} \end{equation}$

訓練神經網路計算過程

假設 🔗

問題 🔗

點擊回到導覽頁面  🔗