edunex itb
Post on 13-Apr-2022
17 Views
Preview:
TRANSCRIPT
EDUNEX ITB
EDUNEX ITB
IF4074 Minggu 9-15
• Minggu ke-9 (18 Oktober 2021): LSTM + RNN arsitektur + Tubes 2
• Minggu ke-10 (25 Oktober 2021): RNN Latihan + BPTT
• Minggu ke-11 (1 November 2021): Kuliah tamu (sharing Aplikasi ML di Gojek)
• Minggu ke-12 (8 November 2021): Praktikum RNN
• Minggu ke-13 (15 November 2021): Feature Engineering 1 / TugasDesain eksperimen
• Minggu ke-14 (22 November 2021): Kuis 2
• Minggu ke-15 (29 November 2021): Praktikum Feature Engineering 2
EDUNEX ITB
04 LSTM: What & Why
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra(masayu@informatika.org)
KK IF – Teknik Informatika – STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Long Short-Term Memory (LSTM): Why
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
ℎ𝑡 = 𝑓(𝑈𝑥𝑡 +𝑊ℎ𝑡−1 + 𝑏xh)
𝑦𝑡 = 𝑓(𝑉ℎ𝑡 + 𝑏hy)
RNN: long-term dependency problem
U
W V
𝑥𝑡
ℎt-1
ℎt
Suffer from short-term memory (forward propagation).
Suffer from vanishing gradient problem (backward propagation).RNNs fail to learn greater than 5-10 time steps.In the worst case, this may completely stop the neural network from further training.
02
EDUNEX ITB
Long Short-Term Memory (LSTM): What
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs are explicitly designed to avoid the long-term dependency problem.
Introduced by Hochreiter & Schmidhuber (1997)
LSTM is special kind of RNN. The differences are the operations within the LSTM’s cells. RNN: repeating module have a very simple structure. LSTM: repeating module contains four interacting layers.
03
EDUNEX ITB
LSTM: Cell State & Gates• as the “memory” of the network
• act as a transport highway that transfers relevant information throughout the processing of the sequence.
Cell State
• decides what information should be thrown away or kept.
• Values closer to 0 means to forget, and closer to 1 means to keep.Forget Gate
• Decides what information is relevant to add from the current step
• update the cell state by hidden state and current inputInput Gate
• decides what the next hidden state should be.
• Hidden state contains information on previous inputs. The hidden state is also used for predictions.
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
04
EDUNEX ITB
Forget Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
𝑓𝑡 = 𝜎(𝑈𝑓𝑥𝑡 +𝑊𝑓ℎ𝑡−1 + 𝑏f)
Value 1 represents
“completely keep this”
while a 0 represents “completely get rid of this.”
𝑥𝑡
ℎ t-1
𝑐t-1
𝑓𝑡
05
EDUNEX ITB
Input Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
𝑖𝑡 = 𝜎(𝑈𝑖𝑥𝑡 +𝑊𝑖ℎ𝑡−1 + 𝑏i)
𝑥𝑡
ℎ t-1
𝑐t-1
𝑓𝑡 𝑖𝑡 ǁ𝑐𝑡
෩𝐶𝑡 = tanh(𝑈𝑐𝑥𝑡 +𝑊𝑐ℎ𝑡−1 + 𝑏c)
06
EDUNEX ITB
Cell State
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
𝑥𝑡
ℎ t-1
𝑐t-1
𝑓𝑡 𝑖𝑡 ǁ𝑐𝑡
𝐶𝑡 = 𝑓𝑡 ⊙𝐶𝑡−1 + 𝑖𝑡 ⊙ ෩𝐶𝑡
07
EDUNEX ITB
Output Gate
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
𝑥𝑡
ℎ t-1
𝑐t-1
𝑓𝑡 𝑖𝑡 ǁ𝑐𝑡
𝑜𝑡 = 𝜎 𝑈𝑜𝑥𝑡 +𝑊𝑜ℎ𝑡−1 + 𝑏o
ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶t)
08
EDUNEX ITB
LSTM Forward Propagation: Example
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
Ԧ𝑥 (2)
ℎ (1)
𝑈𝑓 , 𝑈𝑖,
𝑈𝑐 , 𝑈𝑜
𝑊𝑓 ,𝑊𝑖,
𝑊𝑐 ,𝑊𝑜
A1 A2 Target
1 2 0.5
0.5 3 1
…
𝑓𝑡 = 𝜎(𝑈𝑓𝑥𝑡 +𝑊𝑓ℎ𝑡−1 + 𝑏f)
𝑖𝑡 = 𝜎(𝑈𝑖𝑥𝑡 +𝑊𝑖ℎ𝑡−1 + 𝑏i)
෩𝐶𝑡 = tanh(𝑈𝑐𝑥𝑡 +𝑊𝑐ℎ𝑡−1 + 𝑏c)
𝐶𝑡 = 𝑓𝑡 ⊙𝐶𝑡−1 + 𝑖𝑡 ⊙ ෩𝐶𝑡
𝑜𝑡 = 𝜎 𝑈𝑜𝑥𝑡 +𝑊𝑜ℎ𝑡−1 + 𝑏o
ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶t)
Uf0.700 0.450
Ui0.950 0.800
Uc0.450 0.250
Uo0.600 0.400
Wf bf0.100 0.150
Wi bi0.800 0.650
Wc bc0.150 0.200
Wo bo0.250 0.100
ht-1 Ct-10 0
09
EDUNEX ITB
Computing ht and ct : Timestep t1
t1=<12, 0.5> Uf.xt Wf.ht-1+bf net_ft ft
1.600 0.150 1.750 0.852
Ui.xt Wi.ht-1+bi net_it it2.550 0.650 3.200 0.961
Uc.xt Wc.ht-1+bc net_~ct ~ct0.950 0.200 1.150 0.818
Uo.xt Wo.ht-1+bo net_ot ot1.400 0.100 1.500 0.818
Ct ht0.786 0.536
ht-1 Ct-10 0
𝑓𝑡 = 𝜎(𝑈𝑓𝑥𝑡 +𝑊𝑓ℎ𝑡−1 + 𝑏f)
𝑖𝑡 = 𝜎(𝑈𝑖𝑥𝑡 +𝑊𝑖ℎ𝑡−1 + 𝑏i)
෩𝐶𝑡 = tanh(𝑈𝑐𝑥𝑡 +𝑊𝑐ℎ𝑡−1 + 𝑏c)
𝐶𝑡 = 𝑓𝑡 ⊙𝐶𝑡−1 + 𝑖𝑡 ⊙ ෩𝐶𝑡
𝑜𝑡 = 𝜎 𝑈𝑜𝑥𝑡 +𝑊𝑜ℎ𝑡−1 + 𝑏o
ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶t)
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
10
EDUNEX ITB
Computing ht and ct : Timestep t2
t2=<0.53
, 1>
ht-1 Ct-10.786 0.536
𝑓𝑡 = 𝜎(𝑈𝑓𝑥𝑡 +𝑊𝑓ℎ𝑡−1 + 𝑏f)
𝑖𝑡 = 𝜎(𝑈𝑖𝑥𝑡 +𝑊𝑖ℎ𝑡−1 + 𝑏i)
෩𝐶𝑡 = tanh(𝑈𝑐𝑥𝑡 +𝑊𝑐ℎ𝑡−1 + 𝑏c)
𝐶𝑡 = 𝑓𝑡 ⊙𝐶𝑡−1 + 𝑖𝑡 ⊙ ෩𝐶𝑡
𝑜𝑡 = 𝜎 𝑈𝑜𝑥𝑡 +𝑊𝑜ℎ𝑡−1 + 𝑏o
ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶t)
Uf.xt Wf.ht-1+bf net_ft ft1.700 0.204 1.904 0.870
Ui.xt Wi.ht-1+bi net_it it2.875 1.079 3.954 0.981
Uc.xt Wc.ht-1+bc net_~ct ~ct0.975 0.280 1.255 0.850
Uo.xt Wo.ht-1+bo net_ot ot1.500 0.234 1.734 0.850
Ct ht1.518 0.772
https://medium.com/@aidangomez/let-s-do-this-f9b699de31d9
11
EDUNEX ITB
Implementing LSTM on Keras: Many to One
from keras import Sequential
from keras.layers import LSTM, Dense
model = Sequential()
model.add(LSTM(10, input_shape=(50,1)))
#10 neurons & process 50x1 sequences
model.add(Dense(1,activation='linear’))
#linear output as regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
Ԧ𝑥 (1)
ℎ (10)
𝑦 (1)
U
V
W
# predict amazon stock closing prices, LSTM 50 timestep
12
EDUNEX ITB
Number of Parameter
Ԧ𝑥 (1)
ℎ (10)
𝑦 (1)
U
V
W
Total parameter = (1+10+1)*4*10+(10+1)*1=491
Simple RNN with equal networks: 131 parameterU: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
13
Total parameter for n unit lstm from m-dimension input to k dimension output = (m+n+1)*4*n+(n+1)*k
EDUNEX ITB
RNN → LSTM → GRU → ReGU
1985Recurrent nets
1997LSTMBi-RNN
2014GRU
2017Residual LSTM
2019Residual Gated Unit
GRU: no cell state, 2 gates
ReGU: shortcut connection
14
EDUNEX ITB
Summary
LSTMs avoid the long-term
dependency problem
LSTMs have a cell state and 3 gates
(forget, input, output)
Computing ht and ct
15
Backpropagation Through Time
EDUNEX ITB
03 RNN Architecture
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra(masayu@informatika.org)
KK IF – Teknik Informatika – STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
General Architecture
Ԧ𝑥 (i)
ℎ1 (𝑗)
ℎℎ (k)
𝑦 (𝑚)
Uxh1
Uh1h…
V
…
Uh…hh
Wh1
Wh…
Whh
Ԧ𝑥1
ℎ (𝑗)
Ԧ𝑥2
ℎ (𝑗)
Ԧ𝑥𝑛
ℎ (𝑗)…
n timestep
Return sequence = True/False
02
EDUNEX ITB
Architecture
fixed-sized input vector xt
fixed-sized output vector ot
RNN state st
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
One to many: image captioningMany to one: text classificationMany to many: machine translation, video frame classification, POS tagging
03
EDUNEX ITB
One to Many: Image Captioning
CNN Encoder (Inception) - RNN Decoder (LSTM) (Vinyals dkk., 2014)
04
EDUNEX ITB
Many to One: Text Classification
22https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow
05
EDUNEX ITB
Many to Many: Sequence Tagging
https://www.depends-on-the-definition.com/guide-sequence-tagging-neural-networks-python/
Input is a sequence of words, and output is the sequence of POS tag for each word.
23
06
EDUNEX ITB
Many to Many: Machine Translation
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
● Machine Translation: input is a sequence of words in source language (e.g. German). Output is a sequence of words in target language (e.g. English).
● A key difference is that our output only starts after we have seen the complete input, because the first word of our translated sentences may require information captured from the complete input sequence.
24
07
EDUNEX ITB
Implementing RNN on Keras: Many to One
from keras import Sequential
from keras.layers import SimpleRNN, Dense
model = Sequential()
model.add(SimpleRNN(10, input_shape=(50,1)))
#simple recurrent layer, 10 neurons & process 5
0x1 sequences
model.add(Dense(1,activation='linear')) #linear
output because this is a regression problem
https://towardsdatascience.com/a-comprehensive-guide-to-working-with-recurrent-neural-networks-in-keras-f3b2d5e2fa7f
Ԧ𝑥 (1)
ℎ (10)
𝑦 (1)
U
V
W
# predict amazon stock closing prices, RNN 50 timestep
08
EDUNEX ITB
Number of Parameter
Ԧ𝑥 (1)
ℎ (10)
𝑦 (1)
U
V
W
Total parameter = (1+10+1)*10+(10+1)*1=131Simple RNN:U: matrix hidden neurons x (input dimension + 1)W: matrix hidden neurons x hidden neuronsV: matrix output neurons x (hidden neurons+1)
09
EDUNEX ITB
Number of Parameter: Example 2model = Sequential() #initialize model
model.add(SimpleRNN(64, input_shape=(50,1), return_sequences=True))#64 neurons
model.add(SimpleRNN(32, return_sequences=True))#32 neurons
model.add(SimpleRNN(16)) #16 neurons
model.add(Dense(8,activation='tanh'))
model.add(Dense(1,activation='linear'))
Total parameter = 8257
= (1+64+1)*64=4224
= (64+32+1)*32=3104
= (32+16+1)*16=784= (16+1)*8=136= (8+1)*1=9
10
EDUNEX ITB
Bidirectional RNNs
• In many applications we want to output a prediction of y (t) which may depend on the whole input sequence. E.g. co-articulation in speech recognition, right neighbors in POS tagging, etc.
• Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence. https://www.cs.toronto.edu/~tingwuwang/rnn_tutorial.pdf
11
EDUNEX ITB
Bidirectional RNNs for Information Extraction
https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/
29
12
EDUNEX ITB
Summary
Architecture: 1-to-n, n-to-1,
n-to-n
Number of parameter
RNN
Bidirectional RNN
10
LSTM
EDUNEX ITB
05 Backpropagation Through Time
Pembelajaran Mesin Lanjut(Advanced Machine Learning)
Masayu Leylia Khodra(masayu@informatika.org)
KK IF – Teknik Informatika – STEI ITB
Modul 4: Recurrent Neural Network
01
EDUNEX ITB
Backpropagation Through Time (BPTT)
Forward Passget sequence current
output
Backward Passcompute 𝛿𝑔𝑎𝑡𝑒𝑠𝑡, 𝛿𝑥𝑡, ∆𝑜𝑢𝑡𝑡−1, 𝛿U, 𝛿W, 𝛿b
Update Weights𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − . 𝛿𝑤𝑜𝑙𝑑
BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network.
02
EDUNEX ITB
Example
Ԧ𝑥 (2)
ℎ (1)
𝑈𝑓 , 𝑈𝑖,
𝑈𝑐 , 𝑈𝑜
𝑊𝑓 ,𝑊𝑖,
𝑊𝑐 ,𝑊𝑜
unfold
Ԧ𝑥1 =12
[0.536]
Ԧ𝑥2 =0.53
[0.772]
0.5 1.25
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
03
0
EDUNEX ITB
LSTM: Backward Propagation Timestep t
𝑓𝑡 = 𝜎(𝑈𝑓𝑥𝑡 +𝑊𝑓ℎ𝑡−1 + 𝑏f)
𝑖𝑡 = 𝜎(𝑈𝑖𝑥𝑡 +𝑊𝑖ℎ𝑡−1 + 𝑏i)
෩𝐶𝑡 = tanh(𝑈𝑐𝑥𝑡 +𝑊𝑐ℎ𝑡−1 + 𝑏c)
𝐶𝑡 = 𝑓𝑡 ⊙𝐶𝑡−1 + 𝑖𝑡 ⊙ ෩𝐶𝑡
𝑜𝑡 = 𝜎 𝑈𝑜𝑥𝑡 +𝑊𝑜ℎ𝑡−1 + 𝑏o
ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝐶t)
𝛿𝑜𝑢𝑡𝑡 = ∆𝑡 + ∆𝑜𝑢𝑡𝑡
𝛿𝐶𝑡 = 𝛿𝑜𝑢𝑡𝑡 ⊙𝑜𝑡 ⊙ 1− 𝑡𝑎𝑛ℎ2 𝐶𝑡 + 𝛿𝐶𝑡+1 ⊙𝑓𝑡+1
𝛿 ෩𝐶𝑡 = 𝛿𝐶𝑡 ⊙ 𝑖𝑡 ⊙ (1 − ෩𝐶𝑡2)
𝛿𝑖𝑡 = 𝛿𝐶𝑡 ⊙ ෩𝐶𝑡 ⊙ 𝑖𝑡 ⊙ (1 − 𝑖𝑡)
𝛿𝑓𝑡 = 𝛿𝐶𝑡 ⊙𝐶𝑡−1 ⊙𝑓𝑡 ⊙ 1− 𝑓𝑡
𝛿𝑜𝑡 = 𝛿𝑜𝑢𝑡𝑡 ⊙ tanh 𝐶𝑡 ⊙𝑜𝑡 ⊙ 1− 𝑜𝑡
𝛿𝑥𝑡 = 𝑈𝑇 . 𝛿𝑔𝑎𝑡𝑒𝑠𝑡
∆𝑜𝑢𝑡𝑡−1= 𝑊𝑇. 𝛿𝑔𝑎𝑡𝑒𝑠𝑡
04
EDUNEX ITB
Computing 𝛿𝑔𝑎𝑡𝑒𝑠𝑡 for timestep t=2
Last timestep: ∆𝑜𝑢𝑡𝑡= 0; 𝑓𝑡+1 = 0; 𝛿𝐶𝑡+1 = 0
t2: 𝜕𝐸
𝜕𝑜= 0.772 − 1.25 = −0.478→ 𝛿𝑜𝑢𝑡2 = −0.478 + 0 = −0.478
𝛿𝐶2 = −0.478 ∗ 0.85 ∗ 1 − 𝑡𝑎𝑛ℎ2 1.518 + 0 ∗ 0 = −0.071
𝛿𝑓2 = −0.071 ∗ 0.786 ∗ 0.870 ∗ 1 − 0.870 = −0.006𝛿𝑖2 = −0.071 ∗ 0.850 ∗ 0.981 ∗ 1 − 0.981 = −0.001𝛿෪𝐶2 = −0.071 ∗ 0.981 ∗ 1 − 0.8502 = −0.019𝛿𝑜2 = −0.478 ∗ tanh 1.518 ∗ 0.850 ∗ 1 − 0.850 = −0.055
𝛿𝑖𝑡 = 𝛿𝐶𝑡 ⊙ ෩𝐶𝑡 ⊙ 𝑖𝑡 ⊙ (1 − 𝑖𝑡)𝛿𝑓𝑡 = 𝛿𝐶𝑡 ⊙𝐶𝑡−1 ⊙𝑓𝑡 ⊙ 1− 𝑓𝑡𝛿𝑜𝑡 = 𝛿𝑜𝑢𝑡𝑡 ⊙ tanh 𝐶𝑡 ⊙𝑜𝑡 ⊙ 1− 𝑜𝑡
𝛿𝑜𝑢𝑡𝑡 = ∆𝑡 + ∆𝑜𝑢𝑡𝑡𝛿𝐶𝑡 = 𝛿𝑜𝑢𝑡𝑡 ⊙𝑜𝑡 ⊙ 1− 𝑡𝑎𝑛ℎ2 𝐶𝑡 + 𝛿𝐶𝑡+1 ⊙𝑓𝑡+1
𝛿 ෩𝐶𝑡 = 𝛿𝐶𝑡 ⊙ 𝑖𝑡 ⊙ (1 − ෩𝐶𝑡2)
𝐸 =1
2(𝑡𝑎𝑟𝑔𝑒𝑡 − ℎ)2 ∆𝑡=
𝜕𝐸
𝜕ℎ= −(𝑡𝑎𝑟𝑔𝑒𝑡 − ℎ) = ℎ − 𝑡𝑎𝑟𝑔𝑒𝑡
𝛿𝑔𝑎𝑡𝑒𝑠2 =
−0.006−0.001−0.019−0.055
05
EDUNEX ITB
Computing 𝛿𝑥2 and ∆𝑜𝑢𝑡1 for timestep t=2
𝛿𝑥𝑡 = 𝑈𝑇 . 𝛿𝑔𝑎𝑡𝑒𝑠𝑡∆𝑜𝑢𝑡𝑡−1= 𝑊𝑇 . 𝛿𝑔𝑎𝑡𝑒𝑠𝑡
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
dgates-0.006-0.001-0.019-0.055
dx2-0.047-0.030
W0.100 0.800 0.150 0.250
dout1-0.018
06
EDUNEX ITB
Computing for timestep t=1: ∆𝑜𝑢𝑡1= −0.018
𝛿𝑜𝑢𝑡1 = 0.036 − 0.018 = 0.018𝛿𝐶1 = −0.053𝛿𝑓1 = 0𝛿𝑖1 = −0.0017𝛿෪𝐶1 = −0.017𝛿𝑜1 = 0.0018
𝛿𝑥𝑡 = 𝑈𝑇 . 𝛿𝑔𝑎𝑡𝑒𝑠𝑡∆𝑜𝑢𝑡𝑡−1= 𝑊𝑇 . 𝛿𝑔𝑎𝑡𝑒𝑠𝑡
U0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
W0.100 0.800 0.150 0.250
dgates0.0000
-0.0017-0.01700.0018
dx1-0.0082-0.0049
dout0-0.0035
07
EDUNEX ITB
Computing 𝛿U, 𝛿W, 𝛿b
𝛿𝑈 =
𝑡=1
2
𝛿𝑔𝑎𝑡𝑒𝑠𝑡 . 𝑥𝑡 =
0.0−0.0017−0.01700.0018
1 2 +
−0.006−0.001−0.019−0.055
0.5 3 =
𝛿𝑊 = σ𝑡=12 𝛿𝑔𝑎𝑡𝑒𝑠𝑡+1 . ℎ𝑡=
−0.006−0.001−0.019−0.055
[0.536]=
𝛿𝑏 =
𝑡=1
2
𝛿𝑔𝑎𝑡𝑒𝑠𝑡+1 =
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641-0.05362
08
EDUNEX ITB
Update Weights (=0.1) 𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − . 𝛿𝑤𝑜𝑙𝑑
dU-0.0032 -0.0189-0.0022 -0.0067-0.0267 -0.0922-0.0259 -0.1626
dW-0.0034-0.0006-0.0104-0.0297
db-0.00631-0.00277-0.03641
Unew0.7003 0.9502 0.4527 0.60260.4519 0.8007 0.2592 0.4163
Uold0.7 0.95 0.5 0.6
0.45 0.8 0.3 0.4
Wold0.100 0.800 0.150 0.250
Wnew0.1003 0.8001 0.1510 0.2530
bnew0.1506 0.6503 0.2036 0.1054
bold0.1500 0.6500 0.2000 0.1000
09
EDUNEX ITB
Truncated BPTT
https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent
41
Truncated BPTT was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network.
10
EDUNEX ITB
Summary
Backpropagation through time for
LSTMTruncated BPTT
11
EDUNEX ITB
top related