[Deep Learning] Recurrent Neural Networks (RNN)

Recent Posts

Recent Comments

Tags more

Archives

Today

Total

Code&Data Insights

[Deep Learning] Recurrent Neural Networks (RNN) 본문

Artificial Intelligence/Deep Learning

[Deep Learning] Recurrent Neural Networks (RNN)

paka_corn 2023. 12. 10. 09:24

Recurrent Neural Networks (RNN)

: the current output depends on all the previous inputs *SEQUENCE*

Xt = current input

Ht = previous state

W = set of learnable parameters

Comparison between a hidden RNN layer and a linear layer

- Hidden RNN Layer

: the memory that maintains info about past inputs in the sequence

=> The hidden state is updated at each time stop and can capture short-term dependencies within the sequence.

- Linear Layer

: takes input and produces output without considering any temporal dependencies

=> A linear layer treats each time step independently, NOT capture sequential info

=> Lack of maintain memory of past inputs and their influence on the current output

Unfolded RNN

=> Final state hL : L depends on all the inputs

** apply the same neural network f for all the sequence steps (weight sharing)

** RNNs share a full neural network, deep in time-axis

-> can find arbitrary long pattern

Different Types of RNN

(1) a prediction at each time step(many-to-many)

(ex) stock prediction, prediction of next word

- apply a final transformation(linear model) on top of each hidden state

- apply a loss function at each time step(CCE, MSE)

- total loss will eb the sum(or avg) of all the losses at each time step

(2) a single prediction at the end(many-to-one)

(ex) speaker identification, emotion recognition from text

- apply a final transformation(linear model) on top of the last hidden state

- apply a loss function on top of it

(3) Seq2Seq

: RNN-based encoder + RNN-based decoder

- Encoder : encodes all the L inputs

- Decoder : takes one of many encoded states and generates(one by one) the output elements

=> loss is computed on top of each prediction

(ex) attention mechanism, machine translation, speech recognition

How to Train RNN

- After unfolding it, the RNN becomes a standard computational graph that employs long chains of computations

- Compute the gradient with the backpropagation algorithm ( backpropagation through time )

- With the gradient, we can update the parameters with gradient descent

Vanilla RNN

Problem in RNN - Vanishing Gradient

- In RNN, the gradient through a long temporal chain

- Vanishing/ Exploding Gradient

=> The vanishing gradient problem prevents the model to learn long-term dependencies

=> Adding shortcuts across time steps may prevent these problems

Bidirectional RNN

: employ two RNNs running in opposite directions, make a prediction at each time step based on the whole input elements and not only the previous ones.

- combine the forward and the backward state by concatenating them or summing them up

(ex) speech recognition, machine translation, and handwritten recognition

CNNs + RNNs + MLPs

- CNN : learns local contexts

- RNN : learn long-term dependencies

- MLP : performs a final classification

'Artificial Intelligence > Deep Learning' 카테고리의 다른 글

[Deep Learning] Regularization - Dropout \| Data Augmentation \| Multitask Learning (0)	2023.12.10
[Deep Learning] Convolutional Neural Network (CNN) (0)	2023.11.07
[Deep Learning] Vanishing and Exploding Gradients \| Weight Initialization \| Batch Normalization \| Layer Normalization (0)	2023.11.06
[Deep Learning] Advanced Optimization Methods - Momentum \| Adaptive Learning Rate \| AdaGrad \| RMSProp \| Adam \| Distributed Synchronous SGD (0)	2023.11.02
[Deep Learning] Optimization - Gradient Descent \| Stochastic Gradient Descent \| Mini-batch SGD (0)	2023.11.01