Code&Data Insights

[Deep Learning] Regularization - Dropout | Data Augmentation | Multitask Learning 본문

Data Science/Deep Learning

[Deep Learning] Regularization - Dropout | Data Augmentation | Multitask Learning

paka_corn 2023. 12. 10. 08:31

Regularization 

: techniques used to control the complexity of  models and prevent overfitting are known as regularization techniques

 

=> Increase generalization ability! 

=> Deep Neural Networks are models with large capacity and thus more prone to overfitting problems!!! 

 

 

Data Augmentation 

: create fake training data by applying some transformations to the original data

- Used for classficiation problems, moslty image and audio 

 

=> augmentation applied at different levels of abstraction 

Noise injection might also work when noise is applied to the hidden units 

 

 

 

Multitask Learning

: divide the architecture into a shared one (early layers) and a task-specific one(last layers)

- input must be same! 

 

=> The shared representation will be much more robust and general as trained with more data on different tasks.

 

 

 

 

 

Dropout 

: a regularization method that efficiently approximates training a large number of neural networks.

=> Dropout can be viewed as a cheap way to train ensemble models with an exponential number of architectures. 

 

During training,

Some neurons are randomly ignored or dropped out 

neurons are temporarily removed from the networkd along with their input and output connections

 

- the probability of dropping a neuron is called dropout rate ρ

 

- One way to implement it is to sample a binary mask(that contains 0s and 1s) for each layer and multiply it with the output of the neurons

=> For every input sample, we sample a different mask 

 

 

During testing,

- Dropout is not used at test time

- all the neurons are  active, so we have to scale their outputs properly

=> we have a mismatch in the input scale between training and test! This might significantly harm the performance. 

=> we can compensatef or the effect of dropout by multiplying each input by(1-p) at test time.

 

 

 

 

 

 

 

Comments