Code&Data Insights

[Deep Learning] Neural Networks - Perceptron | MLP | Activation Function | Backpropagation | Gradient Descent 본문

Data Science/Deep Learning

[Deep Learning] Neural Networks - Perceptron | MLP | Activation Function | Backpropagation | Gradient Descent

paka_corn 2023. 6. 22. 11:10

Neural Networks 

Neural Network : A neural network is composed of neurons that take inputs and calculate outputs through weights and activation functions. Through this process, it learns patterns in data and gains the ability to make predictions.

 

Learning in Neural Network 

1) Feed Forward 

 

 

2) Compute Loss 

 

 

3) Backpropagate ( = chain rule) 

: the method to compute the gradient efficiently

 

 

4) Gradient Descent ( * Optimization Algorithm *

 

: attempts to optimize the objective function by progressively updating the parameters (weights) using the gradient.

-> Find the best weight to minimize the loss! 

 

 

Perceptron 

Perceptron : A perceptron consists of one or more input nodes, weights assigned to each input, a summation function, an activation function, and an output node.

 

=> no hidden layer, only single layer

=> input nodes (features) + weights + output 

- Sum(Weighted Sum) : decide how much weighted to be assigned to each feature. 

                                                  : the sum of the inputs multiplied by their corresponding weights.

 

- Output : Apply activation function to Weighted Sum 

 

- weight : learnable parameter, reflects the importance of the input.

 

- bias : an additional weighted input that is commonly added to each neuron, which serves as an offset that allows modifying the neuron’s output independently from the input

 

[ Limits of the Perceptron ] 

- Can Only model linear decision boundaries 

 

 

 

Multilayer Perceptron (MLP)  

 

 

 

 

- each neuron performs a weighted sum of the previous inputs and then applies the non-linear activation function 

- use a non-linear activation function for more complex decision boundaries 

 

 

 

Activation Function 

Activation Function :  it generates outputs by receiving inputs from each neuron in an artificial neural network.

=> After computing the weighted sum of the input signals and bias, the activation function performs a non-linear transformation on the result to generate the output.

 

 

- the aim of introducing non-linearity to the neuron’s output, allowing for more complex representations.

 

 

1) Threshold Function

: if the value is less than 0, then 0 

if value is more than 0, then 1. 

 

-> Not used in real neural network. 

=> If we use this function, adding more hidden layers doesn't work effectively; it behaves like a linear model.

 

 

 

 

 

 

2) Sigmoid Function 

: the probability of the output to be equal to one.

- maps the input to a value between 0 and 1.

=> commonly used is to apply for the output layer (classification -> Yes or No)

 

* Due to the issue of vanishing gradient problem, activation functions like sigmoid and hyperbolic tangent are not commonly used in deep neural networks.

 

 

 

 

3) Rectifier Linear Unit (ReLU)

 

- maps negative inputs to 0, and maintains positive inputs

=> commonly used is to apply for the hidden layer 

 

 

4) Leaky ReLU 

 : allows a small, non-zero gradient when the input is negative, preventing the vanishing problem in ReLU function.

 

 

 

 

5) Hyperbolic Tangent(tanh)

: similar as sigmoid function, but the range is from -1 to 1. 

 

- generates a probability distribution across the possible outputs (usually for discrete outputs), where the sum of the probabilities adds up to 1.

- commonly used at the output layer for classification tasks.

 

* Due to the issue of vanishing gradient problem, activation functions like sigmoid and hyperbolic tangent are not commonly used in deep neural networks.

 

 

 

Backpropagation 

: backpropagation is the algorithm which calculates the gradient of the error function with respect to the neural network's weights 

 

applying the chain rule of calculus to compute the partial derivatives of the error with respect to the weights.



the weights of the network are then updated using an optimization algorithm (commonly gradient descent) in a direction that reduces the error.

 

 

 

 

Gradient Descent 

Gradient Descent 

=> find the minimum cost function by derivation

 

cost function = 1/2(y(hat) - y)^2 = 1/2(prediction value - actual value)^2

Cost Function (=Lost function) = RSS (Residual Sum of Square)

 

 

 

 

==> Backpropagation vs Gradient descent

backpropagation : compute gradient!

 

gradient descent : optimization algorithm to find the gradient to minimize the error!!! 

Comments