Code&Data Insights

[Deep Learning] Convolutional Neural Network (CNN) 본문

Data Science/Deep Learning

[Deep Learning] Convolutional Neural Network (CNN)

paka_corn 2023. 11. 7. 02:00

What is a Convolution? 

: a standard operation, long used in compression, signal processing, computer vision, and image processing 

 

Convolution = Filtering = Feature Extraction 

 

 

 

Main difference with the MLP 

1) Local Connection

: Local connections can capture local patterns better than fully-connected models 

-> search for all the local patterns by sliding the same kernel

-> have the chance to react to patterns that are in different positions

 

2) Weight Sharing 

: differently to MLPs that employ different weights for different neurons 

 

 

 

Kernel Size

: the wieghts of the filter w are learned from data

- the length of the convolving filter, the kernel size = k 

 

 

Input Channels

- In general, the convolution takes a multi-channel input, filters it with some filters, and returns multi-channel output

- The multi-channel input is processed by a filter of dimensionality

- The convolutional framework is flexible enough to manage multi-channel inputs as well.

multi channel input size  = kernel_size * input channels 

 

 

Output Channels 

: the number of different feature maps generated through the convolution operation.

 

- Each output channel is responsible for detecting and storing various features.

- In CNN, we want each filter to react to different local patterns 

- All these outputs produced by the convolution are gathered in single matrix with dimensionality

- To capture multiple patterns, we have to process the input with many different filters

 

-> Multiple output channels can be used by the model to learn more complex patterns.

 

 

 

Stride 

: quantifies the amount of movement(step size) by which we slide a filter over an input

- hyperparameter of the convolution layer 

- if stride facter is bigger than 1, the effect is to compress the input 

 

 

 

Dilated Convolution 

- a technique that expands the filter by inserting holes between its consecutive elements

- can be done to cover a larger area of the input

 

 

Stacking Convolutional Layers

- we can stack multiple convolutional layers to form a deep convolutional network.

- optionally, normalization is applied right after the convolution (layernorm or batch norm) 

- then, a non-linearity is applied (ReLU or LeakyReLU)

=> after non-linearity, the set of features = feature maps 

 

- After stacking multiple convolutional layers, we can apply a final linear transformation(fully-connected layer)

- Finally, we apply a flattern operation that stacks in a single big vectore all the output channels 

=> Feature Extraction 

 

 

 

Receptive Field 

: the region of the input space that affects a particular unit of the network

 

 

- Show what specific area of the input image a particular neuron is looking at.

- To put it simply, the receptive field of a neuron represents the area in the input that the neuron uses for its computations.

- Smaller receptive fields enable neurons to detect finer and more detailed features.

 

=> Therefore, in CNNs, the receptive field is a crucial concept that explains how the model perceives and understands various parts of an image. 

 

 

* The receptive field depends on different factors

1) Kernel Size
2) Number of Layers
3) Stride Factor
4) Dilation factor

 

 

 

Parameters 

- the number of paramters in 1D convolutional layer

=> kernel size x input channel x output channel 

 

- the number of paramters in 2D convolutional layer

=> kernel size(x) x kernel size(y) x input channel x output channel 

 

 

 

Pooling 

: helps to make feature maps approximately invariant to small transitions of the input 

 

- pooling is often applied after the non-liearity(activation function) 

- the size of the sliding window and its stride factor are hyperparameters of the pooling operation 

 

(ex) Max Pooling, Avg Pooling

 

 

Comments