목록All Contents (170)
Code&Data Insights
Bag of Word Model - The order is ignored (in the sentence) - Fast/simple (ex) Multinomial Naïve Bayes text classification(spam filtering) Information Retrieval (google search) - Representation of a documents => Vectors of pairs => Word : all words in the vocabulary (aka term) => Value: a number associated with the word in the document - Different possible schemes (1) Boo..
Recurrent Neural Networks (RNN) : the current output depends on all the previous inputs *SEQUENCE* Xt = current input Ht = previous state W = set of learnable parameters Comparison between a hidden RNN layer and a linear layer - Hidden RNN Layer : the memory that maintains info about past inputs in the sequence => The hidden state is updated at each time stop and can capture short-term dependenc..
Regularization : techniques used to control the complexity of models and prevent overfitting are known as regularization techniques => Increase generalization ability! => Deep Neural Networks are models with large capacity and thus more prone to overfitting problems!!! Data Augmentation : create fake training data by applying some transformations to the original data - Used for classficiation pr..
Main Steps in Machine Learning Project 1. Look at the big picture 2. Get the data 3. Discover and visualize the data to gain insights 4. Prepare the data for ML algorithms (data cleaning, preprocessing) 5. Select a model and train it 6. Fine-tune the model 7. Present your solution 8. Launch, monitor, and maintain your system 1. Look at the Big Picture - Frame the Problem(business objective) : wh..
What is a Convolution? : a standard operation, long used in compression, signal processing, computer vision, and image processing Convolution = Filtering = Feature Extraction Main difference with the MLP 1) Local Connection : Local connections can capture local patterns better than fully-connected models -> search for all the local patterns by sliding the same kernel -> have the chance to react ..
Vanishing Gradients - usually occurs due to the fact that some activation functions squash the input into small values result in small gradients that result in negligible updates to the weights of the model - Or sometimes the input values are small to begin with : When backpropagation the gradient through long chains of computations, the gradient gets smaller and smaller - causes the gradient of..
Regularization : prevent overfitting and improve the generalization of a model - It introduces additional constraints or penalties into the model training process to discourage the model from becoming too complex. - It aims to strike a balance between fitting the training data well and maintaining simplicity in the model Early Stopping : monitor the performance after each epoch on the validation..
Benefits of Advanced optimization methods - Faster Convergence - Improved Stability - Avoiding Local Minima - Better Generalization Momentum : accumulates an exponentially-decaying moving average of the past gradients - NOT ONLY denpends on learning rate, but ALSO past gradients (SDG with Batch) If the previous update vt is very different from the current gradient => little update If previous up..