Code&Data Insights
[Machine Learning] Supervised Learning - Regression (Linear | Multiple | Logistic ) 본문
[Machine Learning] Supervised Learning - Regression (Linear | Multiple | Logistic )
paka_corn 2023. 5. 26. 04:06[ Regression ]
What is the Regression?
Regression : predict a number from infinitely many possible outputs
- a technique for investigating the relationship between independent variables or features and a dependent variable or outcome. It's used as a method for predictive modelling in machine learning.
=> the point of the predictive modelling in Regression is to find the best(optimal) regression coefficients from given features and datasets by learning.
[ Linear Regression ]
Model in Linear Regression
parameter : w, b
Cost Function
=> Cost Function (=Lost function)
= RSS(we can find the regression error based on RSS) : Residual Sum of Square
MSE(Mean Squred Error) : cost function of simple linear regression
Cost Function Formula ( squared error cost function )
(1/2 is optional ) | measures the difference between predicted value and training data sets
( ŷ (y hat) : predicted output )
w and b are parameters of the model, adjusted as the model learns from the data. They’re also referred to as “coefficients” or “weights”
Ways for Minimize the Cost
Gradient Descent
: find the minimum cost function by derivation
=> For linear regression, if you find parameters and so that is very close to zero, the selected values of the parameters and cause the algorithm to fit the training set really well.
1) Start with some w, b (set w=0, b=0)
2) Keep changing w,b to reduce J(w,b)
3) Until we settle at or near a minimum ( repeat gradient descent algorithm until convergence )
** IMPORTANT ! -> Simultaneously update w and b !!!
Update step for parameter b
Derivative could be negative or positive depending on the slope
(i) If slope > 0, then derivative > 0 (positive number)
=> The learning rate ( is always a positive number, so if you take W minus a positive number, you end up with a new value for W that is smaller
(ii) If slope < 0, then derivative < 0 (negative number)
==> Therefore, the
Learning Rate
(i) If learning rate is too small, then the gradient descent may be very slower
(ii) If learning rate is too large, then the gradient descent may overshoot or never reach the minimum (even may fail to converge then diverge! )
=> can reach local minimum with fixed learning rate
==> can reach minimum without decreasing learning rate !!!
----------------------------------------------------------------------------------------------------------------------
[ Multiple Linear Regression ]
: has multiple features ( x = input values )
Vectorization
dot(w,x) : dot product => work much faster than for loop(without vectorization !
Gradient Descent
Ways to Enhance Gradient Descent - Feature scaling
Feature scaling
: aim for about -1 < Xj < 1 for each feature Xj.
=> Rescaling could be done until acceptable ranges
- Do Mean Normalization
=> If it's not working properly, we can start from very small value of learning rate.
However! if learning rate is too small, the number of iteration will be too large!
Therefore, we can try multiple value by increasing learning rate for example (0.001, 0.01, 0.1, 0.3,..., 1)
==> Since the cost function is increasing, we know that gradient descent is diverging, so we need a lower learning rate. -> the cost function should be decreasing and it's ideal to have a value close to 0!
----------------------------------------------------------------
[ Logistic Regression ]
Logistic Regression - Naming is regression but mostly used in classification
=> solve the binary classification problem
Logical Loss Function
=> Loss is lowest, when
predicts clost to true label Y(i)