Code&Data Insights

[Machine Learning] Hyper parameters | Feature Importance | Gini Impurity | Mean Decrease in Impurity 본문

Data Science/Machine Learning

[Machine Learning] Hyper parameters | Feature Importance | Gini Impurity | Mean Decrease in Impurity

paka_corn 2023. 7. 21. 07:06

Hyper parameters

 

Hyper parameters

: configuration values used to control and tune the behavior of machine learning algorithms and models. These parameters have a significant impact on the model's training process and performance. 

 

=>  Properly setting hyperparameters can optimize the model's performance and prevent overfitting.

=> Model Performance Optimization, Preventing Overfitting, Saving Training Time and Resources, Understanding and Interpreting Algorithms, and Enhancing Model Generalization

 

Feature Importance 

: a technique used to determine the relative importance of each feature (input variable) in making decisions and predictions. 

 

=> It allows us to understand which features have the most significant impact on the model's predictions. Identifying important features can help in several ways, including: Feature Selection, Feature Engineering, and Model Explanation.

 

 

 

Methods to compute feature importance 

1) Gini Impurity (for classification)

Gini Impurity? 

: a metric used in classification problems during the construction of decision trees to measure the impurity of a node. Impurity refers to the degree of mixed class labels in a node, and a lower Gini impurity indicates a purer node. The decision tree algorithm aims to minimize Gini impurity by splitting the data in the direction that reduces impurity at each node during the tree-building process.

 

 

2) Mean Decrease in Impurity (for regression)

Mean Decrease in Impurity ?

It measures the importance of a feature by calculating the average reduction in the mean squared error (MSE) across all the nodes in the tree when that feature is used for splitting. A higher Mean Decrease in Impurity score indicates a more significant impact of the feature on the model's predictive performance.

 

 

 

 

 

Comments