Code&Data Insights

[Machine Learning] Decision Trees | Entropy 본문

Data Science/Machine Learning

[Machine Learning] Decision Trees | Entropy

paka_corn 2023. 5. 31. 11:33

[ Decision Trees ]

Decision Tree  :  a type of Supervised Machine Learning where the data is continuously split according to a certain parameter(features). The tree can be explained by two entities, namely decision nodes and leaves.

 

 

 

 

Decision Tree Learning 

Decision 1 . How to choose what feature to spilt on at each node? 

 

 

Decision 2. When do you stop splitting? => stopping critria 

- When a node is 100% one class

- When splitting a node will result in the tree exceeding a maximum depth 

(keeping the tree depth less -> can prevent to overfitting

- Information gain from additional spilts is less than threshold

- When number of examples in a node is below a threshold

 

 

Learning Step 

1) Start with all examples at the root node

2) Calculate information gain for all possible features, and pick the one with the highest information gain

3) Spilt dataset according to selected feature, and create left and right branches of the tree

4) Keep repeating splitting process until stopping critria is met -- > Recursive Algorithm

 

 

 

 

 

[ Measure of Impurity - Entropy ] 

Entropy : a measure of the level of disorder or uncertainty in a given dataset or system.

- Entroty quantifies similarities & differences 

 

- Entropy relates to Expected Value & Mutual Information(quantifies the relationship between two things)

 

expected value

- surprise : a highly improbable outcome is very surprising

 

 

 

=> Entropy with surprise & expected value 

 

Comments