Code&Data Insights
[Machine Learning] Clustering - Hierarchical Clustering | Agglomerative Hierarchical Clustering | Dendrograms 본문
[Machine Learning] Clustering - Hierarchical Clustering | Agglomerative Hierarchical Clustering | Dendrograms
paka_corn 2023. 6. 19. 05:54[ Hierarchical Clustering ]
Hierarchical Clustering : Hierarchical clustering is a data analysis technique that groups data hierarchically based on similarity or distance
- Use Euclidean distance or Manhattan distance
- 2 approachs for hierarchical Clustering :
1) Agglomaerative- Top-down
2) Divisive - Bottom-up
[ Agglomerative Hierarchical Clustering ]
( Agglomerative Hierarchical Clustering :
시작 지점에서 군집을 설정하고, 조건을 만족할 때까지 유사한 두 개의 군집을 병합하는 것 )
How It Works?
Step 1)
Make each data point a single-point cluster => it forms N clusters
-> If there are 7 data points, there are 7 clusters.
Step 2)
Take the two closest data points and make them one cluster => It forms N-1 clusters
-> There are 7 clusters(7 data points), two closest data points will be comnined as one cluster.
-> There are 6 clusters now.
Step 3)
Take the two closest clusters and make them one cluster => It forms N-2 clusters
-> There are 6 clusters(7 data points), two closest data points will be comnined as one cluster.
-> There are 5 clusters now.
Repeat STEP 3
Step 4)
Repeat STEP 3 until there is only one cluster
-> There is one cluster(7 data points)
-> FINISH
[ Dendrograms ]
Dendrograms : Dendrograms are graphical representations used in hierarchical clustering to display the hierarchical relationships between clusters.
- like tree-structures where each branch represents a cluster and the length of the branches indicates the similarity or dissimilarity between clusters.
How to Find Optimal Number of Clusters?
- Observe the lengths of the branches. especially focus on the longest branch in the dendrogram.
( the longest branch : a significant difference or distance between clusters )
- Way to Find Longest Branch in the Dendrogram?
:To identify the longest branch, you measure the length of the branches in the vertical direction. The length of a branch is determined by the vertical distance between the topmost cluster and the bottommost cluster. The branch with the greatest vertical distance between clusters is considered the longest branch.