Code&Data Insights

[Machine Learning] Clustering - Hierarchical Clustering | Agglomerative Hierarchical Clustering | Dendrograms 본문

Data Science/Machine Learning

[Machine Learning] Clustering - Hierarchical Clustering | Agglomerative Hierarchical Clustering | Dendrograms

paka_corn 2023. 6. 19. 05:54

[ Hierarchical Clustering ]

Hierarchical Clustering : Hierarchical clustering is a data analysis technique that groups data hierarchically based on similarity or distance 

- Use Euclidean distance or Manhattan distance

- 2 approachs for hierarchical Clustering :

1) Agglomaerative- Top-down

2) Divisive - Bottom-up

 

 

[ Agglomerative Hierarchical Clustering ]

( Agglomerative Hierarchical Clustering :

시작 지점에서 군집을 설정하고, 조건을 만족할 때까지 유사한 두 개의 군집을 병합하는 것 )

 

 

How It Works?

Step 1) 

Make each data point a single-point cluster => it forms N clusters

-> If there are 7 data points, there are 7 clusters. 

 

Step 2)

Take the two closest data points and make them one cluster => It forms N-1 clusters

-> There are 7 clusters(7 data points), two closest data points will be comnined as one cluster. 

-> There are 6 clusters now. 

 

Step 3) 

Take the two closest clusters and make them one cluster => It forms N-2 clusters

-> There are 6 clusters(7 data points), two closest data points will be comnined as one cluster. 

-> There are 5 clusters now. 

Repeat STEP 3

 

Step 4)

Repeat STEP 3 until there is only one cluster 

-> There is one cluster(7 data points)

-> FINISH 

 

 

 

[ Dendrograms ]

Dendrograms : Dendrograms are graphical representations used in hierarchical clustering to display the hierarchical relationships between clusters.

- like tree-structures where each branch represents a cluster and the length of the branches indicates the similarity or dissimilarity between clusters.

 

 

 

 

 

How to Find Optimal Number of Clusters? 

- Observe the lengths of the branches. especially focus on the longest branch in the dendrogram. 

( the longest branch :  a significant difference or distance between clusters )

 

Optimal # of Clusters : 2

 

- Way to Find Longest Branch in the Dendrogram? 

:To identify the longest branch, you measure the length of the branches in the vertical direction. The length of a branch is determined by the vertical distance between the topmost cluster and the bottommost cluster. The branch with the greatest vertical distance between clusters is considered the longest branch.

 

 

Comments