logo

基于图论的聚类

wangzf / 2023-02-28


基于图论的距离

Affinity Propagation

Affinity Propagation,亲和力传播

Affinity Propagation 是一种基于图论的聚类算法,旨在识别数据中的 “exemplars”(代表点)和 “clusters”(簇)。 与 K-Means 等传统聚类算法不同,Affinity Propagation 不需要事先指定聚类数目,也不需要随机初始化簇心, 而是通过计算数据点之间的相似性得出最终的聚类结果

Affinity Propagation 算法的优点是不需要预先指定聚类数目,且能够处理非凸形状的簇。 但是该算法的计算复杂度较高,需要大量的存储空间和计算资源,并且对于噪声点和离群点的处理能力较弱

import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.cluster import AffinityPropagation

# model
af = AffinityPropagation(preference = -563, random_state = 0).fit(X)
# cluster centers
cluster_centers_indices = af.cluster_centers_indices_

# cluster labels
af_labels = af.labels_

# number of clusters
n_clusters_ = len(cluster_centers_indices)
print(n_clusters_)

# result
plt.close("all")
plt.figure(1)
plt.clf()

colors = cycle("bgrcmykbgrcmykbgrcmykbgrcmyk")
for k, col in zip(range(n_clusters_), colors):
    class_members = af_labels == k
    cluster_center = X[cluster_centers_indices[k]]
    plt.plot(X[class_members, 0], X[class_members, 1], col + ".")
    plt.plot(
        cluster_center[0],
        cluster_center[1],
        "o",
        markerfacecolor = col,
        markeredgecolor = "k",
        markersize = 14,
    )
    for x in X[class_members]:
        plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)

plt.title("Estimated number of clusters: %d" % n_clusters_)
plt.show()

参考