文章内容如有错误或排版问题,请提交反馈,非常感谢!
在实际的应用场景,通常会遇到计算多个经纬度中心的需求。而在计算经纬度中心点通常有三种方式,每种方式对应不同的需求。
地理中心点
地理中心点的求解过程非常的简单,即将每个经纬度转化成x,y,z的坐标值。然后根据根据x,y,z的值,寻找3D坐标系中的中心点。
from math import cos, sin, atan2, sqrt, radians, degrees
def get_centroid(cluster):
x = y = z = 0
coord_num = len(cluster)
for coord in cluster:
lat, lon = radians(coord[0]), radians(coord[1])
x += cos(lat) * cos(lon)
y += cos(lat) * sin(lon)
z += sin(lat)
x /= coord_num
y /= coord_num
z /= coord_num
return [degrees(atan2(y, x)), degrees(atan2(z, sqrt(x * x + y * y)))]
平均经纬度
所谓的平均经纬度是将经纬度坐标看成是平面坐标,直接计算经度和纬度的平均值。注意:该方法只是大致的估算方法,仅适合距离在400KM以内的点。
from math import pi
def get_geo_mid(data):
x = y = 0
coord_num = len(data)
for coord in data:
lat = coord[0]
lon = coord[1]
x += lat
y += lon
x /= coord_num
y /= coord_num
return lat, lon
最小距离点
所谓的最小距离点,表示的是如何给出的点中哪一点到各个点的距离最近,常用于路径相关的场景。比较简单的实现方式是使用K-Means,并将K值设为1。注意,ScikitLearn中自带的Kmeans默认是欧式距离,不支持自定义。解决方法是自己实现:
from math import radians, sin, cos, asin, sqrt
import numpy as np
import pandas as pd
def haversine(latlon1, latlon2):
"""
计算两经纬度之间的距离
"""
if (latlon1 - latlon2).all():
lat1, lon1 = latlon1
lat2, lon2 = latlon2
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * asin(sqrt(a))
r = 6370996.81 # 地球半径
distance = c * r
else:
distance = 0
return distance
## KMeans算法实现-开始
def haversine_distance_matrix(X, Y=None):
"""Harversine distance matrix calculation"""
if Y is None:
Y = X
return np.apply_along_axis(lambda a, b: np.apply_along_axis(haversine, 1, b, a), 1, X[:, [1, 0]], Y[:, [1, 0]])
def initialize_centroids(points, k):
"""returns k centroids from the initial points"""
centroids = points.copy()
np.random.shuffle(centroids)
return centroids[:k]
def move_centroids(points, closest, centroids):
"""returns the new centroids assigned from the points closest to them"""
new_centroids = [points[closest == k].mean(axis=0)
for k in range(centroids.shape[0])]
for i, c in enumerate(new_centroids):
if np.isnan(c).any():
new_centroids[i] = centroids[i]
return np.array(new_centroids)
def closest_centroid_haversine(points, centroids):
"""returns an array containing the index to the nearest centroid for each
point
"""
distances = haversine_distance_matrix(centroids, points)
return np.argmin(distances, axis=0)
def clustering_by_kmeams(df, n_clusters=1, max_iter=300):
"""
KMeans聚类算法入口
:param X:
:return:
"""
X = df[['lon', 'lat']].as_matrix()
centroids = initialize_centroids(X, n_clusters)
old_centroids = centroids
i = 0
while i < max_iter:
i += 1
#print("Iteration#{0:d}".format(i))
closest = closest_centroid_haversine(X, centroids)
centroids = move_centroids(X, closest, centroids)
done = np.all(np.isclose(old_centroids, centroids))
if done:
break
old_centroids = centroids
cdf = pd.DataFrame(centroids, columns=['lon', 'lat'])
#k_means_labels = closest_centroid_haversine(X, centroids)
#df['assigned_points'] = k_means_labels + 1
return cdf
参考链接:




有个问题kmeans取一类,中心点不就是求算数平均值吗?第三种方法的意义在哪里?