数据, 术→技巧

多经纬度坐标中心点计算方法

钱魏Way · · 10,791 次浏览

在实际的应用场景,通常会遇到计算多个经纬度中心的需求。而在计算经纬度中心点通常有三种方式,每种方式对应不同的需求。

地理中心点

地理中心点的求解过程非常的简单,即将每个经纬度转化成x,y,z的坐标值。然后根据根据x,y,z的值,寻找3D坐标系中的中心点。

from math import cos, sin, atan2, sqrt, radians, degrees

def get_centroid(cluster):
    x = y = z = 0
    coord_num = len(cluster)
    for coord in cluster:
        lat, lon = radians(coord[0]), radians(coord[1])
        x += cos(lat) * cos(lon)
        y += cos(lat) * sin(lon)
        z += sin(lat)
    x /= coord_num
    y /= coord_num
    z /= coord_num
    return [degrees(atan2(y, x)), degrees(atan2(z, sqrt(x * x + y * y)))]

平均经纬度

所谓的平均经纬度是将经纬度坐标看成是平面坐标,直接计算经度和纬度的平均值。注意:该方法只是大致的估算方法,仅适合距离在400KM以内的点。

from math import pi


def get_geo_mid(data):
    x = y = 0
    coord_num = len(data)

    for coord in data:
        lat = coord[0]
        lon = coord[1]

        x += lat
        y += lon

    x /= coord_num
    y /= coord_num

    return lat, lon

最小距离点

所谓的最小距离点,表示的是如何给出的点中哪一点到各个点的距离最近,常用于路径相关的场景。比较简单的实现方式是使用K-Means,并将K值设为1。注意,Scikit Learn中自带的Kmeans默认是欧式距离,不支持自定义。解决方法是自己实现:

from math import radians, sin, cos, asin, sqrt
import numpy as np
import pandas as pd


def haversine(latlon1, latlon2):
    """
    计算两经纬度之间的距离
    """
    if (latlon1 - latlon2).all():
        lat1, lon1 = latlon1
        lat2, lon2 = latlon2
        lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
        c = 2 * asin(sqrt(a))
        r = 6370996.81  # 地球半径
        distance = c * r
    else:
        distance = 0
    return distance


# # KMeans算法实现 - 开始
def haversine_distance_matrix(X, Y=None):
    """Harversine distance matrix calculation"""
    if Y is None:
        Y = X
    return np.apply_along_axis(lambda a, b: np.apply_along_axis(haversine, 1, b, a), 1, X[:, [1, 0]], Y[:, [1, 0]])


def initialize_centroids(points, k):
    """returns k centroids from the initial points"""
    centroids = points.copy()
    np.random.shuffle(centroids)
    return centroids[:k]


def move_centroids(points, closest, centroids):
    """returns the new centroids assigned from the points closest to them"""
    new_centroids = [points[closest == k].mean(axis=0)
                     for k in range(centroids.shape[0])]
    for i, c in enumerate(new_centroids):
        if np.isnan(c).any():
            new_centroids[i] = centroids[i]
    return np.array(new_centroids)


def closest_centroid_haversine(points, centroids):
    """returns an array containing the index to the nearest centroid for each
       point
    """
    distances = haversine_distance_matrix(centroids, points)
    return np.argmin(distances, axis=0)


def clustering_by_kmeams(df, n_clusters=1, max_iter=300):
    """
    KMeans聚类算法入口
    :param X:
    :return:
    """
    X = df[['lon', 'lat']].as_matrix()
    centroids = initialize_centroids(X, n_clusters)
    old_centroids = centroids
    i = 0
    while i < max_iter:
        i += 1
        # print("Iteration #{0:d}".format(i))
        closest = closest_centroid_haversine(X, centroids)
        centroids = move_centroids(X, closest, centroids)
        done = np.all(np.isclose(old_centroids, centroids))
        if done:
            break
        old_centroids = centroids
    cdf = pd.DataFrame(centroids, columns=['lon', 'lat'])
    # k_means_labels = closest_centroid_haversine(X, centroids)
    # df['assigned_points'] = k_means_labels + 1
    return cdf

参考链接:

One Reply to “多经纬度坐标中心点计算方法”

  1. 有个问题kmeans取一类,中心点不就是求算数平均值吗?第三种方法的意义在哪里?

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注