我们想要的离线地理编码器

4作者: gipsyjaeger大约 2 个月前原帖
这是什么?<p>这是一个用Python编写的离线逆地理编码器。给定一对经纬度,它可以返回正确的行政区域,例如国家、州或区,而无需调用任何外部API。这避免了API费用、速率限制和网络依赖。<p>为什么要构建另一个逆地理编码器?<p>大多数离线逆地理编码器依赖于最近邻查找。虽然这种方法速度较快,但在边界附近往往会失败,因为最近的位置不一定是正确的行政区域。该项目专注于准确性而非接近性,通过验证坐标实际落入哪个边界来确保正确性。<p>它是如何工作的?<p>系统使用KD树快速筛选附近的行政边界。对于这些候选区域,系统执行多边形包含性检查,以确认真实的区域。它支持小工作负载的单进程执行和大批量处理的多进程执行。<p>性能<p>该系统在2秒内处理10,000个坐标,平均每个坐标的多边形验证时间低于0.4毫秒。<p>这适合谁?<p>任何需要逆地理编码、可预测成本和大规模批处理的人。<p>实现说明<p>最初这是一个玩具实现,用于探索边界感知的逆地理编码,但结果证明它足够可靠,可以用于实际生产。该数据集覆盖了210多个国家,拥有超过145,000个行政边界。<p>链接<p>源代码:https://github.com/SOORAJTS2001/gazetteer<p>文档:https://gazetteer.readthedocs.io/en/stable<p>欢迎反馈,特别是关于方法、性能权衡和边缘案例的意见。
查看原文
What is this?<p>This is an offline reverse geocoder written in Python. Given a latitude–longitude pair, it returns the correct administrative region such as country, state, or district without calling any external APIs. This avoids API costs, rate limits, and network dependency.<p>Why build another reverse geocoder?<p>Most offline reverse geocoders rely on nearest-neighbor lookups. While fast, this approach often fails near borders because the closest location is not always the correct administrative region. This project focuses on correctness over proximity by verifying which boundary a coordinate actually falls inside.<p>How does it work?<p>A KD-Tree is used to quickly shortlist nearby administrative boundaries. For those candidates, the system performs polygon containment checks to confirm the true region. It supports both single-process execution for small workloads and multiprocessing for large batch processing.<p>Performance<p>The system processes 10,000 coordinates in under 2 seconds, with an average polygon validation time below 0.4 milliseconds per coordinate.<p>Who is this for?<p>Anyone who needs reverse geocoding, predictable costs, large-scale batch processing.<p>Implementation notes<p>This started as a toy implementation to explore boundary-aware reverse geocoding, but it turned out to be reliable enough for real production use. The dataset covers more than 210 countries with over 145,000 administrative boundaries.<p>Links<p>Source code: https:&#x2F;&#x2F;github.com&#x2F;SOORAJTS2001&#x2F;gazetteer<p>Documentation: https:&#x2F;&#x2F;gazetteer.readthedocs.io&#x2F;en&#x2F;stable<p>Feedback is welcome, especially around the approach, performance trade-offs, and edge cases.