我构建了一个实时增强现实飞机识别器,以下是使其运作的数学原理。
我正在开发一款安卓应用,当你将手机指向天空时,它可以识别头顶上的飞机。该应用获取实时的ADS-B数据,并在摄像头画面上叠加飞机标签,但计算过程花费的时间比我预期的要长得多,因此我将这一切写了下来。
问题听起来简单,你有一个在空中的GPS坐标和一个在手中的GPS坐标。你想要一个像素。但在这两者之间有四个不同的坐标空间,它们之间的转换有一些符号约定,可能会静默失败,输出错误却没有任何错误提示。
数据处理流程:
```
大地坐标 (纬度, 经度, 高度)
↓ 平面地球近似 — 有效范围<100公里,50海里范围内误差<2像素
ENU — 东、北、上(米)
↓ 从Android的TYPE_ROTATION_VECTOR传感器获取的R⊤
设备坐标系 (dX, dY, dZ)
↓ 一个符号翻转:Cz = −dZ
摄像头坐标系 (Cx, Cy, Cz)
↓ 透视除法 + FOV归一化
屏幕像素 (Xpx, Ypx)
```
每个转换为何不明显:
大地坐标 → ENU。东向分量有一个余弦因子,大多数实现都忽略了:E = Δλ × (π·RE/180) × cos(φ_user)。经线向极点收敛,25°纬度的一个经度所对应的米数少于赤道的。如果不考虑这一点,东西方向的位置在赤道附近看起来是正确的,但随着纬度的增加会悄然偏离。
ENU → 设备坐标系。Android的旋转矩阵R将设备坐标轴映射到ENU世界坐标轴。要反向转换则使用R⊤。在Android的行主序FloatArray(9)中,这意味着列索引,而不是行索引:
```
R (正向):dX = R[0]·E + R[1]·N + R[2]·U
R⊤ (反向):dX = R[0]·E + R[3]·N + R[6]·U
```
这两者产生完全不同的结果。两者编译时没有任何问题。
设备 → 摄像头坐标系。Android的传感器定义+Zd为指向屏幕外面朝向你的脸。摄像头约定要求+Cz指向场景内部。因此Cz = −dZ,始终如此。这是竖屏模式下唯一需要的修正。
摄像头 → 屏幕。在透视除法和FOV归一化之后,Y轴会翻转:Ypx = (1 − NDCy) × H/2。摄像头的+Cy向上;屏幕的y=0在顶部。如果我们忽略这一点,位于地平线上的飞机会出现在屏幕中心的下方。
实际捕获的值(ATR72,18,000英尺):
```
用户: 24.8600°N, 80.9813°E
飞机: 24.9321°N, 81.0353°E
ENU: E=6,010米 N=8,014米 U=5,486米
方位角34.2°(东北偏北),仰角29.5°,距离11.1公里
摄像头坐标系(经过R⊤ + 符号修正后):(729, 4692, 10077)
大小:11,140米 ≈ 11,138米(ENU范围)
屏幕(1080×1997,θH=66°,θV=50°):(600 px, 1 px)
```
手机方位角33.0°,飞机方位角34.2° → 相对中心偏右1.2°。手机俯仰角−4.3°,仰角29.5° → 净仰角33.8°,刚好在视锥的顶部边缘内。整个过程在物理上是一致的。
欢迎就数据处理流程的任何阶段或其他任何有趣的内容提问。
查看原文
I've been building an Android app that identifies aircraft overhead when
you point your phone at the sky. The app fetches live ADS-B data and
overlays aircraft labels on the camera feed, but getting the math right
took much longer than I expected, so I wrote it all up.<p>The problem sounds simple, you have a GPS coordinate in the sky and a GPS
coordinate in your hand. You want a pixel. But there are four distinct
coordinate spaces between those two things, and the transitions between
them have sign conventions that fail silently, wrong output with no error.<p>The pipeline:<p><pre><code> Geodetic (lat, lon, alt)
↓ flat-earth approx — valid <100 km, error <2 px at 50 nm range
ENU — East, North, Up (metres)
↓ R⊤ from Android TYPE_ROTATION_VECTOR sensor
Device frame (dX, dY, dZ)
↓ one sign flip: Cz = −dZ
Camera frame (Cx, Cy, Cz)
↓ perspective divide + FOV normalisation
Screen pixels (Xpx, Ypx)
</code></pre>
Why each transition is non-obvious:<p>Geodetic → ENU. The East component has a cosine factor that most
implementations miss: E = Δλ × (π·RE/180) × cos(φ_user). Meridians
converge toward the poles, one degree of longitude is fewer metres at
latitude 25° than at the equator. Without it, East-West positions look
correct near the equator and quietly diverge as latitude increases.<p>ENU → Device frame. Android's rotation matrix R maps device axes to ENU
world axes. To go the other direction you use R⊤. In Android's row-major
FloatArray(9), this means column indices, not row indices:<p><pre><code> R (forward): dX = R[0]·E + R[1]·N + R[2]·U
R⊤ (inverse): dX = R[0]·E + R[3]·N + R[6]·U
</code></pre>
These produce completely different results. Both compile without complaint.<p>Device → Camera frame. Android's sensor defines +Zd as pointing out of
the screen toward your face. The camera convention requires +Cz to point
into the scene. So Cz = −dZ, always. This is the only correction needed
for portrait mode.<p>Camera → Screen. After the perspective divide and FOV normalisation, the
Y axis flips: Ypx = (1 − NDCy) × H/2. Camera +Cy is up; screen y=0 is
at the top. If we miss this, the aircraft above the horizon appears below screen
centre.<p>Real captured values (ATR72, 18,000 ft):<p><pre><code> User: 24.8600°N, 80.9813°E
Aircraft: 24.9321°N, 81.0353°E
ENU: E=6,010 m N=8,014 m U=5,486 m
Bearing 34.2° (NNE), Elevation 29.5°, Range 11.1 km
Camera frame (after R⊤ + sign fix): (729, 4692, 10077)
Magnitude: 11,140 m ≈ 11,138 m (ENU range)
Screen (1080×1997, θH=66°, θV=50°): (600 px, 1 px)
</code></pre>
Phone azimuth 33.0°, aircraft bearing 34.2° → 1.2° right of centre.
Phone pitched −4.3°, elevation 29.5° → net 33.8° up, just inside the
top edge of the frustum. Physically consistent throughout.<p>Happy to answer questions about any stage of the pipeline or about anything else, whatever is
interesting to anyone.