HackerNews中文版

我们正在构建一个无人机系统，旨在利用机载计算（高通QRB5165）物理拦截快速移动的目标（速度超过100公里/小时）。我们在延迟与分辨率的权衡上遇到了瓶颈，非常希望听到计算机视觉/嵌入式社区的实战经验和意见。约束条件：我们需要高清分辨率以便在远距离检测小目标，但在全高清帧上运行推理会严重影响我们的控制回路频率（目标是小于20毫秒的玻璃到电机响应时间）。我们正在讨论两种架构路径：选项A：静态切片（SAHI风格）将高清帧切割成重叠的图块。优点：对小物体的检测概率高。缺点：即使在无非极大值抑制（NMS）架构下，DSP上的推理时间也会有效地增加三倍。延迟峰值会导致我们的比例导航引导出现振荡。选项B：动态感兴趣区域（“狙击手方法”）以高帧率运行低分辨率的全局搜索（320x320）。一旦找到目标，从原始摄像头流中锁定一个动态的高清感兴趣区域（ROI），并仅对该区域进行推理。优点：极快，保持控制回路紧凑。缺点：单点故障。如果跟踪器（卡尔曼滤波器）因突发的自我运动而失去该区域，我们将失去视野，直到全局搜索重新获取。在终端拦截阶段，这将导致失败。这里有没有人成功在边缘计算芯片（Jetson/Hexagon DSP）上实现稳健的动态ROI以应对不规则目标？我们是否在过度设计，还是全帧高清推理对于实时引导来说根本行不通？任何相关论文或代码库的推荐都非常感谢。附言：如果你热衷于解决这类问题（并且喜欢在慕尼黑解决这些问题），我们正在寻找一位创始工程师来负责整个流程。请查看个人资料中的邮箱。

查看原文

We are building a UAV system to physically intercept fast-moving targets (100km/h+) using onboard compute only (Qualcomm QRB5165).We hit a wall regarding the Latency vs. Resolution trade-off and I’d love to hear some battle-tested opinions from the CV/Embedded community.The constraint: We need HD resolution to detect small targets at range, but running inference on full HD frames kills our control loop frequency (Target is <20ms glass-to-motor response).We are debating two architectural paths:Option A: Static Tiling (SAHI-style) Slice the HD frame into overlapping tiles.Pro: High detection probability for small objects.Con: Even with NMS-free architectures, the inference time on the DSP effectively triples. Latency spikes cause our Proportional Navigation guidance to oscillate.Option B: Dynamic ROI ("The Sniper Approach") Run a low-res global search (320x320) at high FPS. Once a target is found, lock a dynamic High-Res Region of Interest (ROI) from the raw camera stream and only run inference on that crop.Pro: Extremely fast. Keeps the loop tight.Con: Single Point of Failure. If the tracker (Kalman Filter) loses the crop due to abrupt ego-motion, we are blind until global search re-acquires. In a terminal phase intercept, that’s a miss.Has anyone here successfully implemented robust Dynamic ROI on edge silicon (Jetson/Hexagon DSP) for erratic targets? Are we over-engineering this, or is full-frame HD inference simply dead on arrival for real-time guidance?Any pointers to papers or repos are appreciated.PS: If you live for these kinds of problems (and enjoy solving them in Munich), we are looking for a Founding Engineer to own this entire pipeline. Email in profile.

问HN：动态感兴趣区域（ROI）与平铺技术在高速目标跟踪（<20毫秒延迟）中的比较？