问HN:动态感兴趣区域(ROI)与平铺技术在高速目标跟踪(<20毫秒延迟)中的比较?

1作者: LucaHerakles3 天前原帖
我们正在构建一个无人机系统,旨在利用机载计算(高通QRB5165)物理拦截快速移动的目标(速度超过100公里/小时)。<p>我们在延迟与分辨率的权衡上遇到了瓶颈,非常希望听到计算机视觉/嵌入式社区的实战经验和意见。<p>约束条件:我们需要高清分辨率以便在远距离检测小目标,但在全高清帧上运行推理会严重影响我们的控制回路频率(目标是小于20毫秒的玻璃到电机响应时间)。<p>我们正在讨论两种架构路径:<p>选项A:静态切片(SAHI风格)将高清帧切割成重叠的图块。<p>优点:对小物体的检测概率高。<p>缺点:即使在无非极大值抑制(NMS)架构下,DSP上的推理时间也会有效地增加三倍。延迟峰值会导致我们的比例导航引导出现振荡。<p>选项B:动态感兴趣区域(“狙击手方法”)以高帧率运行低分辨率的全局搜索(320x320)。一旦找到目标,从原始摄像头流中锁定一个动态的高清感兴趣区域(ROI),并仅对该区域进行推理。<p>优点:极快,保持控制回路紧凑。<p>缺点:单点故障。如果跟踪器(卡尔曼滤波器)因突发的自我运动而失去该区域,我们将失去视野,直到全局搜索重新获取。在终端拦截阶段,这将导致失败。<p>这里有没有人成功在边缘计算芯片(Jetson/Hexagon DSP)上实现稳健的动态ROI以应对不规则目标?我们是否在过度设计,还是全帧高清推理对于实时引导来说根本行不通?<p>任何相关论文或代码库的推荐都非常感谢。<p>附言:如果你热衷于解决这类问题(并且喜欢在慕尼黑解决这些问题),我们正在寻找一位创始工程师来负责整个流程。请查看个人资料中的邮箱。
查看原文
We are building a UAV system to physically intercept fast-moving targets (100km&#x2F;h+) using onboard compute only (Qualcomm QRB5165).<p>We hit a wall regarding the Latency vs. Resolution trade-off and I’d love to hear some battle-tested opinions from the CV&#x2F;Embedded community.<p>The constraint: We need HD resolution to detect small targets at range, but running inference on full HD frames kills our control loop frequency (Target is &lt;20ms glass-to-motor response).<p>We are debating two architectural paths:<p>Option A: Static Tiling (SAHI-style) Slice the HD frame into overlapping tiles.<p>Pro: High detection probability for small objects.<p>Con: Even with NMS-free architectures, the inference time on the DSP effectively triples. Latency spikes cause our Proportional Navigation guidance to oscillate.<p>Option B: Dynamic ROI (&quot;The Sniper Approach&quot;) Run a low-res global search (320x320) at high FPS. Once a target is found, lock a dynamic High-Res Region of Interest (ROI) from the raw camera stream and only run inference on that crop.<p>Pro: Extremely fast. Keeps the loop tight.<p>Con: Single Point of Failure. If the tracker (Kalman Filter) loses the crop due to abrupt ego-motion, we are blind until global search re-acquires. In a terminal phase intercept, that’s a miss.<p>Has anyone here successfully implemented robust Dynamic ROI on edge silicon (Jetson&#x2F;Hexagon DSP) for erratic targets? Are we over-engineering this, or is full-frame HD inference simply dead on arrival for real-time guidance?<p>Any pointers to papers or repos are appreciated.<p>PS: If you live for these kinds of problems (and enjoy solving them in Munich), we are looking for a Founding Engineer to own this entire pipeline. Email in profile.