问HN:编码助手能“看到”附加的图片吗?
我一直在使用Cursor,真心想了解一些事情。
当你粘贴一个损坏的用户界面的截图时,它能立即识别出不对齐的div或填充问题——这实际上是在进行视觉分析,还是仅仅是在与训练数据中的常见用户界面错误进行模式匹配?
这种速度似乎快得几乎不符合真实的视觉处理。而且它似乎以一种不同于仅仅描述图像的方式理解空间关系和布局。
这些工具是使用标准的视觉模型,还是有预处理?图像与周围代码上下文的贡献比例是多少?
有没有人知道背后实际发生的技术细节?
查看原文
I've been using Cursor and I'm genuinely curious about something.<p>When you paste a screenshot of a broken UI and it immediately spots the misaligned div or padding issue—is it actually doing visual analysis, or just pattern-matching against common UI bugs from training data?<p>The speed feels almost too fast for real vision processing. And it seems to understand spatial relationships and layout in a way that feels different from just describing an image.<p>Are these tools using standard vision models or is there preprocessing? How much comes from the image vs. surrounding code context?<p>Anyone know the technical details of what's actually happening under the hood?