HackerNews中文版

我一直在使用Cursor，真心想了解一些事情。当你粘贴一个损坏的用户界面的截图时，它能立即识别出不对齐的div或填充问题——这实际上是在进行视觉分析，还是仅仅是在与训练数据中的常见用户界面错误进行模式匹配？这种速度似乎快得几乎不符合真实的视觉处理。而且它似乎以一种不同于仅仅描述图像的方式理解空间关系和布局。这些工具是使用标准的视觉模型，还是有预处理？图像与周围代码上下文的贡献比例是多少？有没有人知道背后实际发生的技术细节？

查看原文

I've been using Cursor and I'm genuinely curious about something.<p>When you paste a screenshot of a broken UI and it immediately spots the misaligned div or padding issue—is it actually doing visual analysis, or just pattern-matching against common UI bugs from training data?<p>The speed feels almost too fast for real vision processing. And it seems to understand spatial relationships and layout in a way that feels different from just describing an image.<p>Are these tools using standard vision models or is there preprocessing? How much comes from the image vs. surrounding code context?<p>Anyone know the technical details of what's actually happening under the hood?

问HN：编码助手能“看到”附加的图片吗？