HackerNews中文版

嗨，HN，我是尼古拉伊，DroidRun 的软件工程师和联合创始人。我们开发了 DroidRun，这是一个基于大型语言模型（LLM）的代理，利用 Android 可访问性树来精确控制和理解用户界面元素。它可以在真实手机和模拟器上运行，并且是开源的。 **起源：** 我们的联合创始人尼尔斯·施密特（你将在演示中看到他）编写了一个原型并分享了一段快速视频。视频迅速走红，在 X 平台上不到两个小时就获得了大约 5 万次观看。那一刻促使我们全力以赴投入到 DroidRun 的开发中，并不久后将其开源。 **工作原理：** 大多数代理仅依赖截图作为上下文。我们不仅这样做，还将可访问性树输入到 LLM 中。这提供了关于用户界面元素的结构性、层次性和空间元数据。 **示例：** 真实用户界面的截图： [https://imgur.com/a/ePRLpyv](https://imgur.com/a/ePRLpyv) 与之匹配的可访问性 JSON 片段： ```json { "index": 3, "resourceId": "com.android.settings:id/search_action_bar", "className": "LinearLayout", "text": "search_action_bar", "bounds": "42, 149, 1038, 338", "children": [ { "index": 4, "resourceId": "com.android.settings:id/search_bar_title", "className": "TextView", "text": "In Einstellungen suchen", "bounds": "189, 205, 768, 282", "children": [] } ] } ``` 我们还在截图中用数字标注用户界面区域，然后在树中进行匹配。这种结构使代理能够深入理解屏幕上显示的内容，即使在不同设备类型（如平板电脑）之间也是如此。这使得在不同设备和屏幕尺寸之间的泛化能力更强。代理可以更自信地执行操作，减少错误。 **当前状态：** - 最近在 AndroidWorld 排名第一（现在竞争非常激烈） - 支持真实设备和模拟器 - 在简单和复杂的用户界面任务上表现良好 - 目前 Gemini 2.5 Pro 的表现最佳，但我们正在快速迭代 **接下来的计划：** 我们正在开发一个云平台，您可以在 Android 设备上运行提示，而无需任何设置。想象一下，LLM 在云中控制手机，随时准备测试您的自动化。 **我们在寻找：** - 来自 HN 的反馈 - 热爱 Android、LLM 和代理的合作者 - 开源贡献者

查看原文

Hi HN,I'm Nikolai, software engineer and co-founder at DroidRun. We built DroidRun, an LLM-based agent that leverages the Android Accessibility Tree for precise control and understanding of UI elements. It works on real phones and emulators, and it's open source.How it started:Our co-founder Niels Schmidt (you’ll see him in the demos) coded a prototype and shared a quick video. It went viral, about 50k views on X in under 2 hours. That moment pushed us to go all-in on DroidRun and soon after, we open-sourced it.How it works:Most agents rely on screenshots alone for context. We do that plus feed the Accessibility Tree into the LLM. That gives structural, hierarchical, and spatial metadata about UI elements.Here’s an example:Screenshot of a real UI: <a href="https://imgur.com/a/ePRLpyv" rel="nofollow">https://imgur.com/a/ePRLpyv</a>And a matching accessibility JSON snippet:<pre><code> { "index": 3, "resourceId": "com.android.settings:id\\/search_action_bar", "className": "LinearLayout", "text": "search_action_bar", "bounds": "42, 149, 1038, 338", "children": [ { "index": 4, "resourceId": "com.android.settings:id\\/search_bar_title", "className": "TextView", "text": "In Einstellungen suchen", "bounds": "189, 205, 768, 282", "children": [] } ] } </code></pre> We also annotate UI regions in screenshots with numbers, then match them in the tree. This structure gives the agent a deep understanding of what’s on screen, even across different device types like tablets.This allows for better generalization across devices and screen sizes. Agents can act with greater confidence and fewer hallucinations.Current Status:- Ranked #1 on AndroidWorld until recently (it became highly competitive)- Supports real devices + Emulators- Strong performance on simple and complex UI tasks- Gemini 2.5 Pro works best so far, but we’re iterating fastWhat's next:We’re working on a cloud platform where you can run prompts on Android devices without setup. Think of LLM controlling a phone in the cloud, ready to test your automations.Looking for:- Feedback from HN- Collaborators who love Android, LLMs, agents- OSS contributors

展示HN：Droidrun – Android的LLM代理