我开发了一个屏幕感知的桌面助手;现在它可以进行写作并使用你的电脑。

2作者: luthiraabeykoon大约 1 个月前原帖
几天前我在这里发布了Julie的周末原型:一个开源桌面助手,它以一个小型覆盖层的形式存在,并利用你的屏幕作为上下文(而不是复制粘贴、切换标签等)。 更新:我刚刚发布了Julie v1.0,最大的变化是它不再仅仅是“回答关于我屏幕的问题”。现在它可以通过CUA工具包运行代理(写作/编码)和计算机使用模式。 ((https://tryjulie.vercel.app/)) 这在实践中意味着: - 通用AI助手:它听到你听到的声音,看到你看到的画面,并即时为任何问题提供实时答案。 - 写作代理:以你的语气起草/重写,然后在覆盖层中与你迭代(无需新的工作区)。 - 编码代理:帮助你进行多步骤的实现/重构,同时保持你的编辑器作为“真实来源”。 - 计算机使用代理:当你需要时,它可以采取“下一步”(点击/输入/导航),而不仅仅是告诉你该做什么。 目标仍然是一样的:不要打断我的工作流程。我希望助手像一个小工具一样,帮助20秒后消失,而不是一个你需要管理的第二生活。 一些实施说明/限制(提到这些是因为我知道人们会问): - 权限是自愿选择的(屏幕 + 可访问性/自动化),并且旨在在你观看的情况下使用,而不是静默运行。 - 用户界面故意保持简约;我努力不把它变成一个完整的聊天应用程序,带有标签/设置/信息流。 代码库和安装程序在这里:https://github.com/Luthiraa/julie 希望能收到关于两件事的反馈: 1. 如果你构建过/使用过计算机使用代理:哪些安全/用户体验模式在日常使用中实际上感觉可接受? 2. 你希望这个助手完成的一个端到端的工作流程是什么,而不需要切换上下文?
查看原文
I posted Julie here a few days ago as a weekend prototype: an open-source desktop assistant that lives as a tiny overlay and uses your screen as context (instead of copy&#x2F;paste, tab switching, etc.)<p>Update: I just shipped Julie v1.0, and the big change is that it’s no longer only “answer questions about my screen.” It can now run agents (writing&#x2F;coding) and a computer-use mode via a CUA toolkit. ((https:&#x2F;&#x2F;tryjulie.vercel.app&#x2F;))<p>What that means in practice:<p>- General AI assistant, it hears what you hear, sees what you see, and gives you real-time answers for any question instantly. - Writing agent: draft&#x2F;rewrite in your voice, then iterate with you while staying in the overlay (no new workspace). - Coding agent: help you implement&#x2F;refactor with multi-step edits, while you keep your editor as the “source of truth.” - Computer-use agent: when you want, it can take the “next step” (click&#x2F;type&#x2F;navigate) instead of just telling you what to do.<p>The goal is still the same: don’t break my flow. I want the assistant to feel like a tiny utility that helps for 20 seconds and disappears, not a second life you manage.<p>A few implementation notes&#x2F;constraints (calling these out because I’m sure people will ask):<p>- It’s opt-in for permissions (screen + accessibility&#x2F;automation) and meant to be used with you watching, not silently running. - The UI is intentionally minimal; I’m trying hard not to turn it into a full chat app with tabs&#x2F;settings&#x2F;feeds.<p>Repo + installers are here: https:&#x2F;&#x2F;github.com&#x2F;Luthiraa&#x2F;julie<p>Would love feedback on two things: 1. If you’ve built&#x2F;used computer-use agents: what safety&#x2F;UX patterns actually feel acceptable day-to-day? 2. What’s the one workflow you’d want this to do end-to-end without context switching?