发布 HN:Cyberdesk(YC S25)– 自动化 Windows 传统桌面应用程序

11作者: mahmoud-almadi4 天前原帖
嗨,HN,我们是Mahmoud和Alan,正在开发Cyberdesk(<a href="https://www.cyberdesk.io">https://www.cyberdesk.io</a>),这是一款用于自动化Windows桌面应用程序的确定性计算机使用代理。开发者使用我们的工具来自动化医疗、会计、建筑等领域的遗留软件中的重复任务,通过直接在桌面上执行点击和键入操作。 <p>以下是Cyberdesk计算机使用代理的几个演示:</p> 快速完成对遗留桌面应用程序的文件导入自动化:<a href="https://youtu.be/H_lRzrCCN0E" rel="nofollow">https://youtu.be/H_lRzrCCN0E</a> <p>在一个名为OpenDental的庞大Windows单体应用上工作(同时展示了代理的学习过程):<a href="https://youtu.be/nXiJDebOJD0" rel="nofollow">https://youtu.be/nXiJDebOJD0</a>。</p> 提交W-2税表:<a href="https://youtu.be/6VNEzHdc8mc" rel="nofollow">https://youtu.be/6VNEzHdc8mc</a> <p>许多行业仍在使用遗留的Windows桌面应用程序,员工被耗时的重复任务所困扰。为这些任务提供自动化的供应商最终往往会编写脆弱的机器人流程自动化(RPA)脚本,或者雇佣海外团队进行手动任务执行。RPA常常因为不可避免的用户界面变化或意外弹出窗口(如Windows更新或随机的应用内通知)而失效。海外团队通常不可靠,成本也高于软件,而且对于受监管的行业来说并不总是可行的选择。</p> <p>我之前在一家财富100强公司编写了影响超过2万名员工的RPA脚本,亲身体验了RPA的脆弱性和不灵活性。对我来说,这显然是一个治标不治本的解决方案。Alan在他之前的创业公司中构建了一个计算机使用代理,并意识到它在自动化许多行业的手动计算机任务方面具有巨大的潜力,因此我们开始了Cyberdesk的开发。</p> <p>计算机使用模型在处理抽象的、长期的任务时可能会遇到困难,但它们在逐屏做出上下文感知决策方面表现出色,因此非常适合自动化这些桌面应用程序。</p> <p>可靠性的关键在于设计高度具体且经过深思熟虑的提示。就像使用ChatGPT一样,模糊或含糊不清的提示不会得到你想要的结果。这在计算机使用中尤其如此,因为模型处理的是几乎整个桌面屏幕的额外视觉信息;没有精确的指令,它不知道该关注哪些细节或如何行动。</p> <p>与RPA不同,Cyberdesk的代理并不是盲目重放点击。它会在每次操作之前读取屏幕状态,并在流程偏离时自我纠正(如弹出窗口、延迟、用户界面变化)。与现成的计算机使用AI不同,Cyberdesk在生产中以确定性运行:代理主要遵循它所学习的步骤,只有在发生异常时才会回退到推理。Cyberdesk通过自然语言指令学习工作流程,捕捉细微差别并处理动态任务——远远超出简单屏幕录制几次运行所能编码的内容。</p> <p>这种方法在可靠性和成本方面都表现良好:可靠性,因为在意外情况下我们会回退到计算机使用模型;成本方面,因为计算机使用模型昂贵,我们只在需要时使用它们。否则,我们会利用更快、更实惠的视觉大语言模型(LLMs)在确定性运行期间逐步检查屏幕状态。我们的代理还配备了故障保护、数据提取、屏幕评估等工具,以处理动态和敏感的情况。</p> <p>工作原理:您只需在任何Windows机器上安装我们的开源驱动程序(<a href="https://github.com/cyberdeyyoyoubackhackersk-hq/cyberdriver" rel="nofollow">https://github.com/cyberdeyyoyoubackhackersk-hq/cyberdriver</a>)。它与我们的后端通信以接收命令(点击、输入、滚动、截图)并发送数据(截图、API响应等)。您给我们的计算机使用代理提供详细的自然语言描述,类似于员工第一次学习新任务的标准操作程序(SOP)。然后,代理利用计算机使用AI模型学习步骤,并通过将每个截图与其操作(点击这些坐标、输入XYZ、等待页面加载等)一起保存来记忆这些步骤。</p> <p>代理以确定性的方式快速且可预测地执行这些步骤。为了应对弹出窗口和用户界面变化,我们的代理会将实时屏幕状态与记忆状态进行对比,以确定是否可以安全地继续执行记忆步骤。如果没有重大变化妨碍安全执行记忆步骤,它将继续;否则,它会回退到一个具有过去操作和剩余任务上下文的计算机使用模型。</p> <p>目前,客户使用我们来处理手动任务,如从遗留桌面应用程序中导入和导出文件、在桌面患者管理系统(PMS)中为患者预约,以及在电子病历(EMR)中填写患者资料等数据录入。</p> <p>我们还没有自助服务选项,但我们希望手动为您提供入驻服务。请在此预约演示以了解更多信息!(<a href="https://www.cyberdesk.io">https://www.cyberdesk.io</a>)如果您希望稍后等待自助服务选项,请在此提交您的电子邮件(<a href="https://forms.gle/HfQLxMXKcv9Eh8Gs8" rel="nofollow">https://forms.gle/HfQLxMXKcv9Eh8Gs8</a>),以便在准备好时及时通知您。您还可以在此查看我们的文档:<a href="https://docs.cyberdesk.io">https://docs.cyberdesk.io</a>。</p> <p>我们非常希望听到您对我们方法和遗留行业桌面自动化的看法!</p>
查看原文
Hi HN, We’re Mahmoud and Alan, building Cyberdesk (<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.<p>Here’s a couple demos of Cyberdesk’s computer use agent:<p>Completing a lightning fast file import automation into a legacy desktop app: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E</a><p>Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): <a href="https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0</a>.<p>Filing a W-2 tax form: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc</a><p>Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.<p>I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.<p>Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.<p>The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.<p>Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.<p>This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.<p>How it works: you install our open source driver on any Windows machine (<a href="https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver</a>). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).<p>The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.<p>Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.<p>We don&#x27;t have a self-serve option yet but we&#x27;d love to onboard you manually. Book a demo here to learn more! (<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (<a href="https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8" rel="nofollow">https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8</a>) so you can be notified as soon as that’s ready. You can also check out our docs here: <a href="https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;</a>.<p>We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!