HackerNews中文版

大家好！我们是 BrowserBook 的 Chris、Jorrie 和 Evan，这是一个用于编写和调试基于 Playwright 的网页自动化的集成开发环境（IDE）。您可以在这里下载 Mac 应用程序：<a href="https://browserbook.com">https://browserbook.com</a>，还有一个演示视频可以在这里观看：<a href="https://www.youtube.com/watch?v=ODGJBCNqGUI" rel="nofollow">https://www.youtube.com/watch?v=ODGJBCNqGUI</a>。 我们为什么要开发这个工具：在我们参加 YC 的时候，我们是一家自动化后端医疗工作流程的公司。由于医疗行业的互操作性生态系统非常分散，我们开始使用浏览器代理直接通过网络自动化电子病历、诊所管理软件和支付门户。当我们这样做时，遇到了很多问题： 速度：与脚本方法相比，LLM 调用的高延迟。 成本：为了使自动化尽可能准确，我们消耗了大量的令牌来提供所需的上下文。 可靠性：即使有详细的指示、上下文和工具，代理在多步骤任务中往往会以不可预测的方式偏离。 可调试性：当出现偏差时，我们基本上是在提示中玩打地鼠，并重新运行整个自动化来调试问题（如上所述：速度和成本问题使得这一过程相当痛苦）。 我们越来越多地只是给我们的代理脚本来执行。最终，我们得出结论，对于这类用例，脚本化是一种更好的网页自动化方法。但脚本化也太痛苦了，因此我们着手用 BrowserBook 来解决这些问题。 在技术层面上，它运行一个独立的 TypeScript REPL，直接与内嵌的浏览器实例连接，并内置工具，使脚本开发快速而简单。这包括： - 在 IDE 中直接提供一个完全交互的浏览器窗口，您可以在不切换上下文的情况下运行代码。 - 类似 Jupyter Notebook 的环境——这里的想法是，您可以在单独的单元中编写自动化的部分并单独运行它们（并可以在浏览器中快速手动重置），而不必每次都重新运行整个流程。 - 一个 AI 编码助手，利用当前页面的 DOM 上下文来编写自动化逻辑，帮助避免寻找选择器的麻烦。 - 用于截图、数据提取和管理身份验证的辅助函数，适用于需要身份验证的工作流程。 一旦您创建了自动化，可以直接在应用程序中运行它，或者通过 API 在我们的托管环境中运行，以便在外部应用程序或代理工作流程中使用。 BrowserBook 的核心是一个 Electron 应用，因此我们可以直接在应用中运行 Chrome 实例，而无需依赖云托管的浏览器。对于 API 运行，我们通过 Kernel 使用托管的浏览器基础设施（顺便说一下，这是一个很棒的产品），依赖于他们的反机器人检测能力（隐身模式、代理等）。 脚本化自动化可能不受欢迎，因为脚本本质上是脆弱的；与“传统”软件开发不同，您的代码是在您无法控制的环境中部署的——别人的网站。通过 BrowserBook，我们试图“接受这种不完美”，并承认这种“进攻性编程”的环境。 我们从头开始设计，假设脚本会出错，并旨在提供使构建和维护它们更容易的工具。未来，我们的计划是利用 AI 在其已经显示出优势的领域——编写代码——来最小化停机时间，并在部署环境变化时快速修复损坏的脚本。 浏览器代理承诺通过将控制权交给可以处理不一致性和模糊性的 LLM 来解决这个问题。虽然我们认为在某些应用中浏览器代理确实可以提供帮助，但需要可靠和重复执行的任务并不在其中。 我们希望您能试用一下！您可以在我们的网站上下载 BrowserBook：<a href="https://browserbook.com">https://browserbook.com</a>（目前仅支持 Mac，抱歉！）当然，我们也非常欢迎您提供任何反馈和意见！

查看原文

Hey HN! We’re Chris, Jorrie, and Evan of BrowserBook, an IDE for writing and debugging Playwright-based web automations. You can download it as a Mac app here: <a href="https://browserbook.com">https://browserbook.com</a>, and there’s a demo video at <a href="https://www.youtube.com/watch?v=ODGJBCNqGUI" rel="nofollow">https://www.youtube.com/watch?v=ODGJBCNqGUI</a>.Why we built this: When we were going through YC, we were a company that automated back-office healthcare workflows. Since the interoperability ecosystem in healthcare is so fragmented, we started using browser agents to automate EMRs, practice management software, and payment portals directly through the web. When we did, we ran into a ton of problems:Speed: High latency on LLM calls vs. a scripting approachCost: We burned through tokens with all the context we needed to make the automations reasonably accurateReliability: Even with detailed instructions, context, and tools, agents tended to drift on multi-step tasks in unpredictable waysDebuggability: When drift did occur, we were essentially playing whack-a-mole in our prompt and re-running the whole automation to debug issues (see above: speed and cost issues made this quite painful)More and more we were just giving our agent scripts to execute. Eventually, we came to the conclusion that scripting is a better approach for web automation for these sort of use cases. But scripting was also too painful, so we set out to solve those problems with BrowserBook.Under the hood, it runs a standalone TypeScript REPL wired directly into an inline browser instance, with built-in tooling to make script development quick and easy. This includes:- A fully interactive browser window directly in the IDE so you can run your code without context switching- A Jupyter-notebook-style environment - the idea here is you can write portions of your automation in individual cells and run them individually (and quickly reset manually in the browser), instead of having to rerun the whole thing every time- An AI coding assistant which uses the DOM context of the current page to write automation logic, which helps avoid digging around for selectors- Helper functions for taking screenshots, data extraction, and managed authentication for auth-required workflows.Once you’ve created your automation, you can run it directly in the application or in our hosted environment via API, so you can use it in external apps or agentic workflows.At its core, BrowserBook is an Electron app, so we can run a Chrome instance directly in the app without the need for cloud-hosted browsers. For API runs, we use hosted browser infra via Kernel (which is a fantastic product, btw), relying on their bot anti-detection capabilities (stealth mode, proxies, etc.).Scripted automation can be unpopular because scripts are inherently brittle; unlike “traditional” software development, your code is deployed in an environment you don’t control - someone else’s website. With BrowserBook, we’re trying to “embrace the suck”, and acknowledge this “offensive programming” environment.We’ve designed from the ground up to assume scripts will break, and aim to provide the tools that make building and maintaining them easier. In the future, our plan is to leverage AI where it has shown its strength already - writing code - to minimize downtime and quickly repair broken scripts as the deployed environment changes.Browser agents promised to solve this by handing the reins to an LLM which can handle inconsistency and ambiguity. While we think there are some applications where browser agents can be genuinely helpful, tasks that need to be done reliably and repeatedly are not one of them.We’d love for you to try it out! You can download BrowserBook from our website here: <a href="https://browserbook.com">https://browserbook.com</a> (only available for Mac so far, sorry!) And of course, we’d appreciate any feedback and comments you have!

发布 HN：BrowserBook（YC F24）– 用于确定性浏览器自动化的集成开发环境（IDE）