启动 HN:Jazzberry(YC X25)– 用于查找错误的 AI 代理

3作者: MarcoDewey9 个月前原帖
大家好, 我们正在开发Jazzberry([https://jazzberry.ai](https://jazzberry.ai)),这是一款AI错误检测工具,能够在发生拉取请求时自动测试您的代码,以便在合并之前发现并标记真实的错误。 这里有一个演示视频:[https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7](https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7) 我们开发Jazzberry是为了帮助您在代码库中发现错误。它的工作原理如下: 当创建拉取请求时,Jazzberry会将代码库克隆到一个安全的沙盒环境中。拉取请求的差异会被提供给AI代理作为上下文。为了与其余代码库进行交互,AI代理可以在沙盒中执行bash命令。这些命令的输出会反馈给代理。这意味着代理可以执行诸如读写文件、搜索、安装软件包、运行解释器、执行代码等操作。它观察结果并进行迭代测试,以准确定位错误,然后以Markdown表格的形式在拉取请求中报告。 Jazzberry专注于在沙盒中动态测试您的代码,以确认是否存在真实的错误。我们不是通用的代码审查工具,我们的唯一目标是提供具体证据,说明哪些地方出现了问题以及如何出现的。 以下是我们迄今为止发现的一些真实错误示例: “<i>认证绕过(严重)</i>” - 当`AUTH_ENABLED`为`False`时,`home/api/deps.py`中的`get_user`依赖始终返回第一个超级用户,绕过认证,可能导致未授权访问。此外,当经过身份验证的auth0用户不在数据库中时,它默认返回超级用户。 “<i>不安全的头部处理(高)</i>” - 服务器没有验证头部名称/值,允许注入恶意头部,可能导致安全问题。 “<i>API密钥泄露(高)</i>” - 浏览器控制台日志中的不同错误消息揭示了API密钥是否有效,允许攻击者通过区分格式错误和授权错误来暴力破解有效凭证。 在这个过程中,我们意识到,LLM生成代码的兴起正在加大对更好自动化测试解决方案的需求。传统的代码覆盖率指标和手动代码审查在处理成千上万行LLM生成代码时已经变得不那么有效。我们认为,随着时间的推移,这种情况会更加明显——AI创作系统的复杂性最终将需要更复杂的AI工具来进行有效验证。 我们的背景:Mateo拥有强化学习和形式化方法的博士学位,发表了超过20篇论文,引用超过350次。Marco拥有软件测试的硕士学位,专注于用于自动化测试生成的LLM。 我们正在积极开发,期待您的诚实反馈!
查看原文
Hey HN,<p>We are building Jazzberry (<a href="https:&#x2F;&#x2F;jazzberry.ai">https:&#x2F;&#x2F;jazzberry.ai</a>), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.<p>Here’s a demo video: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=L6ZTu86qK8U#t=7" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=L6ZTu86qK8U#t=7</a><p>We are building Jazzberry to help you find bugs in your code base. Here’s how it works:<p>When a PR is made, Jazzberry clones the repo into a secure sandbox. The diff from the PR is provided to the AI agent in its context window. In order to interact with the rest of the code base, the AI agent has the ability to execute bash commands within the sandbox. The output from those commands is fed back into the agent. This means that the agent can do things like read&#x2F;write files, search, install packages, run interpreters, execute code, and so on. It observes the outcomes and iteratively tests to pinpoint bugs, which are then reported back in the PR as a markdown table.<p>Jazzberry is focused on dynamically testing your code in a sandbox to confirm the presence of real bugs. We are not a general code review tool, our only aim is to provide concrete evidence of what&#x27;s broken and how.<p>Here are some real examples of bugs that we have found so far.<p>“<i>Authentication Bypass (Critical)</i>” - When `AUTH_ENABLED` is `False`, the `get_user` dependency in `home&#x2F;api&#x2F;deps.py` always returns the first superuser, bypassing authentication and potentially leading to unauthorized access. Additionally, it defaults to superuser when the authenticated auth0 user is not present in the database.<p>“<i>Insecure Header Handling (High)</i>” - The server doesn&#x27;t validate header names&#x2F;values, allowing injection of malicious headers, potentially leading to security issues.<p>“<i>API Key Leakage (High)</i>” - Different error messages in browser console logs revealed whether API keys were valid, allowing attackers to brute force valid credentials by distinguishing between format errors and authorization errors.<p>Working on this, we&#x27;ve realized just how much the rise of LLM-generated code is amplifying the need for better automated testing solutions. Traditional code coverage metrics and manual code review are already becoming less effective when dealing with thousands of lines of LLM-generated code. We think this is going to get more so over time—the complexity of AI-authored systems will ultimately require even more sophisticated AI tooling for effective validation.<p>Our backgrounds: Mateo has a PhD in reinforcement learning and formal methods with over 20 publications and 350 citations. Marco holds an MSc in software testing, specializing in LLMs for automated test generation.<p>We are actively building and would love your honest feedback!