展示HN:一个关于赋予编码代理长期记忆的实验

1作者: yacc225 天前原帖
最近,我想更进一步接近人们所称的通用人工智能(AGI)。<p>很多朋友告诉我这不可能。归根结底,这些模型只是梯度下降和概率机器——基本上是一种非常复杂的马尔可夫链。显然,我在夸张,哈哈,但你明白我的意思。<p>尽管如此,我还是想看看围绕模型的更好基础设施是否能够让智能体更接近于可靠的独立编码者,而不是有严重健忘症的助手。<p>在阅读了一些相关论文后,我开始尝试为智能体实验持久记忆和引导学习。<p>这个想法很简单:<p>智能体所采取的每一个行动都会被存储。当未来的智能体在同一个代码库上工作时,它们会收到来自过去成功和失败的浓缩“技巧和窍门”。<p>这些记忆被嵌入,以便监督层可以语义搜索相关上下文,并在不扩大上下文窗口的情况下将其注入提示中。<p>我还在实验让智能体之间相互沟通,以便多个任务可以并行运行——即使它们在代码上重叠——并自行解决冲突。<p>为了测试这一点,我建立了一个小网站。<p>现在它非常简单,但可以让你连接:<p>一个GitHub代码库<p>一个Anthropic API密钥<p>Linear问题<p>然后你在待办事项列中创建标记为gradient-agent的问题,智能体就会开始处理它们。<p>你对一个代码库的迭代越多,围绕它的“学习”就越多。理论上,这应该使未来的拉取请求(PR)更加稳定。<p>长期目标是一个能够独立规划、设计、编码、测试和审查整个PR的系统,而无需不断提示。<p>现在不幸的是,我是一名经济拮据的大学生,这个基础设施并不便宜,哈哈。我添加了每月的免费额度,以便人们可以进行测试,但一旦用完,你要么等待重置,要么按需付费。(大型语言模型API的费用仍然由你承担。)<p>另外:这仍然是一个测试版,所以在处理极其复杂的任务时要小心。<p>我在考虑托管类似qwen3.5:122b的东西,以便人们可以在没有自己API密钥的情况下进行测试。虽然积分会消耗得更快,但总体上可能会更便宜。想知道人们是否会想要这个。<p>网站:<a href="https://usegradient.dev" rel="nofollow">https://usegradient.dev</a><p>最初我计划将测试限制在大约20人,但老实说,我很好奇在更多用户的情况下它的表现。<p>请不要对我进行DDOS攻击。<p>任何反馈都非常感谢。
查看原文
Recently I wanted to take a couple steps closer to what people call AGI.<p>A lot of friends tell me it’s impossible. At the end of the day these models are just gradient descent and probability machines — basically a very sophisticated Markov chain. Obviously I’m exaggerating lol, but you get the idea.<p>Still, I wanted to see if better infrastructure around the model could push agents a little closer to being reliable individual coders instead of assistants with severe amnesia.<p>After reading a bunch of papers on the topic, I started experimenting with persistent memory and guided learning for agents.<p>The idea is simple:<p>Every action an agent takes gets stored. When future agents work on the same repo, they receive condensed “tips and tricks” derived from past successes and failures.<p>These memories are embedded so a supervisor layer can semantically search for relevant context and inject it into the prompt without blowing up the context window.<p>I&#x27;m also experimenting with letting agents communicate with each other so multiple tasks can run in parallel — even if they overlap in code — and resolve conflicts themselves.<p>To test this, I built a small site.<p>Right now it&#x27;s very bare bones, but it lets you connect:<p>a GitHub repo<p>an Anthropic API key<p>Linear issues<p>Then you create issues labeled gradient-agent in the To-Do column and the agents start working on them.<p>The more you iterate on a repo, the more “learning” accumulates around it. In theory this should make future PRs more stable.<p>The long-term goal is a system that can plan, design, code, test, and review an entire PR on its own without constant prompting.<p>Now the unfortunate part.<p>I’m a poor university student and this infrastructure is not exactly cheap lol. I added monthly free credits so people can test it, but once those run out you either wait for the reset or pay as you go. (LLM API costs still fall on you.)<p>Also: it’s very much a beta, so be careful with extremely complex tasks.<p>I’m considering hosting something like qwen3.5:122b so people can test without their own API keys. Credits would run out faster but it might be cheaper overall. Curious if people would want that.<p>Site: <a href="https:&#x2F;&#x2F;usegradient.dev" rel="nofollow">https:&#x2F;&#x2F;usegradient.dev</a><p>Originally I planned to limit testing to ~20 people, but honestly I&#x27;m curious how it behaves with more users.<p>Pretty please don&#x27;t DDOS me.<p>Any feedback is appreciated.