HackerNews中文版

我构建了一个系统，使大型语言模型（LLMs）能够自动学习和改进问题解决策略，灵感来自于Andrej Karpathy提出的LLM学习的“第三范式”理念。基本思路是：与其使用静态的系统提示，不如让LLM建立一个针对不同问题类型的有效策略数据库。当你给它一个新问题时，它会选择最相关的策略，应用这些策略，然后评估其效果并进行优化。例如，在看到足够多的文字题后，它学会了以下策略： 1) 仔细阅读并识别未知数， 2) 用单位定义变量， 3) 写出方程， 4) 逐步求解， 5) 验证答案。所有策略都以人类可读的JSON格式存储，您可以查看和编辑。我在数学基准测试中对其进行了测试，看到了一定的改进——在Arena Hard上提高了8.6%，在AIME24上提高了6.67%。经过500次查询，该系统创建了129个策略，并优化了其中97个。该实现是一个开源插件，名为optillm（我们的推理优化代理）。它可以与任何兼容OpenAI的API一起使用——只需在模型名称前添加“spl-”。它有两种模式：仅推理模式（使用现有策略）和学习模式（创建和优化策略）。有趣的是，它弥补了生产AI使用的复杂系统提示与我们大多数人使用的基本提示之间的差距。您的模型在处理您提供的问题类型时确实会变得更好。我之所以构建这个系统，是因为我注意到ChatGPT、Claude等具有极其详细的系统提示和问题解决框架，但大多数开发者使用的是基本提示，错过了这些性能提升。这个方法的灵感来源于Andrej Karpathy关于超越仅仅预训练和微调的LLM学习“第三范式”的推文：<a href="https://x.com/karpathy/status/1921368644069765486" rel="nofollow">https://x.com/karpathy/status/1921368644069765486</a> 这些策略是完全透明的——您可以清楚地看到系统学到了什么，以及它为何做出某些决策。没有黑箱学习。 <a href="https://github.com/codelion/optillm/tree/main/optillm/plugins/spl">https://github.com/codelion/optillm/tree/main/optillm/plugins/spl</a> 非常希望能得到对这个方法的反馈。是否还有其他人尝试过让LLMs从自身经验中学习？

查看原文

I built a system that lets LLMs automatically learn and improve problem-solving strategies over time, inspired by Andrej Karpathy's idea of a "third paradigm" for LLM learning.The basic idea: instead of using static system prompts, the LLM builds up a database of strategies that actually work for different problem types. When you give it a new problem, it selects the most relevant strategies, applies them, then evaluates how well they worked and refines them.For example, after seeing enough word problems, it learned this strategy:1) Read carefully and identify unknowns,2) Define variables with units,3) Write equations,4) Solve step-by-step,5) Verify the answer.All strategies are stored as human-readable JSON that you can inspect and edit.I tested it on math benchmarks and saw decent improvements - 8.6% better on Arena Hard, 6.67% on AIME24. After 500 queries, the system had created 129 strategies and refined 97 of them.The implementation is an open-source plugin for optillm (our inference optimization proxy). It works with any OpenAI-compatible API - you just add "spl-" to your model name. Has two modes: inference-only (uses existing strategies) and learning mode (creates and refines strategies).What's interesting is that it bridges the gap between the sophisticated system prompts that production AI uses and the basic prompts most of us work with. Your model literally gets better at the types of problems you throw at it.Built it because I noticed ChatGPT, Claude etc. have incredibly detailed system prompts with problem-solving frameworks, but most developers use basic prompts and miss out on those performance gains. The approach is inspired by Andrej Karpathy's tweet about a "third paradigm" for LLM learning beyond just pretraining and fine-tuning: <a href="https://x.com/karpathy/status/1921368644069765486" rel="nofollow">https://x.com/karpathy/status/1921368644069765486</a>The strategies are completely transparent - you can see exactly what the system learned and why it's making certain decisions. No black box learning.<a href="https://github.com/codelion/optillm/tree/main/optillm/plugins/spl">https://github.com/codelion/optillm/tree/main/optillm/plugin...</a>Would love feedback on the approach. Has anyone else experimented with LLMs learning from their own experience?

展示HN：系统提示学习 – 大型语言模型通过经验学习解决问题