展示HN:DeepThink 插件 – 将 Gemini 2.5 的并行推理带入开放模型
我在OptiLLM中构建了一个开源插件,实施了谷歌的“深度思考”推理方法,适用于DeepSeek R1和Qwen3等本地模型。
谷歌最近的Gemini 2.5报告介绍了深度思考——一种模型并行生成多个假设并进行批判性分析后得出最终答案的技术。该方法在数学奥林匹克和竞争性编码基准测试中取得了最先进的结果。
该插件通过修改推理流程,同时探索多个解决方案路径,然后综合出最佳方案。与单次生成不同,模型在回应之前实际上进行了一场内部辩论。
技术细节:
- 适用于任何支持结构化推理模式的模型
- 在响应生成过程中实现并行思考
- 对于复杂推理任务、数学和编码问题特别有效
- 增加推理时间,但显著提高答案质量
链接: [https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink](https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink)
演示: [https://www.youtube.com/watch?v=b06kD1oWBA4](https://www.youtube.com/watch?v=b06kD1oWBA4)
该实现赢得了Cerebras与OpenRouter的Qwen 3黑客马拉松,但更重要的是,它现在对任何运行本地模型的人都可用。
关于HN的问题:
- 有没有人尝试过类似的并行推理方法与本地模型?
- 你认为还有哪些专有技术对开源有价值?
- 有什么建议可以优化性能权衡?
我们的目标是使以前被API锁定的高级推理能力民主化。希望能收到对该方法的反馈和改进建议。
查看原文
I built an open-source plugin in OptiLLM that implements Google's "Deep Think" reasoning approach for local models like DeepSeek R1 and Qwen3.<p>Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.<p>The plugin works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, the model essentially runs an internal debate before responding.<p>Technical details:<p>- Works with any model that supports structured reasoning patterns<p>- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems<p>- Increases inference time but significantly improves answer quality<p>Link: <a href="https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink">https://github.com/codelion/optillm/tree/main/optillm/plugin...</a><p>Demo: <a href="https://www.youtube.com/watch?v=b06kD1oWBA4" rel="nofollow">https://www.youtube.com/watch?v=b06kD1oWBA4</a><p>The implementation won the Cerebras & OpenRouter Qwen 3 Hackathon, but more importantly, it's now available for anyone running local models.<p>Questions for HN:<p>- Has anyone tried similar parallel reasoning approaches with local models?<p>- What other proprietary techniques do you think would be valuable to open-source?<p>- Any suggestions for optimizing the performance trade-offs?<p>The goal is to democratize advanced reasoning capabilities that were previously locked behind APIs. Would love feedback on the approach and ideas for improvements.