本土并行推理器:自我进化以学习并行推理

1作者: jacklanda6 天前原帖
你好!<p>我们提出了一种并行推理器,旨在减轻幻觉现象并增强推理的公平性。<p>欢迎在 Hugging Face 上投票支持!<p>[推特链接] https://x.com/ZilongZheng/status/1998252267783516444<p>[项目页面与演示] https://bigai-nlco.github.io/Native-Parallel-Reasoner<p>[GitHub 仓库] https://github.com/bigai-nlco/Native-Parallel-Reasoner<p>[HF 论文] https://huggingface.co/papers/2512.07461<p>[arXiv 预印本] https://arxiv.org/abs/2512.07461<p>亮点:该模型是一个原生的并行推理系统原型,与其他流行的多智能体方法不同,它具有多条推理路径。它不是由多个智能体组成,而是由一个智能体在同一时间片内进行多路径推理。训练从一个单一的串行模型开始,零外部监督,利用自蒸馏生成合成轨迹,然后通过模仿学习和适用于并行思维加速的强化学习 + SGLang 基础设施进行多阶段优化。它使用简单的并行性进行思考和解决问题。通过最小的自蒸馏并行推理轨迹样本,它在多个数学和复杂推理基准测试中与现有的并行和自回归推理基准相匹配并略有超越。在推理方面,它实现了高达 4.6 倍的实际时间加速。在物理层面,它实现了约 100% 的并行触发。在逻辑层面,它展现了新兴的问题分解和分而治之的能力,内化了并行思维,而不是回归到串行策略。
查看原文
Hey there!<p>We propose a parallel reasoner, aiming at mitigating the hallucination and enhance fairness in reasoning.<p>Welcome to vote it on Hugging Face !<p>[Twitter Post] https:&#x2F;&#x2F;x.com&#x2F;ZilongZheng&#x2F;status&#x2F;1998252267783516444<p>[Project Page &amp; Demo] https:&#x2F;&#x2F;bigai-nlco.github.io&#x2F;Native-Parallel-Reasoner<p>[GitHub Repo] https:&#x2F;&#x2F;github.com&#x2F;bigai-nlco&#x2F;Native-Parallel-Reasoner<p>[HF Paper] https:&#x2F;&#x2F;huggingface.co&#x2F;papers&#x2F;2512.07461<p>[arXiv Preprint] https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2512.07461<p>Highlight: This model is a native parallel reasoning system PoC that differs from other popular Multi-Agent approaches with multiple reasoning paths. Instead of having multiple agents, it has a single agent doing multi-path reasoning within the same time slice. Training starts from a single serial model with zero external supervision, using self-distillation to generate synthetic trajectories, then multi-stage optimization through imitation learning and RL + SGLang infrastructure adapted for parallel thinking acceleration. It thinks and solves problems using naive parallelism. With minimal self-distilled parallel reasoning trajectory samples, it matches and slightly exceeds existing parallel and autoregressive reasoning baselines on several math and complex reasoning benchmarks. On the inference side, it achieves up to 4.6x wall-clock speedup. Physically, it achieves ~100% parallel triggering. Logically, it exhibits emergent problem decomposition and divide-and-conquer capabilities, internalizing parallel thinking rather than falling back to serial strategies.