展示HN:无预训练的每实例TSP求解器(在d1291上有1.66%的差距)
这里是原帖内容的中文翻译:
我是发帖者。
大多数针对旅行商问题(TSP)的深度学习方法依赖于使用大规模数据集进行预训练。我想看看一个求解器是否能够在特定实例上“即时”学习,而不依赖于其他问题的先验知识。
我构建了一个使用PPO(比例优势策略优化)的求解器,它能够针对每个实例从零开始学习。在单个A100显卡上,它在大约5.6小时内在TSPLIB的d1291实例上达到了1.66%的差距。
核心思想:
我的假设是,尽管最优解主要由“最小边”(最近邻)组成,但实际的困难来自于少数“例外边”,这些边超出了局部范围。
我没有进行预训练,而是基于这些例外边的拓扑/几何结构设计了一种归纳偏置。代理根据微观/宏观结构获得关于哪些边可能有前景的指导,而PPO通过试错来填补空白。
看到强化学习在没有数据集的情况下达到这个水平是很有趣的。我已经开源了代码和一个Colab笔记本,供任何想验证结果或尝试“例外边”假设的人使用。
代码和Colab链接: [https://github.com/jivaprime/TSP_exception-edge](https://github.com/jivaprime/TSP_exception-edge)
欢迎随时询问关于几何先验或PPO实现的任何问题!
查看原文
OP here.<p>Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems.<p>I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.<p>The Core Idea:
My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.<p>Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error.<p>It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis.<p>Code & Colab: <a href="https://github.com/jivaprime/TSP_exception-edge" rel="nofollow">https://github.com/jivaprime/TSP_exception-edge</a><p>Happy to answer any questions about the geometric priors or the PPO implementation!