展示HN:无预训练的每实例TSP求解器(在d1291上有1.66%的差距)

4作者: jivaprime大约 1 个月前原帖
这里是原帖内容的中文翻译: 我是发帖者。 大多数针对旅行商问题(TSP)的深度学习方法依赖于使用大规模数据集进行预训练。我想看看一个求解器是否能够在特定实例上“即时”学习,而不依赖于其他问题的先验知识。 我构建了一个使用PPO(比例优势策略优化)的求解器,它能够针对每个实例从零开始学习。在单个A100显卡上,它在大约5.6小时内在TSPLIB的d1291实例上达到了1.66%的差距。 核心思想: 我的假设是,尽管最优解主要由“最小边”(最近邻)组成,但实际的困难来自于少数“例外边”,这些边超出了局部范围。 我没有进行预训练,而是基于这些例外边的拓扑/几何结构设计了一种归纳偏置。代理根据微观/宏观结构获得关于哪些边可能有前景的指导,而PPO通过试错来填补空白。 看到强化学习在没有数据集的情况下达到这个水平是很有趣的。我已经开源了代码和一个Colab笔记本,供任何想验证结果或尝试“例外边”假设的人使用。 代码和Colab链接: [https://github.com/jivaprime/TSP_exception-edge](https://github.com/jivaprime/TSP_exception-edge) 欢迎随时询问关于几何先验或PPO实现的任何问题!
查看原文
OP here.<p>Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn &quot;on the fly&quot; for a specific instance without any priors from other problems.<p>I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.<p>The Core Idea: My hypothesis was that while optimal solutions are mostly composed of &#x27;minimum edges&#x27; (nearest neighbors), the actual difficulty comes from a small number of &#x27;exception edges&#x27; outside of that local scope.<p>Instead of pre-training, I designed an inductive bias based on the topological&#x2F;geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro&#x2F;macro structures, and PPO fills in the gaps through trial and error.<p>It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the &#x27;exception edge&#x27; hypothesis.<p>Code &amp; Colab: <a href="https:&#x2F;&#x2F;github.com&#x2F;jivaprime&#x2F;TSP_exception-edge" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jivaprime&#x2F;TSP_exception-edge</a><p>Happy to answer any questions about the geometric priors or the PPO implementation!