HackerNews中文版

这里是原帖内容的中文翻译：我是发帖者。大多数针对旅行商问题（TSP）的深度学习方法依赖于使用大规模数据集进行预训练。我想看看一个求解器是否能够在特定实例上“即时”学习，而不依赖于其他问题的先验知识。我构建了一个使用PPO（比例优势策略优化）的求解器，它能够针对每个实例从零开始学习。在单个A100显卡上，它在大约5.6小时内在TSPLIB的d1291实例上达到了1.66%的差距。核心思想：我的假设是，尽管最优解主要由“最小边”（最近邻）组成，但实际的困难来自于少数“例外边”，这些边超出了局部范围。我没有进行预训练，而是基于这些例外边的拓扑/几何结构设计了一种归纳偏置。代理根据微观/宏观结构获得关于哪些边可能有前景的指导，而PPO通过试错来填补空白。看到强化学习在没有数据集的情况下达到这个水平是很有趣的。我已经开源了代码和一个Colab笔记本，供任何想验证结果或尝试“例外边”假设的人使用。代码和Colab链接： [https://github.com/jivaprime/TSP_exception-edge](https://github.com/jivaprime/TSP_exception-edge) 欢迎随时询问关于几何先验或PPO实现的任何问题！

查看原文

OP here.Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems.I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.The Core Idea: My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error.It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis.Code & Colab: <a href="https://github.com/jivaprime/TSP_exception-edge" rel="nofollow">https://github.com/jivaprime/TSP_exception-edge</a>Happy to answer any questions about the geometric priors or the PPO implementation!

展示HN：无预训练的每实例TSP求解器（在d1291上有1.66%的差距）