HackerNews中文版

我之前发现了Caveman（https://github.com/JuliusBrussee/caveman），并对此产生了些许痴迷。它的输出信息密集，但无法直接向真实用户展示。因此，我一直在尝试将其隐藏在中间——本地的SLM（序列语言模型）压缩输入，云端模型以Caveman风格进行推理，然后本地SLM再将其扩展回去。用户永远看不到被压缩的部分。我通过Candle运行Phi-3进行压缩，速度还算快。云端调用的时间也更短。不过，我还没有进行真正的令牌计数。扩展步骤是个问题。将Caveman的输出重新转换为可读文本比压缩输入要困难得多，而本地模型在这方面犯的错误也更多。我不确定这是否是提示问题，还是说对于这样规模的模型来说是个瓶颈。此外，我也不确定在低API调用量的情况下这样做是否有意义。增加的复杂性可能并不值得。

查看原文

Found caveman a while back (https://github.com/JuliusBrussee/caveman) and got kind of obsessed with it. Dense outputs, but you can't show them to a real user. So I've been trying to hide that in the middle -- local SLM compresses the input, cloud model reasons in caveman-style, local SLM expands it back. User never sees the compressed parts.<p>Running Phi-3 via Candle for compression. Fast enough. Cloud calls are shorter. Haven't done real token counting yet.<p>The expansion step is the problem. Re-hydrating caveman output into readable text is harder than compressing the input and the local model makes more mistakes there. Not sure if that's a prompting issue or just a ceiling for a model this size.<p>Also not sure this makes sense at low API volumes. The added complexity might not be worth it.

本地SLM作为云API调用的压缩层