HackerNews中文版

我正在探索一个开发工具的想法，旨在帮助你进行提示的A/B测试，但我不确定是否真的有这个需求。你可以在一个网页界面中编写和版本化你的提示，然后进行A/B测试，并查看你定义的指标结果。例如，对于一个撰写冷邮件的机器人，你可以验证系统提示的版本1（v1）或版本2（v2）哪个能带来更好的回复率。目前有没有人正在做类似的事情，或者有谁想要这样的工具？

查看原文

I'm exploring a dev tool idea that helps you A/B test your prompts, but I'm not sure if there's a need for it. You'd be able to write and version your prompts in a web UI, then A/B test them and see results with metrics you define.<p>So for example, with a bot that writes cold outbound emails, you can verify whether v1 or v2 of your system prompt results in a better reply rate.<p>Does anybody currently do something like this or want something like this?

问HN：你们会对大语言模型的提示进行A/B测试吗？