开源强化学习模型用于预测对话中的销售转化率
在过去的几个月里,我一直在构建一个类似于棋类游戏的系统,用于预测销售对话中的转化概率。销售分析向来是一个难题,现有的大型语言模型(LLMs)或小型语言模型(SLMs),包括ChatGPT、Claude或Gemini,都未能完全分析销售对话。那么,我们是否可以根据预测的转化概率来引导对话呢?也就是说,基于超过10万个销售对话进行强化学习(RL)训练,以从嵌入向量中预测最终的概率。因此,我使用了Azure OpenAI的嵌入模型(特别是text-embedding-3-large模型)来创建各种对话。强化学习的主要目标是转化(奖励=1),它会生成不同的对话和路径,其中大多数会导致未转化(0),而一些会导致转化(1),同时生成3072个嵌入向量,以捕捉对话的细微差别和语义。其他字段包括:
- 公司/产品标识符
- 对话消息(JSON格式)
- 客户参与度和销售效果评分(0-1)
- 每个回合的概率轨迹
- 对话风格、流程模式和渠道
然后,我使用PPO(近端策略优化)训练了一个强化学习模型,通过使用线性层降低维度,并用它进行最终预测。
数据集、模型和训练脚本都是开源的。我还在Arxiv上写了一篇相关论文。
数据集:[https://huggingface.co/datasets/DeepMostInnovations/saas-sales-conversations](https://huggingface.co/datasets/DeepMostInnovations/saas-sales-conversations)
模型、数据集创建、训练和推理:[https://huggingface.co/DeepMostInnovations/sales-conversion-model-reinf-learning](https://huggingface.co/DeepMostInnovations/sales-conversion-model-reinf-learning)
论文:[https://arxiv.org/abs/2503.23303](https://arxiv.org/abs/2503.23303)
顺便提一下,推理时请使用Python 10版本。此外,我正在考虑使用开源嵌入模型来创建嵌入向量,但这将需要更多时间。
我还在此基础上创建了一个平台来构建代理,完全免费,网址是:[https://lexeek.deepmostai.com](https://lexeek.deepmostai.com)。您可以通过这个网站与代理聊天:[https://www.deepmostai.com](https://www.deepmostai.com)。
查看原文
For the past couple of months, I have been working on building a chess game kinda system for predicting sales conversion probabilities from sales conversations. Sales are notoriously difficult to analyse with current LLMs or SLMs, even ChatGPT, Claude, or Gemini failed to fully analyse sales conversations. How about we can guide the conversations based on predicting the conversion probabilities, that is, kinda trained on a 100000+ sales conversation with RL to predict the final probability from the embeddings. So I just used Azure OpenAI embedding(especially the text-embedding-3-large model to create a wide variety of conversations. The main goal of RL is conversion(reward=1), it will create different conversations, different pathways, most of which lead to nonconversion (0), and some lead to conversion(1), along with 3072 embedding vectors to get the nuances and semantics of the dialogues. Other fields include<p>* Company/product identifiers
* Conversation messages (JSON)
* Customer engagement & sales effectiveness scores (0-1)
* Probability trajectory at each turn
* Conversation style, flow pattern, and channel<p>Then I just trained an RL with PPO, by reducing the dimension using a linear layer and using that to do the final prediction with PPO.<p>Dataset, model, and training script are all open-sourced. Also written an Arxiv paper on it.<p>Dataset: [https://huggingface.co/datasets/DeepMostInnovations/saas-sales-conversations](https://huggingface.co/datasets/DeepMostInnovations/saas-sales-conversations)<p>Model, dataset creation, training, and inference: [https://huggingface.co/DeepMostInnovations/sales-conversion-model-reinf-learning](https://huggingface.co/DeepMostInnovations/sales-conversion-model-reinf-learning)<p>Paper: [https://arxiv.org/abs/2503.23303 ](https://arxiv.org/abs/2503.23303)<p>Btw, use Python version 10 for inference. Also, I am thinking of using open-source embedding models to create the embedding vectors, but it will take more time.
Also I just made a platform on top of this to build agents. It's completely free, https://lexeek.deepmostai.com . You can chat with the agent at https://www.deepmostai.com/ from this website