开源我们的创业项目 – 基于人工智能的虚拟形象(UE 5.2)
嗨,HN
<p>简而言之:我们不得不关闭我们的初创公司SPAR - 代码开源 https://github.com/spar-app/spar-services</p>
在2024年,我们开发了一种AI代理基础设施,以支持实时的、个性驱动的AI头像。我们的商业用例是为公司提供一种新的培训(对练)和入职工具,特别是针对需要培训面向客户员工的公司(例如高端零售)。
<p>为了实现上述目标,我们协调了三台服务器:
1. 第一台用于在虚幻引擎(5.2)上运行Metahuman;
2. 第二台用于运行经过自定义微调的开源大语言模型(LLM);
3. 第三台处理其他所有事务,连接上述两台服务器,并在客户端浏览器上进行流媒体传输(WebRTC),同时与外部API(文本转语音和语音转文本等)进行协调。</p>
<p>主要功能:
* 与不同个性头像的实时互动。
* 用于定制和优化LLM生成对话的微调工具包。
* 结构化反馈系统,将可操作的指导直接链接到对话要点。</p>
未来将利用AI和沉浸式体验来练习软技能。我们不会构建这个未来,但如果你们在构建,欢迎使用我们的工作来加速你们的进程。
查看原文
Hi HN<p>TL;DR: Had to shut down our startup SPAR - Open Sourcing the code https://github.com/spar-app/spar-services<p>In 2024, we developed an AI agent infrastructure to support realistic, personality-driven AI avatars in real-time.
The business use case was to provide a new training (sparring) and onboarding tool for companies. In particular, for companies that need to train customer-facing employees (ex, high-end retail)<p>To achieve the above, we were orchestrating three servers:
1. The first to run a Metahuman on Unreal Engine (5.2);
2. The second to run a custom finetuned open-sourced LLM;
3. The third to handle all the rest, connecting to the above two servers and streaming (WebRTC) on the client's browser, while coordinating with external APIs (Text-to-Speech and Speech-to-Text, etc.).<p>Key features:
* Real-time interactions with distinct avatar personalities.
* Fine-tuning toolkit for customizing and refining LLM-generated dialogues.
* Structured feedback system that links actionable guidance directly to conversation points.<p>The future will use AI and immersive experiences to practice soft skills.
We will not be building this future, but if you are, feel free to use our work to accelerate yours