启动 HN:Recall.ai(YC W20)– 会议录音和文字记录的 API

13作者: davidgu3 个月前原帖
大家好,我们是来自 Recall.ai 的 David 和 Amanda(<a href="https://www.recall.ai">https://www.recall.ai</a>)。今天我们推出了桌面录音 SDK,这是一种无需会议中的机器人即可获取会议数据的方法:<a href="https://www.recall.ai/product/desktop-recording-sdk">https://www.recall.ai/product/desktop-recording-sdk</a>。这是我们一段时间以来最大的发布,因此我们决定来做一次 Launch HN :) 这里有一个演示,展示了它如何从会议中生成转录文本,后面还有代码示例:<a href="https://www.youtube.com/watch?v=4croAGGiKTA" rel="nofollow">https://www.youtube.com/watch?v=4croAGGiKTA</a>。API 文档可以在这里找到:<a href="https://docs.recall.ai/">https://docs.recall.ai/</a>。 在 W20 时,我们的第一个产品是一个 API,允许您将机器人参与者发送到会议中。这使开发者可以访问会议中的音频/视频流和其他数据。如今,这个 API 驱动了市场上大多数会议录音产品。 最近,通过桌面设备而非机器人进行会议录音变得越来越流行。许多产品如 Notion 和 ChatGPT 都增加了桌面录音功能,而大型语言模型(LLMs)使处理非结构化转录变得更加容易。但实际上,使用桌面应用程序在大规模上可靠地录制会议是很困难的,大多数希望添加录音功能的开发者并不想构建所有这些基础设施。 使用麦克风和系统音频进行基本录音相对简单,因为您可以直接使用系统 API。但当您想要捕捉发言者姓名、生成视频录制、获取实时数据或在大规模生产中运行时,这就变得复杂得多: - 捕捉发言者姓名需要使用可访问性 API 来屏幕抓取视频会议窗口,以监控谁在什么时间发言。当视频会议平台更改其用户界面时,我们必须立即发布更改,以确保其正常工作。 - 生成干净的视频录制,并且不捕捉视频会议平台的用户界面,需要检测参与者的画面,裁剪掉它们,并将它们合成到一个干净的视频录制中。 - 由于桌面录音代码在最终用户的机器上运行,我们需要尽可能高效。这意味着要编写高度平台优化的代码,利用可用的硬件编码器,并花费大量时间进行性能分析和测试。 会议录音几乎没有失败的余地,因为如果出现任何问题,您将永远失去数据。可靠性尤其重要,这大大增加了所需的工程工作量。 我们的桌面录音 SDK 解决了所有这些问题,让开发者可以将会议录音功能集成到他们的桌面应用中,从而无需机器人即可录制视频会议和面对面的会议。 我们创建 Recall.ai 是因为我们自己经历了这个问题。在我们的第一家初创公司中,我们为产品经理构建了一个包含会议录音功能的工具。70% 的工程时间都花在了这个功能上!最终我们决定创建 Recall.ai 来解决这个问题。从那时起,已有超过 2000 家公司使用我们的服务来支持他们的录音功能,例如 Hubspot 用于销售电话录音,Clickup 用于他们的 AI 记笔记工具。我们的用户包括为金融服务、远程医疗、事件管理、销售、面试等构建商业产品的工程团队。我们还为大型企业提供内部工具支持。 运行这种基础设施带来了意想不到的技术挑战!例如,我们不得不调试我们音频编码器中的一个 360 万分之一的段错误(<a href="https://www.recall.ai/blog/debugging-a-1-in-36-000-000-segfa...">https://www.recall.ai/blog/debugging-a-1-in-36-000-000-segfa...</a>),我们遇到了一个只有在有数万名并发写入者时才会发生的 Postgres 锁定问题(<a href="https://news.ycombinator.com/item?id=44490510">https://news.ycombinator.com/item?id=44490510</a>),通过优化我们在进程之间传输数据的方式,我们每年在 AWS 上节省了超过 100 万美元(<a href="https://news.ycombinator.com/item?id=42067275">https://news.ycombinator.com/item?id=42067275</a>)。 您可以在这里试用:<a href="https://www.recall.ai">https://www.recall.ai</a>。它是自助服务,提供 5 美元的免费积分。定价从每小时 0.70 美元起,按秒计费。我们提供规模折扣。 通过 Recall.ai 录制的所有数据均为我们的客户所有,我们支持零天保留,并且不在客户数据上训练模型。 我们非常希望听到您的反馈!
查看原文
Hey HN, we&#x27;re David and Amanda from Recall.ai (<a href="https:&#x2F;&#x2F;www.recall.ai">https:&#x2F;&#x2F;www.recall.ai</a>). Today we’re launching our Desktop Recording SDK, a way to get meeting data without a bot in the meeting: <a href="https:&#x2F;&#x2F;www.recall.ai&#x2F;product&#x2F;desktop-recording-sdk">https:&#x2F;&#x2F;www.recall.ai&#x2F;product&#x2F;desktop-recording-sdk</a>. It’s our biggest release in quite a while so we thought we’d finally do our Launch HN :)<p>Here’s a demo that shows it producing a transcript from a meeting, followed by examples in code: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=4croAGGiKTA" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=4croAGGiKTA</a> . API docs are at <a href="https:&#x2F;&#x2F;docs.recall.ai&#x2F;">https:&#x2F;&#x2F;docs.recall.ai&#x2F;</a>.<p>Back in W20, our first product was an API that lets you send a bot participant into a meeting. This gives developers access to audio&#x2F;video streams and other data in the meeting. Today, this API powers most of the meeting recording products on the market.<p>Recently, meeting recording through a desktop form factor instead of a bot has become popular. Many products like Notion and ChatGPT have added desktop recording functionality, and LLMs have made it easier to work with unstructured transcripts. But it’s actually hard to reliably record meetings at scale with a desktop app, and most developers who want to add recording functionality don’t want to build all this infrastructure.<p>Doing a basic recording with just the microphone and system audio is fairly straightforward since you can just use the system APIs. But it gets a lot harder when you want to capture speaker names, produce a video recording, get real-time data, or run this in production at large scale:<p>- Capturing speaker names involves using accessibility APIs to screen-scrape the video conference window to monitor who is speaking at what time. When video conferencing platforms change their UI, we must ship a change immediately, so this keeps working.<p>- Producing a video recording that is clean, and doesn’t capture the video conferencing platform UI involves detecting the participant tiles, cropping them out, and compositing them together into a clean video recording.<p>- Because the desktop recording code runs on end-user machines, we need to make it as efficient as possible. This means writing highly platform-optimized code, taking advantage of hardware encoders when available, and spending a lot of time doing profiling and performance testing.<p>Meeting recording has zero margin for failure because if anything breaks, you lose the data forever. Reliability is especially important, which dramatically increases the amount of engineering effort required.<p>Our Desktop Recording SDK takes care of all this and lets developers build meeting recording features into their desktop apps, so they can record both video conferences and in-person meetings without a bot.<p>We built Recall.ai because we experienced this problem ourselves. At our first startup, we built a tool for product managers that included a meeting recording feature. 70% of our engineering time was taken up by just this feature! We ended up starting Recall.ai to solve this instead. Since then, over 2000 companies use us to power their recording features, e.g. Hubspot for sales call recording, Clickup for their AI note taker. Our users are engineering teams building commercial products for financial services, telehealth, incident management, sales, interviewing, and more. We also power internal tooling for large enterprises.<p>Running this sort of infrastructure has led to unexpected technical challenges! For example, we had to debug a 1 in 36 million segfault in our audio encoder (<a href="https:&#x2F;&#x2F;www.recall.ai&#x2F;blog&#x2F;debugging-a-1-in-36-000-000-segfault-in-our-audio-encoder">https:&#x2F;&#x2F;www.recall.ai&#x2F;blog&#x2F;debugging-a-1-in-36-000-000-segfa...</a>), we encountered a Postgres lock-up that only occurs when you have tens of thousands of concurrent writers (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44490510">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44490510</a>), and we saved over $1M a year on AWS by optimizing the way we shuffle data around between our processes (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42067275">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42067275</a>).<p>You can try it here: <a href="https:&#x2F;&#x2F;www.recall.ai">https:&#x2F;&#x2F;www.recall.ai</a>. It&#x27;s self-serve with $5 of free credits. Pricing starts at $0.70 for every hour of recording, prorated to the second. We offer volume discounts with scale.<p>All data recorded through Recall.ai is the property of our customers, we support 0-day retention, and we don’t train models on customer data.<p>We would love your feedback!