我开发了一个一键式的在线AI重写工具(以及出现的问题)

1作者: AzeniqTech大约 1 个月前原帖
我一直在使用我自己开发的一个小写作助手,叫做 Rephrazo,并且我觉得分享一些实现细节和迄今为止的错误可能会很有用。 这个想法很简单: * 高亮你正在写的文本 * 按下一个快捷键 * 在一个小弹窗中获取 AI 释义 * 一键插入回去 目标是消除“复制 - 打开 AI 工具 - 粘贴 - 重写 - 再粘贴”的循环,以便进行小的编辑。 这篇文章将讨论我如何实现这个功能,哪些技术上有效,哪些无效。 ### 设计约束 从一开始,我就试图在几个约束条件下进行设计: * 一个快捷键 → 一个主要操作 * 保持在当前应用内(不使用浏览器,不要大侧边栏) * 最小化用户界面:单一建议,一键插入 * 延迟“感觉瞬时”,否则就不会被使用 每当我打破这些约束(增加额外的选择、提示等),在使用过程中就会下降。 ### 高层架构 大致分解如下: * 桌面客户端: * 监听全局快捷键 * 获取当前文本选择 * 发送到 API * 在选择附近的小覆盖层中显示返回的释义 * 后端 API: * 接受选定的文本 + 一些最小的上下文 * 调用大型语言模型(LLM) * 应用固定的提示(“使其更清晰,尽量保持语气/声音”) * 返回单一建议(目前没有多选) 目前没有复杂的基础设施,只是尽量缩短“按键”到“返回文本”的路径。 ### 文本捕获和插入 出乎意料的棘手部分不是 LLM,而是: * 可靠地捕获选定文本 * 不干扰用户的剪贴板 * 在不破坏格式的情况下插入重写的文本 第一版实际上滥用了剪贴板: * 保存剪贴板 * 复制选择 * 发送到后端 * 通过粘贴结果替换选择 * 恢复剪贴板 这有效……直到它不再有效: * 一些应用忽略模拟的按键 * 有时剪贴板在此过程中被其他内容覆盖 * 感觉脆弱且“黑客” 我正在慢慢朝着更具应用意识的集成方向发展(在可能的情况下),同时保持通用的后备方案。 ### 延迟和用户体验 延迟比我预期的更重要。大致分类如下: * < 500 毫秒 → 感觉瞬时,人们很满意 * 1-2 秒 → 如果建议明显更好则可以接受 * > 3 秒 → 人们后悔按下快捷键,使用频率降低 一些小的用户体验改进有帮助: * 在选择附近立即显示一个小的“加载”状态 * 立即渲染弹窗(骨架状态),然后在响应到达时填充内容 * 在失败时,显示简短而诚实的信息,而不是默默无声 如果你在构建 AI 工具,这些可能不会让你感到惊讶,但当你看到自己的用户在几次缓慢响应后犹豫时,感觉就不一样了。 ### 出现的问题 * 我在早期过度构建了自定义选项: * 语气下拉框 * 多种模式(“更短”、“更长”、“更正式”) * 额外的切换 人们忽视了这些选项,或者感到决策疲劳。 * 我低估了在不同应用中选择/插入的边缘情况有多少。 * 我在最初的构建中没有记录足够的数据,因此不得不重新添加遥测以了解实际使用情况。 如果你感兴趣,当前的早期版本在这里: [https://rephrazo-ai.app/](https://rephrazo-ai.app/)
查看原文
I’ve been dogfooding a small writing helper I built called Rephrazo, and I thought it might be useful to share some implementation details and mistakes so far.<p>The idea is simple:<p>* highlight text where you’re writing * press a hotkey * get an AI paraphrase in a small popup * insert it back with one click<p>The goal is to remove the “copy - open AI tool - paste - rewrite - paste back” loop for small edits.<p>This post is about how I wired it up, what worked technically, and what didn’t.<p>### Constraints I designed for<p>From the beginning I tried to design under a few constraints:<p>* One hotkey → one main action * Stay inside the current app (no browser, no big side panel) * Minimal UI: single suggestion, one click to insert * Latency “feels instant” or it doesn’t get used<p>Whenever I broke these constraints (added extra choices, prompts, etc.), usage dropped in dogfooding.<p>### High-level architecture<p>Rough breakdown:<p>* Desktop client that:<p><pre><code> * listens for a global hotkey * grabs the current text selection * sends it to an API * displays the returned paraphrase in a small overlay near the selection</code></pre> * Backend API that:<p><pre><code> * accepts the selected text + some minimal context * calls an LLM * applies a fixed prompt (“make this clearer, keep tone&#x2F;voice as much as possible”) * returns a single suggestion (no multi-choice for now) </code></pre> No fancy infra yet, just trying to keep the path from “key press” to “returned text” as short as possible.<p>### Text capture and insertion<p>The surprisingly tricky part wasn’t the LLM, it was:<p>* reliably capturing the selected text * not messing up the user’s clipboard * inserting the rewritten text back without breaking formatting<p>The first version literally abused the clipboard:<p>* save clipboard * copy selection * send to backend * replace selection by pasting the result * restore clipboard<p>This worked… until it didn’t:<p>* some apps ignore simulated keypresses * sometimes the clipboard got overwritten by other things in between * it felt fragile and “hacky”<p>I’m slowly moving toward more app-aware integrations (where possible) while still keeping a generic fallback.<p>### Latency and UX<p>Latency matters more than I expected. Rough buckets:<p>* &lt; 500 ms → feels instant, people are happy * 1–2 seconds → acceptable if the suggestion is clearly better * &gt; 3 seconds → people regret pressing the hotkey and use it less<p>A few tiny UX things helped:<p>* show a small “loading” state immediately near the selection * render the popup instantly (skeleton state), then fill it when the response arrives * on failure, show a short, honest message instead of silently doing nothing<p>If you’re building AI tools, this won’t surprise you, but it’s different when you watch your own users hesitate after a few slow responses.<p>### Things that went wrong<p>* I overbuilt customization early:<p><pre><code> * tone dropdowns * multiple modes (“shorter”, “longer”, “more formal”) * extra toggles People ignored them, or got decision fatigue. </code></pre> * I underestimated how many edge cases there are with selection&#x2F;insertion across different apps.<p>* I didn’t log enough in the first builds, so I had to retrofit telemetry to understand actual usage.<p>If you’re curious, the current early version is here: [https:&#x2F;&#x2F;rephrazo-ai.app&#x2F;](https:&#x2F;&#x2F;rephrazo-ai.app&#x2F;)