HackerNews中文版

嗨，HN！我们是Adil、Salman和Jose，正在开发archgw [1]。这是一个智能代理服务器，旨在作为边缘和AI网关，为代理提供服务——它不仅能够处理网络流量，还能原生地处理提示。我们进行了多项重大改进，因此再次分享这个项目。关于我们为什么构建这个项目的一些背景信息。构建AI代理演示很简单，但要创建一个适合生产环境的系统，需要大量重复的低级工作。你需要设置保护措施，以确保不安全或不相关的请求不会通过。你需要澄清模糊的输入，以避免代理出错。你需要根据上下文或任务类型将提示路由到合适的专家代理。你还需要编写集成代码，以快速、安全地支持新的大型语言模型（LLM）。每当有新的框架上市或更新时，你都需要验证或重新实现相同的逻辑——一次又一次。将所有低级的管道代码放入一个框架中会变得难以管理，更难以更新和扩展。低级工作并不是商业逻辑。这就是我们构建archgw的原因——一个智能代理服务器，能够在请求的进入和退出过程中处理提示，并从单一软件服务中提供多项相关功能。它位于你的应用程序运行时之外，因此你可以保持商业逻辑的清晰，专注于重要的事情。可以把它想象成一个服务网格，但用于AI代理。在构建archgw之前，团队曾在Lyft构建Envoy [2]，在AWS开发API Gateway，在微软研究院专注于特定的自然语言处理模型，并在Meta工作于安全性。archgw的诞生源于这样一种信念：处理弹性、处理和路由提示的基于规则的单一用途工具应该迁移到代理的专用基础设施层，但基于经过实战检验的Envoy Proxy的基础。archgw的智能来自于我们快速的任务特定LLMs [3]，能够处理代理路由和交接、保护措施以及基于偏好的智能LLM调用。以下是关于这个开源项目的一些额外细节。archgw是用Rust编写的，请求路径主要有三个部分：* 监听子系统，处理下游（进入）和上游（退出）请求的处理。 * 提示处理子系统。在这里，archgw通过其prompt_guard钩子对传入请求的安全性做出决策，并通过其prompt_target原语识别将对话转发到哪里。 * 模型服务子系统是托管所有在archgw中设计的轻量级LLMs的接口，并提供诸如这些模型的幻觉检测等功能的框架。我们很高兴能够构建这个开源项目，我们相信这个基础设施原语将帮助开发者更快、更安全和更个性化地构建代理，而无需进行所有手动的提示工程和系统集成工作。我们希望邀请其他开发者使用和改进Arch。请尝试一下，并在这里或我们的Discord频道 [4] 留下反馈。此外，这里有一个项目运行中的快速演示 [5]。你可以在这里查看我们的公共文档 [6]。我们的模型也可以在这里找到 [7]。[1] <a href="https://github.com/katanemo/archgw">https://github.com/katanemo/archgw</a> [2] <a href="https://www.envoyproxy.io/" rel="nofollow">https://www.envoyproxy.io/</a> [3] <a href="https://huggingface.co/collections/katanemo/arch-function-66" rel="nofollow">https://huggingface.co/collections/katanemo/arch-function-66</a>... [4] <a href="https://discord.com/channels/1292630766827737088/12926307682" rel="nofollow">https://discord.com/channels/1292630766827737088/12926307682</a>... [5] <a href="https://www.youtube.com/watch?v=I4Lbhr-NNXk" rel="nofollow">https://www.youtube.com/watch?v=I4Lbhr-NNXk</a> [6] <a href="https://docs.archgw.com/" rel="nofollow">https://docs.archgw.com/</a> [7] <a href="https://huggingface.co/katanemo" rel="nofollow">https://huggingface.co/katanemo</a>

查看原文

Hey HN!This is Adil, Salman and Jose and and we’re behind archgw [1]. An intelligent proxy server designed as an edge and AI gateway for agents - one that natively know how to handle prompts, not just network traffic. We’ve made several sweeping changes so sharing the project again.A bit of background on why we’ve built this project. Building AI agent demos is easy, but to create something production-ready there is a lot of repeat low-level plumbing work that everyone is doing. You’re applying guardrails to make sure unsafe or off-topic requests don’t get through. You’re clarifying vague input so agents don’t make mistakes. You’re routing prompts to the right expert agent based on context or task type. You’re writing integration code to quickly and safely add support for new LLMs. And every time a new framework hits the market or is updated, you’re validating or re-implementing that same logic—again and again.Putting all the low-level plumbing code in a framework gets messy to manage, harder to update and scale. Low-level work isn't business logic. That’s why we built archgw - an intelligent proxy server that handles prompts during ingress and egress and offers several related capabilities from a single software service. It lives outside your app runtime, so you can keep your business logic clean and focus on what matters. Think of it like a service mesh, but for AI agents.Prior to building archgw, the team spent time building Envoy [2] at Lyft, API Gateway at AWS, specialized NLP models at Microsoft Research and worked on safety at Meta. archgw was born out of the belief that rule-based, single-purpose tools that handle the work around resiliency, processing and routing prompts should move into a dedicated infrastructure layer for agents, but built on the battle-tested foundational of Envoy Proxy.The intelligence in archgw comes from our fast Task-specific LLMs [3] that can handle things like agent routing and hand off, guardrails and preference-based intelligent LLM calling. Here are some additional details about the open source project. archgw is written in rust, and the request path has three main parts:* Listener subsystem which handles downstream (ingress) and upstream (egress) request processing. * Prompt handler subsystem. This is where archgw makes decisions on the safety of the incoming request via its prompt_guard hooks and identifies where to forward the conversation to via its prompt_target primitive. * Model serving subsystem is the interface that hosts all the lightweight LLMs engineered in archgw and offers a framework for things like hallucination detection of our these modelsWe loved building this open source project, and our belief is that this infra primitive would help developers build faster, safer and more personalized agents without all the manual prompt engineering and systems integration work needed to get there. We hope to invite other developers to use and improve Arch. Please give it a shot and leave feedback here, or at our discord channel [4] Also here is a quick demo of the project in action [5]. You can check out our public docs here at [6]. Our models are also available here [7].[1] <a href="https://github.com/katanemo/archgw">https://github.com/katanemo/archgw</a> [2] <a href="https://www.envoyproxy.io/" rel="nofollow">https://www.envoyproxy.io/</a> [3] <a href="https://huggingface.co/collections/katanemo/arch-function-66" rel="nofollow">https://huggingface.co/collections/katanemo/arch-function-66</a>... [4] <a href="https://discord.com/channels/1292630766827737088/12926307682" rel="nofollow">https://discord.com/channels/1292630766827737088/12926307682</a>... [5] <a href="https://www.youtube.com/watch?v=I4Lbhr-NNXk" rel="nofollow">https://www.youtube.com/watch?v=I4Lbhr-NNXk</a> [6] <a href="https://docs.archgw.com/" rel="nofollow">https://docs.archgw.com/</a> [7] <a href="https://huggingface.co/katanemo" rel="nofollow">https://huggingface.co/katanemo</a>

展示HN：ArchGW – 一款智能边缘和服务代理工具，用于代理服务。