HackerNews中文版

以下是我在创业公司与大型语言模型（LLMs）合作的方式。我们有一个单一代码库，里面包含定期的Python数据工作流程、两个Next.js应用和一个小型工程团队。我们使用GitHub进行源代码管理和持续集成/持续交付，部署到GCP和Vercel，并且在自动化方面依赖很大。 **本地开发：** 每位工程师都获得Cursor Pro（加上Bugbot）、Gemini Pro、OpenAI Pro，和可选的Claude Pro。我们并不在意人们使用哪个模型。实际上，LLMs的价值大约相当于每位工程师1.5名优秀的初级/中级工程师，因此支付多个模型的费用是非常值得的。我们非常依赖预提交钩子：ty、ruff、TypeScript检查、所有语言的测试、格式化和其他保护措施。所有内容都自动格式化。LLMs使得类型和测试的编写变得更加容易，尽管复杂的类型仍然需要一些手动指导。 **GitHub + Copilot工作流程：** 我们支付GitHub Enterprise的费用，主要是因为它允许将问题分配给Copilot，后者会打开一个拉取请求（PR）。我们的规则很简单：如果你打开一个问题，就将其分配给Copilot。每个问题都会附带一个代码尝试。对于大量的PR没有任何污名。我们经常删除那些没有使用的PR。我们使用Turborepo管理单一代码库，并在Python方面完全使用uv。所有编码实践都记录在.cursor/rules文件中。例如：“如果你在进行数据库工作，只需编辑Drizzle的schema.ts，而不要手动编写SQL。”Cursor通常遵循这一点，但其他工具在读取或遵循这些规则时常常遇到困难，无论我们添加多少个agent.md风格的文件。 **我个人的开发循环：** 如果我在外面看到一个bug或有一个想法，我会通过Slack、手机或网页打开一个GitHub问题并将其分配给Copilot。有时问题描述很详细，有时只是一个简单的句子。Copilot会打开一个PR，我稍后会进行审查。如果我在键盘前，我会在Cursor中作为一个代理在Git工作树中开始，使用最佳模型。我会不断迭代，直到满意，要求LLM编写测试，审查所有内容，然后推送到GitHub。在人类审查之前，我会让Cursor Bugbot、Copilot和GitHub CodeQL审查代码，并要求Copilot修复他们标记的任何问题。 **仍然痛苦的事情：** 要真正知道代码是否有效，我需要运行Temporal、两个Next.js应用、几个Python工作者和一个Node工作者。其中一些是Docker化的，有些则不是。然后我需要一个浏览器来进行手动检查。据我所知，没有服务可以让我：给出提示、编写代码、启动所有这些基础设施、运行Playwright、处理数据库迁移，并让我手动测试系统。我们用GitHub Actions来近似实现这一点，但这对手动验证或数据库工作没有帮助。 Copilot在分配问题或代码审查时不允许你选择模型。它使用的模型通常效果不佳。你可以在Copilot聊天中选择模型，但在问题、PR或审查中无法选择。 Cursor + 工作树 + 代理的体验很糟糕。工作树从源代码库克隆，包括未暂存的文件，因此如果你想要一个干净的代理环境，你的主代码库必须是干净的。有时感觉直接将代码库克隆到新目录比使用工作树更简单。 **运作良好的方面：** 由于我们不断启动代理，我们的单一代码库设置脚本经过良好测试且可靠。它们也能顺利转化为持续集成/持续交付。大约25%的“打开问题 → Copilot PR”结果可以直接合并。这并不算惊人，但比零要好，经过几条评论后，这个比例能达到约50%。如果Copilot能更可靠地遵循我们的设置说明或让我们使用更强的模型，这个比例会更高。总体而言，每月大约花费1000美元，我们获得了相当于每位工程师1.5名额外初级/中级工程师的产出。这些“LLM工程师”总是编写测试，遵循标准，产生良好的提交信息，并且全天候工作。在审查和代理之间的上下文切换中会有一些摩擦，但这是可以管理的。你们在生产系统中是如何进行氛围编码的？

查看原文

Here’s how we’re working with LLMs at my startup.We have a monorepo with scheduled Python data workflows, two Next.js apps, and a small engineering team. We use GitHub for SCM and CI/CD, deploy to GCP and Vercel, and lean heavily on automation.Local development: Every engineer gets Cursor Pro (plus Bugbot), Gemini Pro, OpenAI Pro, and optionally Claude Pro. We don’t really care which model people use. In practice, LLMs are worth about 1.5 excellent junior/mid-level engineers per engineer, so paying for multiple models is easily worth it.We rely heavily on pre-commit hooks: ty, ruff, TypeScript checks, tests across all languages, formatting, and other guards. Everything is auto-formatted. LLMs make types and tests much easier to write, though complex typing still needs some hand-holding.GitHub + Copilot workflow: We pay for GitHub Enterprise primarily because it allows assigning issues to Copilot, which then opens a PR. Our rule is simple: if you open an issue, you assign it to Copilot. Every issue gets a code attempt attached to it.There’s no stigma around lots of PRs. We frequently delete ones we don’t use.We use Turborepo for the monorepo and are fully uv on the Python side.All coding practices are encoded in .cursor/rules files. For example: “If you are doing database work, only edit Drizzle’s schema.ts and don’t hand-write SQL.” Cursor generally respects this, but other tools struggle to consistently read or follow these rules no matter how many agent.md-style files we add.My personal dev loop: If I’m on the go and see a bug or have an idea, I open a GitHub issue (via Slack, mobile, or web) and assign it to Copilot. Sometimes the issue is detailed; sometimes a single sentence. Copilot opens a PR, and I review it later.If I’m at the keyboard, I start in Cursor as an agent in a Git worktree, using whatever the best model is. I iterate until I’m happy, ask the LLM to write tests, review everything, and push to GitHub. Before a human review, I let Cursor Bugbot, Copilot, and GitHub CodeQL review the code, and ask Copilot to fix anything they flag.Things that are still painful: To really know if code works, I need to run Temporal, two Next.js apps, several Python workers, and a Node worker. Some of this is Dockerized, some isn’t. Then I need a browser to run manual checks.AFAICT, there’s no service that lets me: give a prompt, write the code, spin up all this infra, run Playwright, handle database migrations, and let me manually poke at the system. We approximate this with GitHub Actions, but that doesn’t help with manual verification or DB work.Copilot doesn’t let you choose a model when assigning an issue or during code review. The model it uses is generally bad. You can pick a model in Copilot chat, but not in issues, PRs or reviews.Cursor + worktrees + agents suck. Worktrees clone from the source repo including unstaged files, so if you want a clean agent environment, your main repo has to be clean. At times it feels simpler to just clone the repo into a new directory instead of using worktrees.What’s working well: Because we constantly spin up agents, our monorepo setup scripts are well-tested and reliable. They also translate cleanly into CI/CD.Roughly 25% of “open issue → Copilot PR” results are mergeable as-is. That’s not amazing, but better than zero, and it gets to ~50% with a few comments. This would be higher if Copilot followed our setup instructions more reliably or let us use stronger models.Overall, for roughly $1k/month, we’re getting the equivalent of 1.5 additional junior/mid engineers per engineer. Those “LLM engineers” always write tests, follow standards, produce good commit messages, and work 24/7. There’s friction in reviewing and context-switching across agents, but it’s manageable.What are you doing for vibe coding in a production system?

请问HN：在一个成熟的代码库中，你是如何保持编程的感觉的？