手动需求收集与人工智能需求收集的比较:两句话与127点规格说明。
我们接到了一份模糊的客户请求,内容是“团队生产力仪表板”,并通过两种不同的发现流程进行了处理:一种是传统的人类分析师方法,另一种是基于人工智能的询问工作流程。
结果令人不安。人类分析师写了一段礼貌的总结,概述了“理想路径”。而人工智能则生成了一份包含127个技术规范的详细文档,突出了每一个边缘案例、安全漏洞和我们通常会在第八周才想起的缺失功能。
以下是实验的详细分析,以及我认为“范围蔓延”主要是发现失败的原因。
问题: “假设盲点”
我们都经历过“第八周危机”。在一个为期12周的项目中,当你完成了75%时,客户突然问:“管理用户的管理面板在哪里?”开发团队假设这超出了范围;而客户则认为这是隐含的,因为“所有应用都有登录功能”。
人类有很高的上下文理解能力。当我们听到“仪表板”时,我们会假设标准的身份验证、标准的错误处理和标准的规模。我们不会把这些写下来,因为这感觉有些啰嗦。
而人工智能则没有上下文。它不知道“身份验证”是隐含的。它不知道我们对原型的速率限制并不在意。因此,它会询问。
实验
我们将相同的输入提供给了一位资深人类分析师和一个作为技术询问者的LLM工作流程。
输入:“我们需要一个仪表板来跟踪团队的生产力。它应该从Jira和GitHub中提取数据,并显示谁在阻碍谁。”
路径A:人类分析师
输出:约5个要点。
专注于用户界面和“商业价值”。
假设:标准的Jira/GitHub API、单租户、标准安全性。
结果:一份干净、易读但技术上空洞的总结。
路径B:人工智能询问者
输出:127个独立的技术需求。
专注于:失败状态、数据治理和边缘案例。
结果:一份庞大、乏味但详尽的文档。
结果
数量差异(5与127)令人震惊,但内容差异才是关键。人工智能明确列出了人类完全“盲点”的需求:
- 细粒度的RBAC:“如果一名初级开发者试图删除一个仓库链接,会发生什么?”
- API速率限制:“我们如何处理同步时来自GitHub的429错误?”
- 数据保留:“我们是否无限期存储Jira票据?是否有清除政策?”
- 空状态:“对于一个没有任何票据的新用户,仪表板会是什么样子?”
人类的规范暗示这些是“实施细节”。而人工智能将其视为需求。在我的经验中,将RBAC视为实施细节正是项目超出预算的原因所在。
权衡与局限性
公平地说,阅读一份127点的规范是非常痛苦的。这其中存在严重的信号与噪声问题。
- 膨胀:人工智能可能过于严谨。它建议为本应是单体的项目采用微服务架构。在不存在复杂性的地方,它产生了幻觉。
- 瘫痪:给开发者提供一份127点的清单用于原型开发,是打击士气的好方法。
- 过滤:你仍然需要人类来审视这个清单,并说:“我们还不需要多租户,删除第45到60点。”
然而,我宁愿在项目开始时删除20个不必要的要点,也不愿在发布前两周发现20个缺失的需求。
讨论
这个实验让我意识到,我们对编写规范的厌恶——以及对“隐含”上下文的依赖——是技术债务的主要来源。人工智能之所以有用,并不是因为它聪明,而是因为它足够细致,能够提出我们认为太显而易见而不去问的问题。
我很好奇其他人是如何处理这个“隐含需求”问题的:
1. 你是否有一个关于RBAC/身份验证/速率限制等内容的清单可以重复使用?
2. 一份超过100点的规范真的有帮助吗,还是只是提前引发争论?
3. 你如何从“人工智能噪声”中筛选出关键的缺失规范?
如果有人想查看我们用来触发这种“询问者”模式的具体提示,欢迎在评论中分享。
查看原文
We took a vague 2-sentence client request for a "Team Productivity Dashboard" and ran it through two different discovery processes: a traditional human analyst approach vs an AI-driven interrogation workflow.<p>The results were uncomfortable. The human produced a polite paragraph summarizing the "happy path." The AI produced a 127-point technical specification that highlighted every edge case, security flaw, and missing feature we usually forget until Week 8.<p>Here is the breakdown of the experiment and why I think "scope creep" is mostly just discovery failure.<p>The Problem: The "Assumption Blind Spot"<p>We’ve all lived through the "Week 8 Crisis." You’re 75% through a 12-week build, and suddenly the client asks, "Where is the admin panel to manage users?" The dev team assumed it was out of scope; the client assumed it was implied because "all apps have logins."<p>Humans have high context. When we hear "dashboard," we assume standard auth, standard errors, and standard scale. We don't write it down because it feels pedantic.<p>AI has zero context. It doesn't know that "auth" is implied. It doesn't know that we don't care about rate limiting for a prototype. So it asks.<p>The Experiment<p>We fed the same input to a senior human analyst and an LLM workflow acting as a technical interrogator.<p>Input: "We need a dashboard to track team productivity. It should pull data from Jira and GitHub and show us who is blocking who."<p>Path A: Human Analyst
Output: ~5 bullet points.
Focused on the UI and the "business value."
Assumed: Standard Jira/GitHub APIs, single tenant, standard security.
Result: A clean, readable, but technically hollow summary.<p>Path B: AI Interrogator
Output: 127 distinct technical requirements.
Focused on: Failure states, data governance, and edge cases.
Result: A massive, boring, but exhaustive document.<p>The Results<p>The volume difference (5 vs 127) is striking, but the content difference is what matters. The AI explicitly defined requirements that the human completely "blind spotted":<p>- Granular RBAC: "What happens if a junior dev tries to delete a repo link?"
- API Rate Limits: "How do we handle 429 errors from GitHub during a sync?"
- Data Retention: "Do we store the Jira tickets indefinitely? Is there a purge policy?"
- Empty States: "What does the dashboard look like for a new user with 0 tickets?"<p>The human spec implied these were "implementation details." The AI treated them as requirements. In my experience, treating RBAC as an implementation detail is exactly why projects go over budget.<p>Trade-offs and Limitations<p>To be fair, reading a 127-point spec is miserable. There is a serious signal-to-noise problem here.<p>- Bloat: The AI can be overly rigid. It suggested microservices architecture for what should be a monolith. It hallucinated complexity where none existed.
- Paralysis: Handing a developer a 127-point list for a prototype is a great way to kill morale.
- Filtering: You still need a human to look at the list and say, "We don't need multi-tenancy yet, delete points 45-60."<p>However, I'd rather delete 20 unnecessary points at the start of a project than discover 20 missing requirements two weeks before launch.<p>Discussion<p>This experiment made me realize that our hatred of writing specs—and our reliance on "implied" context—is a major source of technical debt. The AI is useful not because it's smart, but because it's pedantic enough to ask the questions we think are too obvious to ask.<p>I’m curious how others handle this "implied requirements" problem:<p>1. Do you have a checklist for things like RBAC/Auth/Rate Limits that you reuse?
2. Is a 100+ point spec actually helpful, or does it just front-load the arguments?
3. How do you filter the "AI noise" from the critical missing specs?<p>If anyone wants to see the specific prompts we used to trigger this "interrogator" mode, happy to share in the comments.